langchain/docs/modules/indexes
Chetanya Rastogi 50c511d75f
Add new loader to load pdf as html content (#2607)
Adds a new pdf loader using the existing dependency on PDFMiner. 

The new loader can be helpful for chunking texts semantically into
sections as the output html content can be parsed via `BeautifulSoup` to
get more structured and rich information about font size, page numbers,
pdf headers/footers, etc. which may not be available otherwise with
other pdf loaders
2023-04-09 17:57:25 -07:00
..
document_loaders/examples Add new loader to load pdf as html content (#2607) 2023-04-09 17:57:25 -07:00
retrievers/examples Harrison/docs cleanup (#2633) 2023-04-09 12:55:22 -07:00
text_splitters Update huggingface_length_function.ipynb (#2203) 2023-03-30 20:43:58 -07:00
vectorstores Harrison/redis improvements (#2528) 2023-04-06 23:21:22 -07:00
document_loaders.rst big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
getting_started.ipynb Fix docstring in indexes/getting-started (#2452) 2023-04-06 12:48:08 -07:00
retrievers.rst big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
text_splitters.rst big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
vectorstores.rst big docs refactor (#1978) 2023-03-26 19:49:46 -07:00