mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-19 14:01:50 +00:00
Adds a new pdf loader using the existing dependency on PDFMiner. The new loader can be helpful for chunking texts semantically into sections as the output html content can be parsed via `BeautifulSoup` to get more structured and rich information about font size, page numbers, pdf headers/footers, etc. which may not be available otherwise with other pdf loaders |
||
---|---|---|
.. | ||
document_loaders/examples | ||
retrievers/examples | ||
text_splitters | ||
vectorstores | ||
document_loaders.rst | ||
getting_started.ipynb | ||
retrievers.rst | ||
text_splitters.rst | ||
vectorstores.rst |