mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-28 10:39:23 +00:00
Different PDF libraries have different strengths and weaknesses. PyMuPDF does a good job at extracting the most amount of content from the doc, regardless of the source quality, extremely fast (especially compared to Unstructured). https://pymupdf.readthedocs.io/en/latest/index.html |
||
---|---|---|
.. | ||
chains | ||
document_loaders | ||
embeddings | ||
examples | ||
llms | ||
vectorstores | ||
__init__.py | ||
test_googlesearch_api.py | ||
test_googleserper_api.py | ||
test_ngram_overlap_example_selector.py | ||
test_nlp_text_splitters.py | ||
test_pdf_pagesplitter.py | ||
test_serpapi.py | ||
test_text_splitter.py | ||
test_wolfram_alpha_api.py |