langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-06-05 06:33:20 +00:00

History

Martin Triska 7a9149f5dd community: ZeroxPDFLoader (#27800 ) # OCR-based PDF loader This implements [Zerox](https://github.com/getomni-ai/zerox) PDF document loader. Zerox utilizes simple but very powerful (even though slower and more costly) approach to parsing PDF documents: it converts PDF to series of images and passes it to a vision model requesting the contents in markdown. It is especially suitable for complex PDFs that are not parsed well by other alternatives. ## Example use: ```python from langchain_community.document_loaders.pdf import ZeroxPDFLoader os.environ["OPENAI_API_KEY"] = "" ## your-api-key model = "gpt-4o-mini" ## openai model pdf_url = "https://assets.ctfassets.net/f1df9zr7wr1a/soP1fjvG1Wu66HJhu3FBS/034d6ca48edb119ae77dec5ce01a8612/OpenAI_Sacra_Teardown.pdf" loader = ZeroxPDFLoader(file_path=pdf_url, model=model) docs = loader.load() ``` The Zerox library supports wide range of provides/models. See Zerox documentation for details. - Dependencies: `zerox` - Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>		2024-11-07 03:14:57 +00:00
..
_templates
additional_resources	docs: make docs mdxv2 compatible (#26798 )	2024-09-23 21:24:23 -07:00
changes/changelog	docs: Add langchain over time (#21434 )	2024-05-10 00:34:35 +00:00
concepts	docs: Update `messages.mdx` (#27856 )	2024-11-04 20:36:31 +00:00
contributing	fix the grammar and markdown component (#27657 )	2024-10-30 14:47:26 +00:00
example_data	docs[minor]: Add "Build a PDF ingestion and Question/Answering system" tutorial (#22570 )	2024-06-05 17:09:28 -07:00
how_to	update llm graph transformer documentation (#27905 )	2024-11-05 11:54:26 -05:00
integrations	community: ZeroxPDFLoader (#27800 )	2024-11-07 03:14:57 +00:00
troubleshooting/errors	docs: fix more links (#27809 )	2024-10-31 17:15:46 -04:00
tutorials	docs: Update VectorStore api reference url in rag.ipynb (#27841 )	2024-11-04 20:27:03 +00:00
versions	docs: fix more links (#27809 )	2024-10-31 17:15:46 -04:00
.gitignore
introduction.mdx	docs: fix more links (#27809 )	2024-10-31 17:15:46 -04:00
people.mdx
security.md	community[minor]: add proxy support to RecursiveUrlLoader (#27364 )	2024-10-16 16:29:59 +00:00