mirror of
https://github.com/hwchase17/langchain.git
synced 2026-01-21 13:52:48 +00:00
## PyMuPDF4LLM integration to LangChain for PDF content extraction in Markdown format ### Description [PyMuPDF4LLM](https://github.com/pymupdf/RAG) makes it easier to extract PDF content in Markdown format, needed for LLM & RAG applications. (License: GNU Affero General Public License v3.0) [langchain-pymupdf4llm](https://github.com/lakinduboteju/langchain-pymupdf4llm) integrates PyMuPDF4LLM to LangChain as a Document Loader. (License: MIT License) This pull request introduces the integration of [PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm) into the LangChain project as an integration package: [`langchain-pymupdf4llm`](https://github.com/lakinduboteju/langchain-pymupdf4llm). The most important changes include adding new Jupyter notebooks to document the integration and updating the package configuration file to include the new package. ### Documentation: * `docs/docs/integrations/providers/pymupdf4llm.ipynb`: Added a new Jupyter notebook to document the integration of `PyMuPDF4LLM` with LangChain, including installation instructions and class imports. * `docs/docs/integrations/document_loaders/pymupdf4llm.ipynb`: Added a new Jupyter notebook to document the usage of `langchain-pymupdf4llm` as a LangChain integration package in detail. ### Package registration: * `libs/packages.yml`: Updated the package configuration file to include the `langchain-pymupdf4llm` package. ### Additional information * Related to: https://github.com/langchain-ai/langchain/pull/29848 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>