mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-08 14:31:55 +00:00
PyMuPDF4LLM integration to LangChain (#29953)
## PyMuPDF4LLM integration to LangChain for PDF content extraction in Markdown format ### Description [PyMuPDF4LLM](https://github.com/pymupdf/RAG) makes it easier to extract PDF content in Markdown format, needed for LLM & RAG applications. (License: GNU Affero General Public License v3.0) [langchain-pymupdf4llm](https://github.com/lakinduboteju/langchain-pymupdf4llm) integrates PyMuPDF4LLM to LangChain as a Document Loader. (License: MIT License) This pull request introduces the integration of [PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm) into the LangChain project as an integration package: [`langchain-pymupdf4llm`](https://github.com/lakinduboteju/langchain-pymupdf4llm). The most important changes include adding new Jupyter notebooks to document the integration and updating the package configuration file to include the new package. ### Documentation: * `docs/docs/integrations/providers/pymupdf4llm.ipynb`: Added a new Jupyter notebook to document the integration of `PyMuPDF4LLM` with LangChain, including installation instructions and class imports. * `docs/docs/integrations/document_loaders/pymupdf4llm.ipynb`: Added a new Jupyter notebook to document the usage of `langchain-pymupdf4llm` as a LangChain integration package in detail. ### Package registration: * `libs/packages.yml`: Updated the package configuration file to include the `langchain-pymupdf4llm` package. ### Additional information * Related to: https://github.com/langchain-ai/langchain/pull/29848 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
@@ -888,6 +888,13 @@ const FEATURE_TABLES = {
|
||||
api: "Package",
|
||||
apiLink: "https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html"
|
||||
},
|
||||
{
|
||||
name: "PyMuPDF4LLM",
|
||||
link: "pymupdf4llm",
|
||||
source: "Load PDF content to Markdown using PyMuPDF4LLM",
|
||||
api: "Package",
|
||||
apiLink: "https://github.com/lakinduboteju/langchain-pymupdf4llm"
|
||||
},
|
||||
{
|
||||
name: "PDFMiner",
|
||||
link: "pdfminer",
|
||||
|
Reference in New Issue
Block a user