mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-07 14:03:26 +00:00
docs: add Docling loader docs (#29104)
### Description This adds the docs for the Docling document loader. [Docling](https://github.com/DS4SD/docling) parses PDF, DOCX, PPTX, HTML, and other formats into a rich unified representation including document layout, tables etc., making them ready for generative AI workflows like RAG. Some references: - https://research.ibm.com/blog/docling-generative-AI - https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai - [Docling Technical Report](https://arxiv.org/abs/2408.09869) The introduced `DoclingLoader` enables users to: - use various document types in their LLM applications with ease and speed, and - leverage Docling's rich representation for advanced, document-native grounding. ### Issue Replacing PR #27987 as discussed with @efriis [here](https://github.com/langchain-ai/langchain/pull/27987#issuecomment-2489354930). ### Dependencies None --------- Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
@@ -808,6 +808,13 @@ const FEATURE_TABLES = {
|
||||
source: "API service that can be deployed locally, hosted version has free credits.",
|
||||
api: "API",
|
||||
apiLink: "https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.firecrawl.FireCrawlLoader.html"
|
||||
},
|
||||
{
|
||||
name: "Docling",
|
||||
link: "docling",
|
||||
source: "Uses Docling to load and parse web pages",
|
||||
api: "Package",
|
||||
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/docling/"
|
||||
}
|
||||
]
|
||||
},
|
||||
@@ -890,6 +897,13 @@ const FEATURE_TABLES = {
|
||||
source: "Load PDF files using UpstageDocumentParseLoader",
|
||||
api: "Package",
|
||||
apiLink: "https://python.langchain.com/api_reference/upstage/document_parse/langchain_upstage.document_parse.UpstageDocumentParseLoader.html"
|
||||
},
|
||||
{
|
||||
name: "Docling",
|
||||
link: "docling",
|
||||
source: "Load PDF files using Docling",
|
||||
api: "Package",
|
||||
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/docling/"
|
||||
}
|
||||
]
|
||||
},
|
||||
@@ -932,6 +946,12 @@ const FEATURE_TABLES = {
|
||||
source: "HTML files",
|
||||
apiLink: "https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.html_bs.BSHTMLLoader.html"
|
||||
},
|
||||
{
|
||||
name: "DoclingLoader",
|
||||
link: "../../integrations/document_loaders/docling",
|
||||
source: "Various file types (see https://ds4sd.github.io/docling/)",
|
||||
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/docling/"
|
||||
},
|
||||
]
|
||||
},
|
||||
vectorstores: {
|
||||
|
Reference in New Issue
Block a user