docs: add Docling loader docs (#29104)

### Description
This adds the docs for the Docling document loader.
[Docling](https://github.com/DS4SD/docling) parses PDF, DOCX, PPTX,
HTML, and other formats into a rich unified representation including
document layout, tables etc., making them ready for generative AI
workflows like RAG.

Some references:
- https://research.ibm.com/blog/docling-generative-AI
-
https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai
- [Docling Technical Report](https://arxiv.org/abs/2408.09869)

The introduced `DoclingLoader` enables users to:
- use various document types in their LLM applications with ease and
speed, and
- leverage Docling's rich representation for advanced, document-native
grounding.

### Issue
Replacing PR #27987 as discussed with @efriis
[here](https://github.com/langchain-ai/langchain/pull/27987#issuecomment-2489354930).

### Dependencies
None

---------

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
Panos Vagenas
2025-01-09 16:15:35 +01:00
committed by GitHub
parent cc55e32924
commit 858f655a25
4 changed files with 621 additions and 0 deletions

View File

@@ -808,6 +808,13 @@ const FEATURE_TABLES = {
source: "API service that can be deployed locally, hosted version has free credits.",
api: "API",
apiLink: "https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.firecrawl.FireCrawlLoader.html"
},
{
name: "Docling",
link: "docling",
source: "Uses Docling to load and parse web pages",
api: "Package",
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/docling/"
}
]
},
@@ -890,6 +897,13 @@ const FEATURE_TABLES = {
source: "Load PDF files using UpstageDocumentParseLoader",
api: "Package",
apiLink: "https://python.langchain.com/api_reference/upstage/document_parse/langchain_upstage.document_parse.UpstageDocumentParseLoader.html"
},
{
name: "Docling",
link: "docling",
source: "Load PDF files using Docling",
api: "Package",
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/docling/"
}
]
},
@@ -932,6 +946,12 @@ const FEATURE_TABLES = {
source: "HTML files",
apiLink: "https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.html_bs.BSHTMLLoader.html"
},
{
name: "DoclingLoader",
link: "../../integrations/document_loaders/docling",
source: "Various file types (see https://ds4sd.github.io/docling/)",
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/docling/"
},
]
},
vectorstores: {