mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-12 00:11:17 +00:00
community[minor]: Azure DocumentIntelligenceLoader/Parser support update with latest SDK (#14389)
- **Description:** Add DocumentIntelligenceLoader & DocumentIntelligenceParser implementation using the latest Azure Document Intelligence SDK with markdown support. The core logic resides in DocumentIntelligenceParser and DocumentIntelligenceLoader is a mere wrapper of the parser. The parser will takes api_endpoint and api_key and creates DocumentIntelligenceClient for the user. 4 parsing modes are supported: 1. Markdown (default) 2. Single 3. Page 4. Object UT and notebook are also updated accordingly. - **Dependencies:** Azure Document Intelligence SDK: azure-ai-documentintelligence [azure-sdk-for-python/sdk/documentintelligence/azure-ai-documentintelligence at 7c42462ac662522a6fd21b17d2a20f4cd40d0356 · Azure/azure-sdk-for-python (github.com)](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-python%2Ftree%2F7c42462ac662522a6fd21b17d2a20f4cd40d0356%2Fsdk%2Fdocumentintelligence%2Fazure-ai-documentintelligence&data=05%7C01%7CZifei.Qian%40microsoft.com%7C298225aa3e31468a863108dbf07374ff%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638368150928704292%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oE0Sl4HERnMKdbkV9KgBV46Z2xytcQAShdTWf7ZNl%2Bs%3D&reserved=0). --------- Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
@@ -542,9 +542,17 @@ class AmazonTextractPDFParser(BaseBlobParser):
|
||||
|
||||
class DocumentIntelligenceParser(BaseBlobParser):
|
||||
"""Loads a PDF with Azure Document Intelligence
|
||||
(formerly Forms Recognizer) and chunks at character level."""
|
||||
(formerly Form Recognizer) and chunks at character level."""
|
||||
|
||||
def __init__(self, client: Any, model: str):
|
||||
warnings.warn(
|
||||
"langchain.document_loaders.parsers.pdf.DocumentIntelligenceParser"
|
||||
"and langchain.document_loaders.pdf.DocumentIntelligenceLoader"
|
||||
" are deprecated. Please upgrade to "
|
||||
"langchain.document_loaders.DocumentIntelligenceLoader "
|
||||
"for any file parsing purpose using Azure Document Intelligence "
|
||||
"service."
|
||||
)
|
||||
self.client = client
|
||||
self.model = model
|
||||
|
||||
|
Reference in New Issue
Block a user