langchain/libs/community/langchain_community/document_loaders/parsers
Louis Auneau 0b532a4ed0
community: Azure Document Intelligence parser features not available fixed (#30370)
Thank you for contributing to LangChain!

- **Description:** Azure Document Intelligence OCR solution has a
*feature* parameter that enables some features such as high-resolution
document analysis, key-value pairs extraction, ... In langchain parser,
you could be provided as a `analysis_feature` parameter to the
constructor that was passed on the `DocumentIntelligenceClient`.
However, according to the `DocumentIntelligenceClient` [API
Reference](https://learn.microsoft.com/en-us/python/api/azure-ai-documentintelligence/azure.ai.documentintelligence.documentintelligenceclient?view=azure-python),
this is not a valid constructor parameter. It was therefore remove and
instead stored as a parser property that is used in the
`begin_analyze_document`'s `features` parameter (see [API
Reference](https://learn.microsoft.com/en-us/python/api/azure-ai-formrecognizer/azure.ai.formrecognizer.documentanalysisclient?view=azure-python#azure-ai-formrecognizer-documentanalysisclient-begin-analyze-document)).
I also removed the check for "Supported features" since all features are
supported out-of-the-box. Also I did not check if the provided `str`
actually corresponds to the Azure package enumeration of features, since
the `ValueError` when creating the enumeration object is pretty
explicit.
Last caveat, is that some features are not supported for some kind of
documents. This is documented inside Microsoft documentation and
exception are also explicit.
- **Issue:** N/A
- **Dependencies:** No
- **Twitter handle:** @Louis___A

---------

Co-authored-by: Louis Auneau <louis@handshakehealth.co>
2025-03-26 14:40:14 -04:00
..
html
language Langchain_Community: SQL LanguageParser (#28430) 2024-12-19 20:30:57 +00:00
__init__.py community[minor]: Refactoring PyMuPDF parser, loader and add image blob parsers (#29063) 2025-01-20 15:15:43 -05:00
audio.py community: support in-memory data (Blob.from_data) in all audio parsers (#30262) 2025-03-17 19:52:33 -04:00
doc_intelligence.py community: Azure Document Intelligence parser features not available fixed (#30370) 2025-03-26 14:40:14 -04:00
docai.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
documentloader_adapter.py community: DocumentLoaderAsParser wrapper (#27749) 2024-12-18 12:47:08 -05:00
generic.py
grobid.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
images.py community: fix import exception too constrictive (#30218) 2025-03-17 22:09:02 -04:00
msword.py
pdf.py community[patch]: update PyPDFParser to take into account filters returned as arrays (#30489) 2025-03-26 14:16:54 -04:00
registry.py
txt.py
vsdx.py