langchain/libs/community/langchain_community/document_loaders
Louis Auneau 0b532a4ed0
community: Azure Document Intelligence parser features not available fixed (#30370)
Thank you for contributing to LangChain!

- **Description:** Azure Document Intelligence OCR solution has a
*feature* parameter that enables some features such as high-resolution
document analysis, key-value pairs extraction, ... In langchain parser,
you could be provided as a `analysis_feature` parameter to the
constructor that was passed on the `DocumentIntelligenceClient`.
However, according to the `DocumentIntelligenceClient` [API
Reference](https://learn.microsoft.com/en-us/python/api/azure-ai-documentintelligence/azure.ai.documentintelligence.documentintelligenceclient?view=azure-python),
this is not a valid constructor parameter. It was therefore remove and
instead stored as a parser property that is used in the
`begin_analyze_document`'s `features` parameter (see [API
Reference](https://learn.microsoft.com/en-us/python/api/azure-ai-formrecognizer/azure.ai.formrecognizer.documentanalysisclient?view=azure-python#azure-ai-formrecognizer-documentanalysisclient-begin-analyze-document)).
I also removed the check for "Supported features" since all features are
supported out-of-the-box. Also I did not check if the provided `str`
actually corresponds to the Azure package enumeration of features, since
the `ValueError` when creating the enumeration object is pretty
explicit.
Last caveat, is that some features are not supported for some kind of
documents. This is documented inside Microsoft documentation and
exception are also explicit.
- **Issue:** N/A
- **Dependencies:** No
- **Twitter handle:** @Louis___A

---------

Co-authored-by: Louis Auneau <louis@handshakehealth.co>
2025-03-26 14:40:14 -04:00
..
blob_loaders community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
parsers community: Azure Document Intelligence parser features not available fixed (#30370) 2025-03-26 14:40:14 -04:00
__init__.py community[patch]: Refactoring PDF loaders: 01 prepare (#29062) 2025-01-07 11:00:04 -05:00
acreom.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
airbyte_json.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
airbyte.py
airtable.py docs: fix kwargs docstring (#25010) 2024-08-02 19:54:54 -07:00
apify_dataset.py docs: update apify integration (#29553) 2025-02-12 20:02:55 -08:00
arcgis_loader.py
arxiv.py docs: Arxiv docs update (#23871) 2024-07-05 11:43:51 -04:00
assemblyai.py community[patch]: docstrings update (#20301) 2024-04-11 16:23:27 -04:00
astradb.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
async_html.py community[patch]: Release 0.2.11 (#24989) 2024-08-02 20:08:44 +00:00
athena.py community: make AthenaLoader profile_name optional and fix type hint (#24958) 2024-08-05 14:28:58 +00:00
azlyrics.py
azure_ai_data.py
azure_blob_storage_container.py
azure_blob_storage_file.py
baiducloud_bos_directory.py
baiducloud_bos_file.py
base_o365.py community: update base_o365.py (#29657) 2025-02-07 08:43:29 -05:00
base.py
bibtex.py
bigquery.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
bilibili.py community[patch]: fix bilibili loader handling of multi-page content (#30283) 2025-03-14 14:53:03 -04:00
blackboard.py community: add flag to toggle progress bar (#24463) 2024-07-20 13:18:02 +00:00
blockchain.py community: add supported blockchains to Blockchain Document Loader (#25428) 2024-08-23 14:39:42 +00:00
brave_search.py
browserbase.py community: updated Browserbase loader (#21757) 2024-05-16 08:21:23 -07:00
browserless.py
cassandra.py community[minor]: Add Cassandra ByteStore (#22064) 2024-05-23 10:46:23 -04:00
chatgpt.py
chm.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
chromium.py community[minor]: add user agent for web scraping loaders (#22480) 2024-06-05 15:20:34 +00:00
college_confidential.py
concurrent.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
confluence.py community: add keep_newlines parameter to process_pages method (#30365) 2025-03-19 08:57:59 -04:00
conllu.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
couchbase.py
csv_loader.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
cube_semantic.py community: cube document loader - do not load non-public dimensions and measures (#30286) 2025-03-14 15:07:56 -04:00
datadog_logs.py
dataframe.py Update dataframe.py (#28871) 2024-12-22 19:16:16 -05:00
dedoc.py community[minor]: added new document loaders based on dedoc library (#24303) 2024-07-23 02:04:53 +00:00
diffbot.py
directory.py community: glob multiple patterns when using DirectoryLoader (#22852) 2024-06-18 09:24:50 -07:00
discord.py
doc_intelligence.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
docugami.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
docusaurus.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
dropbox.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
duckdb_loader.py
email.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
epub.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
etherscan.py
evernote.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
excel.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
facebook_chat.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
fauna.py
figma.py
firecrawl.py community: add 'extract' mode to FireCrawlLoader for structured data extraction (#30242) 2025-03-17 15:15:57 +00:00
gcs_directory.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
gcs_file.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
generic.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
geodataframe.py
git.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
gitbook.py community: add flag to toggle progress bar (#24463) 2024-07-20 13:18:02 +00:00
github.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
glue_catalog.py community[minor]: Add glue catalog loader (#20220) 2024-04-16 11:39:23 -04:00
google_speech_to_text.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
googledrive.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
gutenberg.py
helpers.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
hn.py
html_bs.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
html.py community: add init for UnstructuredHTMLLoader to solve pathlib paths (#29091) 2025-01-08 10:19:27 -05:00
hugging_face_dataset.py
hugging_face_model.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
ifixit.py
image_captions.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
image.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
imsdb.py
iugu.py
joplin.py
json_loader.py community[minor]: Fix json._validate_metadata_func() (#22842) 2024-12-13 21:24:20 +00:00
kinetica_loader.py community[patch]: Kinetica Integrations handled error in querying; quotes in table names; updated gpudb API (#22724) 2024-06-11 10:01:26 -04:00
lakefs.py
larksuite.py community[minor]: Add LarkSuite wiki document loader. (#21016) 2024-04-29 10:37:50 -04:00
llmsherpa.py community[minor]: add support for llmsherpa (#19741) 2024-03-29 16:04:57 -07:00
markdown.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
mastodon.py
max_compute.py
mediawikidump.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
merge.py
mhtml.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
mintbase.py community[minor]: add mintbase loader to langchain (#20089) 2024-04-30 04:11:56 +00:00
modern_treasury.py
mongodb.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
needle.py community: add Needle retriever and document loader integration (#28157) 2024-12-03 22:06:25 +00:00
news.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
notebook.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
notion.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
notiondb.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
nuclia.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
obs_directory.py
obs_file.py add mode arg to OBSFileLoader.load() method (#29246) 2025-01-16 11:09:04 -05:00
obsidian.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
odt.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
onedrive_file.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
onedrive.py community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716) 2024-11-06 17:44:34 -05:00
onenote.py community[patch]: Fix validation error in SettingsConfigDict across multiple Langchain modules (#26852) 2024-09-25 10:02:14 -04:00
open_city_data.py
oracleadb_loader.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
oracleai.py community[minor]: Oraclevs integration (#21123) 2024-05-04 03:15:35 +00:00
org_mode.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
pdf.py community[minor]: 05 - Refactoring PyPDFium2 parser (#29625) 2025-02-07 21:31:12 -05:00
pebblo.py community[minor]: [Pebblo] Enhance PebbloSafeLoader to take anonymize flag (#26812) 2024-09-25 09:33:06 -04:00
polars_dataframe.py
powerpoint.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
psychic.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
pubmed.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
pyspark_dataframe.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
python.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
quip.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
readthedocs.py
recursive_url_loader.py docs: fix broken Appearance of langchain_community/document_loaders/recursive_url_loader API Reference (#29305) 2025-01-20 10:56:59 -05:00
reddit.py
roam.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
rocksetdb.py
rspace.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
rss.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
rst.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
rtf.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
s3_directory.py
s3_file.py community[patch]: support unstructured_kwargs for s3 loader (#15473) 2024-03-27 22:03:48 +00:00
scrapfly.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
scrapingant.py community[minor]: Add ScrapingAnt Loader Community Integration (#24514) 2024-07-24 21:11:43 -04:00
sharepoint.py community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716) 2024-11-06 17:44:34 -05:00
sitemap.py community[patch]: SitemapLoader restrict depth of parsing sitemap (CVE-2024-2965) (#22903) 2024-06-14 13:04:40 -04:00
slack_directory.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
snowflake_loader.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
spider.py doc list not empty (#21208) 2024-05-20 08:24:06 -07:00
spreedly.py
sql_database.py community[patch]: restore compatibility with SQLAlchemy 1.x (#22546) 2024-06-19 17:58:57 +00:00
srt.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
stripe.py
surrealdb.py
telegram.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tencent_cos_directory.py
tencent_cos_file.py
tensorflow_datasets.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
text.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tidb.py
tomarkdown.py community[patch]: Update URL to the 2markdown API (#24546) 2024-07-23 14:27:55 +00:00
toml.py
trello.py
tsv.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
twitter.py
unstructured.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
url_playwright.py community[minor]: PlaywrightURLLoader can take stored session file (#30152) 2025-03-19 16:29:07 -04:00
url_selenium.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
url.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
vsdx.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
weather.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
web_base.py community: Corrected aload func to be asynchronous from webBaseLoader (#28337) 2024-12-20 14:42:52 -05:00
whatsapp_chat.py
wikipedia.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
word_document.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
xml.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
xorbits.py
youtube.py Youtube Loader load method Fixed (#30314) 2025-03-23 14:48:03 -04:00
yuque.py