langchain/libs/community/langchain_community/document_loaders
Fabian Blatz a2d05a376c
community: ConfluenceLoader: add a filter method for attachments (#29882)
Adds a `attachment_filter_func` parameter to the ConfluenceLoader class
which can be used to determine which files are indexed. This is useful
if you are interested in excluding files based on their media type or
other metadata.
2025-02-19 18:20:45 -05:00
..
blob_loaders community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
parsers community: add custom model for OpenAIWhisperParser (#29831) 2025-02-16 21:26:07 -05:00
__init__.py community[patch]: Refactoring PDF loaders: 01 prepare (#29062) 2025-01-07 11:00:04 -05:00
acreom.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
airbyte_json.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
airbyte.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
airtable.py docs: fix kwargs docstring (#25010) 2024-08-02 19:54:54 -07:00
apify_dataset.py docs: update apify integration (#29553) 2025-02-12 20:02:55 -08:00
arcgis_loader.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
arxiv.py docs: Arxiv docs update (#23871) 2024-07-05 11:43:51 -04:00
assemblyai.py community[patch]: docstrings update (#20301) 2024-04-11 16:23:27 -04:00
astradb.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
async_html.py community[patch]: Release 0.2.11 (#24989) 2024-08-02 20:08:44 +00:00
athena.py community: make AthenaLoader profile_name optional and fix type hint (#24958) 2024-08-05 14:28:58 +00:00
azlyrics.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
azure_ai_data.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
azure_blob_storage_container.py community[patch]: type ignore fixes (#18395) 2024-03-01 11:21:02 -08:00
azure_blob_storage_file.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
baiducloud_bos_directory.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
baiducloud_bos_file.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
base_o365.py community: update base_o365.py (#29657) 2025-02-07 08:43:29 -05:00
base.py core: Move document loader interfaces to core (#17723) 2024-03-06 13:59:00 -05:00
bibtex.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
bigquery.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
bilibili.py community[patch]: docstrings update (#20301) 2024-04-11 16:23:27 -04:00
blackboard.py community: add flag to toggle progress bar (#24463) 2024-07-20 13:18:02 +00:00
blockchain.py community: add supported blockchains to Blockchain Document Loader (#25428) 2024-08-23 14:39:42 +00:00
brave_search.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
browserbase.py community: updated Browserbase loader (#21757) 2024-05-16 08:21:23 -07:00
browserless.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
cassandra.py community[minor]: Add Cassandra ByteStore (#22064) 2024-05-23 10:46:23 -04:00
chatgpt.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
chm.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
chromium.py community[minor]: add user agent for web scraping loaders (#22480) 2024-06-05 15:20:34 +00:00
college_confidential.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
concurrent.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
confluence.py community: ConfluenceLoader: add a filter method for attachments (#29882) 2025-02-19 18:20:45 -05:00
conllu.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
couchbase.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
csv_loader.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
cube_semantic.py community: add missing format specifier in error log in CubeSemanticLoader (#29172) 2025-01-13 09:32:57 -05:00
datadog_logs.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
dataframe.py Update dataframe.py (#28871) 2024-12-22 19:16:16 -05:00
dedoc.py community[minor]: added new document loaders based on dedoc library (#24303) 2024-07-23 02:04:53 +00:00
diffbot.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
directory.py community: glob multiple patterns when using DirectoryLoader (#22852) 2024-06-18 09:24:50 -07:00
discord.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
doc_intelligence.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
docugami.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
docusaurus.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
dropbox.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
duckdb_loader.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
email.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
epub.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
etherscan.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
evernote.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
excel.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
facebook_chat.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
fauna.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
figma.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
firecrawl.py Community: Updated Firecrawl Document Loader to v1 (#26548) 2024-10-15 13:13:28 +00:00
gcs_directory.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
gcs_file.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
generic.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
geodataframe.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
git.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
gitbook.py community: add flag to toggle progress bar (#24463) 2024-07-20 13:18:02 +00:00
github.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
glue_catalog.py community[minor]: Add glue catalog loader (#20220) 2024-04-16 11:39:23 -04:00
google_speech_to_text.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
googledrive.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
gutenberg.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
helpers.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
hn.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
html_bs.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
html.py community: add init for UnstructuredHTMLLoader to solve pathlib paths (#29091) 2025-01-08 10:19:27 -05:00
hugging_face_dataset.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
hugging_face_model.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
ifixit.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
image_captions.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
image.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
imsdb.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
iugu.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
joplin.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
json_loader.py community[minor]: Fix json._validate_metadata_func() (#22842) 2024-12-13 21:24:20 +00:00
kinetica_loader.py community[patch]: Kinetica Integrations handled error in querying; quotes in table names; updated gpudb API (#22724) 2024-06-11 10:01:26 -04:00
lakefs.py docs: docstrings langchain_community update (#14889) 2023-12-19 08:58:24 -05:00
larksuite.py community[minor]: Add LarkSuite wiki document loader. (#21016) 2024-04-29 10:37:50 -04:00
llmsherpa.py community[minor]: add support for llmsherpa (#19741) 2024-03-29 16:04:57 -07:00
markdown.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
mastodon.py Merge pull request #18671 2024-03-06 13:23:14 -05:00
max_compute.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
mediawikidump.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
merge.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
mhtml.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
mintbase.py community[minor]: add mintbase loader to langchain (#20089) 2024-04-30 04:11:56 +00:00
modern_treasury.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
mongodb.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
needle.py community: add Needle retriever and document loader integration (#28157) 2024-12-03 22:06:25 +00:00
news.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
notebook.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
notion.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
notiondb.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
nuclia.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
obs_directory.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
obs_file.py add mode arg to OBSFileLoader.load() method (#29246) 2025-01-16 11:09:04 -05:00
obsidian.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
odt.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
onedrive_file.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
onedrive.py community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716) 2024-11-06 17:44:34 -05:00
onenote.py community[patch]: Fix validation error in SettingsConfigDict across multiple Langchain modules (#26852) 2024-09-25 10:02:14 -04:00
open_city_data.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
oracleadb_loader.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
oracleai.py community[minor]: Oraclevs integration (#21123) 2024-05-04 03:15:35 +00:00
org_mode.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
pdf.py community[minor]: 05 - Refactoring PyPDFium2 parser (#29625) 2025-02-07 21:31:12 -05:00
pebblo.py community[minor]: [Pebblo] Enhance PebbloSafeLoader to take anonymize flag (#26812) 2024-09-25 09:33:06 -04:00
polars_dataframe.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
powerpoint.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
psychic.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
pubmed.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
pyspark_dataframe.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
python.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
quip.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
readthedocs.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
recursive_url_loader.py docs: fix broken Appearance of langchain_community/document_loaders/recursive_url_loader API Reference (#29305) 2025-01-20 10:56:59 -05:00
reddit.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
roam.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
rocksetdb.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
rspace.py community: Bump ruff version to 0.9 (#29206) 2025-02-08 01:21:10 +00:00
rss.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
rst.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
rtf.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
s3_directory.py community[patch]: Skip nested directories when using S3DirectoryLoader (#17829) 2024-03-08 16:50:58 -08:00
s3_file.py community[patch]: support unstructured_kwargs for s3 loader (#15473) 2024-03-27 22:03:48 +00:00
scrapfly.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
scrapingant.py community[minor]: Add ScrapingAnt Loader Community Integration (#24514) 2024-07-24 21:11:43 -04:00
sharepoint.py community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716) 2024-11-06 17:44:34 -05:00
sitemap.py community[patch]: SitemapLoader restrict depth of parsing sitemap (CVE-2024-2965) (#22903) 2024-06-14 13:04:40 -04:00
slack_directory.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
snowflake_loader.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
spider.py doc list not empty (#21208) 2024-05-20 08:24:06 -07:00
spreedly.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
sql_database.py community[patch]: restore compatibility with SQLAlchemy 1.x (#22546) 2024-06-19 17:58:57 +00:00
srt.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
stripe.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
surrealdb.py community[patch]: SurrealDB fix for asyncio (#16092) 2024-01-23 19:46:19 -08:00
telegram.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tencent_cos_directory.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tencent_cos_file.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tensorflow_datasets.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
text.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tidb.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tomarkdown.py community[patch]: Update URL to the 2markdown API (#24546) 2024-07-23 14:27:55 +00:00
toml.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
trello.py community: Implement lazy_load() for TrelloLoader (#18658) 2024-03-06 13:04:36 -05:00
tsv.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
twitter.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
unstructured.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
url_playwright.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
url_selenium.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
url.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
vsdx.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
weather.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
web_base.py community: Corrected aload func to be asynchronous from webBaseLoader (#28337) 2024-12-20 14:42:52 -05:00
whatsapp_chat.py community: Implement lazy_load() for WhatsAppChatLoader (#18677) 2024-03-06 13:03:46 -05:00
wikipedia.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
word_document.py community: add init for unstructured file loader (#29101) 2025-01-13 09:26:00 -05:00
xml.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
xorbits.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
youtube.py community:Fix for Pydantic model validator of GoogleApiYoutubeLoader (#29694) 2025-02-10 08:57:58 -05:00
yuque.py community[minor]: add Yuque document loader (#17924) 2024-03-05 15:54:07 -08:00