langchain/docs/docs/integrations/document_loaders
Stefano Lottini 325f729a92
docs: improvements to Astra DB pages, especially modernize Vector DB example notebook (#30961)
This PR brings several improvements and modernizations to the
documentation around the Astra DB partner package.

- language alignment for better matching with the terms used in the
Astra DB docs
- updated several links to pages on said documentation
- for the `AstraDBVectorStore`, added mentions of the new features in
the overall `astra.mdx`
- for the vector store, rewritten/upgraded most of the usage example
notebook for a more straightforward experience able to highlight the
main usage patterns (including new ones such as the newly-introduced
"autodetect feature")

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-05-03 14:26:52 -04:00
..
example_data [docs]: doc loader changes (#25417) 2024-08-14 19:46:33 -07:00
parsers docs: update langsmith env vars (#30331) 2025-03-17 14:35:22 -07:00
acreom.ipynb docs: integrations/providers update 9 (#19941) 2024-04-04 21:37:48 +00:00
agentql.ipynb docs: Add AgentQL provider doc, tool/toolkit doc and documentloader doc (#30144) 2025-03-11 21:57:40 -04:00
airbyte_cdk.ipynb docs[patch]: Remove deprecated Airbyte loaders from listings (#23927) 2024-07-10 02:21:25 +00:00
airbyte_gong.ipynb docs[patch]: Remove deprecated Airbyte loaders from listings (#23927) 2024-07-10 02:21:25 +00:00
airbyte_hubspot.ipynb docs[patch]: Remove deprecated Airbyte loaders from listings (#23927) 2024-07-10 02:21:25 +00:00
airbyte_json.ipynb docs[patch]: Remove deprecated Airbyte loaders from listings (#23927) 2024-07-10 02:21:25 +00:00
airbyte_salesforce.ipynb docs[patch]: Remove deprecated Airbyte loaders from listings (#23927) 2024-07-10 02:21:25 +00:00
airbyte_shopify.ipynb docs[patch]: Remove deprecated Airbyte loaders from listings (#23927) 2024-07-10 02:21:25 +00:00
airbyte_stripe.ipynb docs[patch]: Remove deprecated Airbyte loaders from listings (#23927) 2024-07-10 02:21:25 +00:00
airbyte_typeform.ipynb docs[patch]: Remove deprecated Airbyte loaders from listings (#23927) 2024-07-10 02:21:25 +00:00
airbyte_zendesk_support.ipynb docs[patch]: Remove deprecated Airbyte loaders from listings (#23927) 2024-07-10 02:21:25 +00:00
airbyte.ipynb docs: remove spaces in percent pip (#27082) 2024-10-03 20:34:24 +00:00
airtable.ipynb community[patch]: Airtable to allow for addtl params (#22092) 2024-06-03 13:05:56 -07:00
alibaba_cloud_maxcompute.ipynb
amazon_textract.ipynb docs: fix typo on amazon_textract.ipynb (#26493) 2024-09-17 22:27:45 +00:00
apify_dataset.ipynb docs: update apify integration (#29553) 2025-02-12 20:02:55 -08:00
arcgis.ipynb
arxiv.ipynb multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
assemblyai.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
astradb.ipynb docs: improvements to Astra DB pages, especially modernize Vector DB example notebook (#30961) 2025-05-03 14:26:52 -04:00
async_chromium.ipynb docs[patch]: Address feedback from docs users (#23550) 2024-06-26 14:47:01 -07:00
async_html.ipynb community[minor]: allow enabling proxy in aiohttp session in AsyncHTML (#19499) 2024-05-22 18:25:06 +00:00
athena.ipynb docs: typo fix athena.ipynb and glue_catalog.ipynb (#27435) 2024-10-21 15:01:13 +00:00
aws_s3_directory.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
aws_s3_file.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
azlyrics.ipynb
azure_ai_data.ipynb
azure_blob_storage_container.ipynb
azure_blob_storage_file.ipynb
azure_document_intelligence.ipynb
bibtex.ipynb docs: integrations/providers update 9 (#19941) 2024-04-04 21:37:48 +00:00
bilibili.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
blackboard.ipynb
blockchain.ipynb docs: make docs mdxv2 compatible (#26798) 2024-09-23 21:24:23 -07:00
box.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
brave_search.ipynb
browserbase.ipynb [docs/community]: langchain docs + browserbaseloader fix (#30973) 2025-04-24 13:38:49 -04:00
browserless.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
bshtml.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
cassandra.ipynb
chatgpt_loader.ipynb
college_confidential.ipynb
concurrent.ipynb
confluence.ipynb community: support Confluence cookies (#28760) 2024-12-17 12:16:36 -05:00
conll-u.ipynb
copypaste.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
couchbase.ipynb docs: integrations/providers update 9 (#19941) 2024-04-04 21:37:48 +00:00
csv.ipynb docs[patch]: Update Unstructured loader notebooks and install instructions (#23726) 2024-07-01 13:36:48 -07:00
cube_semantic.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
datadog_logs.ipynb
dedoc.ipynb multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
diffbot.ipynb docs: Clean up Diffbot docs (#21781) 2024-05-20 23:09:22 +00:00
discord.ipynb
docling.ipynb docs: Fix for broken links for Docling project sites (#30313) 2025-03-17 16:47:09 +00:00
docugami.ipynb community[patch]: deprecate langchain_community Chroma in favor of langchain_chroma (#24474) 2024-07-22 11:00:13 -04:00
docusaurus.ipynb
dropbox.ipynb
duckdb.ipynb
email.ipynb docs[patch]: Update Unstructured loader notebooks and install instructions (#23726) 2024-07-01 13:36:48 -07:00
epub.ipynb docs[patch]: Update Unstructured loader notebooks and install instructions (#23726) 2024-07-01 13:36:48 -07:00
etherscan.ipynb
evernote.ipynb
facebook_chat.ipynb
fauna.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
figma.ipynb docs: make docs mdxv2 compatible (#26798) 2024-09-23 21:24:23 -07:00
firecrawl.ipynb Community: Updated Firecrawl Document Loader to v1 (#26548) 2024-10-15 13:13:28 +00:00
geopandas.ipynb
git.ipynb
gitbook.ipynb
github.ipynb "community: Fix GithubFileLoader source code", "docs: Fix GithubFileLoader code sample" (#19943) 2024-08-22 18:24:57 -04:00
glue_catalog.ipynb docs: typo fix athena.ipynb and glue_catalog.ipynb (#27435) 2024-10-21 15:01:13 +00:00
google_alloydb.ipynb
google_bigquery.ipynb docs: updated docs on langchain_google_community (#21064) 2024-04-30 20:20:49 -04:00
google_bigtable.ipynb docs; fix links in v0.2.0 (#21483) 2024-05-09 11:05:17 -04:00
google_cloud_sql_mssql.ipynb docs: update google_cloud_sql_mssql.ipynb (#29315) 2025-01-20 16:11:08 -05:00
google_cloud_sql_mysql.ipynb docs: throw on broken anchors (#27773) 2024-11-13 14:29:27 -05:00
google_cloud_sql_pg.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
google_cloud_storage_directory.ipynb docs: switched GCSLoaders docs to langchain-google-community (#20985) 2024-04-29 10:45:11 -04:00
google_cloud_storage_file.ipynb docs: switched GCSLoaders docs to langchain-google-community (#20985) 2024-04-29 10:45:11 -04:00
google_datastore.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
google_drive.ipynb docs: updated docs on langchain_google_community (#21064) 2024-04-30 20:20:49 -04:00
google_el_carro.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
google_firestore.ipynb docs; fix links in v0.2.0 (#21483) 2024-05-09 11:05:17 -04:00
google_memorystore_redis.ipynb docs; fix links in v0.2.0 (#21483) 2024-05-09 11:05:17 -04:00
google_spanner.ipynb docs; fix links in v0.2.0 (#21483) 2024-05-09 11:05:17 -04:00
google_speech_to_text.ipynb docs: fix the ImportError in google_speech_to_text.ipynb (#26522) 2024-09-17 22:18:57 +00:00
grobid.ipynb
gutenberg.ipynb
hacker_news.ipynb
huawei_obs_directory.ipynb
huawei_obs_file.ipynb
hugging_face_dataset.ipynb
hyperbrowser.ipynb docs: add HyperbrowserLoader docs (#29143) 2025-01-13 10:45:39 -05:00
ifixit.ipynb
image_captions.ipynb docs[patch]: Fix image caption document loader page and typo on custom tools page (#24635) 2024-07-24 17:16:18 -07:00
image.ipynb docs[patch]: Update Unstructured loader notebooks and install instructions (#23726) 2024-07-01 13:36:48 -07:00
imsdb.ipynb
index.mdx Update index.mdx (#29029) 2025-01-04 22:04:00 -05:00
iugu.ipynb
joplin.ipynb
json.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
jupyter_notebook.ipynb docs: notebook loader: change .html to .ipynb (#22407) 2024-06-03 14:26:28 +00:00
kinetica.ipynb community[patch]: Kinetica Integrations handled error in querying; quotes in table names; updated gpudb API (#22724) 2024-06-11 10:01:26 -04:00
lakefs.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
langsmith.ipynb concepts: update llm stub page and re-link (#27567) 2024-10-22 23:03:36 -04:00
larksuite.ipynb docs: add example for loading data from LarkSuite wiki. (#21311) 2024-05-06 09:56:12 -07:00
llmsherpa.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
mastodon.ipynb
mathpix.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
mediawikidump.ipynb docs: remove unnecessary args from the pip install (#19823) 2024-04-01 10:47:26 -04:00
merge_doc.ipynb
mhtml.ipynb
microsoft_excel.ipynb docs[patch]: Update Unstructured loader notebooks and install instructions (#23726) 2024-07-01 13:36:48 -07:00
microsoft_onedrive.ipynb community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716) 2024-11-06 17:44:34 -05:00
microsoft_onenote.ipynb
microsoft_powerpoint.ipynb docs[patch]: Update Unstructured loader notebooks and install instructions (#23726) 2024-07-01 13:36:48 -07:00
microsoft_sharepoint.ipynb community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716) 2024-11-06 17:44:34 -05:00
microsoft_word.ipynb docs[patch]: Update Unstructured loader notebooks and install instructions (#23726) 2024-07-01 13:36:48 -07:00
mintbase.ipynb docs: make docs mdxv2 compatible (#26798) 2024-09-23 21:24:23 -07:00
modern_treasury.ipynb
mongodb.ipynb docs: make docs mdxv2 compatible (#26798) 2024-09-23 21:24:23 -07:00
needle.ipynb community: add Needle retriever and document loader integration (#28157) 2024-12-03 22:06:25 +00:00
news.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
notion.ipynb docs: more indexing of document loaders (#25500) 2024-08-20 17:54:42 +00:00
nuclia.ipynb
obsidian.ipynb
odt.ipynb docs[patch]: Update Unstructured loader notebooks and install instructions (#23726) 2024-07-01 13:36:48 -07:00
open_city_data.ipynb
oracleadb_loader.ipynb Community: Add bind variable support for oracle adb docloader (#30937) 2025-04-21 08:47:33 -04:00
oracleai.ipynb docs: updates for OracleDB (#21745) 2024-05-20 16:01:35 -07:00
org_mode.ipynb docs[patch]: Update Unstructured loader notebooks and install instructions (#23726) 2024-07-01 13:36:48 -07:00
pandas_dataframe.ipynb
pdfminer.ipynb Fix typos in pdfminer and pymupdf documentations (#30513) 2025-03-27 11:29:11 -04:00
pdfplumber.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
pebblo.ipynb community[minor]: [Pebblo] Enhance PebbloSafeLoader to take anonymize flag (#26812) 2024-09-25 09:33:06 -04:00
polars_dataframe.ipynb
powerscale.ipynb docs: Add Dell PowerScale Document Loader (#30209) 2025-03-18 22:39:21 -04:00
psychic.ipynb docs: langchain-chroma package (#20394) 2024-04-12 11:17:05 -07:00
pubmed.ipynb
pull_md.ipynb docs: add langchain-pull-md Markdown loader (#29024) 2025-01-06 19:32:43 +00:00
pymupdf4llm.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
pymupdf.ipynb Fix typos in pdfminer and pymupdf documentations (#30513) 2025-03-27 11:29:11 -04:00
pypdfdirectory.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
pypdfium2.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
pypdfloader.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
pyspark_dataframe.ipynb
quip.ipynb
readthedocs_documentation.ipynb docs: update readthedocs document loader options (#29556) 2025-02-03 10:54:24 -05:00
recursive_url.ipynb docs: Minor typo fixed, install necessary pip (#28976) 2025-01-02 04:21:29 +00:00
reddit.ipynb
roam.ipynb
rockset.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
rspace.ipynb docs: make docs mdxv2 compatible (#26798) 2024-09-23 21:24:23 -07:00
rss.ipynb
rst.ipynb docs[patch]: Update Unstructured loader notebooks and install instructions (#23726) 2024-07-01 13:36:48 -07:00
scrapfly.ipynb Fix typo in ScrapflyLoader documentation (#26117) 2024-09-08 18:33:01 +00:00
scrapingant.ipynb multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
singlestore.ipynb docs: Register langchain-singlestore integration (#30841) 2025-04-18 12:11:33 -04:00
sitemap.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
slack.ipynb docs: Minor typo fixed, install necessary pip (#28976) 2025-01-02 04:21:29 +00:00
snowflake.ipynb
source_code.ipynb Community[minor]: Add language parser for Elixir (#22742) 2024-06-10 15:56:57 +00:00
spider.ipynb community: Spider integration (#20937) 2024-04-27 21:45:03 +00:00
spreedly.ipynb patch: deprecate (a)get_relevant_documents (#20477) 2024-04-22 11:14:53 -04:00
stripe.ipynb
subtitle.ipynb
surrealdb.ipynb docs: fix path for state_of_the_union sample file (#21609) 2024-05-13 11:46:02 -04:00
telegram.ipynb
tencent_cos_directory.ipynb
tencent_cos_file.ipynb
tensorflow_datasets.ipynb
tidb.ipynb
tomarkdown.ipynb docs: more api ref links, add linting step to prevent more (#28495) 2024-12-04 04:19:42 +00:00
toml.ipynb
trello.ipynb
tsv.ipynb docs[patch]: Update Unstructured loader notebooks and install instructions (#23726) 2024-07-01 13:36:48 -07:00
twitter.ipynb
unstructured_file.ipynb concepts: update llm stub page and re-link (#27567) 2024-10-22 23:03:36 -04:00
unstructured_markdown.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
unstructured_pdfloader.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
upstage.ipynb docs: Documentation update for Document Parse (#26844) 2024-10-03 20:36:04 +00:00
url.ipynb docs: integrationsreferences update (#25322) 2024-08-13 09:29:51 -04:00
vsdx.ipynb
weather.ipynb
web_base.ipynb community: Corrected aload func to be asynchronous from webBaseLoader (#28337) 2024-12-20 14:42:52 -05:00
whatsapp_chat.ipynb
wikipedia.ipynb docs: Updated WikipediaLoader documentation (#25647) 2024-08-23 01:19:03 -07:00
xml.ipynb docs: streamline LangSmith teasing (#30302) 2025-03-28 15:13:22 -04:00
xorbits.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
youtube_audio.ipynb Fixed the import error in OpenAIWhisperParserLocal and resolved the L… (#29168) 2025-01-13 09:47:31 -05:00
youtube_transcript.ipynb docs: udpated api reference (#25172) 2024-08-14 07:00:17 -07:00
yt_dlp.ipynb Add langchain-yt-dlp Document Loader Documentation (#28775) 2024-12-18 10:16:50 -05:00
yuque.ipynb docs: fix some notebook formatting (#21136) 2024-04-30 21:39:03 -07:00
zeroxpdfloader.ipynb community: ZeroxPDFLoader (#27800) 2024-11-07 03:14:57 +00:00