langchain/docs/docs/integrations/document_loaders
Mazen Ramadan 3c1d77dd64
community[minor]: Add Scrapfly Loader community integration (#22036)
Added [Scrapfly](https://scrapfly.io/) Web Loader integration. Scrapfly
is a web scraping API that allows extracting web page data into
accessible markdown or text datasets.

- __Description__: Added Scrapfly web loader for retrieving web page
data as markdown or text.
- Dependencies: scrapfly-sdk
- Twitter: @thealchemi1st

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-22 21:29:13 +00:00
..
example_data docs: v0.2 docs in master (#21438) 2024-05-08 12:29:59 -07:00
acreom.ipynb docs: integrations/providers update 9 (#19941) 2024-04-04 21:37:48 +00:00
airbyte_cdk.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
airbyte_gong.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
airbyte_hubspot.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
airbyte_json.ipynb docs: deprecate old airbyte loader docs (#19048) 2024-03-13 23:18:30 +00:00
airbyte_salesforce.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
airbyte_shopify.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
airbyte_stripe.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
airbyte_typeform.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
airbyte_zendesk_support.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
airbyte.ipynb docs: airbyte deps note (#18243) 2024-02-29 16:02:13 -08:00
airtable.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
alibaba_cloud_maxcompute.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
amazon_textract.ipynb community[patch]: adding linearization config to AmazonTextractPDFLoader (#17489) 2024-03-08 17:25:22 -08:00
apify_dataset.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
arcgis.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
arxiv.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
assemblyai.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
astradb.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
async_chromium.ipynb docs: Update async_chromium.ipynb (#19514) 2024-03-26 00:02:50 +00:00
async_html.ipynb community[minor]: allow enabling proxy in aiohttp session in AsyncHTML (#19499) 2024-05-22 18:25:06 +00:00
athena.ipynb docs: integrations/providers update 9 (#19941) 2024-04-04 21:37:48 +00:00
aws_s3_directory.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
aws_s3_file.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
azlyrics.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
azure_ai_data.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
azure_blob_storage_container.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
azure_blob_storage_file.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
azure_document_intelligence.ipynb community[patch]: Microsoft Azure Document Intelligence updates (#16932) 2024-03-26 23:36:59 -07:00
bibtex.ipynb docs: integrations/providers update 9 (#19941) 2024-04-04 21:37:48 +00:00
bilibili.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
blackboard.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
blockchain.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
brave_search.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
browserbase.ipynb community: updated Browserbase loader (#21757) 2024-05-16 08:21:23 -07:00
browserless.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
cassandra.ipynb astradb: bootstrapping Astra DB as Partner Package (#16875) 2024-02-15 15:50:59 -08:00
chatgpt_loader.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
college_confidential.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
concurrent.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
confluence.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
conll-u.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
copypaste.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
couchbase.ipynb docs: integrations/providers update 9 (#19941) 2024-04-04 21:37:48 +00:00
csv.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
cube_semantic.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
datadog_logs.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
diffbot.ipynb docs: Clean up Diffbot docs (#21781) 2024-05-20 23:09:22 +00:00
discord.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
docugami.ipynb docs: v0.2 docs in master (#21438) 2024-05-08 12:29:59 -07:00
docusaurus.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
dropbox.ipynb docs: Remove non-rendering images & output spamming from doc ntbks (#19475) 2024-03-24 23:47:38 -07:00
duckdb.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
email.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
epub.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
etherscan.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
evernote.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
facebook_chat.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
fauna.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
figma.ipynb patch: deprecate (a)get_relevant_documents (#20477) 2024-04-22 11:14:53 -04:00
firecrawl.ipynb community[minor]: Firecrawl.dev integration (#20364) 2024-04-12 19:13:48 +00:00
geopandas.ipynb docs: make links internal (#19063) 2024-03-14 16:22:56 +00:00
git.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
gitbook.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
github.ipynb docs: Fix typo in github.ipynb (#17259) 2024-02-08 12:03:00 -08:00
glue_catalog.ipynb community[minor]: Add glue catalog loader (#20220) 2024-04-16 11:39:23 -04:00
google_alloydb.ipynb docs: update Google Cloud database integration docs (#18711) 2024-03-07 19:36:00 -08:00
google_bigquery.ipynb docs: updated docs on langchain_google_community (#21064) 2024-04-30 20:20:49 -04:00
google_bigtable.ipynb docs; fix links in v0.2.0 (#21483) 2024-05-09 11:05:17 -04:00
google_cloud_sql_mssql.ipynb docs; fix links in v0.2.0 (#21483) 2024-05-09 11:05:17 -04:00
google_cloud_sql_mysql.ipynb docs; fix links in v0.2.0 (#21483) 2024-05-09 11:05:17 -04:00
google_cloud_sql_pg.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
google_cloud_storage_directory.ipynb docs: switched GCSLoaders docs to langchain-google-community (#20985) 2024-04-29 10:45:11 -04:00
google_cloud_storage_file.ipynb docs: switched GCSLoaders docs to langchain-google-community (#20985) 2024-04-29 10:45:11 -04:00
google_datastore.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
google_drive.ipynb docs: updated docs on langchain_google_community (#21064) 2024-04-30 20:20:49 -04:00
google_el_carro.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
google_firestore.ipynb docs; fix links in v0.2.0 (#21483) 2024-05-09 11:05:17 -04:00
google_memorystore_redis.ipynb docs; fix links in v0.2.0 (#21483) 2024-05-09 11:05:17 -04:00
google_spanner.ipynb docs; fix links in v0.2.0 (#21483) 2024-05-09 11:05:17 -04:00
google_speech_to_text.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
grobid.ipynb docs: make links internal (#19063) 2024-03-14 16:22:56 +00:00
gutenberg.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
hacker_news.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
huawei_obs_directory.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
huawei_obs_file.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
hugging_face_dataset.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
ifixit.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
image_captions.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
image.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
imsdb.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
iugu.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
joplin.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
jupyter_notebook.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
kinetica.ipynb community[minor]: Implemented Kinetica Document Loader and added notebooks (#20002) 2024-04-25 13:39:00 -07:00
lakefs.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
larksuite.ipynb docs: add example for loading data from LarkSuite wiki. (#21311) 2024-05-06 09:56:12 -07:00
llmsherpa.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
mastodon.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
mediawikidump.ipynb docs: remove unnecessary args from the pip install (#19823) 2024-04-01 10:47:26 -04:00
merge_doc.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
mhtml.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
microsoft_excel.ipynb community[patch]: Microsoft Azure Document Intelligence updates (#16932) 2024-03-26 23:36:59 -07:00
microsoft_onedrive.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
microsoft_onenote.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
microsoft_powerpoint.ipynb community[patch]: Microsoft Azure Document Intelligence updates (#16932) 2024-03-26 23:36:59 -07:00
microsoft_sharepoint.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
microsoft_word.ipynb community[patch]: Microsoft Azure Document Intelligence updates (#16932) 2024-03-26 23:36:59 -07:00
mintbase.ipynb community[minor]: add mintbase loader to langchain (#20089) 2024-04-30 04:11:56 +00:00
modern_treasury.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
mongodb.ipynb community[patch]: documented the feature to filter documents in MongoDBloader (#18842) 2024-03-09 13:41:34 -08:00
news.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
notion.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
notiondb.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
nuclia.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
obsidian.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
odt.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
open_city_data.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
oracleadb_loader.ipynb docs: Fix oracle doc loader format issue (#19628) 2024-03-26 22:13:36 -07:00
oracleai.ipynb docs: updates for OracleDB (#21745) 2024-05-20 16:01:35 -07:00
org_mode.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
pandas_dataframe.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
pebblo.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
polars_dataframe.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
psychic.ipynb docs: langchain-chroma package (#20394) 2024-04-12 11:17:05 -07:00
pubmed.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
pyspark_dataframe.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
quip.ipynb docs: Fix broken imports in documentation (#19655) 2024-03-27 13:54:05 -04:00
readthedocs_documentation.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
recursive_url.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
reddit.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
roam.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
rockset.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
rspace.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
rss.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
rst.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
scrapfly.ipynb community[minor]: Add Scrapfly Loader community integration (#22036) 2024-05-22 21:29:13 +00:00
sitemap.ipynb docs: fixed xml URL on sitemap docs exmaple, issue #17236 (#17304) 2024-03-29 01:36:54 -07:00
slack.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
snowflake.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
source_code.ipynb text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346) 2024-02-29 18:33:21 -08:00
spider.ipynb community: Spider integration (#20937) 2024-04-27 21:45:03 +00:00
spreedly.ipynb patch: deprecate (a)get_relevant_documents (#20477) 2024-04-22 11:14:53 -04:00
stripe.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
subtitle.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
surrealdb.ipynb docs: fix path for state_of_the_union sample file (#21609) 2024-05-13 11:46:02 -04:00
telegram.ipynb community:update telegram notebook (#18569) 2024-03-05 11:47:17 -08:00
tencent_cos_directory.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
tencent_cos_file.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
tensorflow_datasets.ipynb docs, templates: update schema imports to core (#17885) 2024-02-22 15:58:44 -08:00
tidb.ipynb community[minor]: Add Initial Support for TiDB Vector Store (#15796) 2024-03-07 17:18:20 -08:00
tomarkdown.ipynb docs: link to langsmith+langgraph docs (#21930) 2024-05-20 13:05:22 -07:00
toml.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
trello.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
tsv.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
twitter.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
unstructured_file.ipynb docs: Fix link in Unstructured notebook (#19851) 2024-04-01 15:26:48 -04:00
upstage.ipynb upstage: deprecate UPSTAGE_DOCUMENT_AI_API_KEY (#21363) 2024-05-08 18:02:26 +00:00
url.ipynb docs: integrations/providers/unstructured update (#19892) 2024-04-04 21:31:27 +00:00
vsdx.ipynb community[minor]: New documents loader for visio files (with extension .vsdx) (#16171) 2024-01-22 22:07:03 -08:00
weather.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
web_base.ipynb community: Spider integration (#20937) 2024-04-27 21:45:03 +00:00
whatsapp_chat.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 2024-01-02 16:47:11 -05:00
wikipedia.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
xml.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
xorbits.ipynb docs: migrate integrations using langchain-cli (#21929) 2024-05-20 18:14:49 +00:00
youtube_audio.ipynb docs: use standard openai params (#20160) 2024-04-08 10:56:53 -05:00
youtube_transcript.ipynb docs: integration package pip installs (#15762) 2024-01-09 11:13:10 -08:00
yuque.ipynb docs: fix some notebook formatting (#21136) 2024-04-30 21:39:03 -07:00