langchain/libs/community/langchain_community/document_loaders
Dong Shin 0b1359801e
community: add trust_env at web_base_loader (#28514)
- **Description:** I am working to address a similar issue to the one
mentioned in https://github.com/langchain-ai/langchain/pull/19499.
Specifically, there is a problem with the Webbase loader used in
open-webui, where it fails to load the proxy configuration. This PR aims
to resolve that issue.




<!--If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.-->

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-17 21:18:16 -05:00
..
blob_loaders all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
parsers community: 🐛 PDF Filter Type Error (#27154) 2024-12-13 23:30:29 +00:00
__init__.py community: add Needle retriever and document loader integration (#28157) 2024-12-03 22:06:25 +00:00
acreom.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
airbyte_json.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
airbyte.py
airtable.py docs: fix kwargs docstring (#25010) 2024-08-02 19:54:54 -07:00
apify_dataset.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
arcgis_loader.py
arxiv.py docs: Arxiv docs update (#23871) 2024-07-05 11:43:51 -04:00
assemblyai.py community[patch]: docstrings update (#20301) 2024-04-11 16:23:27 -04:00
astradb.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
async_html.py community[patch]: Release 0.2.11 (#24989) 2024-08-02 20:08:44 +00:00
athena.py community: make AthenaLoader profile_name optional and fix type hint (#24958) 2024-08-05 14:28:58 +00:00
azlyrics.py
azure_ai_data.py
azure_blob_storage_container.py
azure_blob_storage_file.py
baiducloud_bos_directory.py
baiducloud_bos_file.py
base_o365.py Community: add modified_since argument to O365BaseLoader (#28708) 2024-12-13 17:30:17 +00:00
base.py core: Move document loader interfaces to core (#17723) 2024-03-06 13:59:00 -05:00
bibtex.py
bigquery.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
bilibili.py community[patch]: docstrings update (#20301) 2024-04-11 16:23:27 -04:00
blackboard.py community: add flag to toggle progress bar (#24463) 2024-07-20 13:18:02 +00:00
blockchain.py community: add supported blockchains to Blockchain Document Loader (#25428) 2024-08-23 14:39:42 +00:00
brave_search.py
browserbase.py community: updated Browserbase loader (#21757) 2024-05-16 08:21:23 -07:00
browserless.py
cassandra.py community[minor]: Add Cassandra ByteStore (#22064) 2024-05-23 10:46:23 -04:00
chatgpt.py
chm.py
chromium.py community[minor]: add user agent for web scraping loaders (#22480) 2024-06-05 15:20:34 +00:00
college_confidential.py
concurrent.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
confluence.py community: support Confluence cookies (#28760) 2024-12-17 12:16:36 -05:00
conllu.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
couchbase.py
csv_loader.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
cube_semantic.py
datadog_logs.py
dataframe.py community[patch]: support modin document loader (#18866) 2024-03-10 18:40:04 -07:00
dedoc.py community[minor]: added new document loaders based on dedoc library (#24303) 2024-07-23 02:04:53 +00:00
diffbot.py
directory.py community: glob multiple patterns when using DirectoryLoader (#22852) 2024-06-18 09:24:50 -07:00
discord.py
doc_intelligence.py community: bytes as a source to AzureAIDocumentIntelligenceLoader (#26618) 2024-11-07 03:40:21 +00:00
docugami.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
docusaurus.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
dropbox.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
duckdb_loader.py
email.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
epub.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
etherscan.py
evernote.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
excel.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
facebook_chat.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
fauna.py
figma.py
firecrawl.py Community: Updated Firecrawl Document Loader to v1 (#26548) 2024-10-15 13:13:28 +00:00
gcs_directory.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
gcs_file.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
generic.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
geodataframe.py
git.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
gitbook.py community: add flag to toggle progress bar (#24463) 2024-07-20 13:18:02 +00:00
github.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
glue_catalog.py community[minor]: Add glue catalog loader (#20220) 2024-04-16 11:39:23 -04:00
google_speech_to_text.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
googledrive.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
gutenberg.py
helpers.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
hn.py
html_bs.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
html.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
hugging_face_dataset.py
hugging_face_model.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
ifixit.py
image_captions.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
image.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
imsdb.py
iugu.py
joplin.py
json_loader.py community[minor]: Fix json._validate_metadata_func() (#22842) 2024-12-13 21:24:20 +00:00
kinetica_loader.py community[patch]: Kinetica Integrations handled error in querying; quotes in table names; updated gpudb API (#22724) 2024-06-11 10:01:26 -04:00
lakefs.py
larksuite.py community[minor]: Add LarkSuite wiki document loader. (#21016) 2024-04-29 10:37:50 -04:00
llmsherpa.py community[minor]: add support for llmsherpa (#19741) 2024-03-29 16:04:57 -07:00
markdown.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
mastodon.py Merge pull request #18671 2024-03-06 13:23:14 -05:00
max_compute.py
mediawikidump.py
merge.py
mhtml.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
mintbase.py community[minor]: add mintbase loader to langchain (#20089) 2024-04-30 04:11:56 +00:00
modern_treasury.py
mongodb.py community: Enhance MongoDBLoader with flexible metadata and optimized field extraction (#23376) 2024-09-17 10:23:17 -04:00
needle.py community: add Needle retriever and document loader integration (#28157) 2024-12-03 22:06:25 +00:00
news.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
notebook.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
notion.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
notiondb.py community: Correctly handle multi-element rich text (#25762) 2024-12-16 20:20:27 +00:00
nuclia.py
obs_directory.py
obs_file.py
obsidian.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
odt.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
onedrive_file.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
onedrive.py community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716) 2024-11-06 17:44:34 -05:00
onenote.py community[patch]: Fix validation error in SettingsConfigDict across multiple Langchain modules (#26852) 2024-09-25 10:02:14 -04:00
open_city_data.py
oracleadb_loader.py community: Add support for clob datatype in oracle database (#27330) 2024-10-16 02:19:20 +00:00
oracleai.py community[minor]: Oraclevs integration (#21123) 2024-05-04 03:15:35 +00:00
org_mode.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
pdf.py community: ZeroxPDFLoader (#27800) 2024-11-07 03:14:57 +00:00
pebblo.py community[minor]: [Pebblo] Enhance PebbloSafeLoader to take anonymize flag (#26812) 2024-09-25 09:33:06 -04:00
polars_dataframe.py
powerpoint.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
psychic.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
pubmed.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
pyspark_dataframe.py
python.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
quip.py community[major]: lint for usage of xml library (#22132) 2024-05-24 15:23:53 +00:00
readthedocs.py
recursive_url_loader.py community[minor]: add proxy support to RecursiveUrlLoader (#27364) 2024-10-16 16:29:59 +00:00
reddit.py
roam.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
rocksetdb.py
rspace.py
rss.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
rst.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
rtf.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
s3_directory.py community[patch]: Skip nested directories when using S3DirectoryLoader (#17829) 2024-03-08 16:50:58 -08:00
s3_file.py community[patch]: support unstructured_kwargs for s3 loader (#15473) 2024-03-27 22:03:48 +00:00
scrapfly.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
scrapingant.py community[minor]: Add ScrapingAnt Loader Community Integration (#24514) 2024-07-24 21:11:43 -04:00
sharepoint.py community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716) 2024-11-06 17:44:34 -05:00
sitemap.py community[patch]: SitemapLoader restrict depth of parsing sitemap (CVE-2024-2965) (#22903) 2024-06-14 13:04:40 -04:00
slack_directory.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
snowflake_loader.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
spider.py doc list not empty (#21208) 2024-05-20 08:24:06 -07:00
spreedly.py
sql_database.py community[patch]: restore compatibility with SQLAlchemy 1.x (#22546) 2024-06-19 17:58:57 +00:00
srt.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
stripe.py
surrealdb.py
telegram.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tencent_cos_directory.py
tencent_cos_file.py
tensorflow_datasets.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
text.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tidb.py
tomarkdown.py community[patch]: Update URL to the 2markdown API (#24546) 2024-07-23 14:27:55 +00:00
toml.py
trello.py
tsv.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
twitter.py
unstructured.py multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
url_playwright.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
url_selenium.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
url.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
vsdx.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
weather.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
web_base.py community: add trust_env at web_base_loader (#28514) 2024-12-17 21:18:16 -05:00
whatsapp_chat.py
wikipedia.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
word_document.py Update word_document.py | Fixed metadata["source"] for web paths (#27220) 2024-10-31 18:37:41 +00:00
xml.py all: test 3.13 ci (#27197) 2024-10-25 12:56:58 -07:00
xorbits.py
youtube.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
yuque.py