langchain/libs/community/langchain_community/document_loaders
mwmajewsk f7a1fd91b8
community: better support of pathlib paths in document loaders (#18396)
So this arose from the
https://github.com/langchain-ai/langchain/pull/18397 problem of document
loaders not supporting `pathlib.Path`.

This pull request provides more uniform support for Path as an argument.
The core ideas for this upgrade: 
- if there is a local file path used as an argument, it should be
supported as `pathlib.Path`
- if there are some external calls that may or may not support Pathlib,
the argument is immidiately converted to `str`
- if there `self.file_path` is used in a way that it allows for it to
stay pathlib without conversion, is is only converted for the metadata.

Twitter handle: https://twitter.com/mwmajewsk
2024-03-26 11:51:52 -04:00
..
blob_loaders community[patch]: speed up import times in the community package (#18928) 2024-03-11 16:37:36 -04:00
parsers community[patch]: speed up import times in the community package (#18928) 2024-03-11 16:37:36 -04:00
__init__.py community[patch]: speed up import times in the community package (#18928) 2024-03-11 16:37:36 -04:00
acreom.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
airbyte_json.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
airbyte.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
airtable.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
apify_dataset.py
arcgis_loader.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
arxiv.py community[minor]: Implement lazy_load() for ArxivLoader (#18664) 2024-03-06 09:16:49 -05:00
assemblyai.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
astradb.py community[patch]: Use langchain-astradb for AstraDB doc loader (#19071) 2024-03-15 22:57:25 +00:00
async_html.py
athena.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
azlyrics.py
azure_ai_data.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
azure_blob_storage_container.py community[patch]: type ignore fixes (#18395) 2024-03-01 11:21:02 -08:00
azure_blob_storage_file.py
baiducloud_bos_directory.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
baiducloud_bos_file.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
base_o365.py
base.py core: Move document loader interfaces to core (#17723) 2024-03-06 13:59:00 -05:00
bibtex.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
bigquery.py
bilibili.py
blackboard.py community[patch]: type ignore fixes (#18395) 2024-03-01 11:21:02 -08:00
blockchain.py
brave_search.py
browserless.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
cassandra.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
chatgpt.py
chm.py
chromium.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
college_confidential.py
concurrent.py
confluence.py community[patch]: expanding version in confluence loader (#19324) 2024-03-25 17:08:01 -07:00
conllu.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
couchbase.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
csv_loader.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
cube_semantic.py community[patch]: Implement lazy_load() for CubeSemanticLoader (#18535) 2024-03-05 17:32:31 -08:00
datadog_logs.py
dataframe.py community[patch]: support modin document loader (#18866) 2024-03-10 18:40:04 -07:00
diffbot.py
directory.py community[minor]: add exclude parameter to DirectoryLoader (#17316) 2024-02-16 09:42:42 -05:00
discord.py
doc_intelligence.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
docugami.py deprecate community docugami loader (#19230) 2024-03-18 12:56:47 -07:00
docusaurus.py
dropbox.py
duckdb_loader.py
email.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
epub.py
etherscan.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
evernote.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
excel.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
facebook_chat.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
fauna.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
figma.py
gcs_directory.py
gcs_file.py
generic.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
geodataframe.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
git.py Merge pull request #18539 2024-03-06 13:25:14 -05:00
gitbook.py community[minor]: Implement lazy_load() for GitbookLoader (#18670) 2024-03-06 09:14:36 -05:00
github.py community: Implement lazy_load() for GithubFileLoader (#18584) 2024-03-05 09:35:50 -08:00
google_speech_to_text.py
googledrive.py
gutenberg.py
helpers.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
hn.py
html_bs.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
html.py
hugging_face_dataset.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
hugging_face_model.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
ifixit.py
image_captions.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
image.py
imsdb.py
iugu.py
joplin.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
json_loader.py community: Implement lazy_load() for JSONLoader (#18643) 2024-03-08 13:58:17 -05:00
lakefs.py
larksuite.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
markdown.py
mastodon.py Merge pull request #18671 2024-03-06 13:23:14 -05:00
max_compute.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
mediawikidump.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
merge.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
mhtml.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
modern_treasury.py
mongodb.py community[minor]: added a feature to filter documents in Mongoloader (#18253) 2024-03-08 12:06:35 -08:00
news.py
notebook.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
notion.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
notiondb.py community[patch]: Fix NotionDBLoader 400 Error by conditionally adding filter parameter (#19075) 2024-03-14 13:56:57 +00:00
nuclia.py
obs_directory.py
obs_file.py
obsidian.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
odt.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
onedrive_file.py
onedrive.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
onenote.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
open_city_data.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
org_mode.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
pdf.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
pebblo.py community[patch]: Fix pwd import that is not available on windows (#17532) 2024-02-14 13:45:10 -08:00
polars_dataframe.py
powerpoint.py
psychic.py Merge pull request #18656 2024-03-06 13:05:04 -05:00
pubmed.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
pyspark_dataframe.py
python.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
quip.py
readthedocs.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
recursive_url_loader.py community[patch]: RecursiveUrlLoader: add base_url option (#19421) 2024-03-22 15:34:31 -07:00
reddit.py
roam.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
rocksetdb.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
rspace.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
rss.py
rst.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
rtf.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
s3_directory.py community[patch]: Skip nested directories when using S3DirectoryLoader (#17829) 2024-03-08 16:50:58 -08:00
s3_file.py community[minor]: S3FileLoader to use expose mode and post_processors arguments of unstructured loader (#19270) 2024-03-25 06:56:55 +00:00
sharepoint.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
sitemap.py community[minor]: Implement lazy_load() for SitemapLoader (#18667) 2024-03-06 09:15:35 -05:00
slack_directory.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
snowflake_loader.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
spreedly.py
sql_database.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
srt.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
stripe.py
surrealdb.py
telegram.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tencent_cos_directory.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tencent_cos_file.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tensorflow_datasets.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
text.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tidb.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tomarkdown.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
toml.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
trello.py community: Implement lazy_load() for TrelloLoader (#18658) 2024-03-06 13:04:36 -05:00
tsv.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
twitter.py
unstructured.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
url_playwright.py community: Implement lazy_load() for PlaywrightURLLoader (#18676) 2024-03-06 16:52:13 -05:00
url_selenium.py
url.py
vsdx.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
weather.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
web_base.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
whatsapp_chat.py community: Implement lazy_load() for WhatsAppChatLoader (#18677) 2024-03-06 13:03:46 -05:00
wikipedia.py community[minor]: Implement lazy_load() for WikipediaLoader (#18680) 2024-03-06 13:03:21 -05:00
word_document.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
xml.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
xorbits.py
youtube.py
yuque.py community[minor]: add Yuque document loader (#17924) 2024-03-05 15:54:07 -08:00