langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-08-20 18:07:05 +00:00

Author	SHA1	Message	Date
yeounhak	f38fc89f35	community: Corrected aload func to be asynchronous from webBaseLoader (#28337 ) - Description: The aload function, contrary to its name, is not an asynchronous function, so it cannot work concurrently with other asynchronous functions. - Issue: #28336 - Test: : Done - Docs: [here](`e0a95e5646/docs/docs/integrations/document_loaders/web_base.ipynb (L201)`) - Lint: All checks passed If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-20 14:42:52 -05:00
Anusha Karkhanis	26bdf40072	Langchain_Community: SQL LanguageParser (#28430 ) ## Description (This PR has contributions from @khushiDesai, @ashvini8, and @ssumaiyaahmed). This PR addresses Issue #11229 which addresses the need for SQL support in document parsing. This is integrated into the generic TreeSitter parsing library, allowing LangChain users to easily load codebases in SQL into smaller, manageable "documents." This pull request adds a new ```SQLSegmenter``` class, which provides the SQL integration. ## Issue Issue #11229: Add support for a variety of languages to LanguageParser ## Testing We created a file ```test_sql.py``` with several tests to ensure the ```SQLSegmenter``` is functional. Below are the tests we added: - ```def test_is_valid```: Checks SQL validity. - ```def test_extract_functions_classes```: Extracts individual SQL statements. - ```def test_simplify_code```: Simplifies SQL code with comments. --------- Co-authored-by: Syeda Sumaiya Ahmed <114104419+ssumaiyaahmed@users.noreply.github.com> Co-authored-by: ashvini hunagund <97271381+ashvini8@users.noreply.github.com> Co-authored-by: Khushi Desai <khushi.desai@advantawitty.com> Co-authored-by: Khushi Desai <59741309+khushiDesai@users.noreply.github.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-19 20:30:57 +00:00
Martin Triska	e6b41d081d	community: DocumentLoaderAsParser wrapper (#27749 ) ## Description This pull request introduces the `DocumentLoaderAsParser` class, which acts as an adapter to transform document loaders into parsers within the LangChain framework. The class enables document loaders that accept a `file_path` parameter to be utilized as blob parsers. This is particularly useful for integrating various document loading capabilities seamlessly into the LangChain ecosystem. When merged in together with PR https://github.com/langchain-ai/langchain/pull/27716 It opens options for `SharePointLoader` / `OneDriveLoader` to process any filetype that has a document loader. ### Features - Flexible Parsing: The `DocumentLoaderAsParser` class can adapt any document loader that meets the criteria of accepting a `file_path` argument, allowing for lazy parsing of documents. - Compatibility: The class has been designed to work with various document loaders, making it versatile for different use cases. ### Usage Example To use the `DocumentLoaderAsParser`, you would initialize it with a suitable document loader class and any required parameters. Here’s an example of how to do this with the `UnstructuredExcelLoader`: ```python from langchain_community.document_loaders.blob_loaders import Blob from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser from langchain_community.document_loaders.excel import UnstructuredExcelLoader # Initialize the parser adapter with UnstructuredExcelLoader xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged") # Use parser, for ex. pass it to MimeTypeBasedParser MimeTypeBasedParser( handlers={ "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": xlsx_parser } ) ``` - Dependencies: None - Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 12:47:08 -05:00
Mohammad Mohtashim	d49df4871d	[Community]: Image Extraction Fixed for `PDFPlumberParser` (#28491 ) - Description: One-Bit Images was raising error which has been fixed in this PR for `PDFPlumberParser` - Issue: #28480 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 11:45:48 -05:00
Dong Shin	0b1359801e	community: add trust_env at web_base_loader (#28514 ) - Description: I am working to address a similar issue to the one mentioned in https://github.com/langchain-ai/langchain/pull/19499. Specifically, there is a problem with the Webbase loader used in open-webui, where it fails to load the proxy configuration. This PR aims to resolve that issue. <!--If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.--> --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-17 21:18:16 -05:00
Frank Dai	e81433497b	community: support Confluence cookies (#28760 ) Description: Some confluence instances don't support personal access token, then cookie is a convenient way to authenticate. This PR adds support for Confluence cookies. Twitter handle: soulmachine	2024-12-17 12:16:36 -05:00
Hiros	8f5e72de05	community: Correctly handle multi-element rich text (#25762 ) Description: - Add _concatenate_rich_text method to combine all elements in rich text arrays - Update load_page method to use _concatenate_rich_text for rich text properties - Ensure all text content is captured, including inline code and formatted text - Add unit tests to verify correct handling of multi-element rich text This fix prevents truncation of content after backticks or other formatting elements. Issue: Using Notion DB Loader, the text for `richtext` and `title` is truncated after 1st element was loaded as Notion Loader only read the first element. Dependencies: any dependencies required for this change None. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 20:20:27 +00:00
Vimpas	337fed80a5	community: 🐛 PDF Filter Type Error (#27154 ) Thank you for contributing to LangChain! PR title: "community: fix PDF Filter Type Error" - Description: fix PDF Filter Type Error" - Issue: the issue #27153 it fixes, - Dependencies: no - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 23:30:29 +00:00
Philippe PRADOS	f3fb5a9c68	community[minor]: Fix json._validate_metadata_func() (#22842 ) JSONparse, in _validate_metadata_func(), checks the consistency of the _metadata_func() function. To do this, it invokes it and makes sure it receives a dictionary in response. However, during the call, it does not respect future calls, as shown on line 100. This generates errors if, for example, the function is like this: ```python def generate_metadata(json_node:Dict[str,Any],kwargs:Dict[str,Any]) -> Dict[str,Any]: return { "source": url, "row": kwargs['seq_num'], "question":json_node.get("question"), } loader = JSONLoader( file_path=file_path, content_key="answer", jq_schema='.[]', metadata_func=generate_metadata, text_content=False) ``` To avoid this, the verification must comply with the specifications. This patch does just that. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 21:24:20 +00:00
Martin Triska	05ebe1e66b	Community: add `modified_since` argument to `O365BaseLoader` (#28708 ) ## What are we doing in this PR We're adding `modified_since` optional argument to `O365BaseLoader`. When set, O365 loader will only load documents newer than `modified_since` datetime. ## Why? OneDrives / Sharepoints can contain large number of documents. Current approach is to download and parse all files and let indexer to deal with duplicates. This can be prohibitively time-consuming. Especially when using OCR-based parser like [zerox](`fa06188834/libs/community/langchain_community/document_loaders/pdf.py (L948)`). This argument allows to skip documents that are older than known time of indexing. _Q: What if a file was modfied during last indexing process? A: Users can set the `modified_since` conservatively and indexer will still take care of duplicates._ If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 17:30:17 +00:00
Botong Zhu	13c3c4a210	community: fixes json loader not getting texts with json standard (#27327 ) This PR fixes JSONLoader._get_text not converting objects to json string correctly. If an object is serializable and is not a dict, JSONLoader will use python built-in str() method to convert it to string. This may cause object converted to strings not following json standard. For example, a list will be converted to string with single quotes, and if json.loads try to load this string, it will cause error. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 19:33:45 +00:00
Mohammad Mohtashim	fa155a422f	[Community]: `requests_kwargs` not being used in _fetch (#28646 ) - Description: `requests_kwargs` is not being passed to `_fetch` which is fetching pages asynchronously. In this PR, making sure that we are passing `requests_kwargs` to `_fetch` just like `_scrape`. - Issue: #28634 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 23:46:54 +00:00
Johannes Mohren	c1d348e95d	doc-loader: retain Azure Doc Intelligence API metadata in Document parser (#28382 ) Description: This PR modifies the doc_intelligence.py parser in the community package to include all metadata returned by the Azure Doc Intelligence API in the Document object. Previously, only the parsed content (markdown) was retained, while other important metadata such as bounding boxes (bboxes) for images and tables was discarded. These image bboxes are crucial for supporting use cases like multi-modal RAG workflows when using Azure Doc Intelligence. The change ensures that all information returned by the Azure Doc Intelligence API is preserved by setting the metadata attribute of the Document object to the entire result returned by the API, rather than an empty dictionary. This extends the parser's utility for complex use cases without breaking existing functionality. Issue: This change does not address a specific issue number, but it resolves a critical limitation in supporting multimodal workflows when using the LangChain wrapper for the Azure API. Dependencies: No additional dependencies are required for this change. --------- Co-authored-by: jmohren <johannes.mohren@aol.de>	2024-12-10 11:22:58 -05:00
Alex Tonkonozhenko	0d20c314dd	Confluence Loader: Fix CQL loading (#27620 ) fix #12082 <!--- If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. -->	2024-12-10 11:05:23 -05:00
Martin Triska	75bc6bb191	community: [bugfix] fix source path for office files in O365 (#28260 ) # What problem are we fixing? Currently documents loaded using `O365BaseLoader` fetch source from `file.web_url` (where `file` is `<class 'O365.drive.File'>`). This works well for `.pdf` documents. Unfortunately office documents (`.xlsx`, `.docx` ...) pass their `web_url` in following format: `https://sharepoint_address/sites/path/to/library/root/Doc.aspx?sourcedoc=%XXXXXXXX-1111-1111-XXXX-XXXXXXXXXX%7D&file=filename.xlsx&action=default&mobileredirect=true` This obfuscates the path to the file. This PR utilizes the parrent folder's path and file name to reconstruct the actual location of the file. Knowing the file's location can be crucial for some RAG applications (path to the file can carry information we don't want to loose). @vbarda Could you please look at this one? I'm @-mentioning you since we've already closed some PRs together :-) Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 12:34:59 -08:00
Naka Masato	ce3b69aa05	community: add include_labels option to ConfluenceLoader (#28259 ) ## Description: Enable `ConfluenceLoader` to include labels with `include_labels` option (`false` by default for backward compatibility). and the labels are set to `metadata` in the `Document`. e.g. `{"labels": ["l1", "l2"]}` ## Notes Confluence API supports to get labels by providing `metadata.labels` to `expand` query parameter All of the following functions support `expand` in the same way: - confluence.get_page_by_id - confluence.get_all_pages_by_label - confluence.get_all_pages_from_space - cql (internally using [/api/content/search](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content/#api-wiki-rest-api-content-search-get)) ## Issue: No issue related to this PR. ## Dependencies: No changes. ## Twitter handle: [@gymnstcs](https://x.com/gymnstcs) - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 19:35:01 +00:00
Jan Heimes	ef365543cb	community: add Needle retriever and document loader integration (#28157 ) - [x] PR title: "community: add Needle retriever and document loader integration" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: This PR adds a new integration for Needle, which includes: - NeedleRetriever: A retriever for fetching documents from Needle collections. - NeedleLoader: A document loader for managing and loading documents into Needle collections. - Example notebooks demonstrating usage have been added in: - `docs/docs/integrations/retrievers/needle.ipynb` - `docs/docs/integrations/document_loaders/needle.ipynb`. - Dependencies: The `needle-python` package is required as an external dependency for accessing Needle's API. It has been added to the extended testing dependencies list. - Twitter handle: Feel free to mention me if this PR gets announced: [needlexai](https://x.com/NeedlexAI). - [x] Add tests and docs: If you're adding a new integration, please include 1. Unit tests have been added for both `NeedleRetriever` and `NeedleLoader` in `libs/community/tests/unit_tests`. These tests mock API calls to avoid relying on network access. 2. Example notebooks have been added to `docs/docs/integrations/`, showcasing both retriever and loader functionality. - [x] Lint and test: Run `make format`, `make lint`, and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ - `make format`: Passed - `make lint`: Passed - `make test`: Passed (requires `needle-python` to be installed locally; this package is not added to LangChain dependencies). Additional guidelines: - [x] Optional dependencies are imported only within functions. - [x] No dependencies have been added to pyproject.toml files except for those required for unit tests. - [x] The PR does not touch more than one package. - [x] Changes are fully backwards compatible. - [x] Community additions are not re-imported into LangChain core. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-03 22:06:25 +00:00
Aksel Joonas Reedi	2cb39270ec	community: bytes as a source to `AzureAIDocumentIntelligenceLoader` (#26618 ) - Description: This PR adds functionality to pass in in-memory bytes as a source to `AzureAIDocumentIntelligenceLoader`. - Issue: I needed the functionality, so I added it. - Dependencies: NA - Twitter handle: @akseljoonas if this is a big enough change :) --------- Co-authored-by: Aksel Joonas Reedi <aksel@klippa.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 03:40:21 +00:00
Martin Triska	7a9149f5dd	community: ZeroxPDFLoader (#27800 ) # OCR-based PDF loader This implements [Zerox](https://github.com/getomni-ai/zerox) PDF document loader. Zerox utilizes simple but very powerful (even though slower and more costly) approach to parsing PDF documents: it converts PDF to series of images and passes it to a vision model requesting the contents in markdown. It is especially suitable for complex PDFs that are not parsed well by other alternatives. ## Example use: ```python from langchain_community.document_loaders.pdf import ZeroxPDFLoader os.environ["OPENAI_API_KEY"] = "" ## your-api-key model = "gpt-4o-mini" ## openai model pdf_url = "https://assets.ctfassets.net/f1df9zr7wr1a/soP1fjvG1Wu66HJhu3FBS/034d6ca48edb119ae77dec5ce01a8612/OpenAI_Sacra_Teardown.pdf" loader = ZeroxPDFLoader(file_path=pdf_url, model=model) docs = loader.load() ``` The Zerox library supports wide range of provides/models. See Zerox documentation for details. - Dependencies: `zerox` - Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-11-07 03:14:57 +00:00
Martin Triska	90189f5639	community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716 ) ## What this PR does? ### Currently `O365BaseLoader` (and consequently both derived loaders) are limited to `pdf`, `doc`, `docx` files. - Solution: here we introduce _handlers_ attribute that allows for custom handlers to be passed in. This is done in _dict_ form: Example: ```python from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser # PR for DocumentLoaderAsParser here: https://github.com/langchain-ai/langchain/pull/27749 from langchain_community.document_loaders.excel import UnstructuredExcelLoader xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged") # create dictionary mapping file types to handlers (parsers) handlers = { "doc": MsWordParser() "pdf": PDFMinerParser() "txt": TextParser() "xlsx": xlsx_parser } loader = SharePointLoader(document_library_id="...", handlers=handlers # pass handlers to SharePointLoader ) documents = loader.load() # works the same in OneDriveLoader loader = OneDriveLoader(document_library_id="...", handlers=handlers ) ``` This dictionary is then passed to `MimeTypeBasedParser` same as in the [current implementation](`5a2cfb49e0/libs/community/langchain_community/document_loaders/parsers/registry.py (L13)`). ### Currently `SharePointLoader` and `OneDriveLoader` are separate loaders that both inherit from `O365BaseLoader` However both of these implement the same functionality. The only differences are: - `SharePointLoader` requires argument `document_library_id` whereas `OneDriveLoader` requires `drive_id`. These are just different names for the same thing. - `SharePointLoader` implements significantly more features. - Solution: `OneDriveLoader` is replaced with an empty shell just renaming `drive_id` to `document_library_id` and inheriting from `SharePointLoader` Dependencies: None Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-11-06 17:44:34 -05:00
Ankan Mahapatra	905f43377b	Update word_document.py \| Fixed metadata["source"] for web paths (#27220 ) The metadata["source"] value for the web paths was being set to temporary path (/tmp). Fixed it by creating a new variable self.original_file_path, which will store the original path. Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-10-31 18:37:41 +00:00
ccurme	0172d938b4	community: add AzureOpenAIWhisperParser (#27796 ) Commandeered from https://github.com/langchain-ai/langchain/pull/26757. --------- Co-authored-by: Sheepsta300 <128811766+Sheepsta300@users.noreply.github.com>	2024-10-31 12:37:41 -04:00
Erick Friis	600b7bdd61	all: test 3.13 ci (#27197 ) Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-10-25 12:56:58 -07:00
ZhangShenao	455ab7d714	Improvement[Community] Improve Document Loaders and Splitters (#27568 ) - Fix word spelling error - Add static method decorator - Fix language splitter Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-24 21:42:16 +00:00
orkhank	9a277cbe00	community: Update `file_path` type in `JSONLoader.__init__()` signature (#27535 ) - Description: Change the type of the `file_path` argument from `str \| pathlib.Path` to `str \| os.PathLike`, since the latter is more widely used: https://stackoverflow.com/a/58541858 This is a very minor fix. I was just annoyed to see the red underline displayed by Pylance in VS Code: `reportArgumentType`. ![image](https://github.com/user-attachments/assets/719a7f8e-acca-4dfa-89df-925e1d938c71) The changes do not affect the behavior of the code.	2024-10-22 11:18:36 -07:00
sByteman	31e7664afd	community[minor]: add proxy support to RecursiveUrlLoader (#27364 ) Description This PR introduces the proxies parameter to the RecursiveUrlLoader class, allowing the user to specify proxy servers for requests. This update enables crawling through proxy servers, providing enhanced flexibility for network configurations. The key changes include: 1.Added an optional proxies parameter to the constructor (__init__). 2.Updated the documentation to explain the proxies parameter usage with an example. 3.Modified the _get_child_links_recursive method to pass the proxies parameter to the requests.get function. Sample Usage ```python from bs4 import BeautifulSoup as Soup from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader proxies = { "http": "http://localhost:1080", "https": "http://localhost:1080", } url = "https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel" loader = RecursiveUrlLoader( url=url, max_depth=1, extractor=lambda x: Soup(x, "html.parser").text,proxies=proxies ) docs = loader.load() ``` --------- Co-authored-by: root <root@thb>	2024-10-16 16:29:59 +00:00
xsai9101	15c1ddaf99	community: Add support for clob datatype in oracle database (#27330 ) Description: This PR add support of clob/blob data type for oracle document loader, clob/blob can only be read by oracledb package when connection is open, so reformat code to process data before connection closes. Dependencies: oracledb package same as before. pip install oracledb Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-16 02:19:20 +00:00
Rafael Miller	fc14f675f1	Community: Updated Firecrawl Document Loader to v1 (#26548 ) This PR updates the Firecrawl Document Loader to use the recently released V1 API of Firecrawl. Key Updates: Firecrawl V1 Integration: Updated the document loader to leverage the new Firecrawl V1 API for improved performance, reliability, and developer experience. Map Functionality Added: Introduced the map mode for more flexible document loading options. These updates enhance the integration and provide access to the latest features of Firecrawl. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-10-15 13:13:28 +00:00
Martin Triska	8dc4bec947	[community] [Bugfix] base_o365 document loader metadata needs to be JSON serializable (#26322 ) In order for indexer to work, all metadata in the documents need to be JSON serializable. Timestamps are not. See here: https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/indexing/api.py#L83-L89 @eyurtsev could you please review? It's a tiny PR :-)	2024-10-14 12:48:31 -04:00
Erick Friis	7ecd720120	multiple: update docs urls to latest 2 (#26837 )	2024-09-30 17:37:07 -07:00
Rajendra Kadam	51c4393298	community[patch]: Fix validation error in SettingsConfigDict across multiple Langchain modules (#26852 ) - Description: This pull request addresses the validation error in `SettingsConfigDict` due to extra fields in the `.env` file. The issue is prevalent across multiple Langchain modules. This fix ensures that extra fields in the `.env` file are ignored, preventing validation errors. Changes include: - Applied fixes to modules using `SettingsConfigDict`. - Issue: NA, similar https://github.com/langchain-ai/langchain/issues/26850 - Dependencies: NA	2024-09-25 10:02:14 -04:00
Rajendra Kadam	7e5a9c317f	community[minor]: [Pebblo] Enhance PebbloSafeLoader to take anonymize flag (#26812 ) - Description: The flag is named `anonymize_snippets`. When set to true, the Pebblo server will anonymize snippets by redacting all personally identifiable information (PII) from the snippets going into VectorDB and the generated reports - Issue: NA - Dependencies: NA - docs: Updated	2024-09-25 09:33:06 -04:00
Rajendra Kadam	92003b3724	community[patch]: [SharePointLoader] Fix validation error in _O365Settings due to extra fields in .env file (#26851 ) Description: Fix validation error in _O365Settings by ignoring extra fields in .env file Issue: https://github.com/langchain-ai/langchain/issues/26850 Dependencies: NA	2024-09-25 09:31:59 -04:00
Rajendra Kadam	60dc19da30	[community] Added PebbloTextLoader for loading text data in PebbloSafeLoader (#26582 ) - Description: Added PebbloTextLoader for loading text in PebbloSafeLoader. - Since PebbloSafeLoader wraps document loaders, this new loader enables direct loading of text into Documents using PebbloSafeLoader. - Issue: NA - Dependencies: NA - [x] Tests: Added/Updated tests	2024-09-19 09:59:04 -04:00
Erik	4e0a6ebe7d	community: Add warning when page_content is empty (#25955 ) Page content sometimes is empty when PyMuPDF can not find text on pages. For example, this can happen when the text of the PDF is not copyable "by hand". Then an OCR solution is need - which is not integrated here. This warning should accurately warn the user that some pages are lost during this process. Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-09-19 05:22:09 +00:00
Tibor Reiss	85caaa773f	docs[community]: Fix raw string in docstring (#26350 ) Fixes #26212: replaced the raw string with backslashes. Alternative: raw-stringif the full docstring. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-09-19 04:18:56 +00:00
RUO	0a177ec2cc	community: Enhance MongoDBLoader with flexible metadata and optimized field extraction (#23376 ) ### Description: This pull request significantly enhances the MongodbLoader class in the LangChain community package by adding robust metadata customization and improved field extraction capabilities. The updated class now allows users to specify additional metadata fields through the metadata_names parameter, enabling the extraction of both top-level and deeply nested document attributes as metadata. This flexibility is crucial for users who need to include detailed contextual information without altering the database schema. Moreover, the include_db_collection_in_metadata flag offers optional inclusion of database and collection names in the metadata, allowing for even greater customization depending on the user's needs. The loader's field extraction logic has been refined to handle missing or nested fields more gracefully. It now employs a safe access mechanism that avoids the KeyError previously encountered when a specified nested field was absent in a document. This update ensures that the loader can handle diverse and complex data structures without failure, making it more resilient and user-friendly. ### Issue: This pull request addresses a critical issue where the MongodbLoader class in the LangChain community package could throw a KeyError when attempting to access nested fields that may not exist in some documents. The previous implementation did not handle the absence of specified nested fields gracefully, leading to runtime errors and interruptions in data processing workflows. This enhancement ensures robust error handling by safely accessing nested document fields, using default values for missing data, thus preventing KeyError and ensuring smoother operation across various data structures in MongoDB. This improvement is crucial for users working with diverse and complex data sets, ensuring the loader can adapt to documents with varying structures without failing. ### Dependencies: Requires motor for asynchronous MongoDB interaction. ### Twitter handle: N/A ### Add tests and docs Tests: Unit tests have been added to verify that the metadata inclusion toggle works as expected and that the field extraction correctly handles nested fields. Docs: An example notebook demonstrating the use of the enhanced MongodbLoader is included in the docs/docs/integrations directory. This notebook includes setup instructions, example usage, and outputs. (Here is the notebook link : [colab link](https://colab.research.google.com/drive/1tp7nyUnzZa3dxEFF4Kc3KS7ACuNF6jzH?usp=sharing)) Lint and test Before submitting, I ran make format, make lint, and make test as per the contribution guidelines. All tests pass, and the code style adheres to the LangChain standards. ```python import unittest from unittest.mock import patch, MagicMock import asyncio from langchain_community.document_loaders.mongodb import MongodbLoader class TestMongodbLoader(unittest.TestCase): def setUp(self): """Setup the MongodbLoader test environment by mocking the motor client and database collection interactions.""" # Mocking the AsyncIOMotorClient self.mock_client = MagicMock() self.mock_db = MagicMock() self.mock_collection = MagicMock() self.mock_client.get_database.return_value = self.mock_db self.mock_db.get_collection.return_value = self.mock_collection # Initialize the MongodbLoader with test data self.loader = MongodbLoader( connection_string="mongodb://localhost:27017", db_name="testdb", collection_name="testcol" ) @patch('langchain_community.document_loaders.mongodb.AsyncIOMotorClient', return_value=MagicMock()) def test_constructor(self, mock_motor_client): """Test if the constructor properly initializes with the correct database and collection names.""" loader = MongodbLoader( connection_string="mongodb://localhost:27017", db_name="testdb", collection_name="testcol" ) self.assertEqual(loader.db_name, "testdb") self.assertEqual(loader.collection_name, "testcol") def test_aload(self): """Test the aload method to ensure it correctly queries and processes documents.""" # Setup mock data and responses for the database operations self.mock_collection.count_documents.return_value = asyncio.Future() self.mock_collection.count_documents.return_value.set_result(1) self.mock_collection.find.return_value = [ {"_id": "1", "content": "Test document content"} ] # Run the aload method and check responses loop = asyncio.get_event_loop() results = loop.run_until_complete(self.loader.aload()) self.assertEqual(len(results), 1) self.assertEqual(results[0].page_content, "Test document content") def test_construct_projection(self): """Verify that the projection dictionary is constructed correctly based on field names.""" self.loader.field_names = ['content', 'author'] self.loader.metadata_names = ['timestamp'] expected_projection = {'content': 1, 'author': 1, 'timestamp': 1} projection = self.loader._construct_projection() self.assertEqual(projection, expected_projection) if __name__ == '__main__': unittest.main() ``` ### Additional Example for Documentation Sample Data: ```json [ { "_id": "1", "title": "Artificial Intelligence in Medicine", "content": "AI is transforming the medical industry by providing personalized medicine solutions.", "author": { "name": "John Doe", "email": "john.doe@example.com" }, "tags": ["AI", "Healthcare", "Innovation"] }, { "_id": "2", "title": "Data Science in Sports", "content": "Data science provides insights into player performance and strategic planning in sports.", "author": { "name": "Jane Smith", "email": "jane.smith@example.com" }, "tags": ["Data Science", "Sports", "Analytics"] } ] ``` Example Code: ```python loader = MongodbLoader( connection_string="mongodb://localhost:27017", db_name="example_db", collection_name="articles", filter_criteria={"tags": "AI"}, field_names=["title", "content"], metadata_names=["author.name", "author.email"], include_db_collection_in_metadata=True ) documents = loader.load() for doc in documents: print("Page Content:", doc.page_content) print("Metadata:", doc.metadata) ``` Expected Output: ``` Page Content: Artificial Intelligence in Medicine AI is transforming the medical industry by providing personalized medicine solutions. Metadata: {'author_name': 'John Doe', 'author_email': 'john.doe@example.com', 'database': 'example_db', 'collection': 'articles'} ``` Thank you. --- Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-09-17 10:23:17 -04:00
Erick Friis	c2a3021bb0	multiple: pydantic 2 compatibility, v0.3 (#26443 ) Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com>	2024-09-13 14:38:45 -07:00
Alexander KIRILOV	6a8f8a56ac	community[patch]: added content_columns option to CSVLoader (#23809 ) Description: Adding a new option to the CSVLoader that allows us to implicitly specify the columns that are used for generating the Document content. Currently these are implicitly set as "all fields not part of the metadata_columns". In some cases however it is useful to have a field both as a metadata and as part of the document content.	2024-09-02 20:25:53 +00:00
mehdiosa	c6f00e6bdc	community: Fix branch not being considered when using GithubFileLoader (#20075 ) - Description: Added `ref` query parameter so data is not loaded only from the default branch but any branch passed --------- Co-authored-by: Osama Mehdi <mehdi@hm.edu> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-08-30 21:47:11 +00:00
Isaac Francisco	d5ddaac1fc	docs minor fix (#25794 )	2024-08-28 04:14:36 +00:00
Dristy Srivastava	7205057c3e	[Community][minor]: Added langchain_version while calling discover API (#24428 ) - Description: Added langchain version while calling discover API during both ingestion and retrieval - Issue: NA - Dependencies: NA - Tests: NA - Docs NA --------- Co-authored-by: dristy.cd <dristy@clouddefense.io>	2024-08-26 08:47:48 -04:00
Dristy Srivastava	fbb4761199	[Community][minor]: Updating source path, and file path for SharePoint loader in PebbloSafeLoader (#25592 ) - Description: Updating source path and file path in Pebblo safe loader for SharePoint apps during loading - Issue: NA - Dependencies: NA - Tests: NA - Docs NA --------- Co-authored-by: dristy.cd <dristy@clouddefense.io>	2024-08-26 08:38:40 -04:00
clement.l	642f9530cd	community: add supported blockchains to Blockchain Document Loader (#25428 ) - Remove deprecated chains. - Add more supported chains. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-08-23 14:39:42 +00:00
Kevin Engelke	3c7f12cbf5	community[minor]: Fix missing 'keep_newlines' parameter forward-pass to 'process_pages' function in confluence loader (#20086 ) (#20087 ) - Description: Fixed missing `keep_newlines` parameter forward-pass in confluence-loader - Issue: #20086 - Dependencies: None --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-08-23 12:59:38 +00:00
Nobuhiko Otoba	4b63a217c2	"community: Fix GithubFileLoader source code", "docs: Fix GithubFileLoader code sample" (#19943 ) This PR adds tiny improvements to the `GithubFileLoader` document loader and its code sample, addressing the following issues: 1. Currently, the `file_extension` argument of `GithubFileLoader` does not change its behavior at all. 1. The `GithubFileLoader` sample code in `docs/docs/integrations/document_loaders/github.ipynb` does not work as it stands. The respective solutions I propose are the following: 1. Remove `file_extension` argument from `GithubFileLoader`. 1. Specify the branch as `master` (not the default `main`) and rename `documents` as `document`. --------- Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>	2024-08-22 18:24:57 -04:00
Rajendra Kadam	1f1679e960	community: Refactor PebbloSafeLoader (#25582 ) Refactor PebbloSafeLoader - Created `APIWrapper` and moved API logic into it. - Moved helper functions to the utility file. - Created smaller functions and methods for better readability. - Properly read environment variables. - Removed unused code. Issue: NA Dependencies: NA tests: Updated	2024-08-22 11:46:52 -04:00
Dristy Srivastava	b002702af6	[Community][minor]: Updating metadata with full_path in SharePoint loader (#25593 ) - Description: Updating metadata for sharepoint loader with full path i.e., webUrl - Issue: NA - Dependencies: NA - Tests: NA - Docs NA Co-authored-by: dristy.cd <dristy@clouddefense.io> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-08-21 13:10:14 +00:00
Mohammad Mohtashim	75c3c81b8c	[Community]: Fix - Open AI Whisper `client.audio.transcriptions` returning Text Object which raises error (#25271 ) - Description: The following [line](`fd546196ef/libs/community/langchain_community/document_loaders/parsers/audio.py (L117)`) in `OpenAIWhisperParser` returns a text object for some odd reason despite the official documentation saying it should return `Transcript` Instance which should have the text attribute. But for the example given in the issue and even when I tried running on my own, I was directly getting the text. The small PR accounts for that. - Issue: : #25218 I was able to replicate the error even without the GenericLoader as shown below and the issue was with `OpenAIWhisperParser` ```python parser = OpenAIWhisperParser(api_key="sk-fxxxxxxxxx", response_format="srt", temperature=0) list(parser.lazy_parse(Blob.from_path('path_to_file.m4a'))) ```	2024-08-19 09:36:42 -04:00
Bagatur	253ceca76a	docs: fix mimetype parser docstring (#25463 )	2024-08-15 16:16:52 -07:00
Isaac Francisco	966b408634	[docs]: doc loader changes (#25417 )	2024-08-14 19:46:33 -07:00
ccurme	27690506d0	multiple: update removal targets (#25361 )	2024-08-14 09:50:39 -04:00
Harrison Chase	967b6f21f6	docs: improve document loaders index (#25365 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2024-08-14 01:48:48 +00:00
Isaac Francisco	f4ffd692a3	[docs]: standardize doc loader doc strings (#25325 )	2024-08-13 23:18:56 +00:00
Eugene Yurtsev	bd6c31617e	community[patch]: Remove more @allow_reuse=True validators (#25236 ) Remove some additional allow_reuse=True usage in @root_validators.	2024-08-09 11:10:27 -04:00
Shivendra Soni	66b7206ab6	community: Add llm-extraction option to FireCrawl Document Loader (#25231 ) Description: This minor PR aims to add `llm_extraction` to Firecrawl loader. This feature is supported on API and PythonSDK, but the langchain loader omits adding this to the response. Twitter handle: [scalable_pizza](https://x.com/scalablepizza) --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-08-09 13:59:10 +00:00
Rajendra Kadam	663638d6a8	community[minor]: [SharePointLoader] Load extended metadata for the root folder (#24872 ) - Title: [SharePointLoader] Load extended metadata for the root folder - Description: - Ensure extended metadata loads correctly for the root folder. - Cleanup: Refactor SharePointLoader to remove unused fields(`file_id` & `site_id`). - Dependencies: NA - Add tests and docs: NA	2024-08-08 14:39:16 -04:00
Eugene Yurtsev	bf5193bb99	community[patch]: Upgrade pydantic extra (#25185 ) Upgrade to using a literal for specifying the extra which is the recommended approach in pydantic 2. This works correctly also in pydantic v1. ```python from pydantic.v1 import BaseModel class Foo(BaseModel, extra="forbid"): x: int Foo(x=5, y=1) ``` And ```python from pydantic.v1 import BaseModel class Foo(BaseModel): x: int class Config: extra = "forbid" Foo(x=5, y=1) ``` ## Enum -> literal using grit pattern: ``` engine marzano(0.1) language python or { `extra=Extra.allow` => `extra="allow"`, `extra=Extra.forbid` => `extra="forbid"`, `extra=Extra.ignore` => `extra="ignore"` } ``` Resorted attributes in config and removed doc-string in case we will need to deal with going back and forth between pydantic v1 and v2 during the 0.3 release. (This will reduce merge conflicts.) ## Sort attributes in Config: ``` engine marzano(0.1) language python function sort($values) js { return $values.text.split(',').sort().join("\n"); } class_definition($name, $body) as $C where { $name <: `Config`, $body <: block($statements), $values = [], $statements <: some bubble($values) assignment() as $A where { $values += $A }, $body => sort($values), } ```	2024-08-08 17:20:39 +00:00
Dominik Fladung	ffa0c838d8	Allow ConfluenceLoader authorization via Personal Access Tokens (#25096 ) - community: Allow authorization to Confluence with bearer token - Description: Allow authorization to Confluence with [Personal Access Token](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html) by checking for the keys `['client_id', token: ['access_token', 'token_type']]` - Issue: Currently the following error occurs when using an personal access token for authorization. ```python loader = ConfluenceLoader( url=os.getenv('CONFLUENCE_URL'), oauth2={ 'token': {"access_token": os.getenv("CONFLUENCE_ACCESS_TOKEN"), "token_type": "bearer"}, 'client_id': 'client_id', }, page_ids=['12345678'], ) ``` ``` ValueError: Error(s) while validating input: ["You have either omitted require keys or added extra keys to the oauth2 dictionary. key values should be `['access_token', 'access_token_secret', 'consumer_key', 'key_cert']`"] ``` With this PR the loader runs as expected. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-08-06 13:42:47 +00:00
Jim Baldwin	6890daa90c	community: make AthenaLoader profile_name optional and fix type hint (#24958 ) - Description: This PR makes the AthenaLoader profile_name optional and fixes the type hint which says the type is `str` but it should be `str` or `None` as None is handled in the loader init. This is a minor problem but it just confused me when I was using the Athena Loader to why we had to use a Profile, as I want that for local but not production. - Issue: #24957 - Dependencies: None.	2024-08-05 14:28:58 +00:00
Bagatur	e81ddb32a6	docs: fix kwargs docstring (#25010 ) Fix: ![Screenshot 2024-08-02 at 5 33 37 PM](https://github.com/user-attachments/assets/7c56cdeb-ee81-454c-b3eb-86aa8a9bdc8d)	2024-08-02 19:54:54 -07:00
Bagatur	8e2316b8c2	community[patch]: Release 0.2.11 (#24989 )	2024-08-02 20:08:44 +00:00
BottlePumpkin	bfc59c1d26	community: Fix KeyError in NotionDB loader when 'name' is missing (#24224 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Description: This PR fixes a KeyError in NotionDBLoader when the "name" key is missing in the "people" property. Issue: Fixes #24223 Dependencies: None --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-08-01 13:55:40 +00:00
Eugene Yurtsev	d24b82357f	community[patch]: Add missing annotations (#24890 ) This PR adds annotations in comunity package. Annotations are only strictly needed in subclasses of BaseModel for pydantic 2 compatibility. This PR adds some unnecessary annotations, but they're not bad to have regardless for documentation pages.	2024-07-31 18:13:44 +00:00
Rajendra Kadam	a6add89bd4	community[minor]: [PebbloSafeLoader] Implement content-size-based batching (#24871 ) - Title: [PebbloSafeLoader] Implement content-size-based batching in the classification flow(loader/doc API) - Description: - Implemented content-size-based batching in the loader/doc API, set to 100KB with no external configuration option, intentionally hard-coded to prevent timeouts. - Remove unused field(pb_id) from doc_metadata - Issue: NA - Dependencies: NA - Add tests and docs: Updated	2024-07-31 09:10:28 -04:00
AmosDinh	c113682328	community:Add support for specifying document_loaders.firecrawl api url. (#24747 ) community:Add support for specifying document_loaders.firecrawl api url. Add support for specifying document_loaders.firecrawl api url. This is mainly to support the [self-hosting](https://github.com/mendableai/firecrawl/blob/main/SELF_HOST.md) option firecrawl provides. Eg. now I can specify localhost:.... The corresponding firecrawl class already provides functionality to pass the argument. See here: `4c9d62f6d3/apps/python-sdk/firecrawl/firecrawl.py (L29)` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-07-28 14:30:36 -04:00
Oleg Kulyk	4b1b7959a2	community[minor]: Add ScrapingAnt Loader Community Integration (#24514 ) Added [ScrapingAnt](https://scrapingant.com/) Web Loader integration. ScrapingAnt is a web scraping API that allows extracting web page data into accessible and well-formatted markdown. Description: Added ScrapingAnt web loader for retrieving web page data as markdown Dependencies: scrapingant-client Twitter: @WeRunTheWorld3 --------- Co-authored-by: Oleg Kulyk <oleg@scrapingant.com>	2024-07-24 21:11:43 -04:00
John	d59c656ea5	unstructured, community, initialize langchain-unstructured package (#22779 ) #### Update (2): A single `UnstructuredLoader` is added to handle both local and api partitioning. This loader also handles single or multiple documents. #### Changes in `community`: Changes here do not affect users. In the initial process of using the SDK for the API Loaders, the Loaders in community were refactored. Other changes include: The `UnstructuredBaseLoader` has a new check to see if both `mode="paged"` and `chunking_strategy="by_page"`. It also now has `Element.element_id` added to the `Document.metadata`. `UnstructuredAPIFileLoader` and `UnstructuredAPIFileIOLoader`. As such, now both directly inherit from `UnstructuredBaseLoader` and initialize their `file_path`/`file` attributes respectively and implement their own `_post_process_elements` methods. -------- #### Update: New SDK Loaders in a [partner package](https://python.langchain.com/v0.1/docs/contributing/integrations/#partner-package-in-langchain-repo) are introduced to prevent breaking changes for users (see discussion below). ##### TODO: - [x] Test docstring examples -------- - Description: UnstructuredAPIFileIOLoader and UnstructuredAPIFileLoader calls to the unstructured api are now made using the unstructured-client sdk. - New Dependencies: unstructured-client - [x] Add tests and docs: If you're adding a new integration, please include - [x] a test for the integration, preferably unit tests that do not rely on network access, - [x] update the description in `docs/docs/integrations/providers/unstructured.mdx` - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. TODO: - [x] Update https://python.langchain.com/v0.1/docs/integrations/document_loaders/unstructured_file/#unstructured-api - `langchain/docs/docs/integrations/document_loaders/unstructured_file.ipynb` - The description here needs to indicate that users should install `unstructured-client` instead of `unstructured`. Read over closely to look for any other changes that need to be made. - [x] Update the `lazy_load` method in `UnstructuredBaseLoader` to handle json responses from the API instead of just lists of elements. - This method may need to be overwritten by the API loaders instead of changing it in the `UnstructuredBaseLoader`. - [x] Update the documentation links in the class docstrings (the Unstructured documents have moved) - [x] Update Document.metadata to include `element_id` (see thread [here](https://unstructuredw-kbe4326.slack.com/archives/C044N0YV08G/p1718187499818419)) --------- Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com>	2024-07-24 23:21:20 +00:00
Morteza Hosseini	9e06991aae	community[patch]: Update URL to the 2markdown API (#24546 ) Update the URL to Markdown endpoint. API information is available here: https://2markdown.com/docs#url2md	2024-07-23 14:27:55 +00:00
Alexander Golodkov	2a70a07aad	community[minor]: added new document loaders based on dedoc library (#24303 ) ### Description This pull request added new document loaders to load documents of various formats using [Dedoc](https://github.com/ispras/dedoc): - `DedocFileLoader` (determine file types automatically and parse) - `DedocPDFLoader` (for `PDF` and images parsing) - `DedocAPIFileLoader` (determine file types automatically and parse using Dedoc API without library installation) [Dedoc](https://dedoc.readthedocs.io) is an open-source library/service that extracts texts, tables, attached files and document structure (e.g., titles, list items, etc.) from files of various formats. The library is actively developed and maintained by a group of developers. `Dedoc` supports `DOCX`, `XLSX`, `PPTX`, `EML`, `HTML`, `PDF`, images and more. Full list of supported formats can be found [here](https://dedoc.readthedocs.io/en/latest/#id1). For `PDF` documents, `Dedoc` allows to determine textual layer correctness and split the document into paragraphs. ### Issue This pull request extends variety of document loaders supported by `langchain_community` allowing users to choose the most suitable option for raw documents parsing. ### Dependencies The PR added a new (optional) dependency `dedoc>=2.2.5` ([library documentation](https://dedoc.readthedocs.io)) to the `extended_testing_deps.txt` ### Twitter handle None ### Add tests and docs 1. Test for the integration: `libs/community/tests/integration_tests/document_loaders/test_dedoc.py` 2. Example notebook: `docs/docs/integrations/document_loaders/dedoc.ipynb` 3. Information about the library: `docs/docs/integrations/providers/dedoc.mdx` ### Lint and test Done locally: - `make format` - `make lint` - `make integration_tests` - `make docs_build` (from the project root) --------- Co-authored-by: Nasty <bogatenkova.anastasiya@mail.ru>	2024-07-23 02:04:53 +00:00
Naka Masato	884f76e05a	fix: load google credentials properly in GoogleDriveLoader (#12871 ) - Description: - Fix #12870: set scope in `default` func (ref: https://google-auth.readthedocs.io/en/master/reference/google.auth.html) - Moved the code to load default credentials to the bottom for clarity of the logic - Add docstring and comment for each credential loading logic - Issue: https://github.com/langchain-ai/langchain/issues/12870 - Dependencies: no dependencies change - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: @gymnstcs <!-- If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-07-22 17:43:33 +00:00
clement.l	d98b830e4b	community: add flag to toggle progress bar (#24463 ) - Description: Add a flag to determine whether to show progress bar - Issue: n/a - Dependencies: n/a - Twitter handle: n/a --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-07-20 13:18:02 +00:00
Asi Greenholts	372c27f2e5	community[minor]: [GoogleApiYoutubeLoader] Replace API used in _get_document_for_channel from search to playlistItem (#24034 ) - Description: Search has a limit of 500 results, playlistItems doesn't. Added a class in except clause to catch another common error. - Issue: None - Dependencies: None - Twitter handle: @TupleType --------- Co-authored-by: asi-cider <88270351+asi-cider@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-07-19 14:04:34 -04:00
Dristy Srivastava	020cc1cf3e	Community[minor]: Added checksum in while send data to pebblo-cloud (#23968 ) - Description: - Updated checksum in doc metadata - Sending checksum and removing actual content, while sending data to `pebblo-cloud` if `classifier-location `is `pebblo-cloud` in `/loader/doc` API - Adding `pb_id` i.e. pebblo id to doc metadata - Refactoring as needed. - Sending `content-checksum` and removing actual content, while sending data to `pebblo-cloud` if `classifier-location `is `pebblo-cloud` in `prmopt` API - Issue: NA - Dependencies: NA - Tests: Updated - Docs NA --------- Co-authored-by: dristy.cd <dristy@clouddefense.io>	2024-07-19 13:52:54 -04:00
Brice Fotzo	034a8c7c1b	community: support advanced text extraction options for pdf documents (#20265 ) Description: - Updated constructors in PyPDFParser and PyPDFLoader to handle `extraction_mode` and additional kwargs, aligning with the capabilities of `PageObject.extract_text()` from pypdf. - Added `test_pypdf_loader_with_layout` along with a corresponding example text file to validate layout extraction from PDFs. Issue: fixes #19735 Dependencies: This change requires updating the pypdf dependency from version 3.4.0 to at least 4.0.0. Additional changes include the addition of a new test test_pypdf_loader_with_layout and an example text file to ensure the functionality of layout extraction from PDFs aligns with the new capabilities. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-07-17 20:47:09 +00:00
Eugene Yurtsev	c4e149d4f1	community[patch]: Add linter to catch @root_validator (#24070 ) - Add linter to prevent further usage of vanilla root validator - Udpate remaining root validators	2024-07-10 14:51:03 +00:00
Rajendra Kadam	ee8aa54f53	community[patch]: Fix source path mismatch in PebbloSafeLoader (#23857 ) Description: Fix for source path mismatch in PebbloSafeLoader. The fix involves storing the full path in the doc metadata in VectorDB Issue: NA, caught in internal testing Dependencies: NA Add tests: Updated tests	2024-07-05 15:24:17 -04:00
Klaudia Lemiec	a2082bc1f8	docs: Arxiv docs update (#23871 ) - [X] PR title - [X] PR message: *Delete this entire checklist* and replace with - Description: Update of docstrings and docpages - Issue: [22866](https://github.com/langchain-ai/langchain/issues/22866) - [X] Add tests and docs - [X] Lint and test	2024-07-05 11:43:51 -04:00
Bagatur	a0c2281540	infra: update mypy 1.10, ruff 0.5 (#23721 ) ```python """python scripts/update_mypy_ruff.py""" import glob import tomllib from pathlib import Path import toml import subprocess import re ROOT_DIR = Path(__file__).parents[1] def main(): for path in glob.glob(str(ROOT_DIR / "libs/*/pyproject.toml"), recursive=True): print(path) with open(path, "rb") as f: pyproject = tomllib.load(f) try: pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = ( "^1.10" ) pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = ( "^0.5" ) except KeyError: continue with open(path, "w") as f: toml.dump(pyproject, f) cwd = "/".join(path.split("/")[:-1]) completed = subprocess.run( "poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color", cwd=cwd, shell=True, capture_output=True, text=True, ) logs = completed.stdout.split("\n") to_ignore = {} for l in logs: if re.match("^(.)\:(\d+)\: error:.\[(.)\]", l): path, line_no, error_type = re.match( "^(.)\:(\d+)\: error:.\[(.*)\]", l ).groups() if (path, line_no) in to_ignore: to_ignore[(path, line_no)].append(error_type) else: to_ignore[(path, line_no)] = [error_type] print(len(to_ignore)) for (error_path, line_no), error_types in to_ignore.items(): all_errors = ", ".join(error_types) full_path = f"{cwd}/{error_path}" try: with open(full_path, "r") as f: file_lines = f.readlines() except FileNotFoundError: continue file_lines[int(line_no) - 1] = ( file_lines[int(line_no) - 1][:-1] + f" # type: ignore[{all_errors}]\n" ) with open(full_path, "w") as f: f.write("".join(file_lines)) subprocess.run( "poetry run ruff format .; poetry run ruff --select I --fix .", cwd=cwd, shell=True, capture_output=True, text=True, ) if __name__ == "__main__": main() ```	2024-07-03 10:33:27 -07:00
Eugene Yurtsev	f24e38876a	community[patch]: Update root_validators to use explicit pre=True or pre=False (#23736 )	2024-07-01 17:13:23 -04:00
Alireza Kashani	c39521b70d	Update grobid.py (#23399 ) fixed potential `IndexError: list index out of range` in case there is no title Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-06-26 09:11:02 -04:00
Rahul Triptahi	9ef93ecd7c	community[minor]: Added classification_location parameter in PebbloSafeLoader. (#22565 ) Description: Add classifier_location feature flag. This flag enables Pebblo to decide the classifier location, local or pebblo-cloud. Unit Tests: N/A Documentation: N/A --------- Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2024-06-24 17:30:38 -04:00
RUO	2b87e330b0	community: fix issue with nested field extraction in MongodbLoader (#22801 ) Description: This PR addresses an issue in the `MongodbLoader` where nested fields were not being correctly extracted. The loader now correctly handles nested fields specified in the `field_names` parameter. Issue: Fixes an issue where attempting to extract nested fields from MongoDB documents resulted in `KeyError`. Dependencies: No new dependencies are required for this change. Twitter handle: (Optional, your Twitter handle if you'd like a mention when the PR is announced) ### Changes 1. Field Name Parsing: - Added logic to parse nested field names and safely extract their values from the MongoDB documents. 2. Projection Construction: - Updated the projection dictionary to include nested fields correctly. 3. Field Extraction: - Updated the `aload` method to handle nested field extraction using a recursive approach to traverse the nested dictionaries. ### Example Usage Updated usage example to demonstrate how to specify nested fields in the `field_names` parameter: ```python loader = MongodbLoader( connection_string=MONGO_URI, db_name=MONGO_DB, collection_name=MONGO_COLLECTION, filter_criteria={"data.job.company.industry_name": "IT", "data.job.detail": { "$exists": True }}, field_names=[ "data.job.detail.id", "data.job.detail.position", "data.job.detail.intro", "data.job.detail.main_tasks", "data.job.detail.requirements", "data.job.detail.preferred_points", "data.job.detail.benefits", ], ) docs = loader.load() print(len(docs)) for doc in docs: print(doc.page_content) ``` ### Testing Tested with a MongoDB collection containing nested documents to ensure that the nested fields are correctly extracted and concatenated into a single page_content string. ### Note This change ensures backward compatibility for non-nested fields and improves functionality for nested field extraction. ### Output Sample ```python print(docs[:3]) ``` ```shell # output sample: [ Document( # Here in this example, page_content is the combined text from the fields below # "position", "intro", "main_tasks", "requirements", "preferred_points", "benefits" page_content='all combined contents from the requested fields in the document', metadata={'database': 'Your Database name', 'collection': 'Your Collection name'} ), ... ] ``` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-06-24 19:29:11 +00:00
Rahul Triptahi	0cd3f93361	Enhance metadata of sharepointLoader. (#22248 ) Description: 2 feature flags added to SharePointLoader in this PR: 1. load_auth: if set to True, adds authorised identities to metadata 2. load_extended_metadata, adds source, owner and full_path to metadata Unit tests:N/A Documentation: To be done. --------- Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2024-06-21 17:03:38 -07:00
Rahul Triptahi	4bb3d5c488	[community][quick-fix]: changed from blob.path to blob.path.name in 0365BaseLoader. (#22287 ) Description: file_metadata_ was not getting propagated to returned documents. Changed the lookup key to the name of the blob's path. Changed blob.path key to blob.path.name for metadata_dict key lookup. Documentation: N/A Unit tests: N/A Co-authored-by: ccurme <chester.curme@gmail.com>	2024-06-21 15:51:03 -04:00
Zheng Robert Jia	a349fce880	docs[minor],community[patch]: Minor tutorial docs improvement, minor import error quick fix. (#22725 ) minor changes to module import error handling and minor issues in tutorial documents. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>	2024-06-20 15:36:49 -04:00
Michał Krassowski	710197e18c	community[patch]: restore compatibility with SQLAlchemy 1.x (#22546 ) - Description: Restores compatibility with SQLAlchemy 1.4.x that was broken since #18992 and adds a test run for this version on CI (only for Python 3.11) - Issue: fixes #19681 - Dependencies: None - Twitter handle: `@krassowski_m` --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-06-19 17:58:57 +00:00
鹿鹿鹿鲨	6b46b5e9ce	community: add request_kwargs and expect TimeError AsyncHtmlLoader (#23068 ) - Description: add `request_kwargs` and expect `TimeError` in `_fetch` function for AsyncHtmlLoader. This allows you to fill in the kwargs parameter when using the `load()` method of the `AsyncHtmlLoader` class. Co-authored-by: Yucolu <yucolu@tencent.com>	2024-06-18 20:02:46 -07:00
Finlay Macklon	616d06d7fe	community: glob multiple patterns when using DirectoryLoader (#22852 ) - Description: Updated community.langchain_community.document_loaders.directory.py to enable the use of multiple glob patterns in the `DirectoryLoader` class. Now, the glob parameter is of type `list[str] \| str` and still defaults to the same value as before. I updated the docstring of the class to reflect this, and added a unit test to community.tests.unit_tests.document_loaders.test_directory.py named `test_directory_loader_glob_multiple`. This test also shows an example of how to use the new functionality. - ~~Issue:~~Discussion Thread: https://github.com/langchain-ai/langchain/discussions/18559 - Dependencies: None - Twitter handle: N/a - [x] Add tests and docs - Added test (described above) - Updated class docstring - [x] Lint and test --------- Co-authored-by: isaac hershenson <ihershenson@hmc.edu> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>	2024-06-18 09:24:50 -07:00
Lucas Tucker	e25a5966b5	docs: Standardize DocumentLoader docstrings (#22932 ) Standardizing DocumentLoader docstrings (of which there are many) This PR addresses issue #22866 and adds docstrings according to the issue's specified format (in the appendix) for files csv_loader.py and json_loader.py in langchain_community.document_loaders. In particular, the following sections have been added to both CSVLoader and JSONLoader: Setup, Instantiate, Load, Async load, and Lazy load. It may be worth adding a 'Metadata' section to the JSONLoader docstring to clarify how we want to extract the JSON metadata (using the `metadata_func` argument). The files I used to walkthrough the various sections were `example_2.json` from [HERE](https://support.oneskyapp.com/hc/en-us/articles/208047697-JSON-sample-files) and `hw_200.csv` from [HERE](https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html). --------- Co-authored-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: isaac hershenson <ihershenson@hmc.edu>	2024-06-18 03:26:36 +00:00
Baskar Gopinath	c4f2bc9540	docs: Fix wrongly referenced class name in confluence.py (#22879 ) Fixes #22542 Changed ConfluenceReader to ConfluenceLoader	2024-06-14 14:00:48 -07:00
Eugene Yurtsev	9a877c7adb	community[patch]: SitemapLoader restrict depth of parsing sitemap (CVE-2024-2965) (#22903 ) This PR restricts the depth to which the sitemap can be parsed. Fix for: CVE-2024-2965	2024-06-14 13:04:40 -04:00
Isaac Francisco	75e966a2fa	docs, cli[patch]: document loaders doc template (#22862 ) From: https://github.com/langchain-ai/langchain/pull/22290 --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-06-13 19:28:57 -07:00
Isaac Francisco	034257e9bf	docs: improved recursive url loader docs (#22648 ) Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-06-13 11:09:35 -07:00
Mr. Lance E Sloan «UMich»	08c466c603	community[patch]: bugfix for `YoutubeLoader`'s `LINES` format (#22815 ) - Description: A change I submitted recently introduced a bug in `YoutubeLoader`'s `LINES` output format. In those conditions, curly braces ("`{}`") creates a set, not a dictionary. This bugfix explicitly specifies that a dictionary is created. - Issue: N/A - Dependencies: N/A - Twitter: lsloan_umich - Mastodon: [lsloan@mastodon.social](https://mastodon.social/@lsloan)	2024-06-12 12:29:34 -04:00
Mr. Lance E Sloan «UMich»	84dc2dd059	community[patch]: Load YouTube transcripts (captions) as fixed-duration chunks with start times (#21710 ) - Description: Add a new format, `CHUNKS`, to `langchain_community.document_loaders.youtube.YoutubeLoader` which creates multiple `Document` objects from YouTube video transcripts (captions), each of a fixed duration. The metadata of each chunk `Document` includes the start time of each one and a URL to that time in the video on the YouTube website. I had implemented this for UMich (@umich-its-ai) in a local module, but it makes sense to contribute this to LangChain community for all to benefit and to simplify maintenance. - Issue: N/A - Dependencies: N/A - Twitter: lsloan_umich - Mastodon: [lsloan@mastodon.social](https://mastodon.social/@lsloan) With regards to tests and documentation, most existing features of the `YoutubeLoader` class are not tested. Only the `YoutubeLoader.extract_video_id()` static method had a test. However, while I was waiting for this PR to be reviewed and merged, I had time to add a test for the chunking feature I've proposed in this PR. I have added an example of using chunking to the `docs/docs/integrations/document_loaders/youtube_transcript.ipynb` notebook. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-06-11 17:44:36 +00:00
am-kinetica	ad101adec8	community[patch]: Kinetica Integrations handled error in querying; quotes in table names; updated gpudb API (#22724 ) - [ ] Miscellaneous updates and fixes: - Description: Handled error in querying; quotes in table names; updated gpudb API - Issue: Threw an error with an error message difficult to understand if a query failed or returned no records - Dependencies: Updated GPUDB API version to `7.2.0.9` @baskaryan @hwchase17	2024-06-11 10:01:26 -04:00
Mohammad Mohtashim	c3cce98d86	community[patch]: Small Fix in OutlookMessageLoader (Close the Message once Open) (#22744 ) - Description: A very small fix where we close the message when it opened - Issue: #22729	2024-06-10 13:08:39 -07:00
Max Mulatz	058a64c563	Community[minor]: Add language parser for Elixir (#22742 ) Hi 👋 First off, thanks a ton for your work on this 💚 Really appreciate what you're providing here for the community. ## Description This PR adds a basic language parser for the [Elixir](https://elixir-lang.org/) programming language. The parser code is based upon the approach outlined in https://github.com/langchain-ai/langchain/pull/13318: it's using `tree-sitter` under the hood and aligns with all the other `tree-sitter` based parses added that PR. The `CHUNK_QUERY` I'm using here is probably not the most sophisticated one, but it worked for my application. It's a starting point to provide "core" parsing support for Elixir in LangChain. It enables people to use the language parser out in real world applications which may then lead to further tweaking of the queries. I consider this PR just the ground work. - Dependencies: requires `tree-sitter` and `tree-sitter-languages` from the extended dependencies - Twitter handle:`@bitcrowd` ## Checklist - [x] PR title: "package: description" - [x] Add tests and docs - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. <!-- If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. -->	2024-06-10 15:56:57 +00:00
Isaac Francisco	ba3e219d83	community[patch]: recursive url loader fix and unit tests (#22521 ) Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-06-05 17:56:20 -07:00

1 2 3 4 5 ...

329 Commits