langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-07-02 03:15:11 +00:00

Author	SHA1	Message	Date
Erick Friis	a119cae5bd	partners/mistralai: release 0.2.4 (#28803 )	2024-12-18 22:11:48 +00:00
Erick Friis	514d78516b	partners/ollama: release 0.2.2 (#28802 )	2024-12-18 22:11:08 +00:00
Bagatur	68940dd0d6	openai[patch]: Release 0.2.13 (#28800 )	2024-12-18 22:08:47 +00:00
Erick Friis	4dc28b43ac	community: release 0.3.13 (#28798 )	2024-12-18 21:58:46 +00:00
Bagatur	557f63c2e6	core[patch]: Release 0.3.27 (#28799 )	2024-12-18 21:58:03 +00:00
Bagatur	4a531437bb	core[patch], openai[patch]: Handle OpenAI developer msg (#28794 ) - Convert developer openai messages to SystemMessage - store additional_kwargs={"__openai_role__": "developer"} so that the correct role can be reconstructed if needed - update ChatOpenAI to read in openai_role --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-18 21:54:07 +00:00
Erick Friis	079f1d93ab	langchain: release 0.3.13 (#28797 )	2024-12-18 12:32:00 -08:00
Yuxin Chen	3256b5d6ae	text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter (#28373 ) - Description: This PR resolves an issue with the `ExperimentalMarkdownSyntaxTextSplitter` class, which retains the internal state across multiple calls to the `split_text` method. This behaviour caused an unintended accumulation of chunks in `self` variables, leading to incorrect outputs when processing multiple Markdown files sequentially. - Modified `libs\text-splitters\langchain_text_splitters\markdown.py` to reset the relevant internal attributes at the start of each `split_text` invocation. This ensures each call processes the input independently. - Added unit tests in `libs\text-splitters\tests\unit_tests\test_text_splitters.py` to verify the fix and ensure the state does not persist across calls. - Issue: Fixes [#26440](https://github.com/langchain-ai/langchain/issues/26440). - Dependencies: No additional dependencies are introduced with this change. - [x] Unit tests were added to verify the changes. - [x] Updated documentation where necessary. - [x] Ran `make format`, `make lint`, and `make test` to ensure compliance with project standards. --------- Co-authored-by: Angel Chen <angelchen396@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 20:27:59 +00:00
Mohammad Mohtashim	7c8f977695	Community: Fix `with_structured_output` for `ChatSambaNovaCloud` (#28796 ) - Description: The `kwargs` was being checked as None object which was causing the rest of code in `with_structured_output` not getting executed. The checking part has been fixed in this PR. - Issue: #28776	2024-12-18 14:35:06 -05:00
V.Prasanna kumar	684b146b18	Fixed adding float values into DynamoDB (#26562 ) Thank you for contributing to LangChain! - [x] PR title: Add float Message into Dynamo DB - community - Example: "community: Chat Message History - [x] PR message: - Description: pushing float values into dynamo db creates error , solved that by converting to str type - Issue: Float values are not getting pushed - Twitter handle: VpkPrasanna Have added an utility function for str conversion , let me know where to place it happy to do an commit. This PR is from an discussion of #26543 @hwchase17 @baskaryan @efriis --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 13:45:00 -05:00
William FH	50ea1c3ea3	[Core] respect tracing project name cvar (#28792 )	2024-12-18 10:02:02 -08:00
Martin Triska	e6b41d081d	community: DocumentLoaderAsParser wrapper (#27749 ) ## Description This pull request introduces the `DocumentLoaderAsParser` class, which acts as an adapter to transform document loaders into parsers within the LangChain framework. The class enables document loaders that accept a `file_path` parameter to be utilized as blob parsers. This is particularly useful for integrating various document loading capabilities seamlessly into the LangChain ecosystem. When merged in together with PR https://github.com/langchain-ai/langchain/pull/27716 It opens options for `SharePointLoader` / `OneDriveLoader` to process any filetype that has a document loader. ### Features - Flexible Parsing: The `DocumentLoaderAsParser` class can adapt any document loader that meets the criteria of accepting a `file_path` argument, allowing for lazy parsing of documents. - Compatibility: The class has been designed to work with various document loaders, making it versatile for different use cases. ### Usage Example To use the `DocumentLoaderAsParser`, you would initialize it with a suitable document loader class and any required parameters. Here’s an example of how to do this with the `UnstructuredExcelLoader`: ```python from langchain_community.document_loaders.blob_loaders import Blob from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser from langchain_community.document_loaders.excel import UnstructuredExcelLoader # Initialize the parser adapter with UnstructuredExcelLoader xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged") # Use parser, for ex. pass it to MimeTypeBasedParser MimeTypeBasedParser( handlers={ "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": xlsx_parser } ) ``` - Dependencies: None - Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 12:47:08 -05:00
Erick Friis	9b024d00c9	text-splitters: release 0.3.4 (#28795 )	2024-12-18 09:44:36 -08:00
Erick Friis	5cf965004c	core: release 0.3.26 (#28793 )	2024-12-18 17:28:42 +00:00
Mohammad Mohtashim	d49df4871d	[Community]: Image Extraction Fixed for `PDFPlumberParser` (#28491 ) - Description: One-Bit Images was raising error which has been fixed in this PR for `PDFPlumberParser` - Issue: #28480 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 11:45:48 -05:00
binhnd102	f723a8456e	Fixes: community: fix LanceDB return no metadata (#27024 ) - [ x ] Fix when lancedb return table without metadata column - Description: Check the table schema, if not has metadata column, init the Document with metadata argument equal to empty dict - Issue: https://github.com/langchain-ai/langchain/issues/27005 - [ x ] Add tests and docs --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-18 15:21:28 +00:00
ANSARI MD AAQIB AHMED	91d28ef453	Add langchain-yt-dlp Document Loader Documentation (#28775 ) ## Overview This PR adds documentation for the `langchain-yt-dlp` package, a YouTube document loader that uses `yt-dlp` for Youtube videos metadata extraaction. ## Changes - Added documentation notebook for YoutubeLoader - Updated packages.yml to include langchain-yt-dlp ## Motivation The existing LangChain YoutubeLoader was unable to fetch YouTube metadata due to changes in YouTube's structure. This package resolves those issues by leveraging the `yt-dlp` library. ## Features - Reliable YouTube metadata extraction ## Related - Package Repository: https://github.com/aqib0770/langchain-yt-dlp - PyPI Package: https://pypi.org/project/langchain-yt-dlp/ --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 10:16:50 -05:00
GITHUBear	33b1fb95b8	partners: langchain-oceanbase Integration (#28782 ) Hi, langchain team! I'm a maintainer of [OceanBase](https://github.com/oceanbase/oceanbase). With the integration guidance, I create a python lib named [langchain-oceanbase](https://github.com/oceanbase/langchain-oceanbase) to integrate `Oceanbase Vector Store` with `Langchain`. So I'd like to add the required docs. I will appreciate your feedback. Thank you! --------- Signed-off-by: shanhaikang.shk <shanhaikang.shk@oceanbase.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-18 14:51:49 +00:00
Rave Harpaz	986b752fc8	Add OCI Generative AI new model and structured output support (#28754 ) - [X] PR title: community: Add new model and structured output support - [X] PR message: - Description: add support for meta llama 3.2 image handling, and JSON mode for structured output - Issue: NA - Dependencies: NA - Twitter handle: NA - [x] Add tests and docs: 1. we have updated our unit tests, 2. no changes required for documentation. - [x] Lint and test: make format, make lint and make test we run successfully --------- Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-18 09:50:25 -05:00
David Pryce-Compson	ef24220d3f	community: adding haiku 3.5 and opus callbacks (#28783 ) Description: Adding new AWS Bedrock model and their respective costs to match https://aws.amazon.com/bedrock/pricing/ for the Bedrock callback Issue: Missing models for those that wish to try them out Dependencies: Nothing added Twitter handle: @David_Pryce and / or @JamfSoftware If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-12-18 09:45:10 -05:00
Yudai Kotani	05a44797ee	langchain_community: Add default None values to DocumentAttributeValue class properties (#28785 ) Description: This PR addresses an issue where the DocumentAttributeValue class properties did not have default values of None. By explicitly setting the Optional attributes (DateValue, LongValue, StringListValue, and StringValue) to default to None, this change ensures the class functions as expected when no value is provided for these attributes. Changes Made: Added default None values to the following properties of the DocumentAttributeValue class: DateValue LongValue StringListValue StringValue Removed the invalid argument extra="allow" from the BaseModel inheritance. Dependencies: None. Twitter handle (optional): @__korikori1021 Checklist - [x] Verified that KendraRetriever works as expected after the changes. Co-authored-by: y1u0d2a1i <y.kotani@raksul.com>	2024-12-18 09:43:04 -05:00
Satyam Kumar	90f7713399	refactor: improve docstring parsing logic for Google style (#28730 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" Description: Improved the `_parse_google_docstring` function in `langchain/core` to support parsing multi-paragraph descriptions before the `Args:` section while maintaining compliance with Google-style docstring guidelines. This change ensures better handling of docstrings with detailed function descriptions. Issue: Fixes #28628 Dependencies: None. Twitter handle: @isatyamks --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 09:35:19 -05:00
Dong Shin	0b1359801e	community: add trust_env at web_base_loader (#28514 ) - Description: I am working to address a similar issue to the one mentioned in https://github.com/langchain-ai/langchain/pull/19499. Specifically, there is a problem with the Webbase loader used in open-webui, where it fails to load the proxy configuration. This PR aims to resolve that issue. <!--If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.--> --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-17 21:18:16 -05:00
Erick Friis	be738aa7de	packages: enable vertex api build (#28773 )	2024-12-17 11:31:14 -08:00
Bagatur	ac278cbe8b	core[patch]: export InjectedToolCallId (#28772 )	2024-12-17 19:29:20 +00:00
Bagatur	e4d3ccf62f	json mode standard test (#25497 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-17 18:47:34 +00:00
Frank Dai	e81433497b	community: support Confluence cookies (#28760 ) Description: Some confluence instances don't support personal access token, then cookie is a convenient way to authenticate. This PR adds support for Confluence cookies. Twitter handle: soulmachine	2024-12-17 12:16:36 -05:00
ccurme	b745281eec	anthropic[patch]: increase timeouts for integration tests (#28767 ) Some tests consistently ran into the 10s limit in CI.	2024-12-17 15:47:17 +00:00
Vinit Kudva	a00258ec12	chroma: fix persistence if client_settings is passed in (#25199 ) …ent path given. Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-17 10:03:02 -05:00
Omri Eliyahu Levy	f8883a1321	partners/voyageai: enable setting output dimension (#28740 ) Voyage has introduced voyage-3-large and voyage-code-3, which feature different output dimensions by leveraging a technique called "Matryoshka Embeddings" (see blog - https://blog.voyageai.com/2024/12/04/voyage-code-3/). These two models are available in various sizes: [256, 512, 1024, 2048] (https://docs.voyageai.com/docs/embeddings#model-choices). This PR adds the option to set the required output dimension.	2024-12-17 10:02:00 -05:00
German Martin	3a1d05394d	community: Apache AGE wrapper. Ensure Node Uniqueness by ID. (#28759 ) Description: The Apache AGE graph integration incorrectly handled node merging, allowing duplicate nodes with different IDs but the same type and other properties. Unlike [Neo4j](`cdf6202156/libs/community/langchain_community/graphs/neo4j_graph.py (L47)`), [Memgraph](`cdf6202156/libs/community/langchain_community/graphs/memgraph_graph.py (L50)`), [Kuzu](`cdf6202156/libs/community/langchain_community/graphs/kuzu_graph.py (L253)`), and [Gremlin](`cdf6202156/libs/community/langchain_community/graphs/gremlin_graph.py (L165)`), it did not use the node ID as the primary identifier for merging. This inconsistency caused data integrity issues and unexpected behavior when users expected updates to specific nodes by ID. Solution: This PR modifies the `node_insert_query` to `MERGE` nodes based on label and ID only and updates properties with `SET`, aligning the behavior with other graph database integrations. The `_format_properties` method was also modified to handle id overrides. Impact: This fix ensures data integrity by preventing duplicate nodes, and provides a consistent behavior across graph database integrations.	2024-12-17 09:21:59 -05:00
gsa9989	cdf6202156	cosmosdbnosql: Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook (#24424 ) * Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook --------- Co-authored-by: Aayush Kataria <aayushkataria3011@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 21:57:05 -05:00
Brian Burgin	27a9056725	community: Fix ChatLiteLLMRouter runtime issues (#28163 ) Description: Fix ChatLiteLLMRouter ctor validation and model_name parameter Issue: #19356, #27455, #28077 Twitter handle: @bburgin_0	2024-12-16 18:17:39 -05:00
Mikhail Khludnev	00deacc67e	docs, external: introduce `langchain-localai` (#28751 ) Thank you for contributing to LangChain! Referring to https://github.com/mkhludnev/langchain-localai --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 22:22:37 +00:00
Erick Friis	d4b5e7ef22	community: recommend RedisVectorStore over Redis (#28749 )	2024-12-16 21:08:30 +00:00
Hiros	8f5e72de05	community: Correctly handle multi-element rich text (#25762 ) Description: - Add _concatenate_rich_text method to combine all elements in rich text arrays - Update load_page method to use _concatenate_rich_text for rich text properties - Ensure all text content is captured, including inline code and formatted text - Add unit tests to verify correct handling of multi-element rich text This fix prevents truncation of content after backticks or other formatting elements. Issue: Using Notion DB Loader, the text for `richtext` and `title` is truncated after 1st element was loaded as Notion Loader only read the first element. Dependencies: any dependencies required for this change None. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 20:20:27 +00:00
Antonio Lanza	b2102b8cc4	text-splitters: Inconsistent results with `NLTKTextSplitter`'s `add_start_index=True` (#27782 ) This PR closes #27781 # Problem The current implementation of `NLTKTextSplitter` is using `sent_tokenize`. However, this `sent_tokenize` doesn't handle chars between 2 tokenized sentences... hence, this behavior throws errors when we are using `add_start_index=True`, as described in issue #27781. In particular: ```python from nltk.tokenize import sent_tokenize output1 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english") print(output1) output2 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english") print(output2) >>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.'] >>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.'] ``` # Solution With this new `use_span_tokenize` parameter, we can use NLTK to create sentences (with `span_tokenize`), but also add extra chars to be sure that we still can map the chunks to the original text. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-12-16 19:53:15 +00:00
Tari Yekorogha	d262d41cc0	community: added FalkorDB vector store support i.e implementation, test, docs an… (#26245 ) Description: Added support for FalkorDB Vector Store, including its implementation, unit tests, documentation, and an example notebook. The FalkorDB integration allows users to efficiently manage and query embeddings in a vector database, with relevance scoring and maximal marginal relevance search. The following components were implemented: - Core implementation for FalkorDBVector store. - Unit tests ensuring proper functionality and edge case coverage. - Example notebook demonstrating an end-to-end setup, search, and retrieval using FalkorDB. Twitter handle: @tariyekorogha --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 19:37:55 +00:00
Aaron Pham	12fced13f4	chore(community): update to OpenLLM 0.6 (#24609 ) Update to OpenLLM 0.6, which we decides to make use of OpenLLM's OpenAI-compatible endpoint. Thus, OpenLLM will now just become a thin wrapper around OpenAI wrapper. Signed-off-by: Aaron Pham <contact@aarnphm.xyz> --------- Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-16 14:30:07 -05:00
Lvlvko	5c17a4ace9	community: support Hunyuan Embedding (#23160 ) ## description - I refactor `Chathunyuan` using tencentcloud sdk because I found the original one can't work in my application - I add `HunyuanEmbeddings` using tencentcloud sdk - Both of them are extend the basic class of langchain. I have fully tested them in my application ## Dependencies - tencentcloud-sdk-python --------- Co-authored-by: centonhuang <centonhuang@tencent.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 19:27:19 +00:00
Harrison Chase	de7996c2ca	core: add kwargs support to VectorStore (#25934 ) has been missing the passthrough until now --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 18:57:57 +00:00
Lorenzo	b79a1156ed	community: correct return type of get_files_from_directory in github tool (#27885 ) ### About: - Description: the _get_files_from_directory_ method return a string, but it's used in other methods that expect a List[str] - Issue: None - Dependencies: None This pull request import a new method _list_files_ with the old logic of _get_files_from_directory_, but it return a List[str] at the end. The behavior of _ get_files_from_directory_ is not changed.	2024-12-16 10:30:33 -08:00
Sheepsta300	580a8d53f9	community: Add configurable `VisualFeatures` to the `AzureAiServicesImageAnalysisTool` (#27444 ) Thank you for contributing to LangChain! - [ ] PR title: community: Add configurable `VisualFeatures` to the `AzureAiServicesImageAnalysisTool` - [ ] PR message: - Description: The `AzureAiServicesImageAnalysisTool` is a good service and utilises the Azure AI Vision package under the hood. However, since the creation of this tool, new `VisualFeatures` have been added to allow the user to request other image specific information to be returned. Currently, the tool offers neither configuration of which features should be return nor does it offer any newer feature types. The aim of this PR is to address this and expose more of the Azure Service in this integration. - Dependencies: no new dependencies in the main class file, azure.ai.vision.imageanalysis added to extra test dependencies file. - [ ] Add tests and docs: If you're adding a new integration, please include 1. Although no tests exist for already implemented Azure Service tools, I've created 3 unit tests for this class that test initialisation and credentials, local file analysis and a test for the new changes/ features option. - [ ] Lint and test: All linting has passed. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 18:30:04 +00:00
Erick Friis	1c120e9615	core: xml output parser tags docstring (#28745 )	2024-12-16 18:25:16 +00:00
Ana	ebab2ea81b	Fix Azure National Cloud authentication using token (RBAC) (Generated by Ana - AI SDE) (#25843 ) This pull request addresses the issue with authenticating Azure National Cloud using token (RBAC) in the AzureSearch vectorstore implementation. ## Changes - Modified the `_get_search_client` method in `azuresearch.py` to pass `additional_search_client_options` to the `SearchIndexClient` instance. ## Implementation Details The patch updates the `SearchIndexClient` initialization to include the `additional_search_client_options` parameter: ```python index_client: SearchIndexClient = SearchIndexClient( endpoint=endpoint, credential=credential, user_agent=user_agent, **additional_search_client_options ) ``` This change allows the `audience` parameter to be correctly passed when using Azure National Cloud, fixing the authentication issues with GovCloud & RBAC. This patch was generated by [Ana - AI SDE](https://openana.ai/), an AI-powered software development assistant. This is a fix for [Issue 25823](https://github.com/langchain-ai/langchain/issues/25823) --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-16 18:22:24 +00:00
chenzimin	169d419581	community: Remove all other keys in ChatLiteLLM and add api_key (#28097 ) Thank you for contributing to LangChain! - PR title: "community: Remove all other keys in ChatLiteLLM and add api_key" - PR message: Currently, no api_key are passed to LiteLLM, and LiteLLM only takes on api_key parameter. Therefore I removed all current `*_api_key` attributes (They are not used), and added `api_key` that is passed to ChatLiteLLM. - Should fix issue #27826 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 17:54:29 +00:00
German Martin	d5d18c62b3	community: Apache AGE wrapper additional edge cases. (#28151 ) Description: Current AGEGraph() implementation does some custom wrapping for graph queries. The method here is _wrap_query() as it parse the field from the original query to add some SQL context to it. This improves the current parsing logic to cover additional edge cases that are added to the test coverage, basically if any Node property name or value has the "return" literal in it will break the graph / SQL query. We discovered this while dealing with real world datasets, is not an uncommon scenario and I think it needs to be covered.	2024-12-16 11:28:01 -05:00
Rock2z	768e4a7fd4	[community][fix] Compatibility support to bump up wikibase-rest-api-client version (#27316 ) Description: This PR addresses the `TypeError: sequence item 0: expected str instance, FluentValue found` error when invoking `WikidataQueryRun`. The root cause was an incompatible version of the `wikibase-rest-api-client`, which caused the tool to fail when handling `FluentValue` objects instead of strings. The current implementation only supports `wikibase-rest-api-client<0.2`, but the latest version is `0.2.1`, where the current implementation breaks. Additionally, the error message advises users to install the latest version: [code reference](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/wikidata.py#L125C25-L125C32). Therefore, this PR updates the tool to support the latest version of `wikibase-rest-api-client`. Key changes: - Updated the handling of `FluentValue` objects to ensure compatibility with the latest `wikibase-rest-api-client`. - Removed the restriction to `wikibase-rest-api-client<0.2` and updated to support the latest version (`0.2.1`). Issue: Fixes [#24093](https://github.com/langchain-ai/langchain/issues/24093) – `TypeError: sequence item 0: expected str instance, FluentValue found`. Dependencies: - Upgraded `wikibase-rest-api-client` to the latest version to resolve the issue. --------- Co-authored-by: peiwen_zhang <peiwen_zhang@email.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 16:22:18 +00:00
André Quintino	a26c786bc5	community: refactor opensearch query constructor to use wildcard instead of match in the contain comparator (#26653 ) - Description: Changed the comparator to use a wildcard query instead of match. This modification allows for partial text matching on analyzed fields, which improves the flexibility of the search by performing full-text searches that aren't limited to exact matches. - Issue: The previous implementation used a match query, which performs exact matches on analyzed fields. This approach limited the search capabilities by requiring the query terms to align with the indexed text. The modification to use a wildcard query instead addresses this limitation. The wildcard query allows for partial text matching, which means the search can return results even if only a portion of the term matches the text. This makes the search more flexible and suitable for use cases where exact matches aren't necessary or expected, enabling broader full-text searches across analyzed fields. In short, the problem was that match queries were too restrictive, and the change to wildcard queries enhances the ability to perform partial matches. - Dependencies: none - Twitter handle: @Andre_Q_Pereira --------- Co-authored-by: André Quintino <andre.quintino@tui.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 11:16:34 -05:00
Davi Schumacher	0f9b4bf244	community[patch]: update dynamodb chat history to update instead of overwrite (#22397 ) Description: The current implementation of `DynamoDBChatMessageHistory` updates the `History` attribute for a given chat history record by first extracting the existing contents into memory, appending the new message, and then using the `put_item` method to put the record back. This has the effect of overwriting any additional attributes someone may want to include in the record, like chat session metadata. This PR suggests changing from using `put_item` to using `update_item` instead which will keep any other attributes in the record untouched. The change is backward compatible since 1. `update_item` is an "upsert" operation, creating the record if it doesn't already exist, otherwise updating it 2. It only touches the db insert call and passes the exact same information. The rest of the class is left untouched Dependencies: None Tests and docs: No unit tests currently exist for the `DynamoDBChatMessageHistory` class. This PR adds the file `libs/community/tests/unit_tests/chat_message_histories/test_dynamodb_chat_message_history.py` to test the `add_message` and `clear` methods. I wanted to use the moto library to mock DynamoDB calls but I could not get poetry to resolve it so I mocked those calls myself in the test. Therefore, no test dependencies were added. The change was tested on a test DynamoDB table as well. The first three images below show the current behavior. First a message is added to chat history, then a value is inserted in the record in some other attribute, and finally another message is added to the record, destroying the other attribute. ![using_put_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/426acd62-fe29-42f4-b75f-863fb8b3fb21) ![using_put_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/f8a1c864-7114-4fe3-b487-d6f9252f8f92) ![using_put_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/8b691e08-755e-4877-8969-0e9769e5d28a) The next three images show the new behavior. Once again a value is added to an attribute other than the History attribute, but now when the followup message is added it does not destroy that other attribute. The History attribute itself is unaffected by this change. ![using_update_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/3e0d76ed-637e-41cd-82c7-01a86c468634) ![using_update_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/52585f9b-71a2-43f0-9dfc-9935aa59c729) ![using_update_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/f94c8147-2d6f-407a-9a0f-86b94341abff) The doc located at `docs/docs/integrations/memory/aws_dynamodb.ipynb` required no changes and was tested as well.	2024-12-16 10:38:00 -05:00
Christophe Bornet	6ddd5dbb1e	community: Add FewShotSQLTool (#28232 ) The `FewShotSQLTool` gets some SQL query examples from a `BaseExampleSelector` for a given question. This is useful to provide [few-shot examples](https://python.langchain.com/docs/how_to/sql_prompting/#few-shot-examples) capability to an SQL agent. Example usage: ```python from langchain.agents.agent_toolkits.sql.prompt import SQL_PREFIX embeddings = OpenAIEmbeddings() example_selector = SemanticSimilarityExampleSelector.from_examples( examples, embeddings, AstraDB, k=5, input_keys=["input"], collection_name="lc_few_shots", token=ASTRA_DB_APPLICATION_TOKEN, api_endpoint=ASTRA_DB_API_ENDPOINT, ) few_shot_sql_tool = FewShotSQLTool( example_selector=example_selector, description="Input to this tool is the input question, output is a few SQL query examples related to the input question. Always use this tool before checking the query with sql_db_query_checker!" ) agent = create_sql_agent( llm=llm, db=db, prefix=SQL_PREFIX + "\nYou MUST get some example queries before creating the query.", extra_tools=[few_shot_sql_tool] ) result = agent.invoke({"input": "How many artists are there?"}) ``` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 15:37:21 +00:00
Mohammad Mohtashim	8d746086ab	Added `bind_tools` support for `ChatMLX` along with small fix in `_stream` (#28743 ) - Description: Added Support for `bind_tool` as requested in the issue. Plus two issue in `_stream` were fixed: - Corrected the Positional Argument Passing for `generate_step` - Accountability if `token` returned by `generate_step` is integer. - Issue: #28692	2024-12-16 09:52:49 -05:00
Jorge Piedrahita Ortiz	558b65ea32	community: SamabaStudio Tool Calling and Structured Output (#28025 ) Description: Add tool calling and structured output support for SambaStudio chat models, docs included --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 06:15:19 +00:00
clairebehue	fb44e74ca4	community: fix AzureSearch Oauth with azure_ad_access_token (#26995 ) Description: AzureSearch vector store: create a wrapper class on `azure.core.credentials.TokenCredential` (which is not-instantiable) to fix Oauth usage with `azure_ad_access_token` argument Issue: [the issue it fixes](https://github.com/langchain-ai/langchain/issues/26216) Dependencies: None - [x] Lint and test --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 05:56:45 +00:00
SirSmokeAlot	29305cd948	community: O365Toolkit - send_event - fixed timezone error (#25876 ) Description: Fixed formatting start and end time Issue: The old formatting resulted everytime in an timezone error Dependencies: / Twitter handle: / --------- Co-authored-by: Yannick Opitz <yannick.opitz@gob.de> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 05:32:28 +00:00
Erick Friis	4f6ccb7080	text-splitters: extended-tests without socket (#28736 )	2024-12-16 05:19:50 +00:00
Erick Friis	8ec1c72e03	text-splitters: test without socket (#28732 )	2024-12-15 22:10:35 +00:00
Aayush Kataria	d417e4b372	Community: Azure CosmosDB No Sql Vector Store: Full Text and Hybrid Search Support (#28716 ) Thank you for contributing to LangChain! - Added [full text](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/full-text-search) and [hybrid search](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search) support for Azure CosmosDB NoSql Vector Store - Added a new enum called CosmosDBQueryType which supports the following values: - VECTOR = "vector" - FULL_TEXT_SEARCH = "full_text_search" - FULL_TEXT_RANK = "full_text_rank" - HYBRID = "hybrid" - User now needs to provide this query_type to the similarity_search method for the vectorStore to make the correct query api call. - Added a couple of work arounds as for the FULL_TEXT_RANK and HYBRID query functions we don't support parameterized queries right now. I have added TODO's in place, and will remove these work arounds by end of January. - Added necessary test cases and updated the - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-12-15 13:26:32 -08:00
Mohammad Mohtashim	4c1871d9a8	community: Passing the `model_kwargs` correctly while maintaing backward compatability (#28439 ) - Description: `Model_Kwargs` was not being passed correctly to `sentence_transformers.SentenceTransformer` which has been corrected while maintaing backward compatability - Issue: #28436 --------- Co-authored-by: MoosaTae <sadhis.tae@gmail.com> Co-authored-by: Sadit Wongprayon <101176694+MoosaTae@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-15 20:34:29 +00:00
nhols	a3851cb3bc	community: FAISS vectorstore - consistent Document id field (#28728 ) make sure id field of Documents in `FAISS` docstore have the same id as values in `index_to_docstore_id`, implement `get_by_ids` method	2024-12-15 12:23:49 -08:00
Bagatur	a0534ae62a	community[patch]: Release 0.3.12 (#28725 )	2024-12-14 22:13:20 +00:00
Bagatur	089e659e03	langchain[patch]: Release 0.3.12 (#28724 )	2024-12-14 20:02:18 +00:00
Bagatur	679e3a9970	text-splitters[patch]: Release 0.3.3 (#28723 )	2024-12-14 19:20:22 +00:00
Erick Friis	387284c259	core: release 0.3.25 (#28718 )	2024-12-14 02:22:28 +00:00
Nawaf Alharbi	decd77c515	community: fix an issue with deepinfra integration (#28715 ) Thank you for contributing to LangChain! - [x] PR title: langchain: add URL parameter to ChatDeepInfra class - [x] PR message: add URL parameter to ChatDeepInfra class - Description: This PR introduces a url parameter to the ChatDeepInfra class in LangChain, allowing users to specify a custom URL. Previously, the URL for the DeepInfra API was hardcoded to "https://stage.api.deepinfra.com/v1/openai/chat/completions", which caused issues when the staging endpoint was not functional. The _url method was updated to return the value from the url parameter, enabling greater flexibility and addressing the problem. out! --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-14 02:15:29 +00:00
Ben Chambers	008efada2c	[community]: Render documents to graphviz (#24830 ) - Description: Adds a helper that renders documents with the GraphVectorStore metadata fields to Graphviz for visualization. This is helpful for understanding and debugging. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-14 02:02:09 +00:00
Erick Friis	288f204758	docs, community: aerospike docs update (#28717 ) Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com> Co-authored-by: Jesse S <jschmidt@aerospike.com> Co-authored-by: dylan <dwelch@aerospike.com>	2024-12-14 00:27:37 +00:00
Vimpas	337fed80a5	community: 🐛 PDF Filter Type Error (#27154 ) Thank you for contributing to LangChain! PR title: "community: fix PDF Filter Type Error" - Description: fix PDF Filter Type Error" - Issue: the issue #27153 it fixes, - Dependencies: no - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 23:30:29 +00:00
Ryan Parker	12111cb922	community: fallback on core async atransform_documents method for `MarkdownifyTransformer` (#27866 ) # Description Implements the `atransform_documents` method for `MarkdownifyTransformer` using the `asyncio` built-in library for concurrency. Note that this is mainly for API completeness when working with async frameworks rather than for performance, since the `markdownify` function is not I/O bound because it works with `Document` objects already in memory. # Issue Fixes #27865 # Dependencies No new dependencies added, but [`markdownify`](https://github.com/matthewwithanm/python-markdownify) is required since this PR updates the `markdownify` integration. # Tests and docs - Tests added - I did not modify the docstrings since they already described the basic functionality, and [the API docs also already included a description](https://python.langchain.com/api_reference/community/document_transformers/langchain_community.document_transformers.markdownify.MarkdownifyTransformer.html#langchain_community.document_transformers.markdownify.MarkdownifyTransformer.atransform_documents). If it would be helpful, I would be happy to update the docstrings and/or the API docs. # Lint and test - [x] format - [x] lint - [x] test I ran formatting with `make format`, linting with `make lint`, and confirmed that tests pass using `make test`. Note that some unit tests pass in CI but may fail when running `make_test`. Those unit tests are: - `test_extract_html` (and `test_extract_html_async`) - `test_strip_tags` (and `test_strip_tags_async`) - `test_convert_tags` (and `test_convert_tags_async`) The reason for the difference is that there are trailing spaces when the tests are run in the CI checks, and no trailing spaces when run with `make test`. I ensured that the tests pass in CI, but they may fail with `make test` due to the addition of trailing spaces. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 22:32:22 +00:00
Manuel	af2e0a7ede	partners: add 'model' alias for consistency in embedding classes (#28374 ) Description: This PR introduces a `model` alias for the embedding classes that contain the attribute `model_name`, to ensure consistency across the codebase, as suggested by a moderator in a previous PR. The change aligns the usage of attribute names across the project (see for example [here](`65deeddd5d/libs/partners/groq/langchain_groq/chat_models.py (L304)`)). Issue: This PR addresses the suggestion from the review of issue #28269. Dependencies: None --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 22:30:00 +00:00
Erick Friis	3107d78517	huggingface: fix standard test lint (#28714 )	2024-12-13 22:18:54 +00:00
Kaiwei Zhang	b909d54e70	chroma[patch]: Update logic for assigning ids	2024-12-13 21:58:34 +00:00
Karthik Bharadhwaj	498f0249e2	community[minor]: Opensearch hybridsearch implementation (#25375 ) community: add hybrid search in opensearch # Langchain OpenSearch Hybrid Search Implementation ## Implementation of Hybrid Search: I have taken LangChain's OpenSearch integration to the next level by adding hybrid search capabilities. Building on the existing OpenSearchVectorSearch class, I have implemented Hybrid Search functionality (which combines the best of both keyword and semantic search). This new functionality allows users to harness the power of OpenSearch's advanced hybrid search features without leaving the familiar LangChain ecosystem. By blending traditional text matching with vector-based similarity, the enhanced class delivers more accurate and contextually relevant results. It's designed to seamlessly fit into existing LangChain workflows, making it easy for developers to upgrade their search capabilities. In implementing the hybrid search for OpenSearch within the LangChain framework, I also incorporated filtering capabilities. It's important to note that according to the OpenSearch hybrid search documentation, only post-filtering is supported for hybrid queries. This means that the filtering is applied after the hybrid search results are obtained, rather than during the initial search process. Note: For the implementation of hybrid search, I strictly followed the official OpenSearch Hybrid search documentation and I took inspiration from https://github.com/AndreasThinks/langchain/tree/feature/opensearch_hybrid_search Thanks Mate! ### Experiments I conducted few experiments to verify that the hybrid search implementation is accurate and capable of reproducing the results of both plain keyword search and vector search. Experiment - 1 Hybrid Search Keyword_weight: 1, vector_weight: 0 I conducted an experiment to verify the accuracy of my hybrid search implementation by comparing it to a plain keyword search. For this test, I set the keyword_weight to 1 and the vector_weight to 0 in the hybrid search, effectively giving full weightage to the keyword component. The results from this hybrid search configuration matched those of a plain keyword search, confirming that my implementation can accurately reproduce keyword-only search results when needed. It's important to note that while the results were the same, the scores differed between the two methods. This difference is expected because the plain keyword search in OpenSearch uses the BM25 algorithm for scoring, whereas the hybrid search still performs both keyword and vector searches before normalizing the scores, even when the vector component is given zero weight. This experiment validates that my hybrid search solution correctly handles the keyword search component and properly applies the weighting system, demonstrating its accuracy and flexibility in emulating different search scenarios. Experiment - 2 Hybrid Search keyword_weight = 0.0, vector_weight = 1.0 For experiment-2, I took the inverse approach to further validate my hybrid search implementation. I set the keyword_weight to 0 and the vector_weight to 1, effectively giving full weightage to the vector search component (KNN search). I then compared these results with a pure vector search. The outcome was consistent with my expectations: the results from the hybrid search with these settings exactly matched those from a standalone vector search. This confirms that my implementation accurately reproduces vector search results when configured to do so. As with the first experiment, I observed that while the results were identical, the scores differed between the two methods. This difference in scoring is expected and can be attributed to the normalization process in hybrid search, which still considers both components even when one is given zero weight. This experiment further validates the accuracy and flexibility of my hybrid search solution, demonstrating its ability to effectively emulate pure vector search when needed while maintaining the underlying hybrid search structure. Experiment - 3 Hybrid Search - balanced keyword_weight = 0.5, vector_weight = 0.5 For experiment-3, I adopted a balanced approach to further evaluate the effectiveness of my hybrid search implementation. In this test, I set both the keyword_weight and vector_weight to 0.5, giving equal importance to keyword-based and vector-based search components. This configuration aims to leverage the strengths of both search methods simultaneously. By setting both weights to 0.5, I intended to create a scenario where the hybrid search would consider lexical matches and semantic similarity equally. This balanced approach is often ideal for many real-world applications, as it can capture both exact keyword matches and contextually relevant results that might not contain the exact search terms. Kindly verify the notebook for the experiments conducted! Notebook: https://github.com/karthikbharadhwajKB/Langchain_OpenSearch_Hybrid_search/blob/main/Opensearch_Hybridsearch.ipynb ### Instructions to follow for Performing Hybrid Search: Step-1: Instantiating OpenSearchVectorSearch Class: ```python opensearch_vectorstore = OpenSearchVectorSearch( index_name=os.getenv("INDEX_NAME"), embedding_function=embedding_model, opensearch_url=os.getenv("OPENSEARCH_URL"), http_auth=(os.getenv("OPENSEARCH_USERNAME"),os.getenv("OPENSEARCH_PASSWORD")), use_ssl=False, verify_certs=False, ssl_assert_hostname=False, ssl_show_warn=False ) ``` Parameters: 1. index_name: The name of the OpenSearch index to use. 2. embedding_function: The function or model used to generate embeddings for the documents. It's assumed that embedding_model is defined elsewhere in the code. 3. opensearch_url: The URL of the OpenSearch instance. 4. http_auth: A tuple containing the username and password for authentication. 5. use_ssl: Set to False, indicating that the connection to OpenSearch is not using SSL/TLS encryption. 6. verify_certs: Set to False, which means the SSL certificates are not being verified. This is often used in development environments but is not recommended for production. 7. ssl_assert_hostname: Set to False, disabling hostname verification in SSL certificates. 8. ssl_show_warn: Set to False, suppressing SSL-related warnings. Step-2: Configure Search Pipeline: To initiate hybrid search functionality, you need to configures a search pipeline first. Implementation Details: This method configures a search pipeline in OpenSearch that: 1. Normalizes the scores from both keyword and vector searches using the min-max technique. 2. Applies the specified weights to the normalized scores. 3. Calculates the final score using an arithmetic mean of the weighted, normalized scores. Parameters: * pipeline_name (str): A unique identifier for the search pipeline. It's recommended to use a descriptive name that indicates the weights used for keyword and vector searches. * keyword_weight (float): The weight assigned to the keyword search component. This should be a float value between 0 and 1. In this example, 0.3 gives 30% importance to traditional text matching. * vector_weight (float): The weight assigned to the vector search component. This should be a float value between 0 and 1. In this example, 0.7 gives 70% importance to semantic similarity. ```python opensearch_vectorstore.configure_search_pipelines( pipeline_name="search_pipeline_keyword_0.3_vector_0.7", keyword_weight=0.3, vector_weight=0.7, ) ``` Step-3: Performing Hybrid Search: After creating the search pipeline, you can perform a hybrid search using the `similarity_search()` method (or) any methods that are supported by `langchain`. This method combines both `keyword-based and semantic similarity` searches on your OpenSearch index, leveraging the strengths of both traditional information retrieval and vector embedding techniques. parameters: * query: The search query string. * k: The number of top results to return (in this case, 3). * search_type: Set to `hybrid_search` to use both keyword and vector search capabilities. * search_pipeline: The name of the previously created search pipeline. ```python query = "what are the country named in our database?" top_k = 3 pipeline_name = "search_pipeline_keyword_0.3_vector_0.7" matched_docs = opensearch_vectorstore.similarity_search_with_score( query=query, k=top_k, search_type="hybrid_search", search_pipeline = pipeline_name ) matched_docs ``` twitter handle: @iamkarthik98 --------- Co-authored-by: Karthik Kolluri <karthik.kolluri@eidosmedia.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 16:34:12 -05:00
Philippe PRADOS	f3fb5a9c68	community[minor]: Fix json._validate_metadata_func() (#22842 ) JSONparse, in _validate_metadata_func(), checks the consistency of the _metadata_func() function. To do this, it invokes it and makes sure it receives a dictionary in response. However, during the call, it does not respect future calls, as shown on line 100. This generates errors if, for example, the function is like this: ```python def generate_metadata(json_node:Dict[str,Any],kwargs:Dict[str,Any]) -> Dict[str,Any]: return { "source": url, "row": kwargs['seq_num'], "question":json_node.get("question"), } loader = JSONLoader( file_path=file_path, content_key="answer", jq_schema='.[]', metadata_func=generate_metadata, text_content=False) ``` To avoid this, the verification must comply with the specifications. This patch does just that. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 21:24:20 +00:00
Keiichi Hirobe	67fd554512	core[patch]: throw exception indexing code if deletion fails in vectorstore (#28103 ) The delete methods in the VectorStore and DocumentIndex interfaces return a status indicating the result. Therefore, we can assume that their implementations don't throw exceptions but instead return a result indicating whether the delete operations have failed. The current implementation doesn't check the returned value, so I modified it to throw an exception when the operation fails. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 16:14:27 -05:00
Keiichi Hirobe	258b3be5ec	core[minor]: add new clean up strategy "scoped_full" to indexing (#28505 ) ~Note that this PR is now Draft, so I didn't add change to `aindex` function and didn't add test codes for my change. After we have an agreement on the direction, I will add commits.~ `batch_size` is very difficult to decide because setting a large number like >10000 will impact VectorDB and RecordManager, while setting a small number will delete records unnecessarily, leading to redundant work, as the `IMPORTANT` section says. On the other hand, we can't use `full` because the loader returns just a subset of the dataset in our use case. I guess many people are in the same situation as us. So, as one of the possible solutions for it, I would like to introduce a new argument, `scoped_full_cleanup`. This argument will be valid only when `claneup` is Full. If True, Full cleanup deletes all documents that haven't been updated AND that are associated with source ids that were seen during indexing. Default is False. This change keeps backward compatibility. --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 20:35:25 +00:00
Eugene Yurtsev	ce90b25313	core[patch]: Update error message in indexing code for unreachable code assertion (#28712 ) Minor update for error message that should never be triggered	2024-12-13 20:21:14 +00:00
Keiichi Hirobe	da28cf1f54	core[patch]: Reverts PR #25754 and add unit tests (#28702 ) I reported the bug 2 weeks ago here: https://github.com/langchain-ai/langchain/issues/28447 I believe this is a critical bug for the indexer, so I submitted a PR to revert the change and added unit tests to prevent similar bugs from being introduced in the future. @eyurtsev Could you check this?	2024-12-13 15:13:06 -05:00
ScriptShi	b0a298894d	community[minor]: Add TablestoreVectorStore (#25767 ) Thank you for contributing to LangChain! - [x] PR title: community: add TablestoreVectorStore - [x] PR message: - Description: add TablestoreVectorStore - Dependencies: none - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration: yes 2. an example notebook showing its use: yes If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-12-13 11:17:28 -08:00
Erick Friis	86b3c6e81c	community: make old stub for QuerySQLDataBaseTool private to skip api ref (#28711 )	2024-12-13 10:43:23 -08:00
Martin Triska	05ebe1e66b	Community: add `modified_since` argument to `O365BaseLoader` (#28708 ) ## What are we doing in this PR We're adding `modified_since` optional argument to `O365BaseLoader`. When set, O365 loader will only load documents newer than `modified_since` datetime. ## Why? OneDrives / Sharepoints can contain large number of documents. Current approach is to download and parse all files and let indexer to deal with duplicates. This can be prohibitively time-consuming. Especially when using OCR-based parser like [zerox](`fa06188834/libs/community/langchain_community/document_loaders/pdf.py (L948)`). This argument allows to skip documents that are older than known time of indexing. _Q: What if a file was modfied during last indexing process? A: Users can set the `modified_since` conservatively and indexer will still take care of duplicates._ If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 17:30:17 +00:00
Bagatur	fa06188834	community[patch]: fix QuerySQLDatabaseTool name (#28659 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-12 19:16:03 -08:00
Erick Friis	48ab91b520	docs: more useful vercel warnings (#28699 )	2024-12-13 03:07:24 +00:00
Michael Chin	28cb2cefc6	docs: Fix stack diagram in community README (#28685 ) - Description: The stack diagram illustration in the community README fails to render due to an invalid branch reference. This PR replaces the broken image link with a valid one referencing master branch.	2024-12-12 13:33:50 -08:00
Botong Zhu	13c3c4a210	community: fixes json loader not getting texts with json standard (#27327 ) This PR fixes JSONLoader._get_text not converting objects to json string correctly. If an object is serializable and is not a dict, JSONLoader will use python built-in str() method to convert it to string. This may cause object converted to strings not following json standard. For example, a list will be converted to string with single quotes, and if json.loads try to load this string, it will cause error. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 19:33:45 +00:00
Lorenzo	4149c0dd8d	community: add method to create branch and list files for gitlab tool (#27883 ) ### About - Description: In the Gitlab utilities used for the Gitlab tool there are no methods to create branches, list branches and files, as this is already done for Github - Issue: None - Dependencies: None This Pull request add the methods: - create_branch - list_branches_in_repo - set_active_branch - list_files_in_main_branch - list_files_in_bot_branch - list_files_from_directory --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 19:11:35 +00:00
Prathamesh Nimkar	ca054ed1b1	community: ChatSnowflakeCortex - Add streaming functionality (#27753 ) Description: snowflake.py Add _stream and _stream_content methods to enable streaming functionality fix pydantic issues and added functionality with the overall langchain version upgrade added bind_tools method for agentic workflows support through langgraph updated the _generate method to account for agentic workflows support through langgraph cosmetic changes to comments and if conditions snowflake.ipynb Added _stream example cosmetic changes to comments fixed lint errors check_pydantic.sh Decreased counter from 126 to 125 as suggested when formatting --------- Co-authored-by: Prathamesh Nimkar <prathamesh.nimkar@snowflake.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 18:35:40 -08:00
Wang, Yi	d834c6b618	huggingface: fix tool argument serialization in _convert_TGI_message_to_LC_message (#26075 ) Currently `_convert_TGI_message_to_LC_message` replaces `'` in the tool arguments, so an argument like "It's" will be converted to `It"s` and could cause a json parser to fail. --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-12-11 18:34:32 -08:00
Lakindu Boteju	5a31792bf1	community: Add support for cross-region inference profile IDs in Bedrock Anthropic Claude token cost calculation (#28167 ) This change modifies the token cost calculation logic to support cross-region inference profile IDs for Anthropic Claude models. Instead of explicitly listing all regional variants of new inference profile IDs in the cost dictionaries, the code now extracts a base model ID from the input model ID (or inference profile ID), making it more maintainable and automatically supporting new regional variants. These inference profile IDs follow the format: `<region>.<vendor>.<model-name>` (e.g., `us.anthropic.claude-3-haiku-xxx`, `eu.anthropic.claude-3-sonnet-xxx`). Cross-region inference profiles are system-defined identifiers that enable distributing model inference requests across multiple AWS regions. They help manage unplanned traffic bursts and enhance resilience during peak demands without additional routing costs. References for Amazon Bedrock's cross-region inference profiles:- - https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html - https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 02:33:50 +00:00
fatmelon	d1e0ec7b55	community: VectorStores: Azure Cosmos DB Mongo vCore with DiskANN (#27329 ) # Description Add a new vector index type `diskann` to Azure Cosmos DB Mongo vCore vector store. Paper of DiskANN can be found here [DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf). ## Sample Usage ```python from pymongo import MongoClient # INDEX_NAME = "izzy-test-index-2" # NAMESPACE = "izzy_test_db.izzy_test_collection" # DB_NAME, COLLECTION_NAME = NAMESPACE.split(".") client: MongoClient = MongoClient(CONNECTION_STRING) collection = client[DB_NAME][COLLECTION_NAME] model_deployment = os.getenv( "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada" ) model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002") vectorstore = AzureCosmosDBVectorSearch.from_documents( docs, openai_embeddings, collection=collection, index_name=INDEX_NAME, ) # Read more about these variables in detail here. https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search maxDegree = 40 dimensions = 1536 similarity_algorithm = CosmosDBSimilarityType.COS kind = CosmosDBVectorSearchType.VECTOR_DISKANN lBuild = 20 vectorstore.create_index( dimensions=dimensions, similarity=similarity_algorithm, kind=kind , max_degree=maxDegree, l_build=lBuild, ) ``` ## Dependencies No additional dependencies were added --------- Co-authored-by: Yang Qiao (from Dev Box) <yangqiao@microsoft.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 01:54:04 +00:00
manukychen	ba9b95cd23	Community: Adding bulk_size as a setable param for OpenSearchVectorSearch (#28325 ) Description: When using langchain.retrievers.parent_document_retriever.py with vectorstore is OpenSearchVectorSearch, I found that the bulk_size param I passed into OpenSearchVectorSearch class did not work on my ParentDocumentRetriever.add_documents() function correctly, it will be overwrite with int 500 the function which OpenSearchVectorSearch class had (e.g., add_texts(), add_embeddings()...). So I made this PR requset to fix this, thanks! --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 01:45:22 +00:00
xintoteai	45f9c9ae88	langchain: fixed weaviate (v4) vectorstore import for self-query retriever (#28675 ) Co-authored-by: Xin Heng <xin.heng@gmail.com>	2024-12-11 15:53:41 -08:00
Thomas van Dongen	ee640d6bd3	community: fixed bug in model2vec embedding code (#28670 ) This PR fixes a bug with the current implementation for Model2Vec embeddings where `embed_documents` does not work as expected. - Description: the current implementation uses `encode_as_sequence` for encoding documents. This is incorrect, as `encode_as_sequence` creates token embeddings and not mean embeddings. The normal `encode` function handles both single and batched inputs and should be used instead. The return type was also incorrect, as encode returns a NumPy array. This PR converts the embedding to a list so that the output is consistent with the Embeddings ABC.	2024-12-11 15:50:56 -08:00
Brian Sharon	b20230c800	community: use correct `id_key` when deleting by id in LanceDB wrapper (#28655 ) - Description: The current version of the `delete` method assumes that the id field will always be called `id`. - Issue: n/a - Dependencies: n/a - Twitter handle: ugh, Twitter :D --- Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 23:49:35 +00:00
Mohammad Mohtashim	fa155a422f	[Community]: `requests_kwargs` not being used in _fetch (#28646 ) - Description: `requests_kwargs` is not being passed to `_fetch` which is fetching pages asynchronously. In this PR, making sure that we are passing `requests_kwargs` to `_fetch` just like `_scrape`. - Issue: #28634 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 23:46:54 +00:00
Mohammad Mohtashim	a37afbe353	mistral[minor]: Added Retrying Mechanism in case of Request Rate Limit Error for `MistralAIEmbeddings` (#27818 ) - Description:: In the event of a Rate Limit Error from the MistralAI server, the response JSON raises a KeyError. To address this, a simple retry mechanism has been implemented to handle cases where the request limit is exceeded. - Issue: #27790 --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-11 17:53:42 -05:00
Vincent Zhang	df5008fe55	community[minor]: FAISS Filter Function Enhancement with Advanced Query Operators (#28207 ) ## Description We are submitting as a team of four for a project. Other team members are @RuofanChen03, @LikeWang10067, @TANYAL77. This pull requests expands the filtering capabilities of the FAISS vectorstore by adding MongoDB-style query operators indicated as follows, while including comprehensive testing for the added functionality. - $eq (equals) - $neq (not equals) - $gt (greater than) - $lt (less than) - $gte (greater than or equal) - $lte (less than or equal) - $in (membership in list) - $nin (not in list) - $and (all conditions must match) - $or (any condition must match) - $not (negation of condition) ## Issue This closes https://github.com/langchain-ai/langchain/issues/26379. ## Sample Usage ```python import faiss import asyncio from langchain_community.vectorstores import FAISS from langchain.schema import Document from langchain_huggingface import HuggingFaceEmbeddings embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2") documents = [ Document(page_content="Process customer refund request", metadata={"schema_type": "financial", "handler_type": "refund",}), Document(page_content="Update customer shipping address", metadata={"schema_type": "customer", "handler_type": "update",}), Document(page_content="Process payment transaction", metadata={"schema_type": "financial", "handler_type": "payment",}), Document(page_content="Handle customer complaint", metadata={"schema_type": "customer","handler_type": "complaint",}), Document(page_content="Process invoice payment", metadata={"schema_type": "financial","handler_type": "payment",}) ] async def search(vectorstore, query, schema_type, handler_type, k=2): schema_filter = {"schema_type": {"$eq": schema_type}} handler_filter = {"handler_type": {"$eq": handler_type}} combined_filter = { "$and": [ schema_filter, handler_filter, ] } base_retriever = vectorstore.as_retriever( search_kwargs={"k":k, "filter":combined_filter} ) return await base_retriever.ainvoke(query) async def main(): vectorstore = FAISS.from_texts( texts=[doc.page_content for doc in documents], embedding=embeddings, metadatas=[doc.metadata for doc in documents] ) def printt(title, documents): print(title) if not documents: print("\tNo documents found.") return for doc in documents: print(f"\t{doc.page_content}. {doc.metadata}") printt("Documents:", documents) printt('\nquery="process payment", schema_type="financial", handler_type="payment":', await search(vectorstore, query="process payment", schema_type="financial", handler_type="payment", k=2)) printt('\nquery="customer update", schema_type="customer", handler_type="update":', await search(vectorstore, query="customer update", schema_type="customer", handler_type="update", k=2)) printt('\nquery="refund process", schema_type="financial", handler_type="refund":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="refund", k=2)) printt('\nquery="refund process", schema_type="financial", handler_type="foobar":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="foobar", k=2)) print() if __name__ == "__main__":asyncio.run(main()) ``` ## Output ``` Documents: Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'} Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'} Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'} Handle customer complaint. {'schema_type': 'customer', 'handler_type': 'complaint'} Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'} query="process payment", schema_type="financial", handler_type="payment": Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'} Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'} query="customer update", schema_type="customer", handler_type="update": Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'} query="refund process", schema_type="financial", handler_type="refund": Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'} query="refund process", schema_type="financial", handler_type="foobar": No documents found. ``` --------- Co-authored-by: ruofan chen <ruofan.is.awesome@gmail.com> Co-authored-by: RickyCowboy <like.wang@mail.utoronto.ca> Co-authored-by: Shanni Li <tanya.li@mail.utoronto.ca> Co-authored-by: RuofanChen03 <114096642+ruofanchen03@users.noreply.github.com> Co-authored-by: Like Wang <102838708+likewang10067@users.noreply.github.com>	2024-12-11 17:52:22 -05:00
like	3048a9a26d	community: tongyi multimodal response format fix to support langchain (#28645 ) Description: The multimodal(tongyi) response format "message": {"role": "assistant", "content": [{"text": "图像"}]}}]} is not compatible with LangChain. Dependencies: No --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 21:13:26 +00:00
Bagatur	d0e662e43b	community[patch]: Release 0.3.11 (#28658 )	2024-12-10 20:51:13 +00:00
Bagatur	91227ad7fd	langchain[patch]: Release 0.3.11 (#28657 )	2024-12-10 12:28:14 -08:00
Bagatur	1fbd86a155	core[patch]: Release 0.3.24 (#28656 )	2024-12-10 20:19:21 +00:00
Bagatur	e6a62d8422	core,langchain,community[patch]: allow langsmith 0.2 (#28598 )	2024-12-10 18:50:58 +00:00
ccurme	bc4dc7f4b1	ollama[patch]: permit streaming for tool calls (#28654 ) Resolves https://github.com/langchain-ai/langchain/issues/28543 Ollama recently [released](https://github.com/ollama/ollama/releases/tag/v0.4.6) support for streaming tool calls. Previously we would override the `stream` parameter if tools were passed in. Covered in standard tests here: `c1d348e95d/libs/standard-tests/langchain_tests/integration_tests/chat_models.py (L893-L897)` Before, the test generates one message chunk: ```python [ AIMessageChunk( content='', additional_kwargs={}, response_metadata={ 'model': 'llama3.1', 'created_at': '2024-12-10T17:49:04.468487Z', 'done': True, 'done_reason': 'stop', 'total_duration': 525471208, 'load_duration': 19701000, 'prompt_eval_count': 170, 'prompt_eval_duration': 31000000, 'eval_count': 17, 'eval_duration': 473000000, 'message': Message( role='assistant', content='', images=None, tool_calls=[ ToolCall( function=Function(name='magic_function', arguments={'input': 3}) ) ] ) }, id='run-552bbe0f-8fb2-4105-ada1-fa38c1db444d', tool_calls=[ { 'name': 'magic_function', 'args': {'input': 3}, 'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2', 'type': 'tool_call', }, ], usage_metadata={ 'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187 }, tool_call_chunks=[ { 'name': 'magic_function', 'args': '{"input": 3}', 'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2', 'index': None, 'type': 'tool_call_chunk', } ] ) ] ``` After, it generates two (tool call in one, response metadata in another): ```python [ AIMessageChunk( content='', additional_kwargs={}, response_metadata={}, id='run-9a3f0860-baa1-4bae-9562-13a61702de70', tool_calls=[ { 'name': 'magic_function', 'args': {'input': 3}, 'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0', 'type': 'tool_call', }, ], tool_call_chunks=[ { 'name': 'magic_function', 'args': '{"input": 3}', 'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0', 'index': None, 'type': 'tool_call_chunk', }, ], ), AIMessageChunk( content='', additional_kwargs={}, response_metadata={ 'model': 'llama3.1', 'created_at': '2024-12-10T17:46:43.278436Z', 'done': True, 'done_reason': 'stop', 'total_duration': 514282750, 'load_duration': 16894458, 'prompt_eval_count': 170, 'prompt_eval_duration': 31000000, 'eval_count': 17, 'eval_duration': 464000000, 'message': Message( role='assistant', content='', images=None, tool_calls=None ), }, id='run-9a3f0860-baa1-4bae-9562-13a61702de70', usage_metadata={ 'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187 } ), ] ```	2024-12-10 12:54:37 -05:00
Johannes Mohren	c1d348e95d	doc-loader: retain Azure Doc Intelligence API metadata in Document parser (#28382 ) Description: This PR modifies the doc_intelligence.py parser in the community package to include all metadata returned by the Azure Doc Intelligence API in the Document object. Previously, only the parsed content (markdown) was retained, while other important metadata such as bounding boxes (bboxes) for images and tables was discarded. These image bboxes are crucial for supporting use cases like multi-modal RAG workflows when using Azure Doc Intelligence. The change ensures that all information returned by the Azure Doc Intelligence API is preserved by setting the metadata attribute of the Document object to the entire result returned by the API, rather than an empty dictionary. This extends the parser's utility for complex use cases without breaking existing functionality. Issue: This change does not address a specific issue number, but it resolves a critical limitation in supporting multimodal workflows when using the LangChain wrapper for the Azure API. Dependencies: No additional dependencies are required for this change. --------- Co-authored-by: jmohren <johannes.mohren@aol.de>	2024-12-10 11:22:58 -05:00
Alex Tonkonozhenko	0d20c314dd	Confluence Loader: Fix CQL loading (#27620 ) fix #12082 <!--- If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. -->	2024-12-10 11:05:23 -05:00
Katarina Supe	aba2711e7f	community: update Memgraph integration (#27017 ) Description: - Memgraph no longer relies on `Neo4jGraphStore` but implements `GraphStore`, just like other graph databases. - Memgraph no longer relies on `GraphQAChain`, but implements `MemgraphQAChain`, just like other graph databases. - The refresh schema procedure has been updated to try using `SHOW SCHEMA INFO`. The fallback uses Cypher queries (a combination of schema and Cypher) → LangChain integration no longer relies on MAGE library. - The schema structure has been reformatted. Regardless of the procedures used to get schema, schema structure is the same. - The `add_graph_documents()` method has been implemented. It transforms `GraphDocument` into Cypher queries and creates a graph in Memgraph. It implements the ability to use `baseEntityLabel` to improve speed (`baseEntityLabel` has an index on the `id` property). It also implements the ability to include sources by creating a `MENTIONS` relationship to the source document. - Jupyter Notebook for Memgraph has been updated. - Issue: / - Dependencies: / - Twitter handle: supe_katarina (DX Engineer @ Memgraph) Closes #25606	2024-12-10 10:57:21 -05:00
ccurme	5c6e2cbcda	ollama[patch]: support structured output (#28629 ) - Bump minimum version of `ollama` to 0.4.4 (which also addresses https://github.com/langchain-ai/langchain/issues/28607). - Support recently-released [structured output](https://ollama.com/blog/structured-outputs) feature. This can be accessed by calling `.with_structured_output` with `method="json_schema"` (choice of name [mirrors](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.with_structured_output) what we have for OpenAI's structured output feature). `ChatOllama` previously implemented `.with_structured_output` via the [base implementation](`ec9b41431e/libs/core/langchain_core/language_models/chat_models.py (L1117)`).	2024-12-10 10:36:00 -05:00
Bagatur	24292c4a31	core[patch]: Release 0.3.23 (#28648 )	2024-12-10 10:01:16 +00:00
Bagatur	e24f86e55f	core[patch]: return ToolMessage from tool (#28605 )	2024-12-10 09:59:38 +00:00
Erick Friis	ef2f875dfb	core: deprecate PipelinePromptTemplate (#28644 )	2024-12-10 03:56:48 +00:00
TamagoTorisugi	0f0df2df60	fix: Set default search_type to 'similarity' in as_retriever method of AzureSearch (#28376 ) Description This PR updates the `as_retriever` method in the `AzureSearch` to ensure that the `search_type` parameter defaults to 'similarity' when not explicitly provided. Previously, if the `search_type` was omitted, it did not default to any specific value. So it was inherited from `AzureSearchVectorStoreRetriever`, which defaults to 'hybrid'. This change ensures that the intended default behavior aligns with the expected usage. Issue No specific issue was found related to this change. Dependencies No new dependencies are introduced with this change. --------- Co-authored-by: prrao87 <prrao87@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 03:40:04 +00:00
Prashanth Rao	8c6eec5f25	community: KuzuGraph needs allow_dangerous_requests, add graph documents via LLMGraphTransformer (#27949 ) - [x] PR title: "community: Kuzu - Add graph documents via LLMGraphTransformer" - This PR adds a new method `add_graph_documents` to use the `GraphDocument`s extracted by `LLMGraphTransformer` and store in a Kùzu graph backend. - This allows users to transform unstructured text into a graph that uses Kùzu as the graph store. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: pookam90 <pookam@microsoft.com> Co-authored-by: Pooja Kamath <60406274+Pookam90@users.noreply.github.com> Co-authored-by: hsm207 <hsm207@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 03:15:28 +00:00
Filip Ratajczak	4e743b5427	Core: google docstring parsing fix (#28404 ) Thank you for contributing to LangChain! - [ ] PR title: "core: google docstring parsing fix" - [x] PR message: - Description: Added a solution for invalid parsing of google docstring such as: Args: net_annual_income (float): The user's net annual income (in current year dollars). - Issue: Previous code would return arg = "net_annual_income (float)" which would cause exception in _validate_docstring_args_against_annotations - Dependencies: None If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 00:27:25 +00:00
Arnav Priyadarshi	b78b2f7a28	community[fix]: Update Perplexity to pass parameters into API calls (#28421 ) - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - Description: I realized the invocation parameters were not being passed into `_generate` so I added those in but then realized that the parameters contained some old fields designed for an older openai client which I removed. Parameters work fine now. - Issue: Fixes #28229 - Dependencies: No new dependencies. - Twitter handle: @arch_plane - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 00:23:31 +00:00
Clément Jumel	cf6d1c0ae7	docs: add Linkup integration documentation (#28366 ) ## Description First of all, thanks for the great framework that is LangChain! At [Linkup](https://www.linkup.so/) we're working on an API to connect LLMs and agents to the internet and our partner sources. We'd be super excited to see our API integrated in LangChain! This essentially consists in adding a LangChain retriever and tool, which is done in our own [package](https://pypi.org/project/langchain-linkup/). Here we're simply following the [integration documentation](https://python.langchain.com/docs/contributing/how_to/integrations/) and update the documentation of LangChain to mention the Linkup integration. We do have tests (both units & integration) in our [source code](https://github.com/LinkupPlatform/langchain-linkup), and tried to follow as close as possible the [integration documentation](https://python.langchain.com/docs/contributing/how_to/integrations/) which specifically requests to focus on documentation changes for an integration PR, so I'm not adding tests here, even though the PR checklist seems to suggest so. Feel free to correct me if I got this wrong! By the way, we would be thrilled by being mentioned in the list of providers which have standalone packages [here](https://langchain-git-fork-linkupplatform-cj-doc-langchain.vercel.app/docs/integrations/providers/), is there something in particular for us to do for that? 🙂 ## Twitter handle Linkup_platform <!-- ## PR Checklist Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --!>	2024-12-09 14:36:25 -08:00
Amir Sadeghi	2c49f587aa	community[fix]: could not locate runnable browser (#28289 ) set open_browser to false to resolve "could not locate runnable browser" error while default browser is None Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 21:05:52 +00:00
Martin Triska	75bc6bb191	community: [bugfix] fix source path for office files in O365 (#28260 ) # What problem are we fixing? Currently documents loaded using `O365BaseLoader` fetch source from `file.web_url` (where `file` is `<class 'O365.drive.File'>`). This works well for `.pdf` documents. Unfortunately office documents (`.xlsx`, `.docx` ...) pass their `web_url` in following format: `https://sharepoint_address/sites/path/to/library/root/Doc.aspx?sourcedoc=%XXXXXXXX-1111-1111-XXXX-XXXXXXXXXX%7D&file=filename.xlsx&action=default&mobileredirect=true` This obfuscates the path to the file. This PR utilizes the parrent folder's path and file name to reconstruct the actual location of the file. Knowing the file's location can be crucial for some RAG applications (path to the file can carry information we don't want to loose). @vbarda Could you please look at this one? I'm @-mentioning you since we've already closed some PRs together :-) Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 12:34:59 -08:00
Erick Friis	534b8f4364	standard-tests: release 0.3.7 (#28637 )	2024-12-09 15:12:18 -05:00
Naka Masato	ce3b69aa05	community: add include_labels option to ConfluenceLoader (#28259 ) ## Description: Enable `ConfluenceLoader` to include labels with `include_labels` option (`false` by default for backward compatibility). and the labels are set to `metadata` in the `Document`. e.g. `{"labels": ["l1", "l2"]}` ## Notes Confluence API supports to get labels by providing `metadata.labels` to `expand` query parameter All of the following functions support `expand` in the same way: - confluence.get_page_by_id - confluence.get_all_pages_by_label - confluence.get_all_pages_from_space - cql (internally using [/api/content/search](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content/#api-wiki-rest-api-content-search-get)) ## Issue: No issue related to this PR. ## Dependencies: No changes. ## Twitter handle: [@gymnstcs](https://x.com/gymnstcs) - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 19:35:01 +00:00
Rajendra Kadam	242fee11be	community[minor] Pebblo: Support for new Pinecone class PineconeVectorStore (#28253 ) - Description: Support for new Pinecone class PineconeVectorStore in PebbloRetrievalQA. - Issue: NA - Dependencies: NA - Tests: - Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 19:33:54 +00:00
nikitajoyn	9fcd203556	partners/mistralai: Fix KeyError in Vertex AI stream (#28624 ) - Description: Streaming response from Mistral model using Vertex AI raises KeyError when trying to access `choices` key, that the last chunk doesn't have. The fix is to access the key safely using `get()`. - Issue: https://github.com/langchain-ai/langchain/issues/27886 - Dependencies: - Twitter handle:	2024-12-09 14:14:58 -05:00
maang-h	b64d846347	docs: Standardize MoonshotChat docstring (#28159 ) - Description: Add docstring Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 18:46:25 +00:00
Erick Friis	4c70ffff01	standard-tests: sync/async vectorstore tests conditional (#28636 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-09 18:02:55 +00:00
ccurme	ffb5c1905a	openai[patch]: release 0.2.12 (#28633 )	2024-12-09 12:38:13 -05:00
ccurme	6e6061fe73	openai[patch]: bump minimum SDK version (#28632 ) Resolves https://github.com/langchain-ai/langchain/issues/28625	2024-12-09 11:28:05 -05:00
Mohammad Mohtashim	ec9b41431e	[Core]: Small Docstring Clarification for `BaseTool` (#28148 ) - Description: `kwargs` are not being passed to `run` of the `BaseTool` which has been fixed - Issue: #28114 --------- Co-authored-by: Stevan Kapicic <kapicic.ste1@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 06:10:19 +00:00
Erick Friis	cef21a0b49	cli: warning on app add (#28619 ) instead of #28128	2024-12-09 06:07:14 +00:00
Ankit Dangi	90f162efb6	text-splitters: add pydocstyle linting (#28127 ) As seen in #23188, turned on Google-style docstrings by enabling `pydocstyle` linting in the `text-splitters` package. Each resulting linting error was addressed differently: ignored, resolved, suppressed, and missing docstrings were added. Fixes one of the checklist items from #25154, similar to #25939 in `core` package. Ran `make format`, `make lint` and `make test` from the root of the package `text-splitters` to ensure no issues were found. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 06:01:03 +00:00
WGNW_MG	eabe587787	community[patch]:Fix for get_openai_callback() return token_cost=0.0 when model is gpt-4o-11-20 (#28408 ) - Description: update MODEL_COST_PER_1K_TOKENS for new gpt-4o-11-20. - Issue: with latest gpt-4o-11-20, openai callback return token_cost=0.0 - Dependencies: None (just simple dict fix.) - Twitter handle: I Don't Use Twitter. - (However..., I have a YouTube channel. Could you upload this there, by any chance? https://www.youtube.com/@%EA%B2%9C%EC%B0%BD%EB%B6%80%EA%B3%A0%EB%AC%B8AI%EC%9E%90%EB%AC%B8%EC%84%BC%EC%84%B8)	2024-12-08 20:46:50 -08:00
Fahim Zaman	481c4bfaba	core[patch]: Fixed trim functions, and added corresponding unit test for the solved issue (#28429 ) - Description: - Trim functions were incorrectly deleting nodes with more than 1 outgoing/incoming edge, so an extra condition was added to check for this directly. A unit test "test_trim_multi_edge" was written to test this test case specifically. - Issue: - Fixes #28411 - Fixes https://github.com/langchain-ai/langgraph/issues/1676 - Dependencies: - No changes were made to the dependencies - [x] Unit tests were added to verify the changes. - [x] Updated documentation where necessary. - [x] Ran make format, make lint, and make test to ensure compliance with project standards. --------- Co-authored-by: Tasif Hussain <tasif006@gmail.com>	2024-12-08 20:45:28 -08:00
Marco Perini	2354bb7bfa	partners: 🕷️🦜 ScrapeGraph API Integration (#28559 ) Hi Langchain team! I'm the co-founder and mantainer at [ScrapeGraphAI](https://scrapegraphai.com/). By following the integration [guide](https://python.langchain.com/docs/contributing/how_to/integrations/publish/) on your site, I have created a new lib called [langchain-scrapegraph](https://github.com/ScrapeGraphAI/langchain-scrapegraph). With this PR I would like to integrate Scrapegraph as provider in Langchain, adding the required documentation files. Let me know if there are some changes to be made to be properly integrated both in the lib and in the documentation. Thank you 🕷️🦜 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 02:38:21 +00:00
Abhinav	317a38b83e	community[minor]: Add support for modle2vec embeddings (#28507 ) This PR add an embeddings integration for model2vec, the `Model2vecEmbeddings` class. - Description: [Model2Vec](https://github.com/MinishLab/model2vec) lets you turn any sentence transformer into a really small static model and makes running the model faster. - Issue: - Dependencies: model2vec ([pypi](https://pypi.org/project/model2vec/)) - Twitter handle:: - [x] Add tests and docs: - [Test](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/libs/community/langchain_community/embeddings/model2vec.py), [docs](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/docs/docs/integrations/text_embedding/model2vec.ipynb) - [x] Lint and test: --------- Co-authored-by: Abhinav KM <abhinav.m@zerone-consulting.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-12-09 02:17:22 +00:00
Mohammad Mohtashim	524ee6d9ac	Invalid `tool_choice` being passed to `ChatLiteLLM` (#28198 ) - Description: Invalid `tool_choice` is given to `ChatLiteLLM` to `bind_tools` due to it's parent's class default value being pass through `with_structured_output`. - Issue: #28176	2024-12-07 14:33:40 -05:00
Erick Friis	dd0085a9ff	docs: standard tests to markdown, load templates from files (#28603 )	2024-12-07 01:37:21 +00:00
Erick Friis	5e8553c31a	standard-tests: retriever docstrings (#28596 )	2024-12-07 00:32:19 +00:00
ccurme	d801c6ffc7	tests[patch]: nits (#28601 )	2024-12-07 00:13:04 +00:00
Erick Friis	07c2ac765a	community: release 0.3.10 (#28600 )	2024-12-07 00:07:13 +00:00
Erick Friis	4a7dc6ec4c	standard-tests: release 0.3.6 (#28599 )	2024-12-07 00:05:04 +00:00
ccurme	80a88f8f04	tests[patch]: update API ref for chat models (#28594 )	2024-12-06 19:00:14 -05:00
Erick Friis	0eb7ab65f1	multiple: fix xfailed signatures (#28597 )	2024-12-06 15:39:47 -08:00
Erick Friis	b7c2029e84	standard-tests: root docstrings (#28595 )	2024-12-06 15:14:52 -08:00
Erick Friis	9e2abcd152	standard-tests: show right classes in api docs (#28591 )	2024-12-06 14:48:13 -08:00
Erick Friis	246c10a1cc	standard-tests: private members and tools unit troubleshoot (#28590 )	2024-12-06 13:52:58 -08:00
Erick Friis	e6663b69f3	langchain: release 0.3.10 (#28585 )	2024-12-06 20:20:24 +00:00
Erick Friis	c38b845d7e	core: fix path test (#28584 )	2024-12-06 20:05:18 +00:00
ccurme	2c6bc74cb1	multiple: combine sync/async vector store standard test suites (#28580 ) Breaking change in `langchain-tests`.	2024-12-06 14:55:06 -05:00
Bagatur	dda9f90047	core[patch]: Release 0.3.22 (#28582 )	2024-12-06 19:36:53 +00:00
ccurme	f3dc142d3c	cli[patch]: implement minimal starter vector store (#28577 ) Basically the same as core's in-memory vector store. Removed some optional methods.	2024-12-06 13:10:22 -05:00
Erick Friis	5277a021c1	docs: raw loader codeblock (#28548 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-06 09:26:34 -08:00
Erick Friis	18386c16c7	core, tests: more tolerant _aget_relevant_documents function (#28462 )	2024-12-06 00:49:30 +00:00

1 2 3 4 5 ...

6328 Commits