langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-06-22 06:39:52 +00:00

Author	SHA1	Message	Date
Erick Friis	be738aa7de	packages: enable vertex api build (#28773 )	2024-12-17 11:31:14 -08:00
Bagatur	ac278cbe8b	core[patch]: export InjectedToolCallId (#28772 )	2024-12-17 19:29:20 +00:00
Bagatur	e4d3ccf62f	json mode standard test (#25497 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-17 18:47:34 +00:00
Frank Dai	e81433497b	community: support Confluence cookies (#28760 ) Description: Some confluence instances don't support personal access token, then cookie is a convenient way to authenticate. This PR adds support for Confluence cookies. Twitter handle: soulmachine	2024-12-17 12:16:36 -05:00
ccurme	b745281eec	anthropic[patch]: increase timeouts for integration tests (#28767 ) Some tests consistently ran into the 10s limit in CI.	2024-12-17 15:47:17 +00:00
Vinit Kudva	a00258ec12	chroma: fix persistence if client_settings is passed in (#25199 ) …ent path given. Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-17 10:03:02 -05:00
Omri Eliyahu Levy	f8883a1321	partners/voyageai: enable setting output dimension (#28740 ) Voyage has introduced voyage-3-large and voyage-code-3, which feature different output dimensions by leveraging a technique called "Matryoshka Embeddings" (see blog - https://blog.voyageai.com/2024/12/04/voyage-code-3/). These two models are available in various sizes: [256, 512, 1024, 2048] (https://docs.voyageai.com/docs/embeddings#model-choices). This PR adds the option to set the required output dimension.	2024-12-17 10:02:00 -05:00
German Martin	3a1d05394d	community: Apache AGE wrapper. Ensure Node Uniqueness by ID. (#28759 ) Description: The Apache AGE graph integration incorrectly handled node merging, allowing duplicate nodes with different IDs but the same type and other properties. Unlike [Neo4j](`cdf6202156/libs/community/langchain_community/graphs/neo4j_graph.py (L47)`), [Memgraph](`cdf6202156/libs/community/langchain_community/graphs/memgraph_graph.py (L50)`), [Kuzu](`cdf6202156/libs/community/langchain_community/graphs/kuzu_graph.py (L253)`), and [Gremlin](`cdf6202156/libs/community/langchain_community/graphs/gremlin_graph.py (L165)`), it did not use the node ID as the primary identifier for merging. This inconsistency caused data integrity issues and unexpected behavior when users expected updates to specific nodes by ID. Solution: This PR modifies the `node_insert_query` to `MERGE` nodes based on label and ID only and updates properties with `SET`, aligning the behavior with other graph database integrations. The `_format_properties` method was also modified to handle id overrides. Impact: This fix ensures data integrity by preventing duplicate nodes, and provides a consistent behavior across graph database integrations.	2024-12-17 09:21:59 -05:00
gsa9989	cdf6202156	cosmosdbnosql: Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook (#24424 ) * Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook --------- Co-authored-by: Aayush Kataria <aayushkataria3011@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 21:57:05 -05:00
Brian Burgin	27a9056725	community: Fix ChatLiteLLMRouter runtime issues (#28163 ) Description: Fix ChatLiteLLMRouter ctor validation and model_name parameter Issue: #19356, #27455, #28077 Twitter handle: @bburgin_0	2024-12-16 18:17:39 -05:00
Mikhail Khludnev	00deacc67e	docs, external: introduce `langchain-localai` (#28751 ) Thank you for contributing to LangChain! Referring to https://github.com/mkhludnev/langchain-localai --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 22:22:37 +00:00
Erick Friis	d4b5e7ef22	community: recommend RedisVectorStore over Redis (#28749 )	2024-12-16 21:08:30 +00:00
Hiros	8f5e72de05	community: Correctly handle multi-element rich text (#25762 ) Description: - Add _concatenate_rich_text method to combine all elements in rich text arrays - Update load_page method to use _concatenate_rich_text for rich text properties - Ensure all text content is captured, including inline code and formatted text - Add unit tests to verify correct handling of multi-element rich text This fix prevents truncation of content after backticks or other formatting elements. Issue: Using Notion DB Loader, the text for `richtext` and `title` is truncated after 1st element was loaded as Notion Loader only read the first element. Dependencies: any dependencies required for this change None. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 20:20:27 +00:00
Antonio Lanza	b2102b8cc4	text-splitters: Inconsistent results with `NLTKTextSplitter`'s `add_start_index=True` (#27782 ) This PR closes #27781 # Problem The current implementation of `NLTKTextSplitter` is using `sent_tokenize`. However, this `sent_tokenize` doesn't handle chars between 2 tokenized sentences... hence, this behavior throws errors when we are using `add_start_index=True`, as described in issue #27781. In particular: ```python from nltk.tokenize import sent_tokenize output1 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english") print(output1) output2 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english") print(output2) >>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.'] >>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.'] ``` # Solution With this new `use_span_tokenize` parameter, we can use NLTK to create sentences (with `span_tokenize`), but also add extra chars to be sure that we still can map the chunks to the original text. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-12-16 19:53:15 +00:00
Tari Yekorogha	d262d41cc0	community: added FalkorDB vector store support i.e implementation, test, docs an… (#26245 ) Description: Added support for FalkorDB Vector Store, including its implementation, unit tests, documentation, and an example notebook. The FalkorDB integration allows users to efficiently manage and query embeddings in a vector database, with relevance scoring and maximal marginal relevance search. The following components were implemented: - Core implementation for FalkorDBVector store. - Unit tests ensuring proper functionality and edge case coverage. - Example notebook demonstrating an end-to-end setup, search, and retrieval using FalkorDB. Twitter handle: @tariyekorogha --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 19:37:55 +00:00
Aaron Pham	12fced13f4	chore(community): update to OpenLLM 0.6 (#24609 ) Update to OpenLLM 0.6, which we decides to make use of OpenLLM's OpenAI-compatible endpoint. Thus, OpenLLM will now just become a thin wrapper around OpenAI wrapper. Signed-off-by: Aaron Pham <contact@aarnphm.xyz> --------- Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-16 14:30:07 -05:00
Lvlvko	5c17a4ace9	community: support Hunyuan Embedding (#23160 ) ## description - I refactor `Chathunyuan` using tencentcloud sdk because I found the original one can't work in my application - I add `HunyuanEmbeddings` using tencentcloud sdk - Both of them are extend the basic class of langchain. I have fully tested them in my application ## Dependencies - tencentcloud-sdk-python --------- Co-authored-by: centonhuang <centonhuang@tencent.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 19:27:19 +00:00
Harrison Chase	de7996c2ca	core: add kwargs support to VectorStore (#25934 ) has been missing the passthrough until now --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 18:57:57 +00:00
Lorenzo	b79a1156ed	community: correct return type of get_files_from_directory in github tool (#27885 ) ### About: - Description: the _get_files_from_directory_ method return a string, but it's used in other methods that expect a List[str] - Issue: None - Dependencies: None This pull request import a new method _list_files_ with the old logic of _get_files_from_directory_, but it return a List[str] at the end. The behavior of _ get_files_from_directory_ is not changed.	2024-12-16 10:30:33 -08:00
Sheepsta300	580a8d53f9	community: Add configurable `VisualFeatures` to the `AzureAiServicesImageAnalysisTool` (#27444 ) Thank you for contributing to LangChain! - [ ] PR title: community: Add configurable `VisualFeatures` to the `AzureAiServicesImageAnalysisTool` - [ ] PR message: - Description: The `AzureAiServicesImageAnalysisTool` is a good service and utilises the Azure AI Vision package under the hood. However, since the creation of this tool, new `VisualFeatures` have been added to allow the user to request other image specific information to be returned. Currently, the tool offers neither configuration of which features should be return nor does it offer any newer feature types. The aim of this PR is to address this and expose more of the Azure Service in this integration. - Dependencies: no new dependencies in the main class file, azure.ai.vision.imageanalysis added to extra test dependencies file. - [ ] Add tests and docs: If you're adding a new integration, please include 1. Although no tests exist for already implemented Azure Service tools, I've created 3 unit tests for this class that test initialisation and credentials, local file analysis and a test for the new changes/ features option. - [ ] Lint and test: All linting has passed. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 18:30:04 +00:00
Erick Friis	1c120e9615	core: xml output parser tags docstring (#28745 )	2024-12-16 18:25:16 +00:00
Ana	ebab2ea81b	Fix Azure National Cloud authentication using token (RBAC) (Generated by Ana - AI SDE) (#25843 ) This pull request addresses the issue with authenticating Azure National Cloud using token (RBAC) in the AzureSearch vectorstore implementation. ## Changes - Modified the `_get_search_client` method in `azuresearch.py` to pass `additional_search_client_options` to the `SearchIndexClient` instance. ## Implementation Details The patch updates the `SearchIndexClient` initialization to include the `additional_search_client_options` parameter: ```python index_client: SearchIndexClient = SearchIndexClient( endpoint=endpoint, credential=credential, user_agent=user_agent, **additional_search_client_options ) ``` This change allows the `audience` parameter to be correctly passed when using Azure National Cloud, fixing the authentication issues with GovCloud & RBAC. This patch was generated by [Ana - AI SDE](https://openana.ai/), an AI-powered software development assistant. This is a fix for [Issue 25823](https://github.com/langchain-ai/langchain/issues/25823) --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-16 18:22:24 +00:00
chenzimin	169d419581	community: Remove all other keys in ChatLiteLLM and add api_key (#28097 ) Thank you for contributing to LangChain! - PR title: "community: Remove all other keys in ChatLiteLLM and add api_key" - PR message: Currently, no api_key are passed to LiteLLM, and LiteLLM only takes on api_key parameter. Therefore I removed all current `*_api_key` attributes (They are not used), and added `api_key` that is passed to ChatLiteLLM. - Should fix issue #27826 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 17:54:29 +00:00
German Martin	d5d18c62b3	community: Apache AGE wrapper additional edge cases. (#28151 ) Description: Current AGEGraph() implementation does some custom wrapping for graph queries. The method here is _wrap_query() as it parse the field from the original query to add some SQL context to it. This improves the current parsing logic to cover additional edge cases that are added to the test coverage, basically if any Node property name or value has the "return" literal in it will break the graph / SQL query. We discovered this while dealing with real world datasets, is not an uncommon scenario and I think it needs to be covered.	2024-12-16 11:28:01 -05:00
Rock2z	768e4a7fd4	[community][fix] Compatibility support to bump up wikibase-rest-api-client version (#27316 ) Description: This PR addresses the `TypeError: sequence item 0: expected str instance, FluentValue found` error when invoking `WikidataQueryRun`. The root cause was an incompatible version of the `wikibase-rest-api-client`, which caused the tool to fail when handling `FluentValue` objects instead of strings. The current implementation only supports `wikibase-rest-api-client<0.2`, but the latest version is `0.2.1`, where the current implementation breaks. Additionally, the error message advises users to install the latest version: [code reference](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/wikidata.py#L125C25-L125C32). Therefore, this PR updates the tool to support the latest version of `wikibase-rest-api-client`. Key changes: - Updated the handling of `FluentValue` objects to ensure compatibility with the latest `wikibase-rest-api-client`. - Removed the restriction to `wikibase-rest-api-client<0.2` and updated to support the latest version (`0.2.1`). Issue: Fixes [#24093](https://github.com/langchain-ai/langchain/issues/24093) – `TypeError: sequence item 0: expected str instance, FluentValue found`. Dependencies: - Upgraded `wikibase-rest-api-client` to the latest version to resolve the issue. --------- Co-authored-by: peiwen_zhang <peiwen_zhang@email.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 16:22:18 +00:00
André Quintino	a26c786bc5	community: refactor opensearch query constructor to use wildcard instead of match in the contain comparator (#26653 ) - Description: Changed the comparator to use a wildcard query instead of match. This modification allows for partial text matching on analyzed fields, which improves the flexibility of the search by performing full-text searches that aren't limited to exact matches. - Issue: The previous implementation used a match query, which performs exact matches on analyzed fields. This approach limited the search capabilities by requiring the query terms to align with the indexed text. The modification to use a wildcard query instead addresses this limitation. The wildcard query allows for partial text matching, which means the search can return results even if only a portion of the term matches the text. This makes the search more flexible and suitable for use cases where exact matches aren't necessary or expected, enabling broader full-text searches across analyzed fields. In short, the problem was that match queries were too restrictive, and the change to wildcard queries enhances the ability to perform partial matches. - Dependencies: none - Twitter handle: @Andre_Q_Pereira --------- Co-authored-by: André Quintino <andre.quintino@tui.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 11:16:34 -05:00
Davi Schumacher	0f9b4bf244	community[patch]: update dynamodb chat history to update instead of overwrite (#22397 ) Description: The current implementation of `DynamoDBChatMessageHistory` updates the `History` attribute for a given chat history record by first extracting the existing contents into memory, appending the new message, and then using the `put_item` method to put the record back. This has the effect of overwriting any additional attributes someone may want to include in the record, like chat session metadata. This PR suggests changing from using `put_item` to using `update_item` instead which will keep any other attributes in the record untouched. The change is backward compatible since 1. `update_item` is an "upsert" operation, creating the record if it doesn't already exist, otherwise updating it 2. It only touches the db insert call and passes the exact same information. The rest of the class is left untouched Dependencies: None Tests and docs: No unit tests currently exist for the `DynamoDBChatMessageHistory` class. This PR adds the file `libs/community/tests/unit_tests/chat_message_histories/test_dynamodb_chat_message_history.py` to test the `add_message` and `clear` methods. I wanted to use the moto library to mock DynamoDB calls but I could not get poetry to resolve it so I mocked those calls myself in the test. Therefore, no test dependencies were added. The change was tested on a test DynamoDB table as well. The first three images below show the current behavior. First a message is added to chat history, then a value is inserted in the record in some other attribute, and finally another message is added to the record, destroying the other attribute. ![using_put_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/426acd62-fe29-42f4-b75f-863fb8b3fb21) ![using_put_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/f8a1c864-7114-4fe3-b487-d6f9252f8f92) ![using_put_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/8b691e08-755e-4877-8969-0e9769e5d28a) The next three images show the new behavior. Once again a value is added to an attribute other than the History attribute, but now when the followup message is added it does not destroy that other attribute. The History attribute itself is unaffected by this change. ![using_update_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/3e0d76ed-637e-41cd-82c7-01a86c468634) ![using_update_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/52585f9b-71a2-43f0-9dfc-9935aa59c729) ![using_update_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/f94c8147-2d6f-407a-9a0f-86b94341abff) The doc located at `docs/docs/integrations/memory/aws_dynamodb.ipynb` required no changes and was tested as well.	2024-12-16 10:38:00 -05:00
Christophe Bornet	6ddd5dbb1e	community: Add FewShotSQLTool (#28232 ) The `FewShotSQLTool` gets some SQL query examples from a `BaseExampleSelector` for a given question. This is useful to provide [few-shot examples](https://python.langchain.com/docs/how_to/sql_prompting/#few-shot-examples) capability to an SQL agent. Example usage: ```python from langchain.agents.agent_toolkits.sql.prompt import SQL_PREFIX embeddings = OpenAIEmbeddings() example_selector = SemanticSimilarityExampleSelector.from_examples( examples, embeddings, AstraDB, k=5, input_keys=["input"], collection_name="lc_few_shots", token=ASTRA_DB_APPLICATION_TOKEN, api_endpoint=ASTRA_DB_API_ENDPOINT, ) few_shot_sql_tool = FewShotSQLTool( example_selector=example_selector, description="Input to this tool is the input question, output is a few SQL query examples related to the input question. Always use this tool before checking the query with sql_db_query_checker!" ) agent = create_sql_agent( llm=llm, db=db, prefix=SQL_PREFIX + "\nYou MUST get some example queries before creating the query.", extra_tools=[few_shot_sql_tool] ) result = agent.invoke({"input": "How many artists are there?"}) ``` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 15:37:21 +00:00
Mohammad Mohtashim	8d746086ab	Added `bind_tools` support for `ChatMLX` along with small fix in `_stream` (#28743 ) - Description: Added Support for `bind_tool` as requested in the issue. Plus two issue in `_stream` were fixed: - Corrected the Positional Argument Passing for `generate_step` - Accountability if `token` returned by `generate_step` is integer. - Issue: #28692	2024-12-16 09:52:49 -05:00
Jorge Piedrahita Ortiz	558b65ea32	community: SamabaStudio Tool Calling and Structured Output (#28025 ) Description: Add tool calling and structured output support for SambaStudio chat models, docs included --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 06:15:19 +00:00
clairebehue	fb44e74ca4	community: fix AzureSearch Oauth with azure_ad_access_token (#26995 ) Description: AzureSearch vector store: create a wrapper class on `azure.core.credentials.TokenCredential` (which is not-instantiable) to fix Oauth usage with `azure_ad_access_token` argument Issue: [the issue it fixes](https://github.com/langchain-ai/langchain/issues/26216) Dependencies: None - [x] Lint and test --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 05:56:45 +00:00
SirSmokeAlot	29305cd948	community: O365Toolkit - send_event - fixed timezone error (#25876 ) Description: Fixed formatting start and end time Issue: The old formatting resulted everytime in an timezone error Dependencies: / Twitter handle: / --------- Co-authored-by: Yannick Opitz <yannick.opitz@gob.de> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 05:32:28 +00:00
Erick Friis	4f6ccb7080	text-splitters: extended-tests without socket (#28736 )	2024-12-16 05:19:50 +00:00
Erick Friis	8ec1c72e03	text-splitters: test without socket (#28732 )	2024-12-15 22:10:35 +00:00
Aayush Kataria	d417e4b372	Community: Azure CosmosDB No Sql Vector Store: Full Text and Hybrid Search Support (#28716 ) Thank you for contributing to LangChain! - Added [full text](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/full-text-search) and [hybrid search](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search) support for Azure CosmosDB NoSql Vector Store - Added a new enum called CosmosDBQueryType which supports the following values: - VECTOR = "vector" - FULL_TEXT_SEARCH = "full_text_search" - FULL_TEXT_RANK = "full_text_rank" - HYBRID = "hybrid" - User now needs to provide this query_type to the similarity_search method for the vectorStore to make the correct query api call. - Added a couple of work arounds as for the FULL_TEXT_RANK and HYBRID query functions we don't support parameterized queries right now. I have added TODO's in place, and will remove these work arounds by end of January. - Added necessary test cases and updated the - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-12-15 13:26:32 -08:00
Mohammad Mohtashim	4c1871d9a8	community: Passing the `model_kwargs` correctly while maintaing backward compatability (#28439 ) - Description: `Model_Kwargs` was not being passed correctly to `sentence_transformers.SentenceTransformer` which has been corrected while maintaing backward compatability - Issue: #28436 --------- Co-authored-by: MoosaTae <sadhis.tae@gmail.com> Co-authored-by: Sadit Wongprayon <101176694+MoosaTae@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-15 20:34:29 +00:00
nhols	a3851cb3bc	community: FAISS vectorstore - consistent Document id field (#28728 ) make sure id field of Documents in `FAISS` docstore have the same id as values in `index_to_docstore_id`, implement `get_by_ids` method	2024-12-15 12:23:49 -08:00
Bagatur	a0534ae62a	community[patch]: Release 0.3.12 (#28725 )	2024-12-14 22:13:20 +00:00
Bagatur	089e659e03	langchain[patch]: Release 0.3.12 (#28724 )	2024-12-14 20:02:18 +00:00
Bagatur	679e3a9970	text-splitters[patch]: Release 0.3.3 (#28723 )	2024-12-14 19:20:22 +00:00
Erick Friis	387284c259	core: release 0.3.25 (#28718 )	2024-12-14 02:22:28 +00:00
Nawaf Alharbi	decd77c515	community: fix an issue with deepinfra integration (#28715 ) Thank you for contributing to LangChain! - [x] PR title: langchain: add URL parameter to ChatDeepInfra class - [x] PR message: add URL parameter to ChatDeepInfra class - Description: This PR introduces a url parameter to the ChatDeepInfra class in LangChain, allowing users to specify a custom URL. Previously, the URL for the DeepInfra API was hardcoded to "https://stage.api.deepinfra.com/v1/openai/chat/completions", which caused issues when the staging endpoint was not functional. The _url method was updated to return the value from the url parameter, enabling greater flexibility and addressing the problem. out! --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-14 02:15:29 +00:00
Ben Chambers	008efada2c	[community]: Render documents to graphviz (#24830 ) - Description: Adds a helper that renders documents with the GraphVectorStore metadata fields to Graphviz for visualization. This is helpful for understanding and debugging. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-14 02:02:09 +00:00
Erick Friis	288f204758	docs, community: aerospike docs update (#28717 ) Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com> Co-authored-by: Jesse S <jschmidt@aerospike.com> Co-authored-by: dylan <dwelch@aerospike.com>	2024-12-14 00:27:37 +00:00
Vimpas	337fed80a5	community: 🐛 PDF Filter Type Error (#27154 ) Thank you for contributing to LangChain! PR title: "community: fix PDF Filter Type Error" - Description: fix PDF Filter Type Error" - Issue: the issue #27153 it fixes, - Dependencies: no - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 23:30:29 +00:00
Ryan Parker	12111cb922	community: fallback on core async atransform_documents method for `MarkdownifyTransformer` (#27866 ) # Description Implements the `atransform_documents` method for `MarkdownifyTransformer` using the `asyncio` built-in library for concurrency. Note that this is mainly for API completeness when working with async frameworks rather than for performance, since the `markdownify` function is not I/O bound because it works with `Document` objects already in memory. # Issue Fixes #27865 # Dependencies No new dependencies added, but [`markdownify`](https://github.com/matthewwithanm/python-markdownify) is required since this PR updates the `markdownify` integration. # Tests and docs - Tests added - I did not modify the docstrings since they already described the basic functionality, and [the API docs also already included a description](https://python.langchain.com/api_reference/community/document_transformers/langchain_community.document_transformers.markdownify.MarkdownifyTransformer.html#langchain_community.document_transformers.markdownify.MarkdownifyTransformer.atransform_documents). If it would be helpful, I would be happy to update the docstrings and/or the API docs. # Lint and test - [x] format - [x] lint - [x] test I ran formatting with `make format`, linting with `make lint`, and confirmed that tests pass using `make test`. Note that some unit tests pass in CI but may fail when running `make_test`. Those unit tests are: - `test_extract_html` (and `test_extract_html_async`) - `test_strip_tags` (and `test_strip_tags_async`) - `test_convert_tags` (and `test_convert_tags_async`) The reason for the difference is that there are trailing spaces when the tests are run in the CI checks, and no trailing spaces when run with `make test`. I ensured that the tests pass in CI, but they may fail with `make test` due to the addition of trailing spaces. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 22:32:22 +00:00
Manuel	af2e0a7ede	partners: add 'model' alias for consistency in embedding classes (#28374 ) Description: This PR introduces a `model` alias for the embedding classes that contain the attribute `model_name`, to ensure consistency across the codebase, as suggested by a moderator in a previous PR. The change aligns the usage of attribute names across the project (see for example [here](`65deeddd5d/libs/partners/groq/langchain_groq/chat_models.py (L304)`)). Issue: This PR addresses the suggestion from the review of issue #28269. Dependencies: None --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 22:30:00 +00:00
Erick Friis	3107d78517	huggingface: fix standard test lint (#28714 )	2024-12-13 22:18:54 +00:00
Kaiwei Zhang	b909d54e70	chroma[patch]: Update logic for assigning ids	2024-12-13 21:58:34 +00:00
Karthik Bharadhwaj	498f0249e2	community[minor]: Opensearch hybridsearch implementation (#25375 ) community: add hybrid search in opensearch # Langchain OpenSearch Hybrid Search Implementation ## Implementation of Hybrid Search: I have taken LangChain's OpenSearch integration to the next level by adding hybrid search capabilities. Building on the existing OpenSearchVectorSearch class, I have implemented Hybrid Search functionality (which combines the best of both keyword and semantic search). This new functionality allows users to harness the power of OpenSearch's advanced hybrid search features without leaving the familiar LangChain ecosystem. By blending traditional text matching with vector-based similarity, the enhanced class delivers more accurate and contextually relevant results. It's designed to seamlessly fit into existing LangChain workflows, making it easy for developers to upgrade their search capabilities. In implementing the hybrid search for OpenSearch within the LangChain framework, I also incorporated filtering capabilities. It's important to note that according to the OpenSearch hybrid search documentation, only post-filtering is supported for hybrid queries. This means that the filtering is applied after the hybrid search results are obtained, rather than during the initial search process. Note: For the implementation of hybrid search, I strictly followed the official OpenSearch Hybrid search documentation and I took inspiration from https://github.com/AndreasThinks/langchain/tree/feature/opensearch_hybrid_search Thanks Mate! ### Experiments I conducted few experiments to verify that the hybrid search implementation is accurate and capable of reproducing the results of both plain keyword search and vector search. Experiment - 1 Hybrid Search Keyword_weight: 1, vector_weight: 0 I conducted an experiment to verify the accuracy of my hybrid search implementation by comparing it to a plain keyword search. For this test, I set the keyword_weight to 1 and the vector_weight to 0 in the hybrid search, effectively giving full weightage to the keyword component. The results from this hybrid search configuration matched those of a plain keyword search, confirming that my implementation can accurately reproduce keyword-only search results when needed. It's important to note that while the results were the same, the scores differed between the two methods. This difference is expected because the plain keyword search in OpenSearch uses the BM25 algorithm for scoring, whereas the hybrid search still performs both keyword and vector searches before normalizing the scores, even when the vector component is given zero weight. This experiment validates that my hybrid search solution correctly handles the keyword search component and properly applies the weighting system, demonstrating its accuracy and flexibility in emulating different search scenarios. Experiment - 2 Hybrid Search keyword_weight = 0.0, vector_weight = 1.0 For experiment-2, I took the inverse approach to further validate my hybrid search implementation. I set the keyword_weight to 0 and the vector_weight to 1, effectively giving full weightage to the vector search component (KNN search). I then compared these results with a pure vector search. The outcome was consistent with my expectations: the results from the hybrid search with these settings exactly matched those from a standalone vector search. This confirms that my implementation accurately reproduces vector search results when configured to do so. As with the first experiment, I observed that while the results were identical, the scores differed between the two methods. This difference in scoring is expected and can be attributed to the normalization process in hybrid search, which still considers both components even when one is given zero weight. This experiment further validates the accuracy and flexibility of my hybrid search solution, demonstrating its ability to effectively emulate pure vector search when needed while maintaining the underlying hybrid search structure. Experiment - 3 Hybrid Search - balanced keyword_weight = 0.5, vector_weight = 0.5 For experiment-3, I adopted a balanced approach to further evaluate the effectiveness of my hybrid search implementation. In this test, I set both the keyword_weight and vector_weight to 0.5, giving equal importance to keyword-based and vector-based search components. This configuration aims to leverage the strengths of both search methods simultaneously. By setting both weights to 0.5, I intended to create a scenario where the hybrid search would consider lexical matches and semantic similarity equally. This balanced approach is often ideal for many real-world applications, as it can capture both exact keyword matches and contextually relevant results that might not contain the exact search terms. Kindly verify the notebook for the experiments conducted! Notebook: https://github.com/karthikbharadhwajKB/Langchain_OpenSearch_Hybrid_search/blob/main/Opensearch_Hybridsearch.ipynb ### Instructions to follow for Performing Hybrid Search: Step-1: Instantiating OpenSearchVectorSearch Class: ```python opensearch_vectorstore = OpenSearchVectorSearch( index_name=os.getenv("INDEX_NAME"), embedding_function=embedding_model, opensearch_url=os.getenv("OPENSEARCH_URL"), http_auth=(os.getenv("OPENSEARCH_USERNAME"),os.getenv("OPENSEARCH_PASSWORD")), use_ssl=False, verify_certs=False, ssl_assert_hostname=False, ssl_show_warn=False ) ``` Parameters: 1. index_name: The name of the OpenSearch index to use. 2. embedding_function: The function or model used to generate embeddings for the documents. It's assumed that embedding_model is defined elsewhere in the code. 3. opensearch_url: The URL of the OpenSearch instance. 4. http_auth: A tuple containing the username and password for authentication. 5. use_ssl: Set to False, indicating that the connection to OpenSearch is not using SSL/TLS encryption. 6. verify_certs: Set to False, which means the SSL certificates are not being verified. This is often used in development environments but is not recommended for production. 7. ssl_assert_hostname: Set to False, disabling hostname verification in SSL certificates. 8. ssl_show_warn: Set to False, suppressing SSL-related warnings. Step-2: Configure Search Pipeline: To initiate hybrid search functionality, you need to configures a search pipeline first. Implementation Details: This method configures a search pipeline in OpenSearch that: 1. Normalizes the scores from both keyword and vector searches using the min-max technique. 2. Applies the specified weights to the normalized scores. 3. Calculates the final score using an arithmetic mean of the weighted, normalized scores. Parameters: * pipeline_name (str): A unique identifier for the search pipeline. It's recommended to use a descriptive name that indicates the weights used for keyword and vector searches. * keyword_weight (float): The weight assigned to the keyword search component. This should be a float value between 0 and 1. In this example, 0.3 gives 30% importance to traditional text matching. * vector_weight (float): The weight assigned to the vector search component. This should be a float value between 0 and 1. In this example, 0.7 gives 70% importance to semantic similarity. ```python opensearch_vectorstore.configure_search_pipelines( pipeline_name="search_pipeline_keyword_0.3_vector_0.7", keyword_weight=0.3, vector_weight=0.7, ) ``` Step-3: Performing Hybrid Search: After creating the search pipeline, you can perform a hybrid search using the `similarity_search()` method (or) any methods that are supported by `langchain`. This method combines both `keyword-based and semantic similarity` searches on your OpenSearch index, leveraging the strengths of both traditional information retrieval and vector embedding techniques. parameters: * query: The search query string. * k: The number of top results to return (in this case, 3). * search_type: Set to `hybrid_search` to use both keyword and vector search capabilities. * search_pipeline: The name of the previously created search pipeline. ```python query = "what are the country named in our database?" top_k = 3 pipeline_name = "search_pipeline_keyword_0.3_vector_0.7" matched_docs = opensearch_vectorstore.similarity_search_with_score( query=query, k=top_k, search_type="hybrid_search", search_pipeline = pipeline_name ) matched_docs ``` twitter handle: @iamkarthik98 --------- Co-authored-by: Karthik Kolluri <karthik.kolluri@eidosmedia.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 16:34:12 -05:00
Philippe PRADOS	f3fb5a9c68	community[minor]: Fix json._validate_metadata_func() (#22842 ) JSONparse, in _validate_metadata_func(), checks the consistency of the _metadata_func() function. To do this, it invokes it and makes sure it receives a dictionary in response. However, during the call, it does not respect future calls, as shown on line 100. This generates errors if, for example, the function is like this: ```python def generate_metadata(json_node:Dict[str,Any],kwargs:Dict[str,Any]) -> Dict[str,Any]: return { "source": url, "row": kwargs['seq_num'], "question":json_node.get("question"), } loader = JSONLoader( file_path=file_path, content_key="answer", jq_schema='.[]', metadata_func=generate_metadata, text_content=False) ``` To avoid this, the verification must comply with the specifications. This patch does just that. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 21:24:20 +00:00
Keiichi Hirobe	67fd554512	core[patch]: throw exception indexing code if deletion fails in vectorstore (#28103 ) The delete methods in the VectorStore and DocumentIndex interfaces return a status indicating the result. Therefore, we can assume that their implementations don't throw exceptions but instead return a result indicating whether the delete operations have failed. The current implementation doesn't check the returned value, so I modified it to throw an exception when the operation fails. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 16:14:27 -05:00
Keiichi Hirobe	258b3be5ec	core[minor]: add new clean up strategy "scoped_full" to indexing (#28505 ) ~Note that this PR is now Draft, so I didn't add change to `aindex` function and didn't add test codes for my change. After we have an agreement on the direction, I will add commits.~ `batch_size` is very difficult to decide because setting a large number like >10000 will impact VectorDB and RecordManager, while setting a small number will delete records unnecessarily, leading to redundant work, as the `IMPORTANT` section says. On the other hand, we can't use `full` because the loader returns just a subset of the dataset in our use case. I guess many people are in the same situation as us. So, as one of the possible solutions for it, I would like to introduce a new argument, `scoped_full_cleanup`. This argument will be valid only when `claneup` is Full. If True, Full cleanup deletes all documents that haven't been updated AND that are associated with source ids that were seen during indexing. Default is False. This change keeps backward compatibility. --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 20:35:25 +00:00
Eugene Yurtsev	ce90b25313	core[patch]: Update error message in indexing code for unreachable code assertion (#28712 ) Minor update for error message that should never be triggered	2024-12-13 20:21:14 +00:00
Keiichi Hirobe	da28cf1f54	core[patch]: Reverts PR #25754 and add unit tests (#28702 ) I reported the bug 2 weeks ago here: https://github.com/langchain-ai/langchain/issues/28447 I believe this is a critical bug for the indexer, so I submitted a PR to revert the change and added unit tests to prevent similar bugs from being introduced in the future. @eyurtsev Could you check this?	2024-12-13 15:13:06 -05:00
ScriptShi	b0a298894d	community[minor]: Add TablestoreVectorStore (#25767 ) Thank you for contributing to LangChain! - [x] PR title: community: add TablestoreVectorStore - [x] PR message: - Description: add TablestoreVectorStore - Dependencies: none - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration: yes 2. an example notebook showing its use: yes If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-12-13 11:17:28 -08:00
Erick Friis	86b3c6e81c	community: make old stub for QuerySQLDataBaseTool private to skip api ref (#28711 )	2024-12-13 10:43:23 -08:00
Martin Triska	05ebe1e66b	Community: add `modified_since` argument to `O365BaseLoader` (#28708 ) ## What are we doing in this PR We're adding `modified_since` optional argument to `O365BaseLoader`. When set, O365 loader will only load documents newer than `modified_since` datetime. ## Why? OneDrives / Sharepoints can contain large number of documents. Current approach is to download and parse all files and let indexer to deal with duplicates. This can be prohibitively time-consuming. Especially when using OCR-based parser like [zerox](`fa06188834/libs/community/langchain_community/document_loaders/pdf.py (L948)`). This argument allows to skip documents that are older than known time of indexing. _Q: What if a file was modfied during last indexing process? A: Users can set the `modified_since` conservatively and indexer will still take care of duplicates._ If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 17:30:17 +00:00
Bagatur	fa06188834	community[patch]: fix QuerySQLDatabaseTool name (#28659 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-12 19:16:03 -08:00
Erick Friis	48ab91b520	docs: more useful vercel warnings (#28699 )	2024-12-13 03:07:24 +00:00
Michael Chin	28cb2cefc6	docs: Fix stack diagram in community README (#28685 ) - Description: The stack diagram illustration in the community README fails to render due to an invalid branch reference. This PR replaces the broken image link with a valid one referencing master branch.	2024-12-12 13:33:50 -08:00
Botong Zhu	13c3c4a210	community: fixes json loader not getting texts with json standard (#27327 ) This PR fixes JSONLoader._get_text not converting objects to json string correctly. If an object is serializable and is not a dict, JSONLoader will use python built-in str() method to convert it to string. This may cause object converted to strings not following json standard. For example, a list will be converted to string with single quotes, and if json.loads try to load this string, it will cause error. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 19:33:45 +00:00
Lorenzo	4149c0dd8d	community: add method to create branch and list files for gitlab tool (#27883 ) ### About - Description: In the Gitlab utilities used for the Gitlab tool there are no methods to create branches, list branches and files, as this is already done for Github - Issue: None - Dependencies: None This Pull request add the methods: - create_branch - list_branches_in_repo - set_active_branch - list_files_in_main_branch - list_files_in_bot_branch - list_files_from_directory --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 19:11:35 +00:00
Prathamesh Nimkar	ca054ed1b1	community: ChatSnowflakeCortex - Add streaming functionality (#27753 ) Description: snowflake.py Add _stream and _stream_content methods to enable streaming functionality fix pydantic issues and added functionality with the overall langchain version upgrade added bind_tools method for agentic workflows support through langgraph updated the _generate method to account for agentic workflows support through langgraph cosmetic changes to comments and if conditions snowflake.ipynb Added _stream example cosmetic changes to comments fixed lint errors check_pydantic.sh Decreased counter from 126 to 125 as suggested when formatting --------- Co-authored-by: Prathamesh Nimkar <prathamesh.nimkar@snowflake.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 18:35:40 -08:00
Wang, Yi	d834c6b618	huggingface: fix tool argument serialization in _convert_TGI_message_to_LC_message (#26075 ) Currently `_convert_TGI_message_to_LC_message` replaces `'` in the tool arguments, so an argument like "It's" will be converted to `It"s` and could cause a json parser to fail. --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-12-11 18:34:32 -08:00
Lakindu Boteju	5a31792bf1	community: Add support for cross-region inference profile IDs in Bedrock Anthropic Claude token cost calculation (#28167 ) This change modifies the token cost calculation logic to support cross-region inference profile IDs for Anthropic Claude models. Instead of explicitly listing all regional variants of new inference profile IDs in the cost dictionaries, the code now extracts a base model ID from the input model ID (or inference profile ID), making it more maintainable and automatically supporting new regional variants. These inference profile IDs follow the format: `<region>.<vendor>.<model-name>` (e.g., `us.anthropic.claude-3-haiku-xxx`, `eu.anthropic.claude-3-sonnet-xxx`). Cross-region inference profiles are system-defined identifiers that enable distributing model inference requests across multiple AWS regions. They help manage unplanned traffic bursts and enhance resilience during peak demands without additional routing costs. References for Amazon Bedrock's cross-region inference profiles:- - https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html - https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 02:33:50 +00:00
fatmelon	d1e0ec7b55	community: VectorStores: Azure Cosmos DB Mongo vCore with DiskANN (#27329 ) # Description Add a new vector index type `diskann` to Azure Cosmos DB Mongo vCore vector store. Paper of DiskANN can be found here [DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf). ## Sample Usage ```python from pymongo import MongoClient # INDEX_NAME = "izzy-test-index-2" # NAMESPACE = "izzy_test_db.izzy_test_collection" # DB_NAME, COLLECTION_NAME = NAMESPACE.split(".") client: MongoClient = MongoClient(CONNECTION_STRING) collection = client[DB_NAME][COLLECTION_NAME] model_deployment = os.getenv( "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada" ) model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002") vectorstore = AzureCosmosDBVectorSearch.from_documents( docs, openai_embeddings, collection=collection, index_name=INDEX_NAME, ) # Read more about these variables in detail here. https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search maxDegree = 40 dimensions = 1536 similarity_algorithm = CosmosDBSimilarityType.COS kind = CosmosDBVectorSearchType.VECTOR_DISKANN lBuild = 20 vectorstore.create_index( dimensions=dimensions, similarity=similarity_algorithm, kind=kind , max_degree=maxDegree, l_build=lBuild, ) ``` ## Dependencies No additional dependencies were added --------- Co-authored-by: Yang Qiao (from Dev Box) <yangqiao@microsoft.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 01:54:04 +00:00
manukychen	ba9b95cd23	Community: Adding bulk_size as a setable param for OpenSearchVectorSearch (#28325 ) Description: When using langchain.retrievers.parent_document_retriever.py with vectorstore is OpenSearchVectorSearch, I found that the bulk_size param I passed into OpenSearchVectorSearch class did not work on my ParentDocumentRetriever.add_documents() function correctly, it will be overwrite with int 500 the function which OpenSearchVectorSearch class had (e.g., add_texts(), add_embeddings()...). So I made this PR requset to fix this, thanks! --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 01:45:22 +00:00
xintoteai	45f9c9ae88	langchain: fixed weaviate (v4) vectorstore import for self-query retriever (#28675 ) Co-authored-by: Xin Heng <xin.heng@gmail.com>	2024-12-11 15:53:41 -08:00
Thomas van Dongen	ee640d6bd3	community: fixed bug in model2vec embedding code (#28670 ) This PR fixes a bug with the current implementation for Model2Vec embeddings where `embed_documents` does not work as expected. - Description: the current implementation uses `encode_as_sequence` for encoding documents. This is incorrect, as `encode_as_sequence` creates token embeddings and not mean embeddings. The normal `encode` function handles both single and batched inputs and should be used instead. The return type was also incorrect, as encode returns a NumPy array. This PR converts the embedding to a list so that the output is consistent with the Embeddings ABC.	2024-12-11 15:50:56 -08:00
Brian Sharon	b20230c800	community: use correct `id_key` when deleting by id in LanceDB wrapper (#28655 ) - Description: The current version of the `delete` method assumes that the id field will always be called `id`. - Issue: n/a - Dependencies: n/a - Twitter handle: ugh, Twitter :D --- Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 23:49:35 +00:00
Mohammad Mohtashim	fa155a422f	[Community]: `requests_kwargs` not being used in _fetch (#28646 ) - Description: `requests_kwargs` is not being passed to `_fetch` which is fetching pages asynchronously. In this PR, making sure that we are passing `requests_kwargs` to `_fetch` just like `_scrape`. - Issue: #28634 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 23:46:54 +00:00
Mohammad Mohtashim	a37afbe353	mistral[minor]: Added Retrying Mechanism in case of Request Rate Limit Error for `MistralAIEmbeddings` (#27818 ) - Description:: In the event of a Rate Limit Error from the MistralAI server, the response JSON raises a KeyError. To address this, a simple retry mechanism has been implemented to handle cases where the request limit is exceeded. - Issue: #27790 --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-11 17:53:42 -05:00
Vincent Zhang	df5008fe55	community[minor]: FAISS Filter Function Enhancement with Advanced Query Operators (#28207 ) ## Description We are submitting as a team of four for a project. Other team members are @RuofanChen03, @LikeWang10067, @TANYAL77. This pull requests expands the filtering capabilities of the FAISS vectorstore by adding MongoDB-style query operators indicated as follows, while including comprehensive testing for the added functionality. - $eq (equals) - $neq (not equals) - $gt (greater than) - $lt (less than) - $gte (greater than or equal) - $lte (less than or equal) - $in (membership in list) - $nin (not in list) - $and (all conditions must match) - $or (any condition must match) - $not (negation of condition) ## Issue This closes https://github.com/langchain-ai/langchain/issues/26379. ## Sample Usage ```python import faiss import asyncio from langchain_community.vectorstores import FAISS from langchain.schema import Document from langchain_huggingface import HuggingFaceEmbeddings embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2") documents = [ Document(page_content="Process customer refund request", metadata={"schema_type": "financial", "handler_type": "refund",}), Document(page_content="Update customer shipping address", metadata={"schema_type": "customer", "handler_type": "update",}), Document(page_content="Process payment transaction", metadata={"schema_type": "financial", "handler_type": "payment",}), Document(page_content="Handle customer complaint", metadata={"schema_type": "customer","handler_type": "complaint",}), Document(page_content="Process invoice payment", metadata={"schema_type": "financial","handler_type": "payment",}) ] async def search(vectorstore, query, schema_type, handler_type, k=2): schema_filter = {"schema_type": {"$eq": schema_type}} handler_filter = {"handler_type": {"$eq": handler_type}} combined_filter = { "$and": [ schema_filter, handler_filter, ] } base_retriever = vectorstore.as_retriever( search_kwargs={"k":k, "filter":combined_filter} ) return await base_retriever.ainvoke(query) async def main(): vectorstore = FAISS.from_texts( texts=[doc.page_content for doc in documents], embedding=embeddings, metadatas=[doc.metadata for doc in documents] ) def printt(title, documents): print(title) if not documents: print("\tNo documents found.") return for doc in documents: print(f"\t{doc.page_content}. {doc.metadata}") printt("Documents:", documents) printt('\nquery="process payment", schema_type="financial", handler_type="payment":', await search(vectorstore, query="process payment", schema_type="financial", handler_type="payment", k=2)) printt('\nquery="customer update", schema_type="customer", handler_type="update":', await search(vectorstore, query="customer update", schema_type="customer", handler_type="update", k=2)) printt('\nquery="refund process", schema_type="financial", handler_type="refund":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="refund", k=2)) printt('\nquery="refund process", schema_type="financial", handler_type="foobar":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="foobar", k=2)) print() if __name__ == "__main__":asyncio.run(main()) ``` ## Output ``` Documents: Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'} Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'} Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'} Handle customer complaint. {'schema_type': 'customer', 'handler_type': 'complaint'} Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'} query="process payment", schema_type="financial", handler_type="payment": Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'} Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'} query="customer update", schema_type="customer", handler_type="update": Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'} query="refund process", schema_type="financial", handler_type="refund": Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'} query="refund process", schema_type="financial", handler_type="foobar": No documents found. ``` --------- Co-authored-by: ruofan chen <ruofan.is.awesome@gmail.com> Co-authored-by: RickyCowboy <like.wang@mail.utoronto.ca> Co-authored-by: Shanni Li <tanya.li@mail.utoronto.ca> Co-authored-by: RuofanChen03 <114096642+ruofanchen03@users.noreply.github.com> Co-authored-by: Like Wang <102838708+likewang10067@users.noreply.github.com>	2024-12-11 17:52:22 -05:00
like	3048a9a26d	community: tongyi multimodal response format fix to support langchain (#28645 ) Description: The multimodal(tongyi) response format "message": {"role": "assistant", "content": [{"text": "图像"}]}}]} is not compatible with LangChain. Dependencies: No --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 21:13:26 +00:00
Bagatur	d0e662e43b	community[patch]: Release 0.3.11 (#28658 )	2024-12-10 20:51:13 +00:00
Bagatur	91227ad7fd	langchain[patch]: Release 0.3.11 (#28657 )	2024-12-10 12:28:14 -08:00
Bagatur	1fbd86a155	core[patch]: Release 0.3.24 (#28656 )	2024-12-10 20:19:21 +00:00
Bagatur	e6a62d8422	core,langchain,community[patch]: allow langsmith 0.2 (#28598 )	2024-12-10 18:50:58 +00:00
ccurme	bc4dc7f4b1	ollama[patch]: permit streaming for tool calls (#28654 ) Resolves https://github.com/langchain-ai/langchain/issues/28543 Ollama recently [released](https://github.com/ollama/ollama/releases/tag/v0.4.6) support for streaming tool calls. Previously we would override the `stream` parameter if tools were passed in. Covered in standard tests here: `c1d348e95d/libs/standard-tests/langchain_tests/integration_tests/chat_models.py (L893-L897)` Before, the test generates one message chunk: ```python [ AIMessageChunk( content='', additional_kwargs={}, response_metadata={ 'model': 'llama3.1', 'created_at': '2024-12-10T17:49:04.468487Z', 'done': True, 'done_reason': 'stop', 'total_duration': 525471208, 'load_duration': 19701000, 'prompt_eval_count': 170, 'prompt_eval_duration': 31000000, 'eval_count': 17, 'eval_duration': 473000000, 'message': Message( role='assistant', content='', images=None, tool_calls=[ ToolCall( function=Function(name='magic_function', arguments={'input': 3}) ) ] ) }, id='run-552bbe0f-8fb2-4105-ada1-fa38c1db444d', tool_calls=[ { 'name': 'magic_function', 'args': {'input': 3}, 'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2', 'type': 'tool_call', }, ], usage_metadata={ 'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187 }, tool_call_chunks=[ { 'name': 'magic_function', 'args': '{"input": 3}', 'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2', 'index': None, 'type': 'tool_call_chunk', } ] ) ] ``` After, it generates two (tool call in one, response metadata in another): ```python [ AIMessageChunk( content='', additional_kwargs={}, response_metadata={}, id='run-9a3f0860-baa1-4bae-9562-13a61702de70', tool_calls=[ { 'name': 'magic_function', 'args': {'input': 3}, 'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0', 'type': 'tool_call', }, ], tool_call_chunks=[ { 'name': 'magic_function', 'args': '{"input": 3}', 'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0', 'index': None, 'type': 'tool_call_chunk', }, ], ), AIMessageChunk( content='', additional_kwargs={}, response_metadata={ 'model': 'llama3.1', 'created_at': '2024-12-10T17:46:43.278436Z', 'done': True, 'done_reason': 'stop', 'total_duration': 514282750, 'load_duration': 16894458, 'prompt_eval_count': 170, 'prompt_eval_duration': 31000000, 'eval_count': 17, 'eval_duration': 464000000, 'message': Message( role='assistant', content='', images=None, tool_calls=None ), }, id='run-9a3f0860-baa1-4bae-9562-13a61702de70', usage_metadata={ 'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187 } ), ] ```	2024-12-10 12:54:37 -05:00
Johannes Mohren	c1d348e95d	doc-loader: retain Azure Doc Intelligence API metadata in Document parser (#28382 ) Description: This PR modifies the doc_intelligence.py parser in the community package to include all metadata returned by the Azure Doc Intelligence API in the Document object. Previously, only the parsed content (markdown) was retained, while other important metadata such as bounding boxes (bboxes) for images and tables was discarded. These image bboxes are crucial for supporting use cases like multi-modal RAG workflows when using Azure Doc Intelligence. The change ensures that all information returned by the Azure Doc Intelligence API is preserved by setting the metadata attribute of the Document object to the entire result returned by the API, rather than an empty dictionary. This extends the parser's utility for complex use cases without breaking existing functionality. Issue: This change does not address a specific issue number, but it resolves a critical limitation in supporting multimodal workflows when using the LangChain wrapper for the Azure API. Dependencies: No additional dependencies are required for this change. --------- Co-authored-by: jmohren <johannes.mohren@aol.de>	2024-12-10 11:22:58 -05:00
Alex Tonkonozhenko	0d20c314dd	Confluence Loader: Fix CQL loading (#27620 ) fix #12082 <!--- If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. -->	2024-12-10 11:05:23 -05:00
Katarina Supe	aba2711e7f	community: update Memgraph integration (#27017 ) Description: - Memgraph no longer relies on `Neo4jGraphStore` but implements `GraphStore`, just like other graph databases. - Memgraph no longer relies on `GraphQAChain`, but implements `MemgraphQAChain`, just like other graph databases. - The refresh schema procedure has been updated to try using `SHOW SCHEMA INFO`. The fallback uses Cypher queries (a combination of schema and Cypher) → LangChain integration no longer relies on MAGE library. - The schema structure has been reformatted. Regardless of the procedures used to get schema, schema structure is the same. - The `add_graph_documents()` method has been implemented. It transforms `GraphDocument` into Cypher queries and creates a graph in Memgraph. It implements the ability to use `baseEntityLabel` to improve speed (`baseEntityLabel` has an index on the `id` property). It also implements the ability to include sources by creating a `MENTIONS` relationship to the source document. - Jupyter Notebook for Memgraph has been updated. - Issue: / - Dependencies: / - Twitter handle: supe_katarina (DX Engineer @ Memgraph) Closes #25606	2024-12-10 10:57:21 -05:00
ccurme	5c6e2cbcda	ollama[patch]: support structured output (#28629 ) - Bump minimum version of `ollama` to 0.4.4 (which also addresses https://github.com/langchain-ai/langchain/issues/28607). - Support recently-released [structured output](https://ollama.com/blog/structured-outputs) feature. This can be accessed by calling `.with_structured_output` with `method="json_schema"` (choice of name [mirrors](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.with_structured_output) what we have for OpenAI's structured output feature). `ChatOllama` previously implemented `.with_structured_output` via the [base implementation](`ec9b41431e/libs/core/langchain_core/language_models/chat_models.py (L1117)`).	2024-12-10 10:36:00 -05:00
Bagatur	24292c4a31	core[patch]: Release 0.3.23 (#28648 )	2024-12-10 10:01:16 +00:00
Bagatur	e24f86e55f	core[patch]: return ToolMessage from tool (#28605 )	2024-12-10 09:59:38 +00:00
Erick Friis	ef2f875dfb	core: deprecate PipelinePromptTemplate (#28644 )	2024-12-10 03:56:48 +00:00
TamagoTorisugi	0f0df2df60	fix: Set default search_type to 'similarity' in as_retriever method of AzureSearch (#28376 ) Description This PR updates the `as_retriever` method in the `AzureSearch` to ensure that the `search_type` parameter defaults to 'similarity' when not explicitly provided. Previously, if the `search_type` was omitted, it did not default to any specific value. So it was inherited from `AzureSearchVectorStoreRetriever`, which defaults to 'hybrid'. This change ensures that the intended default behavior aligns with the expected usage. Issue No specific issue was found related to this change. Dependencies No new dependencies are introduced with this change. --------- Co-authored-by: prrao87 <prrao87@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 03:40:04 +00:00
Prashanth Rao	8c6eec5f25	community: KuzuGraph needs allow_dangerous_requests, add graph documents via LLMGraphTransformer (#27949 ) - [x] PR title: "community: Kuzu - Add graph documents via LLMGraphTransformer" - This PR adds a new method `add_graph_documents` to use the `GraphDocument`s extracted by `LLMGraphTransformer` and store in a Kùzu graph backend. - This allows users to transform unstructured text into a graph that uses Kùzu as the graph store. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: pookam90 <pookam@microsoft.com> Co-authored-by: Pooja Kamath <60406274+Pookam90@users.noreply.github.com> Co-authored-by: hsm207 <hsm207@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 03:15:28 +00:00
Filip Ratajczak	4e743b5427	Core: google docstring parsing fix (#28404 ) Thank you for contributing to LangChain! - [ ] PR title: "core: google docstring parsing fix" - [x] PR message: - Description: Added a solution for invalid parsing of google docstring such as: Args: net_annual_income (float): The user's net annual income (in current year dollars). - Issue: Previous code would return arg = "net_annual_income (float)" which would cause exception in _validate_docstring_args_against_annotations - Dependencies: None If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 00:27:25 +00:00
Arnav Priyadarshi	b78b2f7a28	community[fix]: Update Perplexity to pass parameters into API calls (#28421 ) - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - Description: I realized the invocation parameters were not being passed into `_generate` so I added those in but then realized that the parameters contained some old fields designed for an older openai client which I removed. Parameters work fine now. - Issue: Fixes #28229 - Dependencies: No new dependencies. - Twitter handle: @arch_plane - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 00:23:31 +00:00
Clément Jumel	cf6d1c0ae7	docs: add Linkup integration documentation (#28366 ) ## Description First of all, thanks for the great framework that is LangChain! At [Linkup](https://www.linkup.so/) we're working on an API to connect LLMs and agents to the internet and our partner sources. We'd be super excited to see our API integrated in LangChain! This essentially consists in adding a LangChain retriever and tool, which is done in our own [package](https://pypi.org/project/langchain-linkup/). Here we're simply following the [integration documentation](https://python.langchain.com/docs/contributing/how_to/integrations/) and update the documentation of LangChain to mention the Linkup integration. We do have tests (both units & integration) in our [source code](https://github.com/LinkupPlatform/langchain-linkup), and tried to follow as close as possible the [integration documentation](https://python.langchain.com/docs/contributing/how_to/integrations/) which specifically requests to focus on documentation changes for an integration PR, so I'm not adding tests here, even though the PR checklist seems to suggest so. Feel free to correct me if I got this wrong! By the way, we would be thrilled by being mentioned in the list of providers which have standalone packages [here](https://langchain-git-fork-linkupplatform-cj-doc-langchain.vercel.app/docs/integrations/providers/), is there something in particular for us to do for that? 🙂 ## Twitter handle Linkup_platform <!-- ## PR Checklist Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --!>	2024-12-09 14:36:25 -08:00
Amir Sadeghi	2c49f587aa	community[fix]: could not locate runnable browser (#28289 ) set open_browser to false to resolve "could not locate runnable browser" error while default browser is None Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 21:05:52 +00:00
Martin Triska	75bc6bb191	community: [bugfix] fix source path for office files in O365 (#28260 ) # What problem are we fixing? Currently documents loaded using `O365BaseLoader` fetch source from `file.web_url` (where `file` is `<class 'O365.drive.File'>`). This works well for `.pdf` documents. Unfortunately office documents (`.xlsx`, `.docx` ...) pass their `web_url` in following format: `https://sharepoint_address/sites/path/to/library/root/Doc.aspx?sourcedoc=%XXXXXXXX-1111-1111-XXXX-XXXXXXXXXX%7D&file=filename.xlsx&action=default&mobileredirect=true` This obfuscates the path to the file. This PR utilizes the parrent folder's path and file name to reconstruct the actual location of the file. Knowing the file's location can be crucial for some RAG applications (path to the file can carry information we don't want to loose). @vbarda Could you please look at this one? I'm @-mentioning you since we've already closed some PRs together :-) Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 12:34:59 -08:00
Erick Friis	534b8f4364	standard-tests: release 0.3.7 (#28637 )	2024-12-09 15:12:18 -05:00
Naka Masato	ce3b69aa05	community: add include_labels option to ConfluenceLoader (#28259 ) ## Description: Enable `ConfluenceLoader` to include labels with `include_labels` option (`false` by default for backward compatibility). and the labels are set to `metadata` in the `Document`. e.g. `{"labels": ["l1", "l2"]}` ## Notes Confluence API supports to get labels by providing `metadata.labels` to `expand` query parameter All of the following functions support `expand` in the same way: - confluence.get_page_by_id - confluence.get_all_pages_by_label - confluence.get_all_pages_from_space - cql (internally using [/api/content/search](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content/#api-wiki-rest-api-content-search-get)) ## Issue: No issue related to this PR. ## Dependencies: No changes. ## Twitter handle: [@gymnstcs](https://x.com/gymnstcs) - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 19:35:01 +00:00
Rajendra Kadam	242fee11be	community[minor] Pebblo: Support for new Pinecone class PineconeVectorStore (#28253 ) - Description: Support for new Pinecone class PineconeVectorStore in PebbloRetrievalQA. - Issue: NA - Dependencies: NA - Tests: - Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 19:33:54 +00:00
nikitajoyn	9fcd203556	partners/mistralai: Fix KeyError in Vertex AI stream (#28624 ) - Description: Streaming response from Mistral model using Vertex AI raises KeyError when trying to access `choices` key, that the last chunk doesn't have. The fix is to access the key safely using `get()`. - Issue: https://github.com/langchain-ai/langchain/issues/27886 - Dependencies: - Twitter handle:	2024-12-09 14:14:58 -05:00
maang-h	b64d846347	docs: Standardize MoonshotChat docstring (#28159 ) - Description: Add docstring Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 18:46:25 +00:00
Erick Friis	4c70ffff01	standard-tests: sync/async vectorstore tests conditional (#28636 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-09 18:02:55 +00:00
ccurme	ffb5c1905a	openai[patch]: release 0.2.12 (#28633 )	2024-12-09 12:38:13 -05:00
ccurme	6e6061fe73	openai[patch]: bump minimum SDK version (#28632 ) Resolves https://github.com/langchain-ai/langchain/issues/28625	2024-12-09 11:28:05 -05:00
Mohammad Mohtashim	ec9b41431e	[Core]: Small Docstring Clarification for `BaseTool` (#28148 ) - Description: `kwargs` are not being passed to `run` of the `BaseTool` which has been fixed - Issue: #28114 --------- Co-authored-by: Stevan Kapicic <kapicic.ste1@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 06:10:19 +00:00
Erick Friis	cef21a0b49	cli: warning on app add (#28619 ) instead of #28128	2024-12-09 06:07:14 +00:00
Ankit Dangi	90f162efb6	text-splitters: add pydocstyle linting (#28127 ) As seen in #23188, turned on Google-style docstrings by enabling `pydocstyle` linting in the `text-splitters` package. Each resulting linting error was addressed differently: ignored, resolved, suppressed, and missing docstrings were added. Fixes one of the checklist items from #25154, similar to #25939 in `core` package. Ran `make format`, `make lint` and `make test` from the root of the package `text-splitters` to ensure no issues were found. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 06:01:03 +00:00
WGNW_MG	eabe587787	community[patch]:Fix for get_openai_callback() return token_cost=0.0 when model is gpt-4o-11-20 (#28408 ) - Description: update MODEL_COST_PER_1K_TOKENS for new gpt-4o-11-20. - Issue: with latest gpt-4o-11-20, openai callback return token_cost=0.0 - Dependencies: None (just simple dict fix.) - Twitter handle: I Don't Use Twitter. - (However..., I have a YouTube channel. Could you upload this there, by any chance? https://www.youtube.com/@%EA%B2%9C%EC%B0%BD%EB%B6%80%EA%B3%A0%EB%AC%B8AI%EC%9E%90%EB%AC%B8%EC%84%BC%EC%84%B8)	2024-12-08 20:46:50 -08:00
Fahim Zaman	481c4bfaba	core[patch]: Fixed trim functions, and added corresponding unit test for the solved issue (#28429 ) - Description: - Trim functions were incorrectly deleting nodes with more than 1 outgoing/incoming edge, so an extra condition was added to check for this directly. A unit test "test_trim_multi_edge" was written to test this test case specifically. - Issue: - Fixes #28411 - Fixes https://github.com/langchain-ai/langgraph/issues/1676 - Dependencies: - No changes were made to the dependencies - [x] Unit tests were added to verify the changes. - [x] Updated documentation where necessary. - [x] Ran make format, make lint, and make test to ensure compliance with project standards. --------- Co-authored-by: Tasif Hussain <tasif006@gmail.com>	2024-12-08 20:45:28 -08:00
Marco Perini	2354bb7bfa	partners: 🕷️🦜 ScrapeGraph API Integration (#28559 ) Hi Langchain team! I'm the co-founder and mantainer at [ScrapeGraphAI](https://scrapegraphai.com/). By following the integration [guide](https://python.langchain.com/docs/contributing/how_to/integrations/publish/) on your site, I have created a new lib called [langchain-scrapegraph](https://github.com/ScrapeGraphAI/langchain-scrapegraph). With this PR I would like to integrate Scrapegraph as provider in Langchain, adding the required documentation files. Let me know if there are some changes to be made to be properly integrated both in the lib and in the documentation. Thank you 🕷️🦜 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 02:38:21 +00:00
Abhinav	317a38b83e	community[minor]: Add support for modle2vec embeddings (#28507 ) This PR add an embeddings integration for model2vec, the `Model2vecEmbeddings` class. - Description: [Model2Vec](https://github.com/MinishLab/model2vec) lets you turn any sentence transformer into a really small static model and makes running the model faster. - Issue: - Dependencies: model2vec ([pypi](https://pypi.org/project/model2vec/)) - Twitter handle:: - [x] Add tests and docs: - [Test](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/libs/community/langchain_community/embeddings/model2vec.py), [docs](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/docs/docs/integrations/text_embedding/model2vec.ipynb) - [x] Lint and test: --------- Co-authored-by: Abhinav KM <abhinav.m@zerone-consulting.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-12-09 02:17:22 +00:00
Mohammad Mohtashim	524ee6d9ac	Invalid `tool_choice` being passed to `ChatLiteLLM` (#28198 ) - Description: Invalid `tool_choice` is given to `ChatLiteLLM` to `bind_tools` due to it's parent's class default value being pass through `with_structured_output`. - Issue: #28176	2024-12-07 14:33:40 -05:00
Erick Friis	dd0085a9ff	docs: standard tests to markdown, load templates from files (#28603 )	2024-12-07 01:37:21 +00:00
Erick Friis	5e8553c31a	standard-tests: retriever docstrings (#28596 )	2024-12-07 00:32:19 +00:00
ccurme	d801c6ffc7	tests[patch]: nits (#28601 )	2024-12-07 00:13:04 +00:00
Erick Friis	07c2ac765a	community: release 0.3.10 (#28600 )	2024-12-07 00:07:13 +00:00
Erick Friis	4a7dc6ec4c	standard-tests: release 0.3.6 (#28599 )	2024-12-07 00:05:04 +00:00
ccurme	80a88f8f04	tests[patch]: update API ref for chat models (#28594 )	2024-12-06 19:00:14 -05:00
Erick Friis	0eb7ab65f1	multiple: fix xfailed signatures (#28597 )	2024-12-06 15:39:47 -08:00
Erick Friis	b7c2029e84	standard-tests: root docstrings (#28595 )	2024-12-06 15:14:52 -08:00
Erick Friis	9e2abcd152	standard-tests: show right classes in api docs (#28591 )	2024-12-06 14:48:13 -08:00
Erick Friis	246c10a1cc	standard-tests: private members and tools unit troubleshoot (#28590 )	2024-12-06 13:52:58 -08:00
Erick Friis	e6663b69f3	langchain: release 0.3.10 (#28585 )	2024-12-06 20:20:24 +00:00
Erick Friis	c38b845d7e	core: fix path test (#28584 )	2024-12-06 20:05:18 +00:00
ccurme	2c6bc74cb1	multiple: combine sync/async vector store standard test suites (#28580 ) Breaking change in `langchain-tests`.	2024-12-06 14:55:06 -05:00
Bagatur	dda9f90047	core[patch]: Release 0.3.22 (#28582 )	2024-12-06 19:36:53 +00:00
ccurme	f3dc142d3c	cli[patch]: implement minimal starter vector store (#28577 ) Basically the same as core's in-memory vector store. Removed some optional methods.	2024-12-06 13:10:22 -05:00
Erick Friis	5277a021c1	docs: raw loader codeblock (#28548 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-06 09:26:34 -08:00
Erick Friis	18386c16c7	core, tests: more tolerant _aget_relevant_documents function (#28462 )	2024-12-06 00:49:30 +00:00
Erick Friis	bc636ccc60	cli: release 0.0.35 (#28557 )	2024-12-05 16:40:52 -08:00
Erick Friis	7ecf38f4fa	cli: create specific files from template (#28556 )	2024-12-06 00:32:47 +00:00
Erick Friis	478def8dcc	core: deprecation doc removal (#28553 ) ![ScreenShot 2024-12-05 at 02 33 43PM@2x](https://github.com/user-attachments/assets/e1ce495b-90ca-41c7-9a65-b403a934675c)	2024-12-05 15:35:28 -08:00
cinqisap	482e8a7855	community: Add support for SAP HANA Vector hnsw index creation (#27884 ) Issue: Added support for creating indexes in the SAP HANA Vector engine. Changes: 1. Introduced a new function `create_hnsw_index` in `hanavector.py` that enables the creation of indexes for SAP HANA Vector. 2. Added integration tests for the index creation function to ensure functionality. 3. Updated the documentation to reflect the new index creation feature, including examples and output from the notebook. 4. Fix the operator issue in ` _process_filter_object` function and change the array argument to a placeholder in the similarity search SQL statement. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-05 23:29:08 +00:00
blaufink	28f8d436f6	mistral: fix of issue #26029 (#28233 ) - Description: Azure AI takes an issue with the safe_mode parameter being set to False instead of None. Therefore, this PR changes the default value of safe_mode from False to None. This results in it being filtered out before the request is sent - avoind the extra-parameter issue described below. - Issue: #26029 - Dependencies: / --------- Co-authored-by: blaufink <sebastian.brueckner@outlook.de> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-05 23:28:12 +00:00
ccurme	ecdfc98ef6	tests[patch]: run standard tests for embeddings and populate embeddings API ref (#28545 ) plus minor updates to chat models and vector store API refs	2024-12-05 19:39:03 +00:00
ccurme	b8e861a63b	openai[patch]: add standard tests for embeddings (#28540 )	2024-12-05 17:00:27 +00:00
ZhangShenao	d26555c682	[VectorStore] Improvement: Improve chroma vector store (#28524 ) - Complete unit test - Fix spelling error	2024-12-05 11:58:32 -05:00
ccurme	8f9b3b7498	chroma[patch]: fix bug (#28538 ) Fix bug introduced in https://github.com/langchain-ai/langchain/pull/27995 If all document IDs are `""`, the chroma SDK will raise ``` DuplicateIDError: Expected IDs to be unique ``` Caught by [docs tests](https://github.com/langchain-ai/langchain/actions/runs/12180395579/job/33974633950), but added a test to langchain-chroma as well.	2024-12-05 15:37:19 +00:00
Erick Friis	ecff9a01e4	cli: release 0.0.34 (#28525 )	2024-12-05 15:35:49 +00:00
ccurme	d9e42a1517	langchain[patch]: fix deprecation warning (#28535 )	2024-12-05 14:49:10 +00:00
Erick Friis	0f539f0246	standard-tests: release 0.3.5 (#28526 )	2024-12-05 00:41:07 -08:00
Erick Friis	43c35d19d4	cli: standard tests in cli, test that they run, skip vectorstore tests (#28521 )	2024-12-05 00:38:32 -08:00
Erick Friis	c5acedddc2	anthropic: timeout in tests (10s) (#28488 )	2024-12-04 16:03:38 -08:00
ccurme	f459754470	tests[patch]: populate API reference for vector stores (#28520 )	2024-12-05 00:02:31 +00:00
ccurme	8bc2c912b8	chroma[patch]: (nit) simplify test (#28517 ) Use `self.get_embeddings` on test class instead of importing embeddings separately.	2024-12-04 20:22:55 +00:00
ccurme	eec55c2550	chroma[patch]: add `get_by_ids` and fix bug (#28516 ) - Run standard integration tests in Chroma - Add `get_by_ids` method - Fix bug in `add_texts`: if a list of `ids` is passed but any of them are None, Chroma will raise an exception. Here we assign a uuid.	2024-12-04 14:00:36 -05:00
Erick Friis	e6a08355a3	docs: more api ref links, add linting step to prevent more (#28495 )	2024-12-04 04:19:42 +00:00
wlleiiwang	6151ea78d5	community: implement _select_relevance_score_fn for tencent vectordb (#28036 ) implement _select_relevance_score_fn for tencent vectordb fix use external embedding for tencent vectordb Co-authored-by: wlleiiwang <wlleiiwang@tencent.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-04 03:03:00 +00:00
Asi Greenholts	d34bf78f3b	community: BM25Retriever preservation of document id (#27019 ) Currently this retriever discards document ids --------- Co-authored-by: asi-cider <88270351+asi-cider@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-04 00:36:00 +00:00
peterdhp	bc5ec63d67	community : allow using apikey for PubMedAPIWrapper (#27246 ) Description: > Without an API key, any site (IP address) posting more than 3 requests per second to the E-utilities will receive an error message. By including an API key, a site can post up to 10 requests per second by default. quoted from A General Introduction to the E-utilities,NCBI : https://www.ncbi.nlm.nih.gov/books/NBK25497/ I have simply added a api_key parameter to the PubMedAPIWrapper that can be used to increase the number of requests per second from 3 to 10. Twitter handle : @KORmaori --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-03 16:21:22 -08:00
Eric Pinzur	eff8a54756	langchain_chroma: added document.id support (#27995 ) Description: * Added internal `Document.id` support to Chroma VectorStore Dependencies: * https://github.com/langchain-ai/langchain/pull/27968 should be merged first and this PR should be re-based on top of those changes. Tests: * Modified/Added tests for `Document.id` support. All tests are passing. Note: I am not a member of the Chroma team. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-04 00:04:27 +00:00
William Smith	15e7353168	langchain_community: updated query constructor for Databricks Vector Search due to LangChainDeprecationWarning: `filters` was deprecated since langchain-community 0.2.11 and will be removed in 0.3. Please use `filter` instead. (#27974 ) - Description: Updated the kwargs for the structured query from filters to filter due to deprecation of 'filters' for Databricks Vector Search. Also changed the error messages as the allowed operators and comparators are different which can cause issues with functions such as get_query_constructor_prompt() - Issue: Fixes the Key Error for filters due to deprecation in favor for 'filter': LangChainDeprecationWarning: DatabricksVectorSearch received a key `filters` in search_kwargs. `filters` was deprecated since langchain-community 0.2.11 and will be removed in 0.3. Please use `filter` instead. - Dependencies: N/A - Twitter handle: N/A --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-03 16:03:53 -08:00
Jan Heimes	ef365543cb	community: add Needle retriever and document loader integration (#28157 ) - [x] PR title: "community: add Needle retriever and document loader integration" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: This PR adds a new integration for Needle, which includes: - NeedleRetriever: A retriever for fetching documents from Needle collections. - NeedleLoader: A document loader for managing and loading documents into Needle collections. - Example notebooks demonstrating usage have been added in: - `docs/docs/integrations/retrievers/needle.ipynb` - `docs/docs/integrations/document_loaders/needle.ipynb`. - Dependencies: The `needle-python` package is required as an external dependency for accessing Needle's API. It has been added to the extended testing dependencies list. - Twitter handle: Feel free to mention me if this PR gets announced: [needlexai](https://x.com/NeedlexAI). - [x] Add tests and docs: If you're adding a new integration, please include 1. Unit tests have been added for both `NeedleRetriever` and `NeedleLoader` in `libs/community/tests/unit_tests`. These tests mock API calls to avoid relying on network access. 2. Example notebooks have been added to `docs/docs/integrations/`, showcasing both retriever and loader functionality. - [x] Lint and test: Run `make format`, `make lint`, and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ - `make format`: Passed - `make lint`: Passed - `make test`: Passed (requires `needle-python` to be installed locally; this package is not added to LangChain dependencies). Additional guidelines: - [x] Optional dependencies are imported only within functions. - [x] No dependencies have been added to pyproject.toml files except for those required for unit tests. - [x] The PR does not touch more than one package. - [x] Changes are fully backwards compatible. - [x] Community additions are not re-imported into LangChain core. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-03 22:06:25 +00:00
ccurme	ab831ce05c	tests[patch]: populate API reference for chat models (#28487 ) Populate API reference for test class properties and test methods for chat models. Also: - Make `standard_chat_model_params` private. - `pytest.skip` some tests that were previously passed if features are not supported.	2024-12-03 15:24:54 -05:00
Erick Friis	c74f34cb41	pinecone: release 0.2.1 (version sequence) (#28485 )	2024-12-03 10:22:16 -08:00
Audrey Sage Lorberfeld	926e452f44	partners: update version header for Pinecone integration (#28481 ) Just need to update the version header used with Pinecone in recently-merged method (from [this PR](https://github.com/langchain-ai/langchain/pull/28320/files#r1867820929)). Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-03 18:08:56 +00:00
Erick Friis	7315360907	openai: dont populate logit_bias if None (#28482 )	2024-12-03 17:54:53 +00:00
Erick Friis	ff675c11f6	partners/pinecone: release 0.2.2 (#28466 )	2024-12-03 06:49:35 +00:00
Audrey Sage Lorberfeld	6b7e93d4c7	pinecone: update pinecone client (#28320 ) This PR updates the Pinecone client to `5.4.0`, as well as its dependencies (`pinecone-plugin-inference` and `pinecone-plugin-interface`). Note: `pinecone-client` is now simply called `pinecone`. Question for reviewer(s): should this PR also update the `pinecone` dep in [the root dir's `poetry.lock` file](https://github.com/langchain-ai/langchain/blob/master/poetry.lock#L6729)? Was unsure. (I don't believe so b/c it seems pinned to a lower version likely based on 3rd-party deps (e.g. Unstructured).) -- TW: @audrey_sage_ --- - To see the specific tasks where the Asana app for GitHub is being used, see below: - https://app.asana.com/0/0/1208693659122374	2024-12-02 22:47:09 -08:00
Erick Friis	000be1f32c	tests: init retriever standard tests (#28459 )	2024-12-02 23:36:09 +00:00
Erick Friis	42d40d694b	partners/openai: release 0.2.11 (#28461 )	2024-12-02 23:35:18 +00:00
Erick Friis	9f04416768	openai: set logit_bias to none instead of empty dict by default (#28460 )	2024-12-02 15:30:32 -08:00
William FH	ecee41ab72	fix: Handle response metadata in merge_messages_runs (#28453 )	2024-12-02 13:56:23 -08:00
lucasiscovici	60021e54b5	community: Add the additonnal kward 'context' for openai (#28351 ) - Description: Add the additonnal kward 'context' for openai into `convert_dict_to_message` and `convert_message_to_dict` functions.	2024-12-02 16:43:30 -05:00
ccurme	28487597b2	ollama[patch]: release 0.2.1 (#28458 ) We inadvertently skipped 0.2.1, so release pipeline [failed](https://github.com/langchain-ai/langchain/actions/runs/12126964367/job/33810204551).	2024-12-02 21:17:51 +00:00
ccurme	88d6d02b59	ollama[patch]: release 0.2.2 (#28456 )	2024-12-02 14:57:30 -05:00
Bagatur	47433485e7	mistral[patch]: Release 0.2.3 (#28452 )	2024-12-02 08:26:28 -08:00
Bagatur	49914e959a	community[patch]: Release 0.3.9 (#28451 )	2024-12-02 16:23:37 +00:00
ccurme	c2f1d022a2	mistral[patch]: ensure tool call IDs in tool messages are correctly formatted (#28422 ) Fixes tests for cross-provider compatibility: https://github.com/langchain-ai/langchain/actions/runs/12085358877/job/33702420504#step:10:376	2024-11-29 13:56:06 +00:00
Alex Thomas	2813e86407	docs: Adds the langchain-neo4j package to the API docs (#28386 ) This PR adds the `langchain-neo4j` package to the `libs/packages.yml` so the API docs can be built.	2024-11-27 12:41:12 -08:00
Bagatur	b7e10bb199	langchain[patch]: Release 0.3.9 (#28399 )	2024-11-27 20:06:11 +00:00
ccurme	a8b21afc08	qdrant[patch]: run python 3.13 in CI (#28394 )	2024-11-27 12:22:17 -05:00
ccurme	ee6fc3f3f6	nomic[patch]: run python 3.13 in CI (#28393 )	2024-11-27 17:08:15 +00:00
Massimiliano Pronesti	83586661d6	partners[chroma]: add retrieval of embedding vectors (#28290 ) This PR adds an additional method to `Chroma` to retrieve the embedding vectors, besides the most relevant Documents. This is sometimes of use when you need to run a postprocessing algorithm on the retrieved results based on the vectors, which has been the case for me lately. Example issue (discussion) requesting this change: https://github.com/langchain-ai/langchain/discussions/20383 --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-11-27 16:34:02 +00:00
ccurme	733a6ad328	mistral[patch]: run python 3.13 in CI (#28392 )	2024-11-27 11:29:04 -05:00
ccurme	b9bf7fd797	couchbase[patch]: run python 3.13 in CI (#28391 )	2024-11-27 11:28:21 -05:00
Greg Hinch	5141f25a20	community[patch]: support numpy2 (#28184 ) Follows on from #27991, updates the langchain-community package to support numpy 2 versions --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-27 11:10:58 -05:00
LuisMSotamba	0901f11b0f	community: add truncation params when an openai assistant's run is created (#28158 ) Description: When an OpenAI assistant is invoked, it creates a run by default, allowing users to set only a few request fields. The truncation strategy is set to auto, which includes previous messages in the thread along with the current question until the context length is reached. This causes token usage to grow incrementally: consumed_tokens = previous_consumed_tokens + current_consumed_tokens. This PR adds support for user-defined truncation strategies, giving better control over token consumption. Issue: High token consumption.	2024-11-27 10:53:53 -05:00
TheDannyG	607c60a594	partners/ollama: fix tool calling with nested schemas (#28225 ) ## Description This PR addresses the following: Fixes Issue #25343: - Adds additional logic to parse shallowly nested JSON-encoded strings in tool call arguments, allowing for proper parsing of responses like that of Llama3.1 and 3.2 with nested schemas. Adds Integration Test for Fix: - Adds a Ollama specific integration test to ensure the issue is resolved and to prevent regressions in the future. Fixes Failing Integration Tests: - Fixes failing integration tests (even prior to changes) caused by `llama3-groq-tool-use` model. Previously, tests`test_structured_output_async` and `test_structured_output_optional_param` failed due to the model not issuing a tool call in the response. Resolved by switching to `llama3.1`. ## Issue Fixes #25343. ## Dependencies No dependencies. ____ Done in collaboration with @ishaan-upadhyay @mirajismail @ZackSteine.	2024-11-27 10:32:02 -05:00
ccurme	bb83abd037	community[patch]: remove sqlalchemy cap (#28389 )	2024-11-27 10:20:36 -05:00
ccurme	42b8ad067d	chroma[patch]: test python 3.13 in CI (#28387 )	2024-11-27 15:02:40 +00:00
William FH	585da22752	Init embeddings (#28370 )	2024-11-27 08:25:10 +00:00
Bagatur	ffe7bd4832	langchain[patch]: init_chat_model provider in model string (#28367 ) ```python llm = init_chat_model("openai:gpt-4o") ```	2024-11-27 00:20:25 -08:00
ccurme	8adc4a5bcc	langchain[patch]: update deprecation message for agent classes and constructors (#28369 )	2024-11-26 16:07:13 -05:00
Mohammad Mohtashim	06fafc6651	Community: Marqo Index Setting GET Request Updated according to `2.x` API version while keep backward compatability for 1.5.x (#28342 ) - Description: `add_texts` was using `get_setting` for marqo client which was being used according to 1.5.x API version. However, this PR updates the `add_text` accounting for updated response payload for 2.x and later while maintaining backward compatibility. Plus I have verified this was the only place where marqo client was not accounting for updated API version. - Issue: #28323 --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-11-26 18:26:56 +00:00
willtai	7d95a10ada	langchain: Fix Neo4jVector vector store reference from partner package for self query (#28292 ) _This should only be merged once neo4j is included under libs/partners._ # Description: Neo4jVector from langchain-community is being moved to langchain-neo4j: [see link](https://github.com/langchain-ai/langchain-neo4j/blob/main/libs/neo4j/langchain_neo4j/vectorstores/neo4j_vector.py#L436). To solve the issue below, this PR adds an attempt to import `Neo4jVector` from the partner package `langchain-neo4j`, similarly to the other partner packages. # Issue: When initializing `SelfQueryRetriever`, the following error is raised: ``` ValueError: Self query retriever with Vector Store type <class 'langchain_neo4j.vectorstores.neo4j_vector.Neo4jVector'> not supported. ``` [See related issue](https://github.com/langchain-ai/langchain/issues/19748). # Dependencies: - langchain-neo4j	2024-11-26 13:21:04 -05:00
ccurme	a1c90794e1	ollama[patch]: bump to 0.4.1 in lock file (#28365 )	2024-11-26 18:19:31 +00:00
ccurme	74d9d2cba1	ollama[patch]: support ollama 0.4 (#28364 ) v0.4 of the Python SDK is already installed via the lock file in CI, but our current implementation is not compatible with it. This also addresses an issue introduced in https://github.com/langchain-ai/langchain/pull/28299. @RyanMagnuson would you mind explaining the motivation for that change? From what I can tell the Ollama SDK [does not support kwargs](`6c44bb2729/ollama/_client.py (L286)`). Previously, unsupported kwargs were ignored, but they currently raise `TypeError`. Some of LangChain's standard test suite expects `tool_choice` to be supported, so here we catch it in `bind_tools` so it is ignored and not passed through to the client.	2024-11-26 12:45:59 -05:00
Bagatur	e9c16552fa	openai[patch]: bump core dep (#28361 )	2024-11-26 08:37:05 -08:00
Bagatur	e7dc26aefb	openai[patch]: Release 0.2.10 (#28360 )	2024-11-26 08:30:29 -08:00
ccurme	42b18824c2	openai[patch]: use max_completion_tokens in place of max_tokens (#26917 ) `max_tokens` is deprecated: https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-11-26 16:30:19 +00:00
Greg Hinch	869c8f5879	langchain[patch]: support numpy 2 (#28183 ) Follows on from #27991, updates the langchain package to support numpy 2 versions --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-26 11:20:02 -05:00
ccurme	7b9a0d9ed8	docs: update tutorials (#28219 )	2024-11-26 10:43:12 -05:00
Richard Hao	c161f7d46f	docs(create_sql_agent): fix reStructured Text Markup (#28356 ) - Description: Lines of code must be indented beneath `.. code-block::` for proper formatting. https://devguide.python.org/documentation/markup/#showing-code-examples - Issue: The example code block on the `create_sql_agent` document page is not properly rendered. https://python.langchain.com/api_reference/community/agent_toolkits/langchain_community.agent_toolkits.sql.base.create_sql_agent.html#langchain_community.agent_toolkits.sql.base.create_sql_agent <img width="933" alt="image" src="https://github.com/user-attachments/assets/d764bcad-e412-408b-ab0b-9a78a11188ee">	2024-11-26 09:52:16 -05:00
Mohammad Mohtashim	195ae7baa3	Community: Adding citations in AIMessage for ChatPerplexity (#28321 ) Description: Adding Citation in response payload of ChatPerplexity Issue: #28108	2024-11-26 09:45:47 -05:00
ccurme	a5374952f8	community[patch]: fix import in test (#28339 ) Library name was updated after https://github.com/langchain-ai/langchain/pull/27879 branched off master.	2024-11-25 19:28:01 +00:00
Alex Thomas	5867f25ff3	community[patch]: Neo4j community deprecation (#28130 ) Adds deprecation notices for Neo4j components moving to the `langchain_neo4j` partner package. - Adds deprecation warnings to all Neo4j-related classes and functions that have been migrated to the new `langchain_neo4j` partner package - Updates documentation to reference the new `langchain_neo4j` package instead of `langchain_community`	2024-11-25 10:34:22 -08:00
Yan	c60695a1c7	community: fixed critical bugs at Writer provider (#27879 )	2024-11-25 12:03:37 -05:00
Yelin Zhang	6ed2d387bb	docs: fix GOOGLE_API_KEY typo (#28322 ) fix small GOOGLE_API_KEY markdown formatting typo	2024-11-25 09:45:22 -05:00
ccurme	a83357dc5a	community[patch]: release 0.3.8 (#28316 )	2024-11-23 08:21:21 -05:00
ccurme	82bb0cdfff	langchain[patch]: release 0.3.8 (#28315 )	2024-11-23 13:02:10 +00:00
ccurme	f5f1149257	core[patch]: release 0.3.21 (#28314 )	2024-11-23 12:46:56 +00:00
Eugene Yurtsev	563587e14f	langchain[patch]: Compat with pydantic 2.10 (#28307 ) pydantic compat 2.10 for langchain	2024-11-23 03:21:27 +00:00
Eugene Yurtsev	a813d11c14	core[patch]: Compat pydantic 2.10 (#28308 ) pydantic 2.10 compat for langchain-core	2024-11-22 21:44:55 -05:00
ccurme	25a636c597	langchain[patch]: update deprecation message for MapReduceChain (#28304 ) Link migration guide first.	2024-11-23 00:47:52 +00:00
ccurme	203d20caa5	community[patch]: fix errors introduced by pydantic 2.10 (#28297 )	2024-11-22 17:50:13 -05:00
Erick Friis	aa7fa80e1e	partners/ollama: release 0.2.2rc1 (#28300 )	2024-11-22 22:25:05 +00:00
Erick Friis	7277794a59	ollama: include kwargs in requests (#28299 ) courtesy of @ryanmagnuson	2024-11-22 14:15:42 -08:00
Pat Patterson	2ee37a1c7b	community: list valid values for LanceDB constructor's `mode` argument (#28296 ) Description: Currently, the docstring for `LanceDB.__init__()` provides the default value for `mode`, but not the list of valid values. This PR adds that list to the docstring. Issue: N/A Dependencies: N/A Twitter handle: `@metadaddy` [Leaving as a reminder: If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.]	2024-11-22 15:40:06 -05:00
ccurme	697dda5052	core[patch]: release 0.3.20 (#28293 )	2024-11-22 14:04:29 -05:00
ccurme	a433039a56	core[patch]: support final AIMessage responses in `tool_example_to_messages` (#28267 ) We have a test [test_structured_few_shot_examples](`ad4333ca03/libs/standard-tests/langchain_tests/integration_tests/chat_models.py (L546)`) in standard integration tests that implements a version of tool-calling few shot examples that works with ~all tested providers. The formulation supported by ~all providers is: `human message, tool call, tool message, AI reponse`. Here we update `langchain_core.utils.function_calling.tool_example_to_messages` to support this formulation. The `tool_example_to_messages` util is undocumented outside of our API reference. IMO, if we are testing that this function works across all providers, it can be helpful to feature it in our guides. The structured few-shot examples we document at the moment require users to implement this function and can be simplified.	2024-11-22 15:38:49 +00:00
Erick Friis	29f8a79ebe	groq,openai,mistralai: fix unit tests (#28279 )	2024-11-22 04:54:01 +00:00
Erick Friis	9a717c9b32	docs: poetry publish 2 (#28277 ) - docs: poetry publish - x - x - x - x - x - x - x - x - x	2024-11-21 20:49:38 -08:00
Erick Friis	4ccb3e64c7	cli: release 0.0.33 (#28278 )	2024-11-21 20:13:37 -08:00
Erick Friis	49254cde70	docs: poetry publish (#28275 )	2024-11-22 03:10:03 +00:00
Erick Friis	b3ee1f8713	core: add space at end of error message link (#28270 )	2024-11-21 22:19:59 +00:00
Erick Friis	5bc2df3060	standard-tests: troubleshooting docstrings (#28268 )	2024-11-21 22:05:31 +00:00
Erick Friis	ad4333ca03	infra: disable vertex api build (#28266 )	2024-11-21 10:37:17 -08:00
ccurme	56499cf58b	openai[patch]: unskip test and relax tolerance in embeddings comparison (#28262 ) From what I can tell response using SDK is not deterministic: ```python import numpy as np import openai documents = ["disallowed special token '<\|endoftext\|>'"] model = "text-embedding-ada-002" direct_output_1 = ( openai.OpenAI() .embeddings.create(input=documents, model=model) .data[0] .embedding ) for i in range(10): direct_output_2 = ( openai.OpenAI() .embeddings.create(input=documents, model=model) .data[0] .embedding ) print(f"{i}: {np.isclose(direct_output_1, direct_output_2).all()}") ``` ``` 0: True 1: True 2: True 3: True 4: False 5: True 6: True 7: True 8: True 9: True ``` See related discussion here: https://community.openai.com/t/can-text-embedding-ada-002-be-made-deterministic/318054 Found the same result using `"text-embedding-3-small"`.	2024-11-21 10:23:10 -08:00
Priyanshi Garg	f5f53d1101	community: fix compatibility issue in kinetica chat model integration for Pydantic 2 (#28252 ) Fixed a compatibility issue in the `load_messages_from_context()` function for the Kinetica chat model integration. The issue was caused by stricter validation introduced in Pydantic 2.	2024-11-21 09:33:00 -05:00
Erick Friis	d1108607f4	multiple: push deprecation removals to 1.0 (#28236 )	2024-11-20 19:56:29 -08:00
Erick Friis	4f76246cf2	standard-tests: release 0.3.4 (#28245 )	2024-11-20 19:35:58 -08:00
Erick Friis	4bdf1d7d1a	standard-tests: fix decorator init test (#28246 )	2024-11-21 03:35:43 +00:00
Erick Friis	60e572f591	standard-tests: tool tests (#28244 )	2024-11-20 19:26:16 -08:00
Erick Friis	35e6052df5	infra: remove stale dockerfiles from repo (#28243 ) deleting the following docker things from monorepo. they aren't currently usable because of old dependencies, and I'd rather avoid people using them / having to maintain them - /docker - this folder has a compose file that spins up postgres,pgvector (separate from postgres and very stale version),mongo instance with default user/password that we've gotten security pings about before. not worth having - also spins up a custom dockerfile with onttotext/graphdb - not even sure what that is - /libs/langchain/dockerfile + dev.dockerfile - super old poetry version, doesn't implement the right thing anymore - .github/workflows/_release_docker.yml, langchain_release_docker.yml - not used anymore, not worth having an alternate release path	2024-11-21 00:05:01 +00:00
Erick Friis	161ab736ce	standard-tests: release 0.3.3 (#28242 )	2024-11-20 23:47:02 +00:00
Eugene Yurtsev	2acc83f146	mistralai[patch]: 0.2.2 release (#28240 ) mistralai 0.2.2 release	2024-11-20 22:18:15 +00:00
Eugene Yurtsev	1a66175e38	mistral[patch]: Propagate tool call id (#28238 ) mistralai-large-2411 requires tool call id Older models accept tool call id if its provided mistral-large-2407 mistral-large-2402	2024-11-20 17:02:30 -05:00
shroominic	dee72c46c1	community: Outlines integration (#27449 ) In collaboration with @rlouf I build an [outlines](https://dottxt-ai.github.io/outlines/latest/) integration for langchain! I think this is really useful for doing any type of structured output locally. [Dottxt](https://dottxt.co) spend alot of work optimising this process at a lower level ([outlines-core](https://pypi.org/project/outlines-core/0.1.14/) written in rust) so I think this is a better alternative over all current approaches in langchain to do structured output. It also implements the `.with_structured_output` method so it should be a drop in replacement for a lot of applications. The integration includes: - Outlines LLM class - ChatOutlines class - Tutorial Cookbooks - Documentation Page - Validation and error messages - Exposes Outlines Structured output features - Support for multiple backends - Integration and Unit Tests Dependencies: `outlines` + additional (depending on backend used) I am not sure if the unit-tests comply with all requirements, if not I suggest to just remove them since I don't see a useful way to do it differently. ### Quick overview: Chat Models: <img width="698" alt="image" src="https://github.com/user-attachments/assets/05a499b9-858c-4397-a9ff-165c2b3e7acc"> Structured Output: <img width="955" alt="image" src="https://github.com/user-attachments/assets/b9fcac11-d3e5-4698-b1ae-8c4cb3d54c45"> --------- Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-11-20 16:31:31 -05:00
Mikelarg	2901fa20cc	community: Add deprecation warning for GigaChat integration in langchain-community (#28022 ) - Description: We have released the [langchain-gigachat](https://github.com/ai-forever/langchain-gigachat?tab=readme-ov-file) with new GigaChat integration that support's function/tool calling. This PR deprecated legacy GigaChat class in community package. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-20 21:03:47 +00:00
Renzo-vS	567dc1e422	community: fix duplicate content (#28003 ) Thank you for reading my first PR! Description: Deduplicate content in AzureSearch vectorstore. Currently, by default, the content of the retrieval is placed both in metadata and page_content of a Document. This PR removes the content from metadata, and leaves it in page_content. Issue:: Previously, the content was popped from result before metadata was populated. In #25828 , the order was changed which leads to a response with duplicated content. This was not the intention of that PR and seems undesirable. Looking forward to seeing my contribution in the next version! Cheers, Renzo	2024-11-20 12:49:03 -08:00
Jorge Piedrahita Ortiz	abaea28417	community: SamabanovaCloud tool calling and Structured output (#27967 ) Description: Add tool calling and structured output support for SambaNovaCloud chat models, docs included --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-20 19:12:08 +00:00
af su	7c7ee07d30	huggingface[fix]: HuggingFaceEndpointEmbeddings model parameter passing error when async embed (#27953 ) This change refines the handling of _model_kwargs in POST requests. Instead of nesting _model_kwargs as a dictionary under the parameters key, it is now directly unpacked and merged into the request's JSON payload. This ensures that the model parameters are passed correctly and avoids unnecessary nesting.E. g.: ```python import asyncio from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings embedding_input = ["This input will get multiplied" * 10000] embeddings = HuggingFaceEndpointEmbeddings( model="http://127.0.0.1:8081/embed", model_kwargs={"truncate": True}, ) # Truncated parameters in synchronized methods are handled correctly embeddings.embed_documents(texts=embedding_input) # The truncate parameter is not handled correctly in the asynchronous method, # and 413 Request Entity Too Large is returned. asyncio.run(embeddings.aembed_documents(texts=embedding_input)) ``` Co-authored-by: af su <saf@zjuici.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-20 19:08:56 +00:00
Eric Pinzur	923ef85105	langchain_chroma: fixed integration tests (#27968 ) Description: * I'm planning to add `Document.id` support to the Chroma VectorStore, but first I wanted to make sure all the integration tests were passing first. They weren't. This PR fixes the broken tests. * I found 2 issues: * This change (from a year ago, exactly :) ) for supporting multi-modal embeddings: https://docs.trychroma.com/deployment/migration#migration-to-0.4.16---november-7,-2023 * This change https://github.com/langchain-ai/langchain/pull/27827 due to an update in the chroma client. Also ran `format` and `lint` on the changes. Note: I am not a member of the Chroma team.	2024-11-20 11:05:02 -08:00
CLOVA Studio 개발	218b4e073e	community: fix some features on Naver ChatModel & embedding model (#28228 ) # Description - adding stopReason to response_metadata to call stream and astream - excluding NCP_APIGW_API_KEY input required validation - to remove warning Field "model_name" has conflict with protected namespace "model_". cc. @vbarda	2024-11-20 10:35:41 -08:00
Erick Friis	43e24cd4a1	docs, standard-tests: property tags, support tool decorator (#28234 )	2024-11-20 17:19:03 +00:00
William FH	197b885911	[CLI] Relax constraints (#28218 )	2024-11-19 09:31:56 -08:00
Eugene Yurtsev	5599a0a537	core[minor]: Add other langgraph packages to sys_info (#28190 ) Add other langgraph packages to sys_info output	2024-11-19 09:20:25 -05:00
Erick Friis	0dbaf05bb7	standard-tests: rename langchain_standard_tests to langchain_tests, release 0.3.2 (#28203 )	2024-11-18 19:10:39 -08:00
Erick Friis	d9d689572a	openai: release 0.2.9, o1 streaming (#28197 )	2024-11-18 23:54:38 +00:00
DreamOfStars	22a8652ecc	langchain: add missing punctuation in react_single_input.py (#28161 ) - [x] PR title: "langchain: add missing punctuation in react_single_input.py" - [x] PR message: - Description: Add missing single quote to line 12: "Invalid Format: Missing 'Action:' after 'Thought:"	2024-11-18 09:38:48 -05:00
Eric Pinzur	0a57fc0016	community: OpenSearchVectorStore: use engine set at init() time by default (#28147 ) Description: * Updated the OpenSearchVectorStore to use the `engine` parameter captured at `init()` time as the default when adding documents to the store. Formatted, Linted, and Tested.	2024-11-16 17:07:42 -05:00
Erick Friis	6d2004ee7d	multiple: langchain-standard-tests -> langchain-tests (#28139 )	2024-11-15 11:32:04 -08:00
Erick Friis	409c7946ac	docs, standard-tests: how to standard test a custom tool, imports (#27931 )	2024-11-15 10:49:14 -08:00
alex shengzhi li	39fcb476fd	community: add reka chat model integration (#27379 )	2024-11-15 13:37:14 -05:00
Erick Friis	d3252b7417	core: release 0.3.19 (#28137 )	2024-11-15 18:15:28 +00:00
Jorge Piedrahita Ortiz	39956a3ef0	community: sambanovacloud llm integration (#27526 ) - Description: SambaNovaCloud llm integration added, previously only chat model integration --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-15 16:58:11 +00:00
Elham Badri	d696728278	partners/ollama: Enabled Token Level Streaming when Using Bind Tools for ChatOllama (#27689 ) Description: The issue concerns the unexpected behavior observed using the bind_tools method in LangChain's ChatOllama. When tools are not bound, the llm.stream() method works as expected, returning incremental chunks of content, which is crucial for real-time applications such as conversational agents and live feedback systems. However, when bind_tools([]) is used, the streaming behavior changes, causing the output to be delivered in full chunks rather than incrementally. This change negatively impacts the user experience by breaking the real-time nature of the streaming mechanism. Issue: #26971 --------- Co-authored-by: 4meyDam1e <amey.damle@mail.utoronto.ca> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-15 11:36:27 -05:00
ccurme	776e3271e3	standard-tests[patch]: add test for async tool calling (#28133 )	2024-11-15 16:09:50 +00:00
Vadym Barda	ed4952e475	core[patch]: add caching to get_function_nonlocals (#28131 )	2024-11-15 07:53:53 -08:00
ccurme	f1222739f8	core[patch]: support numpy 2 (#27991 )	2024-11-14 13:08:57 -05:00
Vadym Barda	6ec688cf2b	xai[patch]: update core (#28092 )	2024-11-13 17:51:51 +00:00
Bharat Ramanathan	3e972faf81	community: chore warn deprecate the tracer (#27159 ) - Description:: This PR deprecates the wandb tracer in favor of the new [WeaveTracer](https://weave-docs.wandb.ai/guides/integrations/langchain#using-weavetracer) in W&B - Dependencies: No dependencies, just a deprecation warning. - Twitter handle: @parambharat @baskaryan	2024-11-13 11:33:34 -05:00
Erick Friis	76e0127539	core: release 0.3.18 (#28070 )	2024-11-13 16:19:13 +00:00
Eric Pinzur	eadc2f6a90	core: added DeleteResponse to the module (#28069 ) Description: * added `DeleteResponse` to the `langchain_core.indexing` module, for implementing DocumentIndex classes.	2024-11-13 11:08:08 -05:00
ZhangShenao	c89e7ce8b5	core[patch]: Update doc-strings in callbacks (#28073 ) - Fix api docs	2024-11-13 11:07:15 -05:00
Vadym Barda	09e85c7c4b	xai[patch]: update dependencies (#28067 )	2024-11-12 16:15:17 -05:00
am-kinetica	a646f1c383	Handled empty search result handling and updated the notebook (#27914 ) - [ ] PR title: "community: updated Kinetica vectorstore" - Description: Handled empty search results - Issue: used to throw error if the search results were empty @efriis	2024-11-12 13:03:49 -08:00
ccurme	00e7b2dada	anthropic[patch]: add examples to API ref (#28065 )	2024-11-12 20:17:02 +00:00
Vadym Barda	48ee322a78	partners: add xAI chat integration (#28032 )	2024-11-12 15:11:29 -05:00
ccurme	2898b95ca7	anthropic[major]: release 0.3.0 (#28063 )	2024-11-12 14:58:00 -05:00
ccurme	5eaa0e8c45	openai[patch]: release 0.2.8 (#28062 )	2024-11-12 14:57:11 -05:00
ccurme	15b7dd3ad7	community[patch]: release 0.3.7 (#28061 )	2024-11-12 19:54:58 +00:00
ccurme	5460096086	core[patch]: release 0.3.17 (#28060 )	2024-11-12 19:38:56 +00:00
ccurme	1538ee17f9	anthropic[major]: support python 3.13 (#27916 ) Last week Anthropic released version 0.39.0 of its python sdk, which enabled support for Python 3.13. This release deleted a legacy `client.count_tokens` method, which we currently access during init of the `Anthropic` LLM. Anthropic has replaced this functionality with the [client.beta.messages.count_tokens() API](https://github.com/anthropics/anthropic-sdk-python/pull/726). To enable support for `anthropic >= 0.39.0` and Python 3.13, here we drop support for the legacy token counting method, and add support for the new method via `ChatAnthropic.get_num_tokens_from_messages`. To fully support the token counting API, we update the signature of `get_num_tokens_from_message` to accept tools everywhere. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-11-12 14:31:07 -05:00
ZhangShenao	ca7375ac20	Improvement[Community]Improve Embeddings API (#28038 ) - Fix `BaichuanTextEmbeddings` api url - Remove unused params in api doc - Fix word spelling	2024-11-12 13:57:35 -05:00
Bagatur	139881b108	openai[patch]: fix azure oai stream check (#28048 )	2024-11-12 15:42:06 +00:00
Bagatur	9611f0b55d	openai[patch]: Release 0.2.7 (#28047 )	2024-11-12 15:16:15 +00:00
Bagatur	5c14e1f935	community[patch]: Release 0.3.6 (#28046 )	2024-11-12 15:15:07 +00:00
Bagatur	9ebd7ebed8	core[patch]: Release 0.3.16 (#28045 )	2024-11-12 14:57:15 +00:00
Bagatur	33dbfba08b	openai[patch]: default to invoke on o1 stream() (#27983 )	2024-11-08 19:12:59 -08:00
Eric Pinzur	c421997caa	community[patch]: Added type hinting to OpenSearch clients (#27946 ) Description: * When working with OpenSearchVectorSearch to make OpenSearchGraphVectorStore (coming soon), I noticed that there wasn't type hinting for the underlying OpenSearch clients. This fixes that issue. * Confirmed tests are still passing with code changes. Note that there is some additional code duplication now, but I think this approach is cleaner overall.	2024-11-08 11:04:57 -08:00
Saad Makrod	b509747c7f	Community: Google Books API Tool (#27307 ) ## Description As proposed in our earlier discussion #26977 we have introduced a Google Books API Tool that leverages the Google Books API found at [https://developers.google.com/books/docs/v1/using](https://developers.google.com/books/docs/v1/using) to generate book recommendations. ### Sample Usage ```python from langchain_community.tools import GoogleBooksQueryRun from langchain_community.utilities import GoogleBooksAPIWrapper api_wrapper = GoogleBooksAPIWrapper() tool = GoogleBooksQueryRun(api_wrapper=api_wrapper) tool.run('ai') ``` ### Sample Output ```txt Here are 5 suggestions based off your search for books related to ai: 1. "AI's Take on the Stigma Against AI-Generated Content" by Sandy Y. Greenleaf: In a world where artificial intelligence (AI) is rapidly advancing and transforming various industries, a new form of content creation has emerged: AI-generated content. However, despite its potential to revolutionize the way we produce and consume information, AI-generated content often faces a significant stigma. "AI's Take on the Stigma Against AI-Generated Content" is a groundbreaking book that delves into the heart of this issue, exploring the reasons behind the stigma and offering a fresh, unbiased perspective on the topic. Written from the unique viewpoint of an AI, this book provides readers with a comprehensive understanding of the challenges and opportunities surrounding AI-generated content. Through engaging narratives, thought-provoking insights, and real-world examples, this book challenges readers to reconsider their preconceptions about AI-generated content. It explores the potential benefits of embracing this technology, such as increased efficiency, creativity, and accessibility, while also addressing the concerns and drawbacks that contribute to the stigma. As you journey through the pages of this book, you'll gain a deeper understanding of the complex relationship between humans and AI in the realm of content creation. You'll discover how AI can be used as a tool to enhance human creativity, rather than replace it, and how collaboration between humans and machines can lead to unprecedented levels of innovation. Whether you're a content creator, marketer, business owner, or simply someone curious about the future of AI and its impact on our society, "AI's Take on the Stigma Against AI-Generated Content" is an essential read. With its engaging writing style, well-researched insights, and practical strategies for navigating this new landscape, this book will leave you equipped with the knowledge and tools needed to embrace the AI revolution and harness its potential for success. Prepare to have your assumptions challenged, your mind expanded, and your perspective on AI-generated content forever changed. Get ready to embark on a captivating journey that will redefine the way you think about the future of content creation. Read more at https://play.google.com/store/books/details?id=4iH-EAAAQBAJ&source=gbs_api 2. "AI Strategies For Web Development" by Anderson Soares Furtado Oliveira: From fundamental to advanced strategies, unlock useful insights for creating innovative, user-centric websites while navigating the evolving landscape of AI ethics and security Key Features Explore AI's role in web development, from shaping projects to architecting solutions Master advanced AI strategies to build cutting-edge applications Anticipate future trends by exploring next-gen development environments, emerging interfaces, and security considerations in AI web development Purchase of the print or Kindle book includes a free PDF eBook Book Description If you're a web developer looking to leverage the power of AI in your projects, then this book is for you. Written by an AI and ML expert with more than 15 years of experience, AI Strategies for Web Development takes you on a transformative journey through the dynamic intersection of AI and web development, offering a hands-on learning experience.The first part of the book focuses on uncovering the profound impact of AI on web projects, exploring fundamental concepts, and navigating popular frameworks and tools. As you progress, you'll learn how to build smart AI applications with design intelligence, personalized user journeys, and coding assistants. Later, you'll explore how to future-proof your web development projects using advanced AI strategies and understand AI's impact on jobs. Toward the end, you'll immerse yourself in AI-augmented development, crafting intelligent web applications and navigating the ethical landscape.Packed with insights into next-gen development environments, AI-augmented practices, emerging realities, interfaces, and security governance, this web development book acts as your roadmap to staying ahead in the AI and web development domain. What you will learn Build AI-powered web projects with optimized models Personalize UX dynamically with AI, NLP, chatbots, and recommendations Explore AI coding assistants and other tools for advanced web development Craft data-driven, personalized experiences using pattern recognition Architect effective AI solutions while exploring the future of web development Build secure and ethical AI applications following TRiSM best practices Explore cutting-edge AI and web development trends Who this book is for This book is for web developers with experience in programming languages and an interest in keeping up with the latest trends in AI-powered web development. Full-stack, front-end, and back-end developers, UI/UX designers, software engineers, and web development enthusiasts will also find valuable information and practical guidelines for developing smarter websites with AI. To get the most out of this book, it is recommended that you have basic knowledge of programming languages such as HTML, CSS, and JavaScript, as well as a familiarity with machine learning concepts. Read more at https://play.google.com/store/books/details?id=FzYZEQAAQBAJ&source=gbs_api 3. "Artificial Intelligence for Students" by Vibha Pandey: A multifaceted approach to develop an understanding of AI and its potential applications KEY FEATURES ● AI-informed focuses on AI foundation, applications, and methodologies. ● AI-inquired focuses on computational thinking and bias awareness. ● AI-innovate focuses on creative and critical thinking and the Capstone project. DESCRIPTION AI is a discipline in Computer Science that focuses on developing intelligent machines, machines that can learn and then teach themselves. If you are interested in AI, this book can definitely help you prepare for future careers in AI and related fields. The book is aligned with the CBSE course, which focuses on developing employability and vocational competencies of students in skill subjects. The book is an introduction to the basics of AI. It is divided into three parts – AI-informed, AI-inquired and AI-innovate. It will help you understand AI's implications on society and the world. You will also develop a deeper understanding of how it works and how it can be used to solve complex real-world problems. Additionally, the book will also focus on important skills such as problem scoping, goal setting, data analysis, and visualization, which are essential for success in AI projects. Lastly, you will learn how decision trees, neural networks, and other AI concepts are commonly used in real-world applications. By the end of the book, you will develop the skills and competencies required to pursue a career in AI. WHAT YOU WILL LEARN ● Get familiar with the basics of AI and Machine Learning. ● Understand how and where AI can be applied. ● Explore different applications of mathematical methods in AI. ● Get tips for improving your skills in Data Storytelling. ● Understand what is AI bias and how it can affect human rights. WHO THIS BOOK IS FOR This book is for CBSE class XI and XII students who want to learn and explore more about AI. Basic knowledge of Statistical concepts, Algebra, and Plotting of equations is a must. TABLE OF CONTENTS 1. Introduction: AI for Everyone 2. AI Applications and Methodologies 3. Mathematics in Artificial Intelligence 4. AI Values (Ethical Decision-Making) 5. Introduction to Storytelling 6. Critical and Creative Thinking 7. Data Analysis 8. Regression 9. Classification and Clustering 10. AI Values (Bias Awareness) 11. Capstone Project 12. Model Lifecycle (Knowledge) 13. Storytelling Through Data 14. AI Applications in Use in Real-World Read more at https://play.google.com/store/books/details?id=ptq1EAAAQBAJ&source=gbs_api 4. "The AI Book" by Ivana Bartoletti, Anne Leslie and Shân M. Millie: Written by prominent thought leaders in the global fintech space, The AI Book aggregates diverse expertise into a single, informative volume and explains what artifical intelligence really means and how it can be used across financial services today. Key industry developments are explained in detail, and critical insights from cutting-edge practitioners offer first-hand information and lessons learned. Coverage includes: · Understanding the AI Portfolio: from machine learning to chatbots, to natural language processing (NLP); a deep dive into the Machine Intelligence Landscape; essentials on core technologies, rethinking enterprise, rethinking industries, rethinking humans; quantum computing and next-generation AI · AI experimentation and embedded usage, and the change in business model, value proposition, organisation, customer and co-worker experiences in today’s Financial Services Industry · The future state of financial services and capital markets – what’s next for the real-world implementation of AITech? · The innovating customer – users are not waiting for the financial services industry to work out how AI can re-shape their sector, profitability and competitiveness · Boardroom issues created and magnified by AI trends, including conduct, regulation & oversight in an algo-driven world, cybersecurity, diversity & inclusion, data privacy, the ‘unbundled corporation’ & the future of work, social responsibility, sustainability, and the new leadership imperatives · Ethical considerations of deploying Al solutions and why explainable Al is so important Read more at http://books.google.ca/books?id=oE3YDwAAQBAJ&dq=ai&hl=&source=gbs_api 5. "Artificial Intelligence in Society" by OECD: The artificial intelligence (AI) landscape has evolved significantly from 1950 when Alan Turing first posed the question of whether machines can think. Today, AI is transforming societies and economies. It promises to generate productivity gains, improve well-being and help address global challenges, such as climate change, resource scarcity and health crises. Read more at https://play.google.com/store/books/details?id=eRmdDwAAQBAJ&source=gbs_api ``` ## Issue This closes #27276 ## Dependencies No additional dependencies were added --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 15:29:35 -08:00
Erick Friis	733e43eed0	docs: new stack diagram (#27972 )	2024-11-07 22:46:56 +00:00
Erick Friis	a073c4c498	templates,docs: leave templates in v0.2 (#27952 ) all template installs will now have to declare `--branch v0.2` to make clear they aren't compatible with langchain 0.3 (most have a pydantic v1 setup). e.g. ``` langchain-cli app add pirate-speak --branch v0.2 ```	2024-11-07 22:23:48 +00:00
Shawn Lee	6f368e9eab	community: handle chatdeepinfra jsondecode error (#27603 ) Fixes #27602 Added error handling to return empty dict if args is empty string or None. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 13:47:19 -08:00
Akshata	05fd6a16a9	Add ChatModels wrapper for Cloudflare Workers AI (#27645 ) Thank you for contributing to LangChain! - [x] PR title: "community: chat models wrapper for Cloudflare Workers AI" - [x] PR message: - Description: Add chat models wrapper for Cloudflare Workers AI. Enables Langgraph intergration via ChatModel for tool usage, agentic usage. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-07 15:34:24 -05:00
Erick Friis	8a5b9bf2ad	box: migrate to repo (#27969 )	2024-11-07 10:19:22 -08:00
ccurme	a747dbd24b	anthropic[patch]: remove retired model from tests (#27965 ) `claude-instant` was [retired yesterday](https://docs.anthropic.com/en/docs/resources/model-deprecations).	2024-11-07 16:16:29 +00:00
Aksel Joonas Reedi	2cb39270ec	community: bytes as a source to `AzureAIDocumentIntelligenceLoader` (#26618 ) - Description: This PR adds functionality to pass in in-memory bytes as a source to `AzureAIDocumentIntelligenceLoader`. - Issue: I needed the functionality, so I added it. - Dependencies: NA - Twitter handle: @akseljoonas if this is a big enough change :) --------- Co-authored-by: Aksel Joonas Reedi <aksel@klippa.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 03:40:21 +00:00
Martin Triska	7a9149f5dd	community: ZeroxPDFLoader (#27800 ) # OCR-based PDF loader This implements [Zerox](https://github.com/getomni-ai/zerox) PDF document loader. Zerox utilizes simple but very powerful (even though slower and more costly) approach to parsing PDF documents: it converts PDF to series of images and passes it to a vision model requesting the contents in markdown. It is especially suitable for complex PDFs that are not parsed well by other alternatives. ## Example use: ```python from langchain_community.document_loaders.pdf import ZeroxPDFLoader os.environ["OPENAI_API_KEY"] = "" ## your-api-key model = "gpt-4o-mini" ## openai model pdf_url = "https://assets.ctfassets.net/f1df9zr7wr1a/soP1fjvG1Wu66HJhu3FBS/034d6ca48edb119ae77dec5ce01a8612/OpenAI_Sacra_Teardown.pdf" loader = ZeroxPDFLoader(file_path=pdf_url, model=model) docs = loader.load() ``` The Zerox library supports wide range of provides/models. See Zerox documentation for details. - Dependencies: `zerox` - Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-11-07 03:14:57 +00:00
Dmitriy Prokopchuk	53b0a99f37	community: Memcached LLM Cache Integration (#27323 ) ## Description This PR adds support for Memcached as a usable LLM model cache by adding the ```MemcachedCache``` implementation relying on the [pymemcache](https://github.com/pinterest/pymemcache) client. Unit test-wise, the new integration is generally covered under existing import testing. All new functionality depends on pymemcache if instantiated and used, so to comply with the other cache implementations the PR also adds optional integration tests for ```MemcachedCache```. Since this is a new integration, documentation is added for Memcached as an integration and as an LLM Cache. ## Issue This PR closes #27275 which was originally raised as a discussion in #27035 ## Dependencies There are no new required dependencies for langchain, but [pymemcache](https://github.com/pinterest/pymemcache) is required to instantiate the new ```MemcachedCache```. ## Example Usage ```python3 from langchain.globals import set_llm_cache from langchain_openai import OpenAI from langchain_community.cache import MemcachedCache from pymemcache.client.base import Client llm = OpenAI(model="gpt-3.5-turbo-instruct", n=2, best_of=2) set_llm_cache(MemcachedCache(Client('localhost'))) # The first time, it is not yet in cache, so it should take longer llm.invoke("Which city is the most crowded city in the USA?") # The second time it is, so it goes faster llm.invoke("Which city is the most crowded city in the USA?") ``` --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 03:07:59 +00:00
ZhangShenao	c2072d909a	Improvement[Partner] Improve qdrant vector store (#27251 ) - Add static method decorator - Add args for api doc - Fix word spelling Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 02:42:41 +00:00
Baptiste Pasquier	81f7daa458	community: add InfinityRerank (#27043 ) Description: - Add a Reranker for Infinity server. Dependencies: This wrapper uses [infinity_client](https://github.com/michaelfeil/infinity/tree/main/libs/client_infinity/infinity_client) to connect to an Infinity server. Tests and docs - integration test: test_infinity_rerank.py - example notebook: infinity_rerank.ipynb [here](https://github.com/baptiste-pasquier/langchain/blob/feat/infinity-rerank/docs/docs/integrations/document_transformers/infinity_rerank.ipynb) --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-06 17:26:30 -08:00
Martin Triska	90189f5639	community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716 ) ## What this PR does? ### Currently `O365BaseLoader` (and consequently both derived loaders) are limited to `pdf`, `doc`, `docx` files. - Solution: here we introduce _handlers_ attribute that allows for custom handlers to be passed in. This is done in _dict_ form: Example: ```python from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser # PR for DocumentLoaderAsParser here: https://github.com/langchain-ai/langchain/pull/27749 from langchain_community.document_loaders.excel import UnstructuredExcelLoader xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged") # create dictionary mapping file types to handlers (parsers) handlers = { "doc": MsWordParser() "pdf": PDFMinerParser() "txt": TextParser() "xlsx": xlsx_parser } loader = SharePointLoader(document_library_id="...", handlers=handlers # pass handlers to SharePointLoader ) documents = loader.load() # works the same in OneDriveLoader loader = OneDriveLoader(document_library_id="...", handlers=handlers ) ``` This dictionary is then passed to `MimeTypeBasedParser` same as in the [current implementation](`5a2cfb49e0/libs/community/langchain_community/document_loaders/parsers/registry.py (L13)`). ### Currently `SharePointLoader` and `OneDriveLoader` are separate loaders that both inherit from `O365BaseLoader` However both of these implement the same functionality. The only differences are: - `SharePointLoader` requires argument `document_library_id` whereas `OneDriveLoader` requires `drive_id`. These are just different names for the same thing. - `SharePointLoader` implements significantly more features. - Solution: `OneDriveLoader` is replaced with an empty shell just renaming `drive_id` to `document_library_id` and inheriting from `SharePointLoader` Dependencies: None Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-11-06 17:44:34 -05:00
takahashi	482c168b3e	langchain_core: add `file_type` option to make file type default as `png` (#27855 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] description langchain_core.runnables.graph_mermaid.draw_mermaid_png calls this function, but the Mermaid API returns JPEG by default. To be consistent, add the option `file_type` with the default `png` type. - [ ] Add tests and docs: If you're adding a new integration, please include With this small change, I didn't add tests and docs. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: One long sentence was divided into two. Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-11-06 22:37:07 +00:00
Roman Solomatin	0f85dea8c8	langchain-huggingface: use separate kwargs for queries and docs (#27857 ) Now `encode_kwargs` used for both for documents and queries and this leads to wrong embeddings. E. g.: ```python model_kwargs = {"device": "cuda", "trust_remote_code": True} encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"} model = HuggingFaceEmbeddings( model_name="dunzhang/stella_en_400M_v5", model_kwargs=model_kwargs, encode_kwargs=encode_kwargs, ) query_embedding = np.array( model.embed_query("What are some ways to reduce stress?",) ) document_embedding = np.array( model.embed_documents( [ "There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.", "Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.", ] ) ) print(model._client.similarity(query_embedding, document_embedding)) # output: tensor([[0.8421, 0.3317]], dtype=torch.float64) ``` But from the [model card](https://huggingface.co/dunzhang/stella_en_400M_v5#sentence-transformers) expexted like this: ```python model_kwargs = {"device": "cuda", "trust_remote_code": True} encode_kwargs = {"normalize_embeddings": False} query_encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"} model = HuggingFaceEmbeddings( model_name="dunzhang/stella_en_400M_v5", model_kwargs=model_kwargs, encode_kwargs=encode_kwargs, query_encode_kwargs=query_encode_kwargs, ) query_embedding = np.array( model.embed_query("What are some ways to reduce stress?", ) ) document_embedding = np.array( model.embed_documents( [ "There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.", "Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.", ] ) ) print(model._client.similarity(query_embedding, document_embedding)) # tensor([[0.8398, 0.2990]], dtype=torch.float64) ```	2024-11-06 17:35:39 -05:00
Bagatur	60123bef67	docs: fix trim_messages docstring (#27948 )	2024-11-06 22:25:13 +00:00
Eric Pinzur	ea0ad917b0	community: added Document.id support to opensearch vectorstore (#27945 ) Description: * Added support of Document.id on OpenSearch vector store * Added tests cases to match	2024-11-06 15:04:09 -05:00
Bagatur	67ce05a0a7	core[patch]: make oai tool description optional (#27756 )	2024-11-06 18:06:47 +00:00
Bagatur	b2da3115ed	docs: document init_chat_model standard params (#27812 )	2024-11-06 09:50:07 -08:00
Dobiichi-Origami	395674d503	community: re-arrange function call message parse logic for Qianfan (#27935 ) the [PR](https://github.com/langchain-ai/langchain/pull/26208) two month ago has a potential bug which causes malfunction of `tool_call` for `QianfanChatEndpoint` waiting for fix	2024-11-06 09:58:16 -05:00
ccurme	66966a6e72	openai[patch]: release 0.2.6 (#27924 ) Some additions in support of [predicted outputs](https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs) feature: - Bump openai sdk version - Add integration test - Add example to integration docs The `prediction` kwarg is already plumbed through model invocation.	2024-11-05 23:02:24 +00:00
Erick Friis	a8c473e114	standard-tests: ci pipeline (#27923 )	2024-11-05 20:55:38 +00:00
Erick Friis	bff2a8b772	standard-tests: add tools standard tests (#27899 )	2024-11-05 11:44:34 -08:00
SHJUN	f6b2f82099	community: chroma error patch(attribute changed on chroma) (#27827 ) There was a change of attribute name which was "max_batch_size". It's now "get_max_batch_size" method. I want to use "create_batches" which is right down below. Please check this PR link. reference: https://github.com/chroma-core/chroma/pull/2305 --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> Co-authored-by: Prithvi Kannan <46332835+prithvikannan@users.noreply.github.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Jun Yamog <jkyamog@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ono-hiroki <86904208+ono-hiroki@users.noreply.github.com> Co-authored-by: Dobiichi-Origami <56953648+Dobiichi-Origami@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Duy Huynh <vndee.huynh@gmail.com> Co-authored-by: Rashmi Pawar <168514198+raspawar@users.noreply.github.com> Co-authored-by: sifatj <26035630+sifatj@users.noreply.github.com> Co-authored-by: Eric Pinzur <2641606+epinzur@users.noreply.github.com> Co-authored-by: Daniel Vu Dao <danielvdao@users.noreply.github.com> Co-authored-by: Ofer Mendelevitch <ofermend@gmail.com> Co-authored-by: Stéphane Philippart <wildagsx@gmail.com>	2024-11-05 19:43:11 +00:00
Erick Friis	31f4fb790d	standard-tests: release 0.3.0 (#27900 )	2024-11-04 17:29:15 -08:00
Stéphane Philippart	4b8cd7a09a	community: ✨ Use new OVHcloud batch embedding (#26209 ) - Description: change to do the batch embedding server side and not client side - Twitter handle: @wildagsx --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-11-04 16:40:30 -05:00
Ofer Mendelevitch	d7c39e6dbb	community: update Vectara integration (#27869 ) Thank you for contributing to LangChain! - Description: Updated Vectara integration - Issue: refresh on descriptions across all demos and added UDF reranker - Dependencies: None - Twitter handle: @ofermend --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:40:39 +00:00
Eric Pinzur	8eb38622a6	community: fixed bug in GraphVectorStoreRetriever (#27846 ) Description: This fixes an issue that mistakenly created in https://github.com/langchain-ai/langchain/pull/27253. The issue currently exists only in `langchain-community==0.3.4`. Test cases were added to prevent this issue in the future. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:27:17 +00:00
Bagatur	dfa83531ad	qdrant,nomic[minor]: bump core deps (#27849 )	2024-11-04 20:19:50 +00:00
Erick Friis	0c62684ce1	Revert "infra: add neo4j to package list" (#27887 ) Reverts langchain-ai/langchain#27833 Wait for release	2024-11-04 18:18:38 +00:00
Erick Friis	bcf499df16	infra: add neo4j to package list (#27833 )	2024-11-04 09:24:04 -08:00
Duy Huynh	a487ec47f4	community: set default `output_token_limit` value for `PowerBIToolkit` to fix validation error (#26308 ) ### Description: This PR sets a default value of `output_token_limit = 4000` for the `PowerBIToolkit` to fix the unintentionally validation error. ### Problem: When attempting to run a code snippet from [Langchain's PowerBI toolkit documentation](https://python.langchain.com/v0.1/docs/integrations/toolkits/powerbi/) to interact with a `PowerBIDataset`, the following error occurs: ``` pydantic.v1.error_wrappers.ValidationError: 1 validation error for QueryPowerBITool output_token_limit none is not an allowed value (type=type_error.none.not_allowed) ``` ### Root Cause: The issue arises because when creating a `QueryPowerBITool`, the `output_token_limit` parameter is unintentionally set to `None`, which is the current default for `PowerBIToolkit`. However, `QueryPowerBITool` expects a default value of `4000` for `output_token_limit`. This unintended override causes the error. `17659ca2cd/libs/community/langchain_community/agent_toolkits/powerbi/toolkit.py (L63)` `17659ca2cd/libs/community/langchain_community/agent_toolkits/powerbi/toolkit.py (L72-L79)` `17659ca2cd/libs/community/langchain_community/tools/powerbi/tool.py (L39)` ### Solution: To resolve this, the default value of `output_token_limit` is now explicitly set to `4000` in `PowerBIToolkit` to prevent the accidental assignment of `None`. Co-authored-by: ccurme <chester.curme@gmail.com>	2024-11-04 14:34:27 +00:00
Dobiichi-Origami	f7ced5b211	community: read function call from `tool_calls` for Qianfan (#26208 ) I added one more 'elif' to read tool call message from `tool_calls` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-04 14:33:32 +00:00
Bagatur	3b0b7cfb74	chroma[minor]: release 0.2.0 (#27840 )	2024-11-01 18:12:00 -07:00
Jun Yamog	830cad7bc0	core: fix CommaSeparatedListOutputParser to handle columns that may contain commas in it (#26365 ) - Description: Currently CommaSeparatedListOutputParser can't handle strings that may contain commas within a column. It would parse any commas as the delimiter. Ex. "foo, foo2", "bar", "baz" It will create 4 columns: "foo", "foo2", "bar", "baz" This should be 3 columns: "foo, foo2", "bar", "baz" - Dependencies: Added 2 additional imports, but they are built in python packages. import csv from io import StringIO - Twitter handle: @jkyamog - [ ] Add tests and docs: 1. added simple unit test test_multiple_items_with_comma --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-11-01 22:42:24 +00:00
Erick Friis	03a3670a5e	infra: remove some special cases (#27839 )	2024-11-01 21:13:43 +00:00
Bagatur	002e1c9055	airbyte: remove from master (#27837 )	2024-11-01 13:59:34 -07:00
Bagatur	ee63d21915	many: use core 0.3.15 (#27834 )	2024-11-01 20:35:55 +00:00
William FH	b4cb2089a2	langchain[patch]: Add warning in react agent (#26980 )	2024-10-31 22:29:34 +00:00
Ant White	e3ea365725	core: use friendlier names for duplicated nodes in mermaid output (#27747 ) Thank you for contributing to LangChain! - [x] PR title: "core: use friendlier names for duplicated nodes in mermaid output" - Description: When generating the Mermaid visualization of a chain, if the chain had multiple nodes of the same type, the reid function would replace their names with the UUID node_id. This made the generated graph difficult to understand. This change deduplicates the nodes in a chain by appending an index to their names. - Issue: None - Discussion: https://github.com/langchain-ai/langchain/discussions/27714 - Dependencies: None - [ ] Add tests and docs: - Currently this functionality is not covered by unit tests, happy to add tests if you'd like - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. # Example Code: ```python from langchain_core.runnables import RunnablePassthrough def fake_llm(prompt: str) -> str: # Fake LLM for the example return "completion" runnable = { 'llm1': fake_llm, 'llm2': fake_llm, } \| RunnablePassthrough.assign( total_chars=lambda inputs: len(inputs['llm1'] + inputs['llm2']) ) print(runnable.get_graph().draw_mermaid(with_styles=False)) ``` # Before ```mermaid graph TD; Parallel_llm1_llm2_Input --> 0b01139db5ed4587ad37964e3a40c0ec; 0b01139db5ed4587ad37964e3a40c0ec --> Parallel_llm1_llm2_Output; Parallel_llm1_llm2_Input --> a98d4b56bd294156a651230b9293347f; a98d4b56bd294156a651230b9293347f --> Parallel_llm1_llm2_Output; Parallel_total_chars_Input --> Lambda; Lambda --> Parallel_total_chars_Output; Parallel_total_chars_Input --> Passthrough; Passthrough --> Parallel_total_chars_Output; Parallel_llm1_llm2_Output --> Parallel_total_chars_Input; ``` # After ```mermaid graph TD; Parallel_llm1_llm2_Input --> fake_llm_1; fake_llm_1 --> Parallel_llm1_llm2_Output; Parallel_llm1_llm2_Input --> fake_llm_2; fake_llm_2 --> Parallel_llm1_llm2_Output; Parallel_total_chars_Input --> Lambda; Lambda --> Parallel_total_chars_Output; Parallel_total_chars_Input --> Passthrough; Passthrough --> Parallel_total_chars_Output; Parallel_llm1_llm2_Output --> Parallel_total_chars_Input; ```	2024-10-31 16:52:00 -04:00
L	8ef0df3539	feat: add batch request support for text-embedding-v3 model (#26375 ) PR title: “langchain: add batch request support for text-embedding-v3 model” PR message: • Description: This PR introduces batch request support for the text-embedding-v3 model within LangChain. The new functionality allows users to process multiple text inputs in a single request, improving efficiency and performance for high-volume applications. • Issue: This PR addresses #<issue_number> (if applicable). • Dependencies: No new external dependencies are required for this change. • Twitter handle: If announced on Twitter, please mention me at @yourhandle. Add tests and docs: 1. Added unit tests to cover the batch request functionality, ensuring it operates without requiring network access. 2. Included an example notebook demonstrating the batch request feature, located in docs/docs/integrations. Lint and test: All required formatting and linting checks have been performed using make format and make lint. The changes have been verified with make test to ensure compatibility. Additional notes: • The changes are fully backwards compatible. • No modifications were made to pyproject.toml, ensuring no new dependencies were added. • The update only affects the langchain package and does not involve other packages. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-10-31 18:56:22 +00:00
putao520	2545fbe709	fix "WARNING: Received notification from DBMS server: {severity: WARN… (#27112 ) …ING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: CALL subquery without a variable scope clause is now deprecated." this warning Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: putao520 <putao520@putao282.com>	2024-10-31 18:47:25 +00:00
Ankan Mahapatra	905f43377b	Update word_document.py \| Fixed metadata["source"] for web paths (#27220 ) The metadata["source"] value for the web paths was being set to temporary path (/tmp). Fixed it by creating a new variable self.original_file_path, which will store the original path. Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-10-31 18:37:41 +00:00
Daniel Birn	389771ccc0	community: fix @embeddingKey in azure cosmos db no sql (#27377 ) I will keep this PR as small as the changes made. Description: fixes a fatal bug syntax error in AzureCosmosDBNoSqlVectorSearch Issue: #27269 #25468	2024-10-31 18:36:02 +00:00
Bagatur	06420de2e7	integrations[patch]: bump core to 0.3.15 (#27805 )	2024-10-31 11:27:05 -07:00
W. Gustavo Cevallos	f94125a325	community: Update Polygon.io API (#27552 ) Description: Update the wrapper to support the Polygon API if not you get an error. I keeped `STOCKBUSINESS` for retro-compatbility with older endpoints / other uses Old Code: ``` if status not in ("OK", "STOCKBUSINESS"): raise ValueError(f"API Error: {data}") ``` API Respond: ``` API Error: {'results': {'P': 0.22, 'S': 0, 'T': 'ZOM', 'X': 5, 'p': 0.123, 'q': 0, 's': 200, 't': 1729614422813395456, 'x': 1, 'z': 1}, 'status': 'STOCKSBUSINESS', 'request_id': 'XXXXXX'} ``` - Issue: N/A Polygon API update - Dependencies: N/A - Twitter handle: @wgcv --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-10-31 18:14:06 +00:00
Wang	621f78babd	community: [fix] add missing tool_calls kwargs of delta message in openai adapter (#27492 ) - Description: add missing tool_calls kwargs of delta message in openai adapter, then tool call will work correctly via adapter's stream chat completion - Issue: Fixes https://github.com/langchain-ai/langchain/issues/25436 - Dependencies: None	2024-10-31 14:07:17 -04:00
Tao Wang	25a1031871	community: Fix a validation error for MoonshotChat (#27801 ) - Description: Change `MoonshotCommon.client` type from `_MoonshotClient` to `Any`. - Issue: Fix the issue #27058 - Dependencies: No - Twitter handle: TaoWang2218 In PR #17100, the implementation for Moonshot was added, which defined two classes: - `MoonshotChat(MoonshotCommon, ChatOpenAI)` in `langchain_community.chat_models.moonshot`; - Here, `validate_environment()` assigns client as `openai.OpenAI().chat.completions` - Note that client here is actually a member variable defined in `ChatOpenAI`; - `MoonshotCommon` in `langchain_community.llms.moonshot`; - And here, `validate_environment()` assigns _client as `_MoonshotClient`; - Note that this is the underscored _client, which is defined within `MoonshotCommon` itself; At this time, there was no conflict between the two, one being `client` and the other `_client`. However, in PR #25878 which fixed #24390, `_client` in `MoonshotCommon` was changed to `client`. Since then, a conflict in the definition of `client` has arisen between `MoonshotCommon` and `MoonshotChat`, which caused `pydantic` validation error. To fix this issue, the type of `client` in `MoonshotCommon` should be changed to `Any`. Signed-off-by: Tao Wang <twang2218@gmail.com>	2024-10-31 14:00:16 -04:00
Bagatur	e4e2aa0b78	core[patch]: update image util err msg (#27803 )	2024-10-31 10:56:43 -07:00
Bagatur	181bcd0577	core[patch]: Release 0.3.15 (#27802 )	2024-10-31 10:35:02 -07:00
Bagatur	c1e742347f	core[patch]: rm image loading (#27797 )	2024-10-31 10:34:51 -07:00
ZhangShenao	ad0387ac97	Improvement [docs] Improve api docs (#27787 ) - Add missing param - Remove unused param --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-10-31 16:56:44 +00:00
ccurme	0172d938b4	community: add AzureOpenAIWhisperParser (#27796 ) Commandeered from https://github.com/langchain-ai/langchain/pull/26757. --------- Co-authored-by: Sheepsta300 <128811766+Sheepsta300@users.noreply.github.com>	2024-10-31 12:37:41 -04:00
ccurme	b631b0a596	community[patch]: cap SQLAlchemy and update deps (#27792 ) SQLAlchemy 2.0.36 introduces a regression when creating a table in DuckDB. Relevant issues: - In SQLAlchemy repo (resolution is to update DuckDB): https://github.com/sqlalchemy/sqlalchemy/discussions/12011 - In DuckDB repo (PR is open): https://github.com/Mause/duckdb_engine/issues/1128 Plan is to track these issues and remove cap when resolved.	2024-10-31 14:19:09 +00:00
Erick Friis	8ad7adad87	infra: build api docs from package listing (#27774 )	2024-10-30 21:31:01 -07:00
JiaranI	3952ee31b8	ollama: add pydocstyle linting for ollama (#27686 ) Description: add lint docstrings for ollama module Issue: the issue https://github.com/langchain-ai/langchain/issues/23188 @baskaryan test: ruff check passed. <img width="311" alt="e94c68ffa93dd518297a95a93de5217" src="https://github.com/user-attachments/assets/e96bf721-e0e3-44de-a50e-206603de398e"> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-31 03:06:55 +00:00
Aayush Kataria	a8a33b2dc6	LangChain-Community - AzureCosmos Mongo vCore: Bug Fix when the data doesn't contain metadata field (#27772 ) Thank you for contributing to LangChain! - Description: Adding an empty metadata field when metadata is not present in the data - Issue: This PR fixes the issue when the data items doesn't contain the metadata field. This happens when there is already data in the container, or cx uses CosmosDB Python SDK to insert data. - Dependencies: No dependencies required Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-10-30 20:05:25 -07:00
Rave Harpaz	8d8d85379f	community: OCI Generative AI tool calling bug fix (#26910 ) - [x] PR title: "community: OCI Generative AI tool calling bug fix - [x] PR message: - Description: bug fix for streaming chat responses with tool calls. Update to PR 24693 - Issue: chat response content is repeated when streaming - Dependencies: NA - Twitter handle: NA - [x] Add tests and docs: NA - [x] Lint and test: make format, make lint and make test we run successfully --------- Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-31 02:35:25 +00:00
Erick Friis	128b07208e	community: release 0.3.4 (#27769 )	2024-10-30 17:48:03 -07:00
Bagatur	6691202998	anthropic[patch]: allow multiple sys not at start (#27725 )	2024-10-30 23:56:47 +00:00
Erick Friis	1ed3cd252e	langchain: release 0.3.6 (#27768 )	2024-10-30 23:50:42 +00:00
Sergey Ryabov	8180637345	community[patch]: Fix Playwright Tools bug with Pydantic schemas (#27050 ) - Add tests for Playwright tools schema serialization - Introduce base empty args Input class for BaseBrowserTool Test Plan: `poetry run pytest tests/unit_tests/tools/playwright/test_all.py` Fixes #26758 --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-30 23:45:36 +00:00
Bagatur	deb4320d29	core[patch]: Release 0.3.14 (#27764 )	2024-10-30 21:47:33 +00:00
Bagatur	5d337326b0	core[patch]: make get_all_basemodel_annotations public (#27761 )	2024-10-30 14:43:29 -07:00
Bagatur	94ea950c6c	core[patch]: support bedrock converse -> openai tool (#27754 )	2024-10-30 12:20:39 -07:00
Lorenzo	3dfdb3e6fb	community: prevent gitlab commit on main branch for Gitlab tool (#27750 ) ### About - Description: In the Gitlab utilities used for the Gitlab tool there is no check to prevent pushing to the main branch, as this is already done for Github (for example here: `5a2cfb49e0/libs/community/langchain_community/utilities/github.py (L587)`). This PR add this check as already done for Github. - Issue: None - Dependencies: None	2024-10-30 18:50:13 +00:00
Sam Julien	0a472e2a2d	community: Add Writer integration (#27646 ) Description: Add support for Writer chat models Issue: N/A Dependencies: Add `writer-sdk` to optional dependencies. Twitter handle: Please tag `@samjulien` and `@Get_Writer` Tests and docs - [x] Unit test - [x] Example notebook in `docs/docs/integrations` directory. Lint and test - [x] Run `make format` - [x] Run `make lint` - [x] Run `make test` --------- Co-authored-by: Johannes <tolstoy.work@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-30 18:06:05 +00:00
ccurme	88bfd60b03	infra: specify python max version of 3.12 for some integration packages (#27740 )	2024-10-30 12:24:48 -04:00
fayvor	3b956b3a97	community: Update Replicate LLM and fix tests (#27655 ) Description: - Fix bug in Replicate LLM class, where it was looking for parameter names in a place where they no longer exist in pydantic 2, resulting in the "Field required" validation error described in the issue. - Fix Replicate LLM integration tests to: - Use active models on Replicate. - Use the correct model parameter `max_new_tokens` as shown in the [Replicate docs](https://replicate.com/docs/guides/language-models/how-to-use#minimum-and-maximum-new-tokens). - Use callbacks instead of deprecated callback_manager. Issue: #26937 Dependencies: n/a Twitter handle: n/a --------- Signed-off-by: Fayvor Love <fayvor@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-10-30 16:07:08 +00:00
ccurme	bd5ea18a6c	groq[patch]: update standard tests (#27744 ) - Add xfail on integration test (fails [> 50% of the time](https://github.com/langchain-ai/langchain/actions/workflows/scheduled_test.yml)); - Remove xfail on passing unit test.	2024-10-30 15:50:51 +00:00
hmn falahi	98bb3a02bd	docs: Add OpenAIAssistantV2Runnable docstrings (#27402 ) - Description: add/improve docstrings of OpenAIAssistantV2Runnable - Issue: the issue #21983 Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-10-30 15:35:51 +00:00
Luiz F. G. dos Santos	7a29ca6200	community: add new parameters to pass to OpenAIAssistantV2Runnable (#27372 ) Thank you for contributing to LangChain! Description: Added the model parameters to be passed in the OpenAI Assistant. Enabled it at the `OpenAIAssistantV2Runnable` class. Issue: NA Dependencies: None Twitter handle: luizf0992	2024-10-30 10:51:03 -04:00
随风枫叶	18cfb4c067	community: Add token_usage and model_name metadata to ChatZhipuAI stream() and astream() response (#27677 ) Thank you for contributing to LangChain! - Description: Add token_usage and model_name metadata to ChatZhipuAI stream() and astream() response - Issue: None - Dependencies: None - Twitter handle: None - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: jianfehuang <jianfehuang@tencent.com>	2024-10-30 10:34:33 -04:00
tkubo-heroz	028e0253d8	community: Added anthropic.claude-3-5-sonnet-20241022-v2:0 cost detials (#27728 ) Added anthropic.claude-3-5-sonnet-20241022-v2:0 cost detials	2024-10-30 14:01:01 +00:00
Changyong Um	dc171221b3	community[patch]: Fix vLLM integration to apply lora_request (#27731 ) Description: - Add the `lora_request` parameter to the VLLM class to support LoRA model configurations. This enhancement allows users to specify LoRA requests directly when using VLLM, enabling more flexible and efficient model customization. Issue: - No existing issue for `lora_adapter` in VLLM. This PR addresses the need for configuring LoRA requests within the VLLM framework. - Reference : [Using LoRA Adapters in vLLM](https://docs.vllm.ai/en/stable/models/lora.html#using-lora-adapters) Example Code : Before this change, the `lora_request` parameter was not applied correctly: ```python ADAPTER_PATH = "/path/of/lora_adapter" llm = VLLM(model="Bllossom/llama-3.2-Korean-Bllossom-3B", max_new_tokens=512, top_k=2, top_p=0.90, temperature=0.1, vllm_kwargs={ "gpu_memory_utilization":0.5, "enable_lora":True, "max_model_len":1024, } ) print(llm.invoke( ["...prompt_content..."], lora_request=LoRARequest("lora_adapter", 1, ADAPTER_PATH) )) ``` Before Change Output: ```bash response was not applied lora_request ``` So, I attempted to apply the lora_adapter to langchain_community.llms.vllm.VLLM. current output: ```bash response applied lora_request ``` Dependencies: - None Lint and test: - All tests and lint checks have passed. --------- Co-authored-by: Um Changyong <changyong.um@sfa.co.kr>	2024-10-30 13:59:34 +00:00
Qier LU	8d8e38b090	community[pathch]: Add missing custom content_key handling in Redis vector store (#27736 ) This fix an error caused by missing custom content_key handling in Redis vector store in function similarity_search_with_score.	2024-10-30 13:57:20 +00:00
William FH	5a2cfb49e0	Support message trimming on single messages (#27729 ) Permit trimming message lists of length 1	2024-10-30 04:27:52 +00:00
Bagatur	5111063af2	langchain[patch]: Release 0.3.5 (#27727 )	2024-10-29 17:06:23 -07:00
Bagatur	8f4423e042	text-splitters[patch]: Release 0.3.1 (#27726 )	2024-10-30 00:04:48 +00:00
Harsimran-19	c1d8c33df6	core: JsonOutputParser UTF characters bug (#27306 ) Description: This PR fixes an issue where non-ASCII characters in Pydantic field descriptions were being escaped to their Unicode representations when using `JsonOutputParser`. The change allows non-ASCII characters to be preserved in the output, which is especially important for multilingual support and when working with non-English languages. Issue: Fixes #27256 Example Code: ```python from pydantic import BaseModel, Field from langchain_core.output_parsers import JsonOutputParser class Article(BaseModel): title: str = Field(description="科学文章的标题") output_data_structure = Article parser = JsonOutputParser(pydantic_object=output_data_structure) print(parser.get_format_instructions()) ``` Previous Output: ```... "title": {"description": "\\u79d1\\u5b66\\u6587\\u7ae0\\u7684\\u6807\\u9898", "title": "Title", "type": "string"}} ...``` Current Output: ```... "title": {"description": "科学文章的标题", "title": "Title", "type": "string"}} ...``` Changes made: - Modified `json.dumps()` call in `langchain_core/output_parsers/json.py` to use `ensure_ascii=False` - Added a unit test to verify Unicode handling Co-authored-by: Harsimran-19 <harsimran1869@gmail.com>	2024-10-29 14:48:53 +00:00
Andrew Effendi	49517cc1e7	partners/huggingface[patch]: fix HuggingFacePipeline model_id parameter (#27514 ) Description: Fixes issue with model parameter not getting initialized correctly when passing transformers pipeline Issue: https://github.com/langchain-ai/langchain/issues/25915	2024-10-29 14:34:46 +00:00
Jeong-Minju	0a465b8032	docs: Fix typo in _action_agent docs section (#27698 ) PR Title: docs: Fix typo in _action_agent function docs section Description: In line 1185, _action_agent function's docs, changing ".agent" to "self.agent". Issue: N/A Dependencies: None --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>	2024-10-29 14:16:42 +00:00
Neil Vachharajani	eec35672a4	core[patch]: Improve type checking for the tool decorator (#27460 ) Description: When annotating a function with the @tool decorator, the symbol should have type BaseTool. The previous type annotations did not convey that to type checkers. This patch creates 4 overloads for the tool function for the 4 different use cases. 1. @tool decorator with no arguments 2. @tool decorator with only keyword arguments 3. @tool decorator with a name argument (and possibly keyword arguments) 4. Invoking tool as function with a name and runnable positional arguments The main function is updated to match the overloads. The changes are 100% backwards compatible (all existing calls should continue to work, just with better type annotations). Twitter handle: @nvachhar --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-29 13:59:56 +00:00
Erick Friis	583808a7b8	partners/huggingface: release 0.1.1 (#27691 )	2024-10-28 13:39:38 -07:00
Erick Friis	6d524e9566	partners/box: release 0.2.2 (#27690 )	2024-10-28 12:54:20 -07:00
yahya-mouman	6803cb4f34	openai[patch]: add check for none values when summing token usage (#27585 ) Description: Fixes None addition issues when an empty value is passed on If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-10-28 12:49:43 -07:00
Bagatur	ede953d617	openai[patch]: fix schema formatting util (#27685 )	2024-10-28 15:46:47 +00:00
Baptiste Pasquier	440c162b8b	community: Fix closed session in Infinity (#26933 ) Description: The `aiohttp.ClientSession` is closed at the end of the with statement, which causes an error during a second call. The implemented fix is to define the session directly within the with block, exactly like in the textembed code: `c6350d636e/libs/community/langchain_community/embeddings/textembed.py (L335-L346)` Issue: Fix #26932 Co-authored-by: ccurme <chester.curme@gmail.com>	2024-10-27 11:37:21 -04:00
Jorge Piedrahita Ortiz	8895d468cb	community: sambastudio llm refactor (#27215 ) Description: - Sambastudio LLM refactor - Sambastudio openai compatible API support added - docs updated	2024-10-27 11:08:15 -04:00
ccurme	fe87e411f2	groq: fix unit test (#27660 )	2024-10-26 14:57:23 -04:00
Erick Friis	fbfc6bdade	core: test runner improvements (#27654 ) when running core tests locally this - prevents langsmith tracing from being enabled by env vars - prevents network calls	2024-10-25 15:06:59 -07:00
Vincent Min	7bc4e320f1	core[patch]: improve performance of InMemoryVectorStore (#27538 ) Description: We improve the performance of the InMemoryVectorStore. Isue: Originally, similarity was computed document by document: ``` for doc in self.store.values(): vector = doc["vector"] similarity = float(cosine_similarity([embedding], [vector]).item(0)) ``` This is inefficient and does not make use of numpy vectorization. This PR computes the similarity in one vectorized go: ``` docs = list(self.store.values()) similarity = cosine_similarity([embedding], [doc["vector"] for doc in docs]) ``` Dependencies: None Twitter handle: @b12_consulting, @Vincent_Min --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-25 17:07:04 -04:00
Bagatur	d5306899d3	openai[patch]: Release 0.2.4 (#27652 )	2024-10-25 20:26:21 +00:00
Erick Friis	600b7bdd61	all: test 3.13 ci (#27197 ) Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-10-25 12:56:58 -07:00
Bagatur	06df15c9c0	core[patch]: Release 0.3.13 (#27651 )	2024-10-25 19:22:44 +00:00
Steve Moss	24605bcdb6	community[patch]: Fix missing protected_namespaces(). (#27610 ) - [x] PR message: - Description: Fixes warning messages raised due to missing `protected_namespaces` parameter in `ConfigDict`. - Issue: https://github.com/langchain-ai/langchain/issues/27609 - Dependencies: No dependencies - Twitter handle: @gawbul	2024-10-25 02:16:26 +00:00
Eugene Yurtsev	7667ee126f	core: remove mustache in extended deps (#27629 ) Remove mustache from extended deps -- we vendor the mustache implementation	2024-10-24 22:12:49 -04:00
Erick Friis	265e0a164a	core: add flake8-bandit (S) ruff rules to core (#27368 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-24 22:33:41 +00:00
Nithish Raghunandanan	0623c74560	couchbase: Add document id to vector search results (#27622 ) Description: Returns the document id along with the Vector Search results Issue: Fixes https://github.com/langchain-ai/langchain/issues/26860 for CouchbaseVectorStore - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-24 21:47:36 +00:00
ZhangShenao	455ab7d714	Improvement[Community] Improve Document Loaders and Splitters (#27568 ) - Fix word spelling error - Add static method decorator - Fix language splitter Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-24 21:42:16 +00:00
CLOVA Studio 개발	846a75284f	community: Add Naver chat model & embeddings (#25162 ) Reopened as a personal repo outside the organization. ## Description - Naver HyperCLOVA X community package - Add chat model & embeddings - Add unit test & integration test - Add chat model & embeddings docs - I changed partner package(https://github.com/langchain-ai/langchain/pull/24252) to community package on this PR - Could this embeddings(https://github.com/langchain-ai/langchain/pull/21890) be deprecated? We are trying to replace it with embedding model(ClovaXEmbeddings) in this PR. Twitter handle: None. (if needed, contact with joonha.jeon@navercorp.com) --- you can check our previous discussion below: > one question on namespaces - would it make sense to have these in .clova namespaces instead of .naver? I would like to keep it as is, unless it is essential to unify the package name. (ClovaX is a branding for the model, and I plan to add other models and components. They need to be managed as separate classes.) > also, could you clarify the difference between ClovaEmbeddings and ClovaXEmbeddings? There are 3 models that are being serviced by embedding, and all are supported in the current PR. In addition, all the functionality of CLOVA Studio that serves actual models, such as distinguishing between test apps and service apps, is supported. The existing PR does not support this content because it is hard-coded. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-10-24 20:54:13 +00:00
Hyejun An	6227396e20	partners/HuggingFacePipeline[stream]: Change to use `pipeline` instead of `pipeline.model.generate` in stream() (#26531 ) ## Description I encountered an error while using the` gemma-2-2b-it model` with the `HuggingFacePipeline` class and have implemented a fix to resolve this issue. ### What is Problem ```python model_id="google/gemma-2-2b-it" gemma_2_model = AutoModelForCausalLM.from_pretrained(model_id) gemma_2_tokenizer = AutoTokenizer.from_pretrained(model_id) gen = pipeline( task='text-generation', model=gemma_2_model, tokenizer=gemma_2_tokenizer, max_new_tokens=1024, device=0 if torch.cuda.is_available() else -1, temperature=.5, top_p=0.7, repetition_penalty=1.1, do_sample=True, ) llm = HuggingFacePipeline(pipeline=gen) for chunk in llm.stream("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World."): print(chunk, end="", flush=True) ``` This code outputs the following error message: ``` /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation. warnings.warn( Exception in thread Thread-19 (generate): Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(self._args, self._kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1874, in generate self._validate_generated_length(generation_config, input_ids_length, has_default_max_length) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1266, in _validate_generated_length raise ValueError( ValueError: Input length of input_ids is 31, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`. ``` In addition, the following error occurs when the number of tokens is reduced. ```python for chunk in llm.stream("Hello World"): print(chunk, end="", flush=True) ``` ``` /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1885: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`. warnings.warn( Exception in thread Thread-20 (generate): Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(self._args, *self._kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2024, in generate result = self._sample( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2982, in _sample outputs = self(model_inputs, return_dict=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 994, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 803, in forward inputs_embeds = self.embed_tokens(input_ids) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 164, in forward return F.embedding( File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2267, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select) ``` On the other hand, in the case of invoke, the output is normal: ``` llm.invoke("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.") ``` ``` 'Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.\n\nThis is a simple program that prints the phrase "Hello World" to the console. \n\nHere\'s how it works:*\n\n `print("Hello World")`: This line of code uses the `print()` function, which is a built-in function in most programming languages (like Python). The `print()` function takes whatever you put inside its parentheses and displays it on the screen.\n* `"Hello World"`: The text within the double quotes (`"`) is called a string. It represents the message we want to print.\n\n\nLet me know if you\'d like to explore other programming concepts or see more examples! \n' ``` ### Problem Analysis - Apparently, I put kwargs in while generating pipelines and it applied to `invoke()`, but it's not applied in the `stream()`. - When using the stream, `inputs = self.pipeline.tokenizer (prompt, return_tensors = "pt")` enters cpu. - This can crash when the model is in gpu. ### Solution Just use `self.pipeline` instead of `self.pipeline.model.generate`. - Original Code ```python stopping_criteria = StoppingCriteriaList([StopOnTokens()]) inputs = self.pipeline.tokenizer(prompt, return_tensors="pt") streamer = TextIteratorStreamer( self.pipeline.tokenizer, timeout=60.0, skip_prompt=skip_prompt, skip_special_tokens=True, ) generation_kwargs = dict( inputs, streamer=streamer, stopping_criteria=stopping_criteria, pipeline_kwargs, ) t1 = Thread(target=self.pipeline.model.generate, kwargs=generation_kwargs) t1.start() ``` - Updated Code ```python stopping_criteria = StoppingCriteriaList([StopOnTokens()]) streamer = TextIteratorStreamer( self.pipeline.tokenizer, timeout=60.0, skip_prompt=skip_prompt, skip_special_tokens=True, ) generation_kwargs = dict( text_inputs= prompt, streamer=streamer, stopping_criteria=stopping_criteria, pipeline_kwargs, ) t1 = Thread(target=self.pipeline, kwargs=generation_kwargs) t1.start() ``` By using the `pipeline` directly, the `kwargs` of the pipeline are applied, and there is no need to consider the `device` of the `tensor` made with the `tokenizer`. > According to the change to use `pipeline`, it was modified to put `text_inputs=prompts` directly into `generation_kwargs`. ## Issue None ## Dependencies None ## Twitter handle None --------- Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-10-24 16:49:43 -04:00
Bagatur	655ced84d7	openai[patch]: accept json schema response format directly (#27623 ) fix #25460 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-24 18:19:15 +00:00
Tibor Reiss	20b56a0233	core[patch]: fix repr and str for Serializable (#26786 ) Fixes #26499 --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-10-24 08:36:35 -07:00
Lei Zhang	f203229b51	community: Fix the failure of ChatSparkLLM after upgrading to Pydantic V2 (#27418 ) Description: The test_sparkllm.py can reproduce this issue. https://github.com/langchain-ai/langchain/blob/master/libs/community/tests/integration_tests/chat_models/test_sparkllm.py#L66 ``` Testing started at 18:27 ... Launching pytest with arguments test_sparkllm.py::test_chat_spark_llm --no-header --no-summary -q in /Users/zhanglei/Work/github/langchain/libs/community/tests/integration_tests/chat_models ============================= test session starts ============================== collecting ... collected 1 item test_sparkllm.py::test_chat_spark_llm ============================== 1 failed in 0.45s =============================== FAILED [100%] tests/integration_tests/chat_models/test_sparkllm.py:65 (test_chat_spark_llm) def test_chat_spark_llm() -> None: > chat = ChatSparkLLM( spark_app_id="your spark_app_id", spark_api_key="your spark_api_key", spark_api_secret="your spark_api_secret", ) # type: ignore[call-arg] test_sparkllm.py:67: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../../../core/langchain_core/load/serializable.py:111: in __init__ super().__init__(args, kwargs) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cls = <class 'langchain_community.chat_models.sparkllm.ChatSparkLLM'> values = {'spark_api_key': 'your spark_api_key', 'spark_api_secret': 'your spark_api_secret', 'spark_api_url': 'wss://spark-api.xf-yun.com/v3.5/chat', 'spark_app_id': 'your spark_app_id', ...} @model_validator(mode="before") @classmethod def validate_environment(cls, values: Dict) -> Any: values["spark_app_id"] = get_from_dict_or_env( values, ["spark_app_id", "app_id"], "IFLYTEK_SPARK_APP_ID", ) values["spark_api_key"] = get_from_dict_or_env( values, ["spark_api_key", "api_key"], "IFLYTEK_SPARK_API_KEY", ) values["spark_api_secret"] = get_from_dict_or_env( values, ["spark_api_secret", "api_secret"], "IFLYTEK_SPARK_API_SECRET", ) values["spark_api_url"] = get_from_dict_or_env( values, "spark_api_url", "IFLYTEK_SPARK_API_URL", SPARK_API_URL, ) values["spark_llm_domain"] = get_from_dict_or_env( values, "spark_llm_domain", "IFLYTEK_SPARK_LLM_DOMAIN", SPARK_LLM_DOMAIN, ) # put extra params into model_kwargs default_values = { name: field.default for name, field in get_fields(cls).items() if field.default is not None } > values["model_kwargs"]["temperature"] = default_values.get("temperature") E KeyError: 'model_kwargs' ../../../langchain_community/chat_models/sparkllm.py:368: KeyError ``` I found that when upgrading to Pydantic v2, @root_validator was changed to @model_validator. When a class declares multiple @model_validator(model=before), the execution order in V1 and V2 is opposite. This is the reason for ChatSparkLLM's failure. The correct execution order is to execute build_extra first. https://github.com/langchain-ai/langchain/blob/langchain%3D%3D0.2.16/libs/community/langchain_community/chat_models/sparkllm.py#L302 And then execute validate_environment. https://github.com/langchain-ai/langchain/blob/langchain%3D%3D0.2.16/libs/community/langchain_community/chat_models/sparkllm.py#L329 The Pydantic community also discusses it, but there hasn't been a conclusion yet. https://github.com/pydantic/pydantic/discussions/7434 Issus:* #27416 Twitter handle: coolbeevip --------- Co-authored-by: vbarda <vadym@langchain.dev>	2024-10-23 21:17:10 -04:00
Andrew Effendi	8f151223ad	Community: Fix DuckDuckGo search tool Output Format (#27479 ) Issue: : https://github.com/langchain-ai/langchain/issues/22961 Description: Previously, the documentation for `DuckDuckGoSearchResults` said that it returns a JSON string, however the code returns a regular string that can't be parsed as is. for example running ```python from langchain_community.tools import DuckDuckGoSearchResults # Create a DuckDuckGo search instance search = DuckDuckGoSearchResults() # Invoke the search result = search.invoke("Obama") # Print the result print(result) # Print the type of the result print("Result Type:", type(result)) ``` will return ``` snippet: Harris will hold a campaign event with former President Barack Obama in Georgia next Thursday, the first time the pair has campaigned side by side, a senior campaign official said. A week from ..., title: Obamas to hit the campaign trail in first joint appearances with Harris, link: https://www.nbcnews.com/politics/2024-election/obamas-hit-campaign-trail-first-joint-appearances-harris-rcna176034, snippet: Item 1 of 3 Former U.S. first lady Michelle Obama and her husband, former U.S. President Barack Obama, stand on stage during Day 2 of the Democratic National Convention (DNC) in Chicago, Illinois ..., title: Obamas set to hit campaign trail with Kamala Harris for first time, link: https://www.reuters.com/world/us/obamas-set-hit-campaign-trail-with-kamala-harris-first-time-2024-10-18/, snippet: Barack and Michelle Obama will make their first campaign appearances alongside Kamala Harris at rallies in Georgia and Michigan. By Reid J. Epstein Reporting from Ashwaubenon, Wis. Here come the ..., title: Harris Will Join Michelle Obama and Barack Obama on Campaign Trail, link: https://www.nytimes.com/2024/10/18/us/politics/kamala-harris-michelle-obama-barack-obama.html, snippet: Obama's leaving office was "a turning point," Mirsky said. "That was the last time anybody felt normal." A few feet over, a 64-year-old physics professor named Eric Swanson who had grown ..., title: Obama's reemergence on the campaign trail for Harris comes as he ..., link: https://www.cnn.com/2024/10/13/politics/obama-campaign-trail-harris-biden/index.html Result Type: <class 'str'> ``` After the change in this PR, `DuckDuckGoSearchResults` takes an additional `output_format = "list" \| "json" \| "string"` ("string" = current behavior, default). For example, invoking `DuckDuckGoSearchResults(output_format="list")` return a list of dictionaries in the format ``` [{'snippet': '...', 'title': '...', 'link': '...'}, ...] ``` e.g. ``` [{'snippet': "Obama has in a sense been wrestling with Trump's impact since the real estate magnate broke onto the political stage in 2015. Trump's victory the next year, defeating Obama's secretary of ...", 'title': "Obama's fears about Trump drive his stepped-up campaigning", 'link': 'https://www.washingtonpost.com/politics/2024/10/18/obama-trump-anxiety-harris-campaign/'}, {'snippet': 'Harris will hold a campaign event with former President Barack Obama in Georgia next Thursday, the first time the pair has campaigned side by side, a senior campaign official said. A week from ...', 'title': 'Obamas to hit the campaign trail in first joint appearances with Harris', 'link': 'https://www.nbcnews.com/politics/2024-election/obamas-hit-campaign-trail-first-joint-appearances-harris-rcna176034'}, {'snippet': 'Item 1 of 3 Former U.S. first lady Michelle Obama and her husband, former U.S. President Barack Obama, stand on stage during Day 2 of the Democratic National Convention (DNC) in Chicago, Illinois ...', 'title': 'Obamas set to hit campaign trail with Kamala Harris for first time', 'link': 'https://www.reuters.com/world/us/obamas-set-hit-campaign-trail-with-kamala-harris-first-time-2024-10-18/'}, {'snippet': 'Barack and Michelle Obama will make their first campaign appearances alongside Kamala Harris at rallies in Georgia and Michigan. By Reid J. Epstein Reporting from Ashwaubenon, Wis. Here come the ...', 'title': 'Harris Will Join Michelle Obama and Barack Obama on Campaign Trail', 'link': 'https://www.nytimes.com/2024/10/18/us/politics/kamala-harris-michelle-obama-barack-obama.html'}] Result Type: <class 'list'> ``` --------- Co-authored-by: vbarda <vadym@langchain.dev>	2024-10-23 20:18:11 -04:00
Bagatur	968dccee04	core[patch]: convert_to_openai_tool Anthropic support (#27591 )	2024-10-23 12:27:06 -07:00
Bagatur	217de4e6a6	langchain[patch]: de-beta init_chat_model (#27558 )	2024-10-23 08:35:15 -07:00
Kwan Kin Chan	6d2a76ac05	langchain_huggingface: Fix multiple GPU usage bug in from_model_id function (#23628 ) - [ ] Description: - pass the device_map into model_kwargs - removing the unused device_map variable in the hf_pipeline function call - [ ] Issue: issue #13128 When using the from_model_id function to load a Hugging Face model for text generation across multiple GPUs, the model defaults to loading on the CPU despite multiple GPUs being available using the expected format ``` python llm = HuggingFacePipeline.from_model_id( model_id="model-id", task="text-generation", device_map="auto", ) ``` Currently, to enable multiple GPU , we have to pass in variable in this format instead ``` python llm = HuggingFacePipeline.from_model_id( model_id="model-id", task="text-generation", device=None, model_kwargs={ "device_map": "auto", } ) ``` This issue arises due to improper handling of the device and device_map parameters. - [ ] Explanation: 1. In from_model_id, the model is created using model_kwargs and passed as the model variable of the pipeline function. So at this moment, to load the model with multiple GPUs, "device_map" needs to be set to "auto" within model_kwargs. Otherwise, the model defaults to loading on the CPU. 2. The device_map variable in from_model_id is not utilized correctly. In the pipeline function's source code of tnansformer: - The device_map variable is stored in the model_kwargs dictionary (lines 867-878 of transformers/src/transformers/pipelines/\__init__.py). ```python if device_map is not None: ...... model_kwargs["device_map"] = device_map ``` - The model is constructed with model_kwargs containing the device_map value ONLY IF it is a string (lines 893-903 of transformers/src/transformers/pipelines/\__init__.py). ```python if isinstance(model, str) or framework is None: model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]} framework, model = infer_framework_load_model( ... , model_kwargs, ) ``` - Consequently, since a model object is already passed to the pipeline function, the device_map variable from from_model_id is never used. 3. The device_map variable in from_model_id not only appears unused but also causes errors. Without explicitly setting device=None, attempting to load the model on multiple GPUs may result in the following error: ``` Device has 2 GPUs available. Provide device={deviceId} to `from_model_id` to use available GPUs for execution. deviceId is -1 (default) for CPU and can be a positive integer associated with CUDA device id. Traceback (most recent call last): File "foo.py", line 15, in <module> llm = HuggingFacePipeline.from_model_id( File "foo\site-packages\langchain_huggingface\llms\huggingface_pipeline.py", line 217, in from_model_id pipeline = hf_pipeline( File "foo\lib\site-packages\transformers\pipelines\__init__.py", line 1108, in pipeline return pipeline_class(model=model, framework=framework, task=task, kwargs) File "foo\lib\site-packages\transformers\pipelines\text_generation.py", line 96, in __init__ super().__init__(args, *kwargs) File "foo\lib\site-packages\transformers\pipelines\base.py", line 835, in __init__ raise ValueError( ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object. ``` This error occurs because, in from_model_id, the default values in from_model_id for device and device_map are -1 and None, respectively. It would passes the statement (`device_map is not None and device < 0`) and keep the device as -1 so the pipeline function later raises an error when trying to move a GPU-loaded model back to the CPU. `19eb82e68b/libs/community/langchain_community/llms/huggingface_pipeline.py (L204-L213)` If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: vbarda <vadym@langchain.dev>	2024-10-22 21:41:47 -04:00
Fernando de Oliveira	ab205e7389	partners/openai + community: Async Azure AD token provider support for Azure OpenAI (#27488 ) This PR introduces a new `azure_ad_async_token_provider` attribute to the `AzureOpenAI` and `AzureChatOpenAI` classes in `partners/openai` and `community` packages, given it's currently supported on `openai` package as [AsyncAzureADTokenProvider](https://github.com/openai/openai-python/blob/main/src/openai/lib/azure.py#L33) type. The reason for creating a new attribute is to avoid breaking changes. Let's say you have an existing code that uses a `AzureOpenAI` or `AzureChatOpenAI` instance to perform both sync and async operations. The `azure_ad_token_provider` will work exactly as it is today, while `azure_ad_async_token_provider` will override it for async requests. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-10-22 21:43:06 +00:00
orkhank	9a277cbe00	community: Update `file_path` type in `JSONLoader.__init__()` signature (#27535 ) - Description: Change the type of the `file_path` argument from `str \| pathlib.Path` to `str \| os.PathLike`, since the latter is more widely used: https://stackoverflow.com/a/58541858 This is a very minor fix. I was just annoyed to see the red underline displayed by Pylance in VS Code: `reportArgumentType`. ![image](https://github.com/user-attachments/assets/719a7f8e-acca-4dfa-89df-925e1d938c71) The changes do not affect the behavior of the code.	2024-10-22 11:18:36 -07:00
Eric Pinzur	f636c83321	community: Cassandra Vector Store: modernize implementation (#27253 ) Description: This PR updates `CassandraGraphVectorStore` to be based off `CassandraVectorStore`, instead of using a custom CQL implementation. This allows users using a `CassandraVectorStore` to upgrade to a `GraphVectorStore` without having to change their database schema or re-embed documents. This PR also updates the documentation of the `GraphVectorStore` base class and contains native async implementations for the standard graph methods: `traversal_search` and `mmr_traversal_search` in `CassandraVectorStore`. Issue: No issue number. Dependencies: https://github.com/langchain-ai/langchain/pull/27078 (already-merged) Lint and test: - Lint and tests all pass, including existing `CassandraGraphVectorStore` tests. - Also added numerous additional tests based of the tests in `langchain-astradb` which cover many more scenarios than the existing tests for `Cassandra` and `CassandraGraphVectorStore` BREAKING CHANGE Note that this is a breaking change for existing users of `CassandraGraphVectorStore`. They will need to wipe their database table and restart. However: - The interfaces have not changed. Just the underlying storage mechanism. - Any one using `langchain_community.vectorstores.Cassandra` can instead use `langchain_community.graph_vectorstores.CassandraGraphVectorStore` and they will gain Graph capabilities without having to re-embed their existing documents. This is the primary goal of this PR. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-22 18:11:11 +00:00
Vadym Barda	0640cbf2f1	huggingface[patch]: hide client field in HuggingFaceEmbeddings (#27522 )	2024-10-21 17:37:07 -04:00
Chun Kang Lu	380449a7a9	core: fix Image prompt template hardcoded template format (#27495 ) Fixes #27411 Description: Adds `template_format` to the `ImagePromptTemplate` class and updates passing in the `template_format` parameter from ChatPromptTemplate instead of the hardcoded "f-string". Also updated docs and typing related to `template_format` to be more up-to-date and specific. Dependencies: None Add tests and docs: Added unit tests to validate fix. Needed to update `test_chat` snapshot due to adding new attribute `template_format` in `ImagePromptTemplate`. --------- Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-10-21 17:31:40 -04:00
bbaltagi-dtsl	403c0ea801	community: fix DallE hidden open_api_key (#26996 ) Thank you for contributing to LangChain! - [ X] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ X] - Issue: issue #26941 Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-21 19:46:56 +00:00
nodfans	cfcf783cb5	community: fix a typo in planner_prompt.py (#27489 ) Description: Fix typo in planner_prompt.py.	2024-10-21 14:59:33 +00:00
Erick Friis	97a819d578	community: fix lint from new mypy (#27474 )	2024-10-18 20:08:03 +00:00
Erick Friis	c397baa85f	community: release 0.3.3 (#27472 )	2024-10-18 12:52:15 -07:00
Erick Friis	4ceb28009a	mongodb: migrate to repo (#27467 )	2024-10-18 12:35:12 -07:00
Erick Friis	a562c54f7d	azure-dynamic-sessions: migrate to repo (#27468 )	2024-10-18 12:30:48 -07:00
Erick Friis	30660786b3	langchain: release 0.3.4 (#27458 )	2024-10-18 11:59:54 -07:00
Erick Friis	2cf2cefe39	partners/openai: release 0.2.3 (#27457 )	2024-10-18 08:16:01 -07:00
Erick Friis	7d65a32ee0	openai: audio modality, remove sockets from unit tests (#27436 )	2024-10-18 08:02:09 -07:00
Erick Friis	f9cc9bdcf3	core: release 0.3.12 (#27410 )	2024-10-17 06:32:40 -07:00
Erick Friis	0ebddabf7d	docs, core: error messaging [wip] (#27397 )	2024-10-17 03:39:36 +00:00
Eugene Yurtsev	202d7f6c4a	core[patch]: 0.3.11 release (#27403 ) Core bump to 0.3.11	2024-10-16 15:39:37 -04:00
Bagatur	a4392b070d	core[patch]: add convert_to_openai_messages util (#27263 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-16 17:10:10 +00:00
sByteman	31e7664afd	community[minor]: add proxy support to RecursiveUrlLoader (#27364 ) Description This PR introduces the proxies parameter to the RecursiveUrlLoader class, allowing the user to specify proxy servers for requests. This update enables crawling through proxy servers, providing enhanced flexibility for network configurations. The key changes include: 1.Added an optional proxies parameter to the constructor (__init__). 2.Updated the documentation to explain the proxies parameter usage with an example. 3.Modified the _get_child_links_recursive method to pass the proxies parameter to the requests.get function. Sample Usage ```python from bs4 import BeautifulSoup as Soup from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader proxies = { "http": "http://localhost:1080", "https": "http://localhost:1080", } url = "https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel" loader = RecursiveUrlLoader( url=url, max_depth=1, extractor=lambda x: Soup(x, "html.parser").text,proxies=proxies ) docs = loader.load() ``` --------- Co-authored-by: root <root@thb>	2024-10-16 16:29:59 +00:00
Yuki Watanabe	b8bfebd382	community: Add deprecation notice for Databricks integration in langchain-community (#27355 ) We have released the [langchain-databricks](https://github.com/langchain-ai/langchain-databricks) package for Databricks integration. This PR deprecates the legacy classes within `langchain-community`. --------- Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-16 02:20:40 +00:00
xsai9101	15c1ddaf99	community: Add support for clob datatype in oracle database (#27330 ) Description: This PR add support of clob/blob data type for oracle document loader, clob/blob can only be read by oracledb package when connection is open, so reformat code to process data before connection closes. Dependencies: oracledb package same as before. pip install oracledb Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-16 02:19:20 +00:00
Enes Bol	3f74dfc3d8	community[patch]: Fix vLLM integration to filter SamplingParams (#27367 ) Description: - This pull request addresses a bug in Langchain's VLLM integration, where the use_beam_search parameter was erroneously passed to SamplingParams. The SamplingParams class in vLLM does not support the use_beam_search argument, which caused a TypeError. - This PR introduces logic to filter out unsupported parameters, ensuring that only valid parameters are passed to SamplingParams. As a result, the integration now functions as expected without errors. - The bug was reproduced by running the code sample from Langchain’s documentation, which triggered the error due to the invalid parameter. This fix resolves that error by implementing proper parameter filtering. VLLM Sampling Params Class: https://github.com/vllm-project/vllm/blob/main/vllm/sampling_params.py Issue: I could not found an Issue that belongs to this. Fixes "TypeError: Unexpected keyword argument 'use_beam_search'" error when using VLLM from Langchain. Dependencies: None. Tests and Documentation: Tests: No new functionality was added, but I tested the changes by running multiple prompts through the VLLM integration with various parameter configurations. All tests passed successfully without breaking compatibility. Docs No documentation changes were necessary as this is a bug fix. Reproducing the Error: https://python.langchain.com/docs/integrations/llms/vllm/ The code sample from the original documentation can be used to reproduce the error I got. from langchain_community.llms import VLLM llm = VLLM( model="mosaicml/mpt-7b", trust_remote_code=True, # mandatory for hf models max_new_tokens=128, top_k=10, top_p=0.95, temperature=0.8, ) print(llm.invoke("What is the capital of France ?")) ![image](https://github.com/user-attachments/assets/3782d6ac-1f7b-4acc-bf2c-186216149de5) This PR resolves the issue by ensuring that only valid parameters are passed to SamplingParams.	2024-10-15 21:57:50 +00:00
Erick Friis	edf6d0a0fb	partners/couchbase: release 0.2.0 (attempt 2) (#27375 )	2024-10-15 14:51:05 -07:00
Jorge Piedrahita Ortiz	12fea5b868	community: sambastudio chat model integration minor fix (#27238 ) Description: sambastudio chat model integration minor fix fix default params fix usage metadata when streaming	2024-10-15 13:24:36 -04:00
ZhangShenao	f3925d71b9	community: Fix word spelling in `Text2vecEmbeddings` (#27183 ) Fix word spelling in `Text2vecEmbeddings`	2024-10-15 09:28:48 -07:00
Erick Friis	92ae61bcc8	multiple: rely on asyncio_mode auto in tests (#27200 )	2024-10-15 16:26:38 +00:00
William FH	0a3e089827	[Anthropic] Shallow Copy (#27105 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-15 15:50:48 +00:00
Matthew Peveler	c6533616b6	docs: fix community pgvector deprecation warning formatting (#27094 ) Description: PR fixes some formatting errors in deprecation message in the `langchain_community.vectorstores.pgvector` module, where it was missing spaces between a few words, and one word was misspelled. Issue: n/a Dependencies: n/a Signed-off-by: mpeveler@timescale.com Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-15 15:45:53 +00:00
Erick Friis	3fa5ce3e5f	community: clear mypy syntax warning in openapi (#27370 ) not completely clear the regex is functional	2024-10-15 15:43:53 +00:00
Ahmet Yasin Aytar	443b37403d	community: refactor Arxiv search logic (#27084 ) PR message: Description: This PR refactors the Arxiv API wrapper by extracting the Arxiv search logic into a helper function (_fetch_results) to reduce code duplication and improve maintainability. The helper function is used in methods like get_summaries_as_docs, run, and lazy_load, streamlining the code and making it easier to maintain in the future. Issue: This is a minor refactor, so no specific issue is being fixed. Dependencies: No new dependencies are introduced with this change. Add tests and docs: No new integrations were added, so no additional tests or docs are necessary for this PR. Lint and test: I have run make format, make lint, and make test to ensure all checks pass successfully. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-15 08:43:03 -07:00
Qiu Qin	57fbc6bdf1	community: Update OCI data science integration (#27083 ) This PR updates the integration with OCI data science model deployment service. - Update LLM to support streaming and async calls. - Added chat model. - Updated tests and docs. - Updated `libs/community/scripts/check_pydantic.sh` since the use of `@pre_init` is removed from existing integration. - Updated `libs/community/extended_testing_deps.txt` as this integration requires `langchain_openai`. --------- Co-authored-by: MING KANG <ming.kang@oracle.com> Co-authored-by: Dmitrii Cherkasov <dmitrii.cherkasov@oracle.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-15 08:32:54 -07:00
Rafael Miller	fc14f675f1	Community: Updated Firecrawl Document Loader to v1 (#26548 ) This PR updates the Firecrawl Document Loader to use the recently released V1 API of Firecrawl. Key Updates: Firecrawl V1 Integration: Updated the document loader to leverage the new Firecrawl V1 API for improved performance, reliability, and developer experience. Map Functionality Added: Introduced the map mode for more flexible document loading options. These updates enhance the integration and provide access to the latest features of Firecrawl. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-10-15 13:13:28 +00:00
Max Tran	8fea07f92e	community: fixed KeyError: 'client' (#27345 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" Updated - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! twitter: @MaxHTran - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. Not needed due to small change - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Max Tran <maxtra@amazon.com>	2024-10-14 20:51:13 +00:00
Martin Triska	8dc4bec947	[community] [Bugfix] base_o365 document loader metadata needs to be JSON serializable (#26322 ) In order for indexer to work, all metadata in the documents need to be JSON serializable. Timestamps are not. See here: https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/indexing/api.py#L83-L89 @eyurtsev could you please review? It's a tiny PR :-)	2024-10-14 12:48:31 -04:00
Trayan Azarov	59bbda9ba3	chroma: Deprecating versions 0.5.7 thru 0.5.12 (#27305 ) Description: Deprecated version of Chroma >=0.5.5 <0.5.12 due to a serious correctness issue that caused some embeddings for deployments with multiple collections to be lost (read more on the issue in Chroma repo) Issue: chroma-core/chroma#2922 (fixed by chroma-core/chroma##2923 and released in [0.5.13](https://github.com/chroma-core/chroma/releases/tag/0.5.13)) Dependencies: N/A Twitter handle: `@t_azarov`	2024-10-14 11:56:05 -04:00
Marcelo Nunes Alves	5647276998	community: Problem with embeddings in new versions of clickhouse. (#26041 ) Starting with Clickhouse version 24.8, a different type of configuration has been introduced in the vectorized data ingestion, and if this configuration occurs, an error occurs when generating the table. As can be seen below: ![Screenshot from 2024-09-04 11-48-00](https://github.com/user-attachments/assets/70840a93-1001-490c-921a-26924c51d9eb) --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-11 18:54:50 +00:00
Eugene Yurtsev	5b9b8fe80f	core[patch]: Ignore ASYNC110 to upgrade to newest ruff version (#27229 ) Ignoring ASYNC110 with explanation	2024-10-09 11:25:58 -04:00
Vittorio Rigamonti	7da2efd9d3	community[minor]: VectorStore Infinispan. Adding TLS and authentication (#23522 ) Description: this PR enable VectorStore TLS and authentication (digest, basic) with HTTP/2 for Infinispan server. Based on httpx. Added docker-compose facilities for testing Added documentation Dependencies: requires `pip install httpx[http2]` if HTTP2 is needed Twitter handle: https://twitter.com/infinispan	2024-10-09 10:51:39 -04:00
Diao Zihao	4553573acb	core[patch],langchain[patch],community[patch]: Bump version dependency of tenacity to >=8.1.0,!=8.4.0,<10 (#27201 ) This should fixes the compatibility issue with graprag as in - https://github.com/langchain-ai/langchain/discussions/25595 Here are the release notes for tenacity 9 (https://github.com/jd/tenacity/releases/tag/9.0.0) --------- Signed-off-by: Zihao Diao <hi@ericdiao.com> Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-09 14:00:45 +00:00
Stefano Lottini	d05fdd97dd	community: Cassandra Vector Store: extend metadata-related methods (#27078 ) Description: this PR adds a set of methods to deal with metadata associated to the vector store entries. These, while essential to the Graph-related extension of the `Cassandra` vector store, are also useful in themselves. These are (all come in their sync+async versions): - `[a]delete_by_metadata_filter` - `[a]replace_metadata` - `[a]get_by_document_id` - `[a]metadata_search` Additionally, a `[a]similarity_search_with_embedding_id_by_vector` method is introduced to better serve the store's internal working (esp. related to reranking logic). Issue: no issue number, but now all Document's returned bear their `.id` consistently (as a consequence of a slight refactoring in how the raw entries read from DB are made back into `Document` instances). Dependencies: (no new deps: packaging comes through langchain-core already; `cassio` is now required to be version 0.1.10+) Add tests and docs Added integration tests for the relevant newly-introduced methods. (Docs will be updated in a separate PR). Lint and test Lint and (updated) test all pass. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-09 06:41:34 +00:00
Erick Friis	84c05b031d	community: release 0.3.2 (#27214 )	2024-10-08 23:33:55 -07:00
Serena Ruan	a7c1ce2b3f	[community] Add timeout control and retry for UC tool execution (#26645 ) Add timeout at client side for UCFunctionToolkit and add retry logic. Users could specify environment variable `UC_TOOL_CLIENT_EXECUTION_TIMEOUT` to increase the timeout value for retrying to get the execution response if the status is pending. Default timeout value is 120s. - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. Tested in Databricks: <img width="1200" alt="image" src="https://github.com/user-attachments/assets/54ab5dfc-5e57-4941-b7d9-bfe3f8ad3f62"> - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Signed-off-by: serena-ruan <serena.rxy@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-09 06:31:48 +00:00
Tomaz Bratanic	481bd25d29	community: Fix database connections for neo4j (#27190 ) Fixes https://github.com/langchain-ai/langchain/issues/27185 Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-08 23:47:55 +00:00
Erick Friis	cedf4d9462	langchain: release 0.3.3 (#27213 )	2024-10-08 16:39:42 -07:00
Erick Friis	7264fb254c	core: release 0.3.10 (#27209 )	2024-10-08 16:21:42 -07:00
Bagatur	ce33c4fa40	openai[patch]: default temp=1 for o1 (#27206 )	2024-10-08 15:45:21 -07:00
RIdham Golakiya	73ad7f2e7a	langchain_chroma[patch]: updated example for get documents with where clause (#26767 ) Example updated for vectorstore ChromaDB. If we want to apply multiple filters then ChromaDB supports filters like this: Reference: [ChromaDB filters](https://cookbook.chromadb.dev/core/filters/) Thank you.	2024-10-08 20:21:58 +00:00
Bagatur	e3e9ee8398	core[patch]: utils for adding/subtracting usage metadata (#27203 )	2024-10-08 13:15:33 -07:00
ccurme	e3920f2320	community[patch]: fix structured_output in llamacpp integration (#27202 ) Resolves https://github.com/langchain-ai/langchain/issues/25318.	2024-10-08 15:16:59 -04:00
Erick Friis	b84e00283f	standard-tests: test that only one chunk sets input_tokens (#27177 )	2024-10-08 11:35:32 -07:00
Ajayeswar Reddy	9b7bdf1a26	Fixed typo in llibs/community/langchain_community/storage/sql.py (#27029 ) - [ ] PR title: docs: fix typo in SQLStore import path - [ ] PR message: - Description: This PR corrects a typo in the docstrings for the class SQLStore(BaseStore[str, bytes]). The import path in the docstring currently reads from langchain_rag.storage import SQLStore, which should be changed to langchain_community.storage import SQLStore. This typo is also reflected in the official documentation. - Issue: N/A - Dependencies: None - Twitter handle: N/A Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-08 17:51:26 +00:00
Vadym Barda	8d27325dbc	core[patch]: support ValidationError from pydantic v1 in tools (#27194 )	2024-10-08 10:19:04 -04:00
Christophe Bornet	16f5fdb38b	core: Add various ruff rules (#26836 ) Adds - ASYNC - COM - DJ - EXE - FLY - FURB - ICN - INT - LOG - NPY - PD - Q - RSE - SLOT - T10 - TID - YTT Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-07 22:30:27 +00:00
Erick Friis	5c826faece	core: update make format to fix all autofixable things (#27174 )	2024-10-07 15:20:47 -07:00
Christophe Bornet	d31ec8810a	core: Add ruff rules for error messages (EM) (#26965 ) All auto-fixes Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-07 22:12:28 +00:00
Oleksii Pokotylo	37ca468d03	community: AzureSearch: fix reranking for empty lists (#27104 ) Description: Fix reranking for empty lists Issue: ``` ValueError: not enough values to unpack (expected 3, got 0) documents, scores, vectors = map(list, zip(*docs)) File langchain_community/vectorstores/azuresearch.py", line 1680, in _reorder_results_with_maximal_marginal_relevance ``` Co-authored-by: Oleksii Pokotylo <oleksii.pokotylo@pwc.com>	2024-10-07 15:27:09 -04:00
Christophe Bornet	c4ebccfec2	core[minor]: Improve support for id in VectorStore (#26660 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-07 15:01:08 -04:00
Bharat Ramanathan	931ce8d026	core[patch]: Update `AsyncCallbackManager` to honor `run_inline` attribute and prevent context loss (#26885 ) ## Description This PR fixes the context loss issue in `AsyncCallbackManager`, specifically in `on_llm_start` and `on_chat_model_start` methods. It properly honors the `run_inline` attribute of callback handlers, preventing race conditions and ordering issues. Key changes: 1. Separate handlers into inline and non-inline groups. 2. Execute inline handlers sequentially for each prompt. 3. Execute non-inline handlers concurrently across all prompts. 4. Preserve context for stateful handlers. 5. Maintain performance benefits for non-inline handlers. These changes are implemented in `AsyncCallbackManager` rather than `ahandle_event` because the issue occurs at the prompt and message_list levels, not within individual events. ## Testing - Test case implemented in #26857 now passes, verifying execution order for inline handlers. ## Related Issues - Fixes issue discussed in #23909 ## Dependencies No new dependencies are required. --- @eyurtsev: This PR implements the discussed changes to respect `run_inline` in `AsyncCallbackManager`. Please review and advise on any needed changes. Twitter handle: @parambharat --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-07 14:59:29 -04:00
João Carlos Ferra de Almeida	780ce00dea	core[minor]: add kwargs to index and aindex functions for custom vector_field support (#26998 ) Added `kwargs` parameters to the `index` and `aindex` functions in `libs/core/langchain_core/indexing/api.py`. This allows users to pass additional arguments to the `add_documents` and `aadd_documents` methods, enabling the specification of a custom `vector_field`. For example, users can now use `vector_field="embedding"` when indexing documents in `OpenSearchVectorStore` --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-07 14:52:50 -04:00
Jorge Piedrahita Ortiz	14de81b140	community: sambastudio chat model (#27056 ) Description:: sambastudio chat model integration added, previously only LLM integration included docs and tests --------- Co-authored-by: luisfucros <luisfucros@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-10-07 14:31:39 -04:00
Aditya Anand	f70650f67d	core[patch]: correct typo doc-string for astream_events method (#27108 ) This commit addresses a typographical error in the documentation for the async astream_events method. The word 'evens' was incorrectly used in the introductory sentence for the reference table, which could lead to confusion for users.\n\n### Changes Made:\n- Corrected 'Below is a table that illustrates some evens that might be emitted by various chains.' to 'Below is a table that illustrates some events that might be emitted by various chains.'\n\nThis enhancement improves the clarity of the documentation and ensures accurate terminology is used throughout the reference material.\n\nIssue Reference: #27107	2024-10-07 14:12:42 -04:00
Bagatur	38099800cc	docs: fix anthropic max_tokens docstring (#27166 )	2024-10-07 16:51:42 +00:00
ogawa	07dd8dd3d7	community[patch]: update gpt-4o cost (#27038 ) updated OpenAI cost definition according to the following: https://openai.com/api/pricing/	2024-10-07 09:06:30 -04:00
Bagatur	06ce5d1d5c	anthropic[patch]: Release 0.2.3 (#27126 )	2024-10-04 22:38:03 +00:00
Bagatur	0b8416bd2e	anthropic[patch]: fix input_tokens when cached (#27125 )	2024-10-04 22:35:51 +00:00
Bagatur	bd5b335cb4	standard-tests[patch]: fix oai usage metadata test (#27122 )	2024-10-04 20:00:48 +00:00
Bagatur	827bdf4f51	fireworks[patch]: Release 0.2.1 (#27120 )	2024-10-04 18:59:15 +00:00
Bagatur	98942edcc9	openai[patch]: Release 0.2.2 (#27119 )	2024-10-04 11:54:01 -07:00
Bagatur	414fe16071	anthropic[patch]: Release 0.2.2 (#27118 )	2024-10-04 11:53:53 -07:00
Bagatur	11df1b2b8d	core[patch]: Release 0.3.9 (#27117 )	2024-10-04 18:35:33 +00:00
Scott Hurrey	558fb4d66d	box: Add citation support to langchain_box.retrievers.BoxRetriever when used with Box AI (#27012 ) Thank you for contributing to LangChain! Description: Box AI can return responses, but it can also be configured to return citations. This change allows the developer to decide if they want the answer, the citations, or both. Regardless of the combination, this is returned as a single List[Document] object. Dependencies: Updated to the latest Box Python SDK, v1.5.1 Twitter handle: BoxPlatform - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-04 18:32:34 +00:00

... 7 8 9 10 11 ...

6605 Commits