langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-06-21 22:29:51 +00:00

Author	SHA1	Message	Date
Hiros	8f5e72de05	community: Correctly handle multi-element rich text (#25762 ) Description: - Add _concatenate_rich_text method to combine all elements in rich text arrays - Update load_page method to use _concatenate_rich_text for rich text properties - Ensure all text content is captured, including inline code and formatted text - Add unit tests to verify correct handling of multi-element rich text This fix prevents truncation of content after backticks or other formatting elements. Issue: Using Notion DB Loader, the text for `richtext` and `title` is truncated after 1st element was loaded as Notion Loader only read the first element. Dependencies: any dependencies required for this change None. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 20:20:27 +00:00
Tari Yekorogha	d262d41cc0	community: added FalkorDB vector store support i.e implementation, test, docs an… (#26245 ) Description: Added support for FalkorDB Vector Store, including its implementation, unit tests, documentation, and an example notebook. The FalkorDB integration allows users to efficiently manage and query embeddings in a vector database, with relevance scoring and maximal marginal relevance search. The following components were implemented: - Core implementation for FalkorDBVector store. - Unit tests ensuring proper functionality and edge case coverage. - Example notebook demonstrating an end-to-end setup, search, and retrieval using FalkorDB. Twitter handle: @tariyekorogha --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 19:37:55 +00:00
Aaron Pham	12fced13f4	chore(community): update to OpenLLM 0.6 (#24609 ) Update to OpenLLM 0.6, which we decides to make use of OpenLLM's OpenAI-compatible endpoint. Thus, OpenLLM will now just become a thin wrapper around OpenAI wrapper. Signed-off-by: Aaron Pham <contact@aarnphm.xyz> --------- Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-16 14:30:07 -05:00
Lvlvko	5c17a4ace9	community: support Hunyuan Embedding (#23160 ) ## description - I refactor `Chathunyuan` using tencentcloud sdk because I found the original one can't work in my application - I add `HunyuanEmbeddings` using tencentcloud sdk - Both of them are extend the basic class of langchain. I have fully tested them in my application ## Dependencies - tencentcloud-sdk-python --------- Co-authored-by: centonhuang <centonhuang@tencent.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 19:27:19 +00:00
Harrison Chase	de7996c2ca	core: add kwargs support to VectorStore (#25934 ) has been missing the passthrough until now --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 18:57:57 +00:00
Sheepsta300	580a8d53f9	community: Add configurable `VisualFeatures` to the `AzureAiServicesImageAnalysisTool` (#27444 ) Thank you for contributing to LangChain! - [ ] PR title: community: Add configurable `VisualFeatures` to the `AzureAiServicesImageAnalysisTool` - [ ] PR message: - Description: The `AzureAiServicesImageAnalysisTool` is a good service and utilises the Azure AI Vision package under the hood. However, since the creation of this tool, new `VisualFeatures` have been added to allow the user to request other image specific information to be returned. Currently, the tool offers neither configuration of which features should be return nor does it offer any newer feature types. The aim of this PR is to address this and expose more of the Azure Service in this integration. - Dependencies: no new dependencies in the main class file, azure.ai.vision.imageanalysis added to extra test dependencies file. - [ ] Add tests and docs: If you're adding a new integration, please include 1. Although no tests exist for already implemented Azure Service tools, I've created 3 unit tests for this class that test initialisation and credentials, local file analysis and a test for the new changes/ features option. - [ ] Lint and test: All linting has passed. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 18:30:04 +00:00
German Martin	d5d18c62b3	community: Apache AGE wrapper additional edge cases. (#28151 ) Description: Current AGEGraph() implementation does some custom wrapping for graph queries. The method here is _wrap_query() as it parse the field from the original query to add some SQL context to it. This improves the current parsing logic to cover additional edge cases that are added to the test coverage, basically if any Node property name or value has the "return" literal in it will break the graph / SQL query. We discovered this while dealing with real world datasets, is not an uncommon scenario and I think it needs to be covered.	2024-12-16 11:28:01 -05:00
Davi Schumacher	0f9b4bf244	community[patch]: update dynamodb chat history to update instead of overwrite (#22397 ) Description: The current implementation of `DynamoDBChatMessageHistory` updates the `History` attribute for a given chat history record by first extracting the existing contents into memory, appending the new message, and then using the `put_item` method to put the record back. This has the effect of overwriting any additional attributes someone may want to include in the record, like chat session metadata. This PR suggests changing from using `put_item` to using `update_item` instead which will keep any other attributes in the record untouched. The change is backward compatible since 1. `update_item` is an "upsert" operation, creating the record if it doesn't already exist, otherwise updating it 2. It only touches the db insert call and passes the exact same information. The rest of the class is left untouched Dependencies: None Tests and docs: No unit tests currently exist for the `DynamoDBChatMessageHistory` class. This PR adds the file `libs/community/tests/unit_tests/chat_message_histories/test_dynamodb_chat_message_history.py` to test the `add_message` and `clear` methods. I wanted to use the moto library to mock DynamoDB calls but I could not get poetry to resolve it so I mocked those calls myself in the test. Therefore, no test dependencies were added. The change was tested on a test DynamoDB table as well. The first three images below show the current behavior. First a message is added to chat history, then a value is inserted in the record in some other attribute, and finally another message is added to the record, destroying the other attribute. ![using_put_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/426acd62-fe29-42f4-b75f-863fb8b3fb21) ![using_put_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/f8a1c864-7114-4fe3-b487-d6f9252f8f92) ![using_put_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/8b691e08-755e-4877-8969-0e9769e5d28a) The next three images show the new behavior. Once again a value is added to an attribute other than the History attribute, but now when the followup message is added it does not destroy that other attribute. The History attribute itself is unaffected by this change. ![using_update_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/3e0d76ed-637e-41cd-82c7-01a86c468634) ![using_update_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/52585f9b-71a2-43f0-9dfc-9935aa59c729) ![using_update_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/f94c8147-2d6f-407a-9a0f-86b94341abff) The doc located at `docs/docs/integrations/memory/aws_dynamodb.ipynb` required no changes and was tested as well.	2024-12-16 10:38:00 -05:00
Christophe Bornet	6ddd5dbb1e	community: Add FewShotSQLTool (#28232 ) The `FewShotSQLTool` gets some SQL query examples from a `BaseExampleSelector` for a given question. This is useful to provide [few-shot examples](https://python.langchain.com/docs/how_to/sql_prompting/#few-shot-examples) capability to an SQL agent. Example usage: ```python from langchain.agents.agent_toolkits.sql.prompt import SQL_PREFIX embeddings = OpenAIEmbeddings() example_selector = SemanticSimilarityExampleSelector.from_examples( examples, embeddings, AstraDB, k=5, input_keys=["input"], collection_name="lc_few_shots", token=ASTRA_DB_APPLICATION_TOKEN, api_endpoint=ASTRA_DB_API_ENDPOINT, ) few_shot_sql_tool = FewShotSQLTool( example_selector=example_selector, description="Input to this tool is the input question, output is a few SQL query examples related to the input question. Always use this tool before checking the query with sql_db_query_checker!" ) agent = create_sql_agent( llm=llm, db=db, prefix=SQL_PREFIX + "\nYou MUST get some example queries before creating the query.", extra_tools=[few_shot_sql_tool] ) result = agent.invoke({"input": "How many artists are there?"}) ``` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 15:37:21 +00:00
Aayush Kataria	d417e4b372	Community: Azure CosmosDB No Sql Vector Store: Full Text and Hybrid Search Support (#28716 ) Thank you for contributing to LangChain! - Added [full text](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/full-text-search) and [hybrid search](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search) support for Azure CosmosDB NoSql Vector Store - Added a new enum called CosmosDBQueryType which supports the following values: - VECTOR = "vector" - FULL_TEXT_SEARCH = "full_text_search" - FULL_TEXT_RANK = "full_text_rank" - HYBRID = "hybrid" - User now needs to provide this query_type to the similarity_search method for the vectorStore to make the correct query api call. - Added a couple of work arounds as for the FULL_TEXT_RANK and HYBRID query functions we don't support parameterized queries right now. I have added TODO's in place, and will remove these work arounds by end of January. - Added necessary test cases and updated the - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-12-15 13:26:32 -08:00
nhols	a3851cb3bc	community: FAISS vectorstore - consistent Document id field (#28728 ) make sure id field of Documents in `FAISS` docstore have the same id as values in `index_to_docstore_id`, implement `get_by_ids` method	2024-12-15 12:23:49 -08:00
Ben Chambers	008efada2c	[community]: Render documents to graphviz (#24830 ) - Description: Adds a helper that renders documents with the GraphVectorStore metadata fields to Graphviz for visualization. This is helpful for understanding and debugging. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-14 02:02:09 +00:00
Ryan Parker	12111cb922	community: fallback on core async atransform_documents method for `MarkdownifyTransformer` (#27866 ) # Description Implements the `atransform_documents` method for `MarkdownifyTransformer` using the `asyncio` built-in library for concurrency. Note that this is mainly for API completeness when working with async frameworks rather than for performance, since the `markdownify` function is not I/O bound because it works with `Document` objects already in memory. # Issue Fixes #27865 # Dependencies No new dependencies added, but [`markdownify`](https://github.com/matthewwithanm/python-markdownify) is required since this PR updates the `markdownify` integration. # Tests and docs - Tests added - I did not modify the docstrings since they already described the basic functionality, and [the API docs also already included a description](https://python.langchain.com/api_reference/community/document_transformers/langchain_community.document_transformers.markdownify.MarkdownifyTransformer.html#langchain_community.document_transformers.markdownify.MarkdownifyTransformer.atransform_documents). If it would be helpful, I would be happy to update the docstrings and/or the API docs. # Lint and test - [x] format - [x] lint - [x] test I ran formatting with `make format`, linting with `make lint`, and confirmed that tests pass using `make test`. Note that some unit tests pass in CI but may fail when running `make_test`. Those unit tests are: - `test_extract_html` (and `test_extract_html_async`) - `test_strip_tags` (and `test_strip_tags_async`) - `test_convert_tags` (and `test_convert_tags_async`) The reason for the difference is that there are trailing spaces when the tests are run in the CI checks, and no trailing spaces when run with `make test`. I ensured that the tests pass in CI, but they may fail with `make test` due to the addition of trailing spaces. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 22:32:22 +00:00
Karthik Bharadhwaj	498f0249e2	community[minor]: Opensearch hybridsearch implementation (#25375 ) community: add hybrid search in opensearch # Langchain OpenSearch Hybrid Search Implementation ## Implementation of Hybrid Search: I have taken LangChain's OpenSearch integration to the next level by adding hybrid search capabilities. Building on the existing OpenSearchVectorSearch class, I have implemented Hybrid Search functionality (which combines the best of both keyword and semantic search). This new functionality allows users to harness the power of OpenSearch's advanced hybrid search features without leaving the familiar LangChain ecosystem. By blending traditional text matching with vector-based similarity, the enhanced class delivers more accurate and contextually relevant results. It's designed to seamlessly fit into existing LangChain workflows, making it easy for developers to upgrade their search capabilities. In implementing the hybrid search for OpenSearch within the LangChain framework, I also incorporated filtering capabilities. It's important to note that according to the OpenSearch hybrid search documentation, only post-filtering is supported for hybrid queries. This means that the filtering is applied after the hybrid search results are obtained, rather than during the initial search process. Note: For the implementation of hybrid search, I strictly followed the official OpenSearch Hybrid search documentation and I took inspiration from https://github.com/AndreasThinks/langchain/tree/feature/opensearch_hybrid_search Thanks Mate! ### Experiments I conducted few experiments to verify that the hybrid search implementation is accurate and capable of reproducing the results of both plain keyword search and vector search. Experiment - 1 Hybrid Search Keyword_weight: 1, vector_weight: 0 I conducted an experiment to verify the accuracy of my hybrid search implementation by comparing it to a plain keyword search. For this test, I set the keyword_weight to 1 and the vector_weight to 0 in the hybrid search, effectively giving full weightage to the keyword component. The results from this hybrid search configuration matched those of a plain keyword search, confirming that my implementation can accurately reproduce keyword-only search results when needed. It's important to note that while the results were the same, the scores differed between the two methods. This difference is expected because the plain keyword search in OpenSearch uses the BM25 algorithm for scoring, whereas the hybrid search still performs both keyword and vector searches before normalizing the scores, even when the vector component is given zero weight. This experiment validates that my hybrid search solution correctly handles the keyword search component and properly applies the weighting system, demonstrating its accuracy and flexibility in emulating different search scenarios. Experiment - 2 Hybrid Search keyword_weight = 0.0, vector_weight = 1.0 For experiment-2, I took the inverse approach to further validate my hybrid search implementation. I set the keyword_weight to 0 and the vector_weight to 1, effectively giving full weightage to the vector search component (KNN search). I then compared these results with a pure vector search. The outcome was consistent with my expectations: the results from the hybrid search with these settings exactly matched those from a standalone vector search. This confirms that my implementation accurately reproduces vector search results when configured to do so. As with the first experiment, I observed that while the results were identical, the scores differed between the two methods. This difference in scoring is expected and can be attributed to the normalization process in hybrid search, which still considers both components even when one is given zero weight. This experiment further validates the accuracy and flexibility of my hybrid search solution, demonstrating its ability to effectively emulate pure vector search when needed while maintaining the underlying hybrid search structure. Experiment - 3 Hybrid Search - balanced keyword_weight = 0.5, vector_weight = 0.5 For experiment-3, I adopted a balanced approach to further evaluate the effectiveness of my hybrid search implementation. In this test, I set both the keyword_weight and vector_weight to 0.5, giving equal importance to keyword-based and vector-based search components. This configuration aims to leverage the strengths of both search methods simultaneously. By setting both weights to 0.5, I intended to create a scenario where the hybrid search would consider lexical matches and semantic similarity equally. This balanced approach is often ideal for many real-world applications, as it can capture both exact keyword matches and contextually relevant results that might not contain the exact search terms. Kindly verify the notebook for the experiments conducted! Notebook: https://github.com/karthikbharadhwajKB/Langchain_OpenSearch_Hybrid_search/blob/main/Opensearch_Hybridsearch.ipynb ### Instructions to follow for Performing Hybrid Search: Step-1: Instantiating OpenSearchVectorSearch Class: ```python opensearch_vectorstore = OpenSearchVectorSearch( index_name=os.getenv("INDEX_NAME"), embedding_function=embedding_model, opensearch_url=os.getenv("OPENSEARCH_URL"), http_auth=(os.getenv("OPENSEARCH_USERNAME"),os.getenv("OPENSEARCH_PASSWORD")), use_ssl=False, verify_certs=False, ssl_assert_hostname=False, ssl_show_warn=False ) ``` Parameters: 1. index_name: The name of the OpenSearch index to use. 2. embedding_function: The function or model used to generate embeddings for the documents. It's assumed that embedding_model is defined elsewhere in the code. 3. opensearch_url: The URL of the OpenSearch instance. 4. http_auth: A tuple containing the username and password for authentication. 5. use_ssl: Set to False, indicating that the connection to OpenSearch is not using SSL/TLS encryption. 6. verify_certs: Set to False, which means the SSL certificates are not being verified. This is often used in development environments but is not recommended for production. 7. ssl_assert_hostname: Set to False, disabling hostname verification in SSL certificates. 8. ssl_show_warn: Set to False, suppressing SSL-related warnings. Step-2: Configure Search Pipeline: To initiate hybrid search functionality, you need to configures a search pipeline first. Implementation Details: This method configures a search pipeline in OpenSearch that: 1. Normalizes the scores from both keyword and vector searches using the min-max technique. 2. Applies the specified weights to the normalized scores. 3. Calculates the final score using an arithmetic mean of the weighted, normalized scores. Parameters: * pipeline_name (str): A unique identifier for the search pipeline. It's recommended to use a descriptive name that indicates the weights used for keyword and vector searches. * keyword_weight (float): The weight assigned to the keyword search component. This should be a float value between 0 and 1. In this example, 0.3 gives 30% importance to traditional text matching. * vector_weight (float): The weight assigned to the vector search component. This should be a float value between 0 and 1. In this example, 0.7 gives 70% importance to semantic similarity. ```python opensearch_vectorstore.configure_search_pipelines( pipeline_name="search_pipeline_keyword_0.3_vector_0.7", keyword_weight=0.3, vector_weight=0.7, ) ``` Step-3: Performing Hybrid Search: After creating the search pipeline, you can perform a hybrid search using the `similarity_search()` method (or) any methods that are supported by `langchain`. This method combines both `keyword-based and semantic similarity` searches on your OpenSearch index, leveraging the strengths of both traditional information retrieval and vector embedding techniques. parameters: * query: The search query string. * k: The number of top results to return (in this case, 3). * search_type: Set to `hybrid_search` to use both keyword and vector search capabilities. * search_pipeline: The name of the previously created search pipeline. ```python query = "what are the country named in our database?" top_k = 3 pipeline_name = "search_pipeline_keyword_0.3_vector_0.7" matched_docs = opensearch_vectorstore.similarity_search_with_score( query=query, k=top_k, search_type="hybrid_search", search_pipeline = pipeline_name ) matched_docs ``` twitter handle: @iamkarthik98 --------- Co-authored-by: Karthik Kolluri <karthik.kolluri@eidosmedia.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 16:34:12 -05:00
Philippe PRADOS	f3fb5a9c68	community[minor]: Fix json._validate_metadata_func() (#22842 ) JSONparse, in _validate_metadata_func(), checks the consistency of the _metadata_func() function. To do this, it invokes it and makes sure it receives a dictionary in response. However, during the call, it does not respect future calls, as shown on line 100. This generates errors if, for example, the function is like this: ```python def generate_metadata(json_node:Dict[str,Any],kwargs:Dict[str,Any]) -> Dict[str,Any]: return { "source": url, "row": kwargs['seq_num'], "question":json_node.get("question"), } loader = JSONLoader( file_path=file_path, content_key="answer", jq_schema='.[]', metadata_func=generate_metadata, text_content=False) ``` To avoid this, the verification must comply with the specifications. This patch does just that. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 21:24:20 +00:00
ScriptShi	b0a298894d	community[minor]: Add TablestoreVectorStore (#25767 ) Thank you for contributing to LangChain! - [x] PR title: community: add TablestoreVectorStore - [x] PR message: - Description: add TablestoreVectorStore - Dependencies: none - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration: yes 2. an example notebook showing its use: yes If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-12-13 11:17:28 -08:00
Bagatur	fa06188834	community[patch]: fix QuerySQLDatabaseTool name (#28659 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-12 19:16:03 -08:00
Botong Zhu	13c3c4a210	community: fixes json loader not getting texts with json standard (#27327 ) This PR fixes JSONLoader._get_text not converting objects to json string correctly. If an object is serializable and is not a dict, JSONLoader will use python built-in str() method to convert it to string. This may cause object converted to strings not following json standard. For example, a list will be converted to string with single quotes, and if json.loads try to load this string, it will cause error. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 19:33:45 +00:00
fatmelon	d1e0ec7b55	community: VectorStores: Azure Cosmos DB Mongo vCore with DiskANN (#27329 ) # Description Add a new vector index type `diskann` to Azure Cosmos DB Mongo vCore vector store. Paper of DiskANN can be found here [DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf). ## Sample Usage ```python from pymongo import MongoClient # INDEX_NAME = "izzy-test-index-2" # NAMESPACE = "izzy_test_db.izzy_test_collection" # DB_NAME, COLLECTION_NAME = NAMESPACE.split(".") client: MongoClient = MongoClient(CONNECTION_STRING) collection = client[DB_NAME][COLLECTION_NAME] model_deployment = os.getenv( "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada" ) model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002") vectorstore = AzureCosmosDBVectorSearch.from_documents( docs, openai_embeddings, collection=collection, index_name=INDEX_NAME, ) # Read more about these variables in detail here. https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search maxDegree = 40 dimensions = 1536 similarity_algorithm = CosmosDBSimilarityType.COS kind = CosmosDBVectorSearchType.VECTOR_DISKANN lBuild = 20 vectorstore.create_index( dimensions=dimensions, similarity=similarity_algorithm, kind=kind , max_degree=maxDegree, l_build=lBuild, ) ``` ## Dependencies No additional dependencies were added --------- Co-authored-by: Yang Qiao (from Dev Box) <yangqiao@microsoft.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 01:54:04 +00:00
Brian Sharon	b20230c800	community: use correct `id_key` when deleting by id in LanceDB wrapper (#28655 ) - Description: The current version of the `delete` method assumes that the id field will always be called `id`. - Issue: n/a - Dependencies: n/a - Twitter handle: ugh, Twitter :D --- Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 23:49:35 +00:00
Vincent Zhang	df5008fe55	community[minor]: FAISS Filter Function Enhancement with Advanced Query Operators (#28207 ) ## Description We are submitting as a team of four for a project. Other team members are @RuofanChen03, @LikeWang10067, @TANYAL77. This pull requests expands the filtering capabilities of the FAISS vectorstore by adding MongoDB-style query operators indicated as follows, while including comprehensive testing for the added functionality. - $eq (equals) - $neq (not equals) - $gt (greater than) - $lt (less than) - $gte (greater than or equal) - $lte (less than or equal) - $in (membership in list) - $nin (not in list) - $and (all conditions must match) - $or (any condition must match) - $not (negation of condition) ## Issue This closes https://github.com/langchain-ai/langchain/issues/26379. ## Sample Usage ```python import faiss import asyncio from langchain_community.vectorstores import FAISS from langchain.schema import Document from langchain_huggingface import HuggingFaceEmbeddings embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2") documents = [ Document(page_content="Process customer refund request", metadata={"schema_type": "financial", "handler_type": "refund",}), Document(page_content="Update customer shipping address", metadata={"schema_type": "customer", "handler_type": "update",}), Document(page_content="Process payment transaction", metadata={"schema_type": "financial", "handler_type": "payment",}), Document(page_content="Handle customer complaint", metadata={"schema_type": "customer","handler_type": "complaint",}), Document(page_content="Process invoice payment", metadata={"schema_type": "financial","handler_type": "payment",}) ] async def search(vectorstore, query, schema_type, handler_type, k=2): schema_filter = {"schema_type": {"$eq": schema_type}} handler_filter = {"handler_type": {"$eq": handler_type}} combined_filter = { "$and": [ schema_filter, handler_filter, ] } base_retriever = vectorstore.as_retriever( search_kwargs={"k":k, "filter":combined_filter} ) return await base_retriever.ainvoke(query) async def main(): vectorstore = FAISS.from_texts( texts=[doc.page_content for doc in documents], embedding=embeddings, metadatas=[doc.metadata for doc in documents] ) def printt(title, documents): print(title) if not documents: print("\tNo documents found.") return for doc in documents: print(f"\t{doc.page_content}. {doc.metadata}") printt("Documents:", documents) printt('\nquery="process payment", schema_type="financial", handler_type="payment":', await search(vectorstore, query="process payment", schema_type="financial", handler_type="payment", k=2)) printt('\nquery="customer update", schema_type="customer", handler_type="update":', await search(vectorstore, query="customer update", schema_type="customer", handler_type="update", k=2)) printt('\nquery="refund process", schema_type="financial", handler_type="refund":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="refund", k=2)) printt('\nquery="refund process", schema_type="financial", handler_type="foobar":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="foobar", k=2)) print() if __name__ == "__main__":asyncio.run(main()) ``` ## Output ``` Documents: Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'} Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'} Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'} Handle customer complaint. {'schema_type': 'customer', 'handler_type': 'complaint'} Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'} query="process payment", schema_type="financial", handler_type="payment": Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'} Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'} query="customer update", schema_type="customer", handler_type="update": Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'} query="refund process", schema_type="financial", handler_type="refund": Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'} query="refund process", schema_type="financial", handler_type="foobar": No documents found. ``` --------- Co-authored-by: ruofan chen <ruofan.is.awesome@gmail.com> Co-authored-by: RickyCowboy <like.wang@mail.utoronto.ca> Co-authored-by: Shanni Li <tanya.li@mail.utoronto.ca> Co-authored-by: RuofanChen03 <114096642+ruofanchen03@users.noreply.github.com> Co-authored-by: Like Wang <102838708+likewang10067@users.noreply.github.com>	2024-12-11 17:52:22 -05:00
Bagatur	e6a62d8422	core,langchain,community[patch]: allow langsmith 0.2 (#28598 )	2024-12-10 18:50:58 +00:00
Katarina Supe	aba2711e7f	community: update Memgraph integration (#27017 ) Description: - Memgraph no longer relies on `Neo4jGraphStore` but implements `GraphStore`, just like other graph databases. - Memgraph no longer relies on `GraphQAChain`, but implements `MemgraphQAChain`, just like other graph databases. - The refresh schema procedure has been updated to try using `SHOW SCHEMA INFO`. The fallback uses Cypher queries (a combination of schema and Cypher) → LangChain integration no longer relies on MAGE library. - The schema structure has been reformatted. Regardless of the procedures used to get schema, schema structure is the same. - The `add_graph_documents()` method has been implemented. It transforms `GraphDocument` into Cypher queries and creates a graph in Memgraph. It implements the ability to use `baseEntityLabel` to improve speed (`baseEntityLabel` has an index on the `id` property). It also implements the ability to include sources by creating a `MENTIONS` relationship to the source document. - Jupyter Notebook for Memgraph has been updated. - Issue: / - Dependencies: / - Twitter handle: supe_katarina (DX Engineer @ Memgraph) Closes #25606	2024-12-10 10:57:21 -05:00
Naka Masato	ce3b69aa05	community: add include_labels option to ConfluenceLoader (#28259 ) ## Description: Enable `ConfluenceLoader` to include labels with `include_labels` option (`false` by default for backward compatibility). and the labels are set to `metadata` in the `Document`. e.g. `{"labels": ["l1", "l2"]}` ## Notes Confluence API supports to get labels by providing `metadata.labels` to `expand` query parameter All of the following functions support `expand` in the same way: - confluence.get_page_by_id - confluence.get_all_pages_by_label - confluence.get_all_pages_from_space - cql (internally using [/api/content/search](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content/#api-wiki-rest-api-content-search-get)) ## Issue: No issue related to this PR. ## Dependencies: No changes. ## Twitter handle: [@gymnstcs](https://x.com/gymnstcs) - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 19:35:01 +00:00
Abhinav	317a38b83e	community[minor]: Add support for modle2vec embeddings (#28507 ) This PR add an embeddings integration for model2vec, the `Model2vecEmbeddings` class. - Description: [Model2Vec](https://github.com/MinishLab/model2vec) lets you turn any sentence transformer into a really small static model and makes running the model faster. - Issue: - Dependencies: model2vec ([pypi](https://pypi.org/project/model2vec/)) - Twitter handle:: - [x] Add tests and docs: - [Test](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/libs/community/langchain_community/embeddings/model2vec.py), [docs](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/docs/docs/integrations/text_embedding/model2vec.ipynb) - [x] Lint and test: --------- Co-authored-by: Abhinav KM <abhinav.m@zerone-consulting.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-12-09 02:17:22 +00:00
ccurme	2c6bc74cb1	multiple: combine sync/async vector store standard test suites (#28580 ) Breaking change in `langchain-tests`.	2024-12-06 14:55:06 -05:00
cinqisap	482e8a7855	community: Add support for SAP HANA Vector hnsw index creation (#27884 ) Issue: Added support for creating indexes in the SAP HANA Vector engine. Changes: 1. Introduced a new function `create_hnsw_index` in `hanavector.py` that enables the creation of indexes for SAP HANA Vector. 2. Added integration tests for the index creation function to ensure functionality. 3. Updated the documentation to reflect the new index creation feature, including examples and output from the notebook. 4. Fix the operator issue in ` _process_filter_object` function and change the array argument to a placeholder in the similarity search SQL statement. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-05 23:29:08 +00:00
Asi Greenholts	d34bf78f3b	community: BM25Retriever preservation of document id (#27019 ) Currently this retriever discards document ids --------- Co-authored-by: asi-cider <88270351+asi-cider@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-04 00:36:00 +00:00
William Smith	15e7353168	langchain_community: updated query constructor for Databricks Vector Search due to LangChainDeprecationWarning: `filters` was deprecated since langchain-community 0.2.11 and will be removed in 0.3. Please use `filter` instead. (#27974 ) - Description: Updated the kwargs for the structured query from filters to filter due to deprecation of 'filters' for Databricks Vector Search. Also changed the error messages as the allowed operators and comparators are different which can cause issues with functions such as get_query_constructor_prompt() - Issue: Fixes the Key Error for filters due to deprecation in favor for 'filter': LangChainDeprecationWarning: DatabricksVectorSearch received a key `filters` in search_kwargs. `filters` was deprecated since langchain-community 0.2.11 and will be removed in 0.3. Please use `filter` instead. - Dependencies: N/A - Twitter handle: N/A --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-03 16:03:53 -08:00
Jan Heimes	ef365543cb	community: add Needle retriever and document loader integration (#28157 ) - [x] PR title: "community: add Needle retriever and document loader integration" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: This PR adds a new integration for Needle, which includes: - NeedleRetriever: A retriever for fetching documents from Needle collections. - NeedleLoader: A document loader for managing and loading documents into Needle collections. - Example notebooks demonstrating usage have been added in: - `docs/docs/integrations/retrievers/needle.ipynb` - `docs/docs/integrations/document_loaders/needle.ipynb`. - Dependencies: The `needle-python` package is required as an external dependency for accessing Needle's API. It has been added to the extended testing dependencies list. - Twitter handle: Feel free to mention me if this PR gets announced: [needlexai](https://x.com/NeedlexAI). - [x] Add tests and docs: If you're adding a new integration, please include 1. Unit tests have been added for both `NeedleRetriever` and `NeedleLoader` in `libs/community/tests/unit_tests`. These tests mock API calls to avoid relying on network access. 2. Example notebooks have been added to `docs/docs/integrations/`, showcasing both retriever and loader functionality. - [x] Lint and test: Run `make format`, `make lint`, and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ - `make format`: Passed - `make lint`: Passed - `make test`: Passed (requires `needle-python` to be installed locally; this package is not added to LangChain dependencies). Additional guidelines: - [x] Optional dependencies are imported only within functions. - [x] No dependencies have been added to pyproject.toml files except for those required for unit tests. - [x] The PR does not touch more than one package. - [x] Changes are fully backwards compatible. - [x] Community additions are not re-imported into LangChain core. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-03 22:06:25 +00:00
LuisMSotamba	0901f11b0f	community: add truncation params when an openai assistant's run is created (#28158 ) Description: When an OpenAI assistant is invoked, it creates a run by default, allowing users to set only a few request fields. The truncation strategy is set to auto, which includes previous messages in the thread along with the current question until the context length is reached. This causes token usage to grow incrementally: consumed_tokens = previous_consumed_tokens + current_consumed_tokens. This PR adds support for user-defined truncation strategies, giving better control over token consumption. Issue: High token consumption.	2024-11-27 10:53:53 -05:00
ccurme	a5374952f8	community[patch]: fix import in test (#28339 ) Library name was updated after https://github.com/langchain-ai/langchain/pull/27879 branched off master.	2024-11-25 19:28:01 +00:00
Yan	c60695a1c7	community: fixed critical bugs at Writer provider (#27879 )	2024-11-25 12:03:37 -05:00
shroominic	dee72c46c1	community: Outlines integration (#27449 ) In collaboration with @rlouf I build an [outlines](https://dottxt-ai.github.io/outlines/latest/) integration for langchain! I think this is really useful for doing any type of structured output locally. [Dottxt](https://dottxt.co) spend alot of work optimising this process at a lower level ([outlines-core](https://pypi.org/project/outlines-core/0.1.14/) written in rust) so I think this is a better alternative over all current approaches in langchain to do structured output. It also implements the `.with_structured_output` method so it should be a drop in replacement for a lot of applications. The integration includes: - Outlines LLM class - ChatOutlines class - Tutorial Cookbooks - Documentation Page - Validation and error messages - Exposes Outlines Structured output features - Support for multiple backends - Integration and Unit Tests Dependencies: `outlines` + additional (depending on backend used) I am not sure if the unit-tests comply with all requirements, if not I suggest to just remove them since I don't see a useful way to do it differently. ### Quick overview: Chat Models: <img width="698" alt="image" src="https://github.com/user-attachments/assets/05a499b9-858c-4397-a9ff-165c2b3e7acc"> Structured Output: <img width="955" alt="image" src="https://github.com/user-attachments/assets/b9fcac11-d3e5-4698-b1ae-8c4cb3d54c45"> --------- Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-11-20 16:31:31 -05:00
Erick Friis	0dbaf05bb7	standard-tests: rename langchain_standard_tests to langchain_tests, release 0.3.2 (#28203 )	2024-11-18 19:10:39 -08:00
Erick Friis	6d2004ee7d	multiple: langchain-standard-tests -> langchain-tests (#28139 )	2024-11-15 11:32:04 -08:00
alex shengzhi li	39fcb476fd	community: add reka chat model integration (#27379 )	2024-11-15 13:37:14 -05:00
Jorge Piedrahita Ortiz	39956a3ef0	community: sambanovacloud llm integration (#27526 ) - Description: SambaNovaCloud llm integration added, previously only chat model integration --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-15 16:58:11 +00:00
Saad Makrod	b509747c7f	Community: Google Books API Tool (#27307 ) ## Description As proposed in our earlier discussion #26977 we have introduced a Google Books API Tool that leverages the Google Books API found at [https://developers.google.com/books/docs/v1/using](https://developers.google.com/books/docs/v1/using) to generate book recommendations. ### Sample Usage ```python from langchain_community.tools import GoogleBooksQueryRun from langchain_community.utilities import GoogleBooksAPIWrapper api_wrapper = GoogleBooksAPIWrapper() tool = GoogleBooksQueryRun(api_wrapper=api_wrapper) tool.run('ai') ``` ### Sample Output ```txt Here are 5 suggestions based off your search for books related to ai: 1. "AI's Take on the Stigma Against AI-Generated Content" by Sandy Y. Greenleaf: In a world where artificial intelligence (AI) is rapidly advancing and transforming various industries, a new form of content creation has emerged: AI-generated content. However, despite its potential to revolutionize the way we produce and consume information, AI-generated content often faces a significant stigma. "AI's Take on the Stigma Against AI-Generated Content" is a groundbreaking book that delves into the heart of this issue, exploring the reasons behind the stigma and offering a fresh, unbiased perspective on the topic. Written from the unique viewpoint of an AI, this book provides readers with a comprehensive understanding of the challenges and opportunities surrounding AI-generated content. Through engaging narratives, thought-provoking insights, and real-world examples, this book challenges readers to reconsider their preconceptions about AI-generated content. It explores the potential benefits of embracing this technology, such as increased efficiency, creativity, and accessibility, while also addressing the concerns and drawbacks that contribute to the stigma. As you journey through the pages of this book, you'll gain a deeper understanding of the complex relationship between humans and AI in the realm of content creation. You'll discover how AI can be used as a tool to enhance human creativity, rather than replace it, and how collaboration between humans and machines can lead to unprecedented levels of innovation. Whether you're a content creator, marketer, business owner, or simply someone curious about the future of AI and its impact on our society, "AI's Take on the Stigma Against AI-Generated Content" is an essential read. With its engaging writing style, well-researched insights, and practical strategies for navigating this new landscape, this book will leave you equipped with the knowledge and tools needed to embrace the AI revolution and harness its potential for success. Prepare to have your assumptions challenged, your mind expanded, and your perspective on AI-generated content forever changed. Get ready to embark on a captivating journey that will redefine the way you think about the future of content creation. Read more at https://play.google.com/store/books/details?id=4iH-EAAAQBAJ&source=gbs_api 2. "AI Strategies For Web Development" by Anderson Soares Furtado Oliveira: From fundamental to advanced strategies, unlock useful insights for creating innovative, user-centric websites while navigating the evolving landscape of AI ethics and security Key Features Explore AI's role in web development, from shaping projects to architecting solutions Master advanced AI strategies to build cutting-edge applications Anticipate future trends by exploring next-gen development environments, emerging interfaces, and security considerations in AI web development Purchase of the print or Kindle book includes a free PDF eBook Book Description If you're a web developer looking to leverage the power of AI in your projects, then this book is for you. Written by an AI and ML expert with more than 15 years of experience, AI Strategies for Web Development takes you on a transformative journey through the dynamic intersection of AI and web development, offering a hands-on learning experience.The first part of the book focuses on uncovering the profound impact of AI on web projects, exploring fundamental concepts, and navigating popular frameworks and tools. As you progress, you'll learn how to build smart AI applications with design intelligence, personalized user journeys, and coding assistants. Later, you'll explore how to future-proof your web development projects using advanced AI strategies and understand AI's impact on jobs. Toward the end, you'll immerse yourself in AI-augmented development, crafting intelligent web applications and navigating the ethical landscape.Packed with insights into next-gen development environments, AI-augmented practices, emerging realities, interfaces, and security governance, this web development book acts as your roadmap to staying ahead in the AI and web development domain. What you will learn Build AI-powered web projects with optimized models Personalize UX dynamically with AI, NLP, chatbots, and recommendations Explore AI coding assistants and other tools for advanced web development Craft data-driven, personalized experiences using pattern recognition Architect effective AI solutions while exploring the future of web development Build secure and ethical AI applications following TRiSM best practices Explore cutting-edge AI and web development trends Who this book is for This book is for web developers with experience in programming languages and an interest in keeping up with the latest trends in AI-powered web development. Full-stack, front-end, and back-end developers, UI/UX designers, software engineers, and web development enthusiasts will also find valuable information and practical guidelines for developing smarter websites with AI. To get the most out of this book, it is recommended that you have basic knowledge of programming languages such as HTML, CSS, and JavaScript, as well as a familiarity with machine learning concepts. Read more at https://play.google.com/store/books/details?id=FzYZEQAAQBAJ&source=gbs_api 3. "Artificial Intelligence for Students" by Vibha Pandey: A multifaceted approach to develop an understanding of AI and its potential applications KEY FEATURES ● AI-informed focuses on AI foundation, applications, and methodologies. ● AI-inquired focuses on computational thinking and bias awareness. ● AI-innovate focuses on creative and critical thinking and the Capstone project. DESCRIPTION AI is a discipline in Computer Science that focuses on developing intelligent machines, machines that can learn and then teach themselves. If you are interested in AI, this book can definitely help you prepare for future careers in AI and related fields. The book is aligned with the CBSE course, which focuses on developing employability and vocational competencies of students in skill subjects. The book is an introduction to the basics of AI. It is divided into three parts – AI-informed, AI-inquired and AI-innovate. It will help you understand AI's implications on society and the world. You will also develop a deeper understanding of how it works and how it can be used to solve complex real-world problems. Additionally, the book will also focus on important skills such as problem scoping, goal setting, data analysis, and visualization, which are essential for success in AI projects. Lastly, you will learn how decision trees, neural networks, and other AI concepts are commonly used in real-world applications. By the end of the book, you will develop the skills and competencies required to pursue a career in AI. WHAT YOU WILL LEARN ● Get familiar with the basics of AI and Machine Learning. ● Understand how and where AI can be applied. ● Explore different applications of mathematical methods in AI. ● Get tips for improving your skills in Data Storytelling. ● Understand what is AI bias and how it can affect human rights. WHO THIS BOOK IS FOR This book is for CBSE class XI and XII students who want to learn and explore more about AI. Basic knowledge of Statistical concepts, Algebra, and Plotting of equations is a must. TABLE OF CONTENTS 1. Introduction: AI for Everyone 2. AI Applications and Methodologies 3. Mathematics in Artificial Intelligence 4. AI Values (Ethical Decision-Making) 5. Introduction to Storytelling 6. Critical and Creative Thinking 7. Data Analysis 8. Regression 9. Classification and Clustering 10. AI Values (Bias Awareness) 11. Capstone Project 12. Model Lifecycle (Knowledge) 13. Storytelling Through Data 14. AI Applications in Use in Real-World Read more at https://play.google.com/store/books/details?id=ptq1EAAAQBAJ&source=gbs_api 4. "The AI Book" by Ivana Bartoletti, Anne Leslie and Shân M. Millie: Written by prominent thought leaders in the global fintech space, The AI Book aggregates diverse expertise into a single, informative volume and explains what artifical intelligence really means and how it can be used across financial services today. Key industry developments are explained in detail, and critical insights from cutting-edge practitioners offer first-hand information and lessons learned. Coverage includes: · Understanding the AI Portfolio: from machine learning to chatbots, to natural language processing (NLP); a deep dive into the Machine Intelligence Landscape; essentials on core technologies, rethinking enterprise, rethinking industries, rethinking humans; quantum computing and next-generation AI · AI experimentation and embedded usage, and the change in business model, value proposition, organisation, customer and co-worker experiences in today’s Financial Services Industry · The future state of financial services and capital markets – what’s next for the real-world implementation of AITech? · The innovating customer – users are not waiting for the financial services industry to work out how AI can re-shape their sector, profitability and competitiveness · Boardroom issues created and magnified by AI trends, including conduct, regulation & oversight in an algo-driven world, cybersecurity, diversity & inclusion, data privacy, the ‘unbundled corporation’ & the future of work, social responsibility, sustainability, and the new leadership imperatives · Ethical considerations of deploying Al solutions and why explainable Al is so important Read more at http://books.google.ca/books?id=oE3YDwAAQBAJ&dq=ai&hl=&source=gbs_api 5. "Artificial Intelligence in Society" by OECD: The artificial intelligence (AI) landscape has evolved significantly from 1950 when Alan Turing first posed the question of whether machines can think. Today, AI is transforming societies and economies. It promises to generate productivity gains, improve well-being and help address global challenges, such as climate change, resource scarcity and health crises. Read more at https://play.google.com/store/books/details?id=eRmdDwAAQBAJ&source=gbs_api ``` ## Issue This closes #27276 ## Dependencies No additional dependencies were added --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 15:29:35 -08:00
Akshata	05fd6a16a9	Add ChatModels wrapper for Cloudflare Workers AI (#27645 ) Thank you for contributing to LangChain! - [x] PR title: "community: chat models wrapper for Cloudflare Workers AI" - [x] PR message: - Description: Add chat models wrapper for Cloudflare Workers AI. Enables Langgraph intergration via ChatModel for tool usage, agentic usage. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-07 15:34:24 -05:00
Dmitriy Prokopchuk	53b0a99f37	community: Memcached LLM Cache Integration (#27323 ) ## Description This PR adds support for Memcached as a usable LLM model cache by adding the ```MemcachedCache``` implementation relying on the [pymemcache](https://github.com/pinterest/pymemcache) client. Unit test-wise, the new integration is generally covered under existing import testing. All new functionality depends on pymemcache if instantiated and used, so to comply with the other cache implementations the PR also adds optional integration tests for ```MemcachedCache```. Since this is a new integration, documentation is added for Memcached as an integration and as an LLM Cache. ## Issue This PR closes #27275 which was originally raised as a discussion in #27035 ## Dependencies There are no new required dependencies for langchain, but [pymemcache](https://github.com/pinterest/pymemcache) is required to instantiate the new ```MemcachedCache```. ## Example Usage ```python3 from langchain.globals import set_llm_cache from langchain_openai import OpenAI from langchain_community.cache import MemcachedCache from pymemcache.client.base import Client llm = OpenAI(model="gpt-3.5-turbo-instruct", n=2, best_of=2) set_llm_cache(MemcachedCache(Client('localhost'))) # The first time, it is not yet in cache, so it should take longer llm.invoke("Which city is the most crowded city in the USA?") # The second time it is, so it goes faster llm.invoke("Which city is the most crowded city in the USA?") ``` --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 03:07:59 +00:00
Baptiste Pasquier	81f7daa458	community: add InfinityRerank (#27043 ) Description: - Add a Reranker for Infinity server. Dependencies: This wrapper uses [infinity_client](https://github.com/michaelfeil/infinity/tree/main/libs/client_infinity/infinity_client) to connect to an Infinity server. Tests and docs - integration test: test_infinity_rerank.py - example notebook: infinity_rerank.ipynb [here](https://github.com/baptiste-pasquier/langchain/blob/feat/infinity-rerank/docs/docs/integrations/document_transformers/infinity_rerank.ipynb) --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-06 17:26:30 -08:00
Eric Pinzur	ea0ad917b0	community: added Document.id support to opensearch vectorstore (#27945 ) Description: * Added support of Document.id on OpenSearch vector store * Added tests cases to match	2024-11-06 15:04:09 -05:00
Ofer Mendelevitch	d7c39e6dbb	community: update Vectara integration (#27869 ) Thank you for contributing to LangChain! - Description: Updated Vectara integration - Issue: refresh on descriptions across all demos and added UDF reranker - Dependencies: None - Twitter handle: @ofermend --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:40:39 +00:00
Eric Pinzur	8eb38622a6	community: fixed bug in GraphVectorStoreRetriever (#27846 ) Description: This fixes an issue that mistakenly created in https://github.com/langchain-ai/langchain/pull/27253. The issue currently exists only in `langchain-community==0.3.4`. Test cases were added to prevent this issue in the future. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:27:17 +00:00
ccurme	0172d938b4	community: add AzureOpenAIWhisperParser (#27796 ) Commandeered from https://github.com/langchain-ai/langchain/pull/26757. --------- Co-authored-by: Sheepsta300 <128811766+Sheepsta300@users.noreply.github.com>	2024-10-31 12:37:41 -04:00
Sergey Ryabov	8180637345	community[patch]: Fix Playwright Tools bug with Pydantic schemas (#27050 ) - Add tests for Playwright tools schema serialization - Introduce base empty args Input class for BaseBrowserTool Test Plan: `poetry run pytest tests/unit_tests/tools/playwright/test_all.py` Fixes #26758 --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-30 23:45:36 +00:00
Sam Julien	0a472e2a2d	community: Add Writer integration (#27646 ) Description: Add support for Writer chat models Issue: N/A Dependencies: Add `writer-sdk` to optional dependencies. Twitter handle: Please tag `@samjulien` and `@Get_Writer` Tests and docs - [x] Unit test - [x] Example notebook in `docs/docs/integrations` directory. Lint and test - [x] Run `make format` - [x] Run `make lint` - [x] Run `make test` --------- Co-authored-by: Johannes <tolstoy.work@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-30 18:06:05 +00:00
fayvor	3b956b3a97	community: Update Replicate LLM and fix tests (#27655 ) Description: - Fix bug in Replicate LLM class, where it was looking for parameter names in a place where they no longer exist in pydantic 2, resulting in the "Field required" validation error described in the issue. - Fix Replicate LLM integration tests to: - Use active models on Replicate. - Use the correct model parameter `max_new_tokens` as shown in the [Replicate docs](https://replicate.com/docs/guides/language-models/how-to-use#minimum-and-maximum-new-tokens). - Use callbacks instead of deprecated callback_manager. Issue: #26937 Dependencies: n/a Twitter handle: n/a --------- Signed-off-by: Fayvor Love <fayvor@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-10-30 16:07:08 +00:00
Erick Friis	600b7bdd61	all: test 3.13 ci (#27197 ) Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-10-25 12:56:58 -07:00
CLOVA Studio 개발	846a75284f	community: Add Naver chat model & embeddings (#25162 ) Reopened as a personal repo outside the organization. ## Description - Naver HyperCLOVA X community package - Add chat model & embeddings - Add unit test & integration test - Add chat model & embeddings docs - I changed partner package(https://github.com/langchain-ai/langchain/pull/24252) to community package on this PR - Could this embeddings(https://github.com/langchain-ai/langchain/pull/21890) be deprecated? We are trying to replace it with embedding model(ClovaXEmbeddings) in this PR. Twitter handle: None. (if needed, contact with joonha.jeon@navercorp.com) --- you can check our previous discussion below: > one question on namespaces - would it make sense to have these in .clova namespaces instead of .naver? I would like to keep it as is, unless it is essential to unify the package name. (ClovaX is a branding for the model, and I plan to add other models and components. They need to be managed as separate classes.) > also, could you clarify the difference between ClovaEmbeddings and ClovaXEmbeddings? There are 3 models that are being serviced by embedding, and all are supported in the current PR. In addition, all the functionality of CLOVA Studio that serves actual models, such as distinguishing between test apps and service apps, is supported. The existing PR does not support this content because it is hard-coded. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-10-24 20:54:13 +00:00
Lei Zhang	f203229b51	community: Fix the failure of ChatSparkLLM after upgrading to Pydantic V2 (#27418 ) Description: The test_sparkllm.py can reproduce this issue. https://github.com/langchain-ai/langchain/blob/master/libs/community/tests/integration_tests/chat_models/test_sparkllm.py#L66 ``` Testing started at 18:27 ... Launching pytest with arguments test_sparkllm.py::test_chat_spark_llm --no-header --no-summary -q in /Users/zhanglei/Work/github/langchain/libs/community/tests/integration_tests/chat_models ============================= test session starts ============================== collecting ... collected 1 item test_sparkllm.py::test_chat_spark_llm ============================== 1 failed in 0.45s =============================== FAILED [100%] tests/integration_tests/chat_models/test_sparkllm.py:65 (test_chat_spark_llm) def test_chat_spark_llm() -> None: > chat = ChatSparkLLM( spark_app_id="your spark_app_id", spark_api_key="your spark_api_key", spark_api_secret="your spark_api_secret", ) # type: ignore[call-arg] test_sparkllm.py:67: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../../../core/langchain_core/load/serializable.py:111: in __init__ super().__init__(args, kwargs) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cls = <class 'langchain_community.chat_models.sparkllm.ChatSparkLLM'> values = {'spark_api_key': 'your spark_api_key', 'spark_api_secret': 'your spark_api_secret', 'spark_api_url': 'wss://spark-api.xf-yun.com/v3.5/chat', 'spark_app_id': 'your spark_app_id', ...} @model_validator(mode="before") @classmethod def validate_environment(cls, values: Dict) -> Any: values["spark_app_id"] = get_from_dict_or_env( values, ["spark_app_id", "app_id"], "IFLYTEK_SPARK_APP_ID", ) values["spark_api_key"] = get_from_dict_or_env( values, ["spark_api_key", "api_key"], "IFLYTEK_SPARK_API_KEY", ) values["spark_api_secret"] = get_from_dict_or_env( values, ["spark_api_secret", "api_secret"], "IFLYTEK_SPARK_API_SECRET", ) values["spark_api_url"] = get_from_dict_or_env( values, "spark_api_url", "IFLYTEK_SPARK_API_URL", SPARK_API_URL, ) values["spark_llm_domain"] = get_from_dict_or_env( values, "spark_llm_domain", "IFLYTEK_SPARK_LLM_DOMAIN", SPARK_LLM_DOMAIN, ) # put extra params into model_kwargs default_values = { name: field.default for name, field in get_fields(cls).items() if field.default is not None } > values["model_kwargs"]["temperature"] = default_values.get("temperature") E KeyError: 'model_kwargs' ../../../langchain_community/chat_models/sparkllm.py:368: KeyError ``` I found that when upgrading to Pydantic v2, @root_validator was changed to @model_validator. When a class declares multiple @model_validator(model=before), the execution order in V1 and V2 is opposite. This is the reason for ChatSparkLLM's failure. The correct execution order is to execute build_extra first. https://github.com/langchain-ai/langchain/blob/langchain%3D%3D0.2.16/libs/community/langchain_community/chat_models/sparkllm.py#L302 And then execute validate_environment. https://github.com/langchain-ai/langchain/blob/langchain%3D%3D0.2.16/libs/community/langchain_community/chat_models/sparkllm.py#L329 The Pydantic community also discusses it, but there hasn't been a conclusion yet. https://github.com/pydantic/pydantic/discussions/7434 Issus:* #27416 Twitter handle: coolbeevip --------- Co-authored-by: vbarda <vadym@langchain.dev>	2024-10-23 21:17:10 -04:00
Eric Pinzur	f636c83321	community: Cassandra Vector Store: modernize implementation (#27253 ) Description: This PR updates `CassandraGraphVectorStore` to be based off `CassandraVectorStore`, instead of using a custom CQL implementation. This allows users using a `CassandraVectorStore` to upgrade to a `GraphVectorStore` without having to change their database schema or re-embed documents. This PR also updates the documentation of the `GraphVectorStore` base class and contains native async implementations for the standard graph methods: `traversal_search` and `mmr_traversal_search` in `CassandraVectorStore`. Issue: No issue number. Dependencies: https://github.com/langchain-ai/langchain/pull/27078 (already-merged) Lint and test: - Lint and tests all pass, including existing `CassandraGraphVectorStore` tests. - Also added numerous additional tests based of the tests in `langchain-astradb` which cover many more scenarios than the existing tests for `Cassandra` and `CassandraGraphVectorStore` BREAKING CHANGE Note that this is a breaking change for existing users of `CassandraGraphVectorStore`. They will need to wipe their database table and restart. However: - The interfaces have not changed. Just the underlying storage mechanism. - Any one using `langchain_community.vectorstores.Cassandra` can instead use `langchain_community.graph_vectorstores.CassandraGraphVectorStore` and they will gain Graph capabilities without having to re-embed their existing documents. This is the primary goal of this PR. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-22 18:11:11 +00:00
Erick Friis	92ae61bcc8	multiple: rely on asyncio_mode auto in tests (#27200 )	2024-10-15 16:26:38 +00:00
Qiu Qin	57fbc6bdf1	community: Update OCI data science integration (#27083 ) This PR updates the integration with OCI data science model deployment service. - Update LLM to support streaming and async calls. - Added chat model. - Updated tests and docs. - Updated `libs/community/scripts/check_pydantic.sh` since the use of `@pre_init` is removed from existing integration. - Updated `libs/community/extended_testing_deps.txt` as this integration requires `langchain_openai`. --------- Co-authored-by: MING KANG <ming.kang@oracle.com> Co-authored-by: Dmitrii Cherkasov <dmitrii.cherkasov@oracle.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-15 08:32:54 -07:00
Vittorio Rigamonti	7da2efd9d3	community[minor]: VectorStore Infinispan. Adding TLS and authentication (#23522 ) Description: this PR enable VectorStore TLS and authentication (digest, basic) with HTTP/2 for Infinispan server. Based on httpx. Added docker-compose facilities for testing Added documentation Dependencies: requires `pip install httpx[http2]` if HTTP2 is needed Twitter handle: https://twitter.com/infinispan	2024-10-09 10:51:39 -04:00
Stefano Lottini	d05fdd97dd	community: Cassandra Vector Store: extend metadata-related methods (#27078 ) Description: this PR adds a set of methods to deal with metadata associated to the vector store entries. These, while essential to the Graph-related extension of the `Cassandra` vector store, are also useful in themselves. These are (all come in their sync+async versions): - `[a]delete_by_metadata_filter` - `[a]replace_metadata` - `[a]get_by_document_id` - `[a]metadata_search` Additionally, a `[a]similarity_search_with_embedding_id_by_vector` method is introduced to better serve the store's internal working (esp. related to reranking logic). Issue: no issue number, but now all Document's returned bear their `.id` consistently (as a consequence of a slight refactoring in how the raw entries read from DB are made back into `Document` instances). Dependencies: (no new deps: packaging comes through langchain-core already; `cassio` is now required to be version 0.1.10+) Add tests and docs Added integration tests for the relevant newly-introduced methods. (Docs will be updated in a separate PR). Lint and test Lint and (updated) test all pass. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-09 06:41:34 +00:00
ccurme	e3920f2320	community[patch]: fix structured_output in llamacpp integration (#27202 ) Resolves https://github.com/langchain-ai/langchain/issues/25318.	2024-10-08 15:16:59 -04:00
Jorge Piedrahita Ortiz	14de81b140	community: sambastudio chat model (#27056 ) Description:: sambastudio chat model integration added, previously only LLM integration included docs and tests --------- Co-authored-by: luisfucros <luisfucros@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-10-07 14:31:39 -04:00
Bagatur	4935a14314	core,integrations[minor]: Dont error on fields in model_kwargs (#27110 ) Given the current erroring behavior, every time we've moved a kwarg from model_kwargs and made it its own field that was a breaking change. Updating this behavior to support the old instantiations / serializations. Assuming build_extra_kwargs was not something that itself is being used externally and needs to be kept backwards compatible	2024-10-04 11:30:27 -07:00
ZhangShenao	e317d457cf	Bug-Fix[Community] Fix `FastEmbedEmbeddings` (#26764 ) #26759 - Fix https://github.com/langchain-ai/langchain/issues/26759 - Change `model` param from private to public, which may not be initiated. - Add test case	2024-09-30 21:23:08 -04:00
Ben Chambers	29bf89db25	community: Add conversions from GVS to networkx (#26906 ) These allow converting linked documents (such as those used with GraphVectorStore) to networkx for rendering and/or in-memory graph algorithms such as community detection.	2024-09-27 16:48:55 -04:00
Abhi Agarwal	696114e145	community: add sqlite-vec vectorstore (#25003 ) Description: Adds a vector store integration with [sqlite-vec](https://alexgarcia.xyz/sqlite-vec/), the successor to sqlite-vss that is a single C file with no external dependencies. Pretty straightforward, just copy-pasted the sqlite-vss integration and made a few tweaks and added integration tests. Only question is whether all documentation should be directed away from sqlite-vss if it is defacto deprecated (cc @asg017). --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: philippe-oger <philippe.oger@adevinta.com>	2024-09-26 17:37:10 +00:00
Jorge Piedrahita Ortiz	408a930d55	community: Add Sambanova Cloud Chat model community integration (#26333 ) Description: : Add SambaNova Cloud Chat model community integration Includes - chat model integration (following Standardize ChatModel docstrings) - tests - docs usage notebook (following Standardize ChatModel integration docs) https://cloud.sambanova.ai/ --------- Co-authored-by: luisfucros <luisfucros@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-09-24 14:11:32 +00:00
ccurme	f2285376a5	community[patch]: add web loader tests (#26728 )	2024-09-20 18:29:54 -04:00
Erick Friis	311f861547	core, community: move graph vectorstores to community (#26678 ) remove beta namespace from core, add to community	2024-09-19 11:38:14 -07:00
ccurme	f91bdd12d2	community[patch]: add to pypdf tests and run in CI (#26663 )	2024-09-19 14:45:49 +00:00
Rajendra Kadam	60dc19da30	[community] Added PebbloTextLoader for loading text data in PebbloSafeLoader (#26582 ) - Description: Added PebbloTextLoader for loading text in PebbloSafeLoader. - Since PebbloSafeLoader wraps document loaders, this new loader enables direct loading of text into Documents using PebbloSafeLoader. - Issue: NA - Dependencies: NA - [x] Tests: Added/Updated tests	2024-09-19 09:59:04 -04:00
Jorge Piedrahita Ortiz	37b72023fe	community: remove sambaverse (#26265 ) removing Sambaverse llm model and references given is not available after Sep/10/2024 <img width="1781" alt="image" src="https://github.com/user-attachments/assets/4dcdb5f7-5264-4a03-b8e5-95c88304e059">	2024-09-19 09:56:30 -04:00
Martin Triska	3fc0ea510e	community : [bugfix] Use document ids as keys in AzureSearch vectorstore (#25486 ) # Description [Vector store base class](`4cdaca67dc/libs/core/langchain_core/vectorstores/base.py (L65)`) currently expects `ids` to be passed in and that is what it passes along to the AzureSearch vector store when attempting to `add_texts()`. However AzureSearch expects `keys` to be passed in. When they are not present, AzureSearch `add_embeddings()` makes up new uuids. This is a problem when trying to run indexing. [Indexing code expects](`b297af5482/libs/core/langchain_core/indexing/api.py (L371)`) the documents to be uploaded using provided ids. Currently AzureSearch ignores `ids` passed from `indexing` and makes up new ones. Later when `indexer` attempts to delete removed file, it uses the `id` it had stored when uploading the document, however it was uploaded under different `id`. Twitter handle: @martintriska1	2024-09-19 09:37:18 -04:00
Nuno Campos	5fc44989bf	core[patch]: Fix "argument of type 'NoneType' is not iterable" error in LangChainTracer (#26576 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-09-17 10:29:46 -07:00
RUO	0a177ec2cc	community: Enhance MongoDBLoader with flexible metadata and optimized field extraction (#23376 ) ### Description: This pull request significantly enhances the MongodbLoader class in the LangChain community package by adding robust metadata customization and improved field extraction capabilities. The updated class now allows users to specify additional metadata fields through the metadata_names parameter, enabling the extraction of both top-level and deeply nested document attributes as metadata. This flexibility is crucial for users who need to include detailed contextual information without altering the database schema. Moreover, the include_db_collection_in_metadata flag offers optional inclusion of database and collection names in the metadata, allowing for even greater customization depending on the user's needs. The loader's field extraction logic has been refined to handle missing or nested fields more gracefully. It now employs a safe access mechanism that avoids the KeyError previously encountered when a specified nested field was absent in a document. This update ensures that the loader can handle diverse and complex data structures without failure, making it more resilient and user-friendly. ### Issue: This pull request addresses a critical issue where the MongodbLoader class in the LangChain community package could throw a KeyError when attempting to access nested fields that may not exist in some documents. The previous implementation did not handle the absence of specified nested fields gracefully, leading to runtime errors and interruptions in data processing workflows. This enhancement ensures robust error handling by safely accessing nested document fields, using default values for missing data, thus preventing KeyError and ensuring smoother operation across various data structures in MongoDB. This improvement is crucial for users working with diverse and complex data sets, ensuring the loader can adapt to documents with varying structures without failing. ### Dependencies: Requires motor for asynchronous MongoDB interaction. ### Twitter handle: N/A ### Add tests and docs Tests: Unit tests have been added to verify that the metadata inclusion toggle works as expected and that the field extraction correctly handles nested fields. Docs: An example notebook demonstrating the use of the enhanced MongodbLoader is included in the docs/docs/integrations directory. This notebook includes setup instructions, example usage, and outputs. (Here is the notebook link : [colab link](https://colab.research.google.com/drive/1tp7nyUnzZa3dxEFF4Kc3KS7ACuNF6jzH?usp=sharing)) Lint and test Before submitting, I ran make format, make lint, and make test as per the contribution guidelines. All tests pass, and the code style adheres to the LangChain standards. ```python import unittest from unittest.mock import patch, MagicMock import asyncio from langchain_community.document_loaders.mongodb import MongodbLoader class TestMongodbLoader(unittest.TestCase): def setUp(self): """Setup the MongodbLoader test environment by mocking the motor client and database collection interactions.""" # Mocking the AsyncIOMotorClient self.mock_client = MagicMock() self.mock_db = MagicMock() self.mock_collection = MagicMock() self.mock_client.get_database.return_value = self.mock_db self.mock_db.get_collection.return_value = self.mock_collection # Initialize the MongodbLoader with test data self.loader = MongodbLoader( connection_string="mongodb://localhost:27017", db_name="testdb", collection_name="testcol" ) @patch('langchain_community.document_loaders.mongodb.AsyncIOMotorClient', return_value=MagicMock()) def test_constructor(self, mock_motor_client): """Test if the constructor properly initializes with the correct database and collection names.""" loader = MongodbLoader( connection_string="mongodb://localhost:27017", db_name="testdb", collection_name="testcol" ) self.assertEqual(loader.db_name, "testdb") self.assertEqual(loader.collection_name, "testcol") def test_aload(self): """Test the aload method to ensure it correctly queries and processes documents.""" # Setup mock data and responses for the database operations self.mock_collection.count_documents.return_value = asyncio.Future() self.mock_collection.count_documents.return_value.set_result(1) self.mock_collection.find.return_value = [ {"_id": "1", "content": "Test document content"} ] # Run the aload method and check responses loop = asyncio.get_event_loop() results = loop.run_until_complete(self.loader.aload()) self.assertEqual(len(results), 1) self.assertEqual(results[0].page_content, "Test document content") def test_construct_projection(self): """Verify that the projection dictionary is constructed correctly based on field names.""" self.loader.field_names = ['content', 'author'] self.loader.metadata_names = ['timestamp'] expected_projection = {'content': 1, 'author': 1, 'timestamp': 1} projection = self.loader._construct_projection() self.assertEqual(projection, expected_projection) if __name__ == '__main__': unittest.main() ``` ### Additional Example for Documentation Sample Data: ```json [ { "_id": "1", "title": "Artificial Intelligence in Medicine", "content": "AI is transforming the medical industry by providing personalized medicine solutions.", "author": { "name": "John Doe", "email": "john.doe@example.com" }, "tags": ["AI", "Healthcare", "Innovation"] }, { "_id": "2", "title": "Data Science in Sports", "content": "Data science provides insights into player performance and strategic planning in sports.", "author": { "name": "Jane Smith", "email": "jane.smith@example.com" }, "tags": ["Data Science", "Sports", "Analytics"] } ] ``` Example Code: ```python loader = MongodbLoader( connection_string="mongodb://localhost:27017", db_name="example_db", collection_name="articles", filter_criteria={"tags": "AI"}, field_names=["title", "content"], metadata_names=["author.name", "author.email"], include_db_collection_in_metadata=True ) documents = loader.load() for doc in documents: print("Page Content:", doc.page_content) print("Metadata:", doc.metadata) ``` Expected Output: ``` Page Content: Artificial Intelligence in Medicine AI is transforming the medical industry by providing personalized medicine solutions. Metadata: {'author_name': 'John Doe', 'author_email': 'john.doe@example.com', 'database': 'example_db', 'collection': 'articles'} ``` Thank you. --- Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-09-17 10:23:17 -04:00
Erick Friis	c2a3021bb0	multiple: pydantic 2 compatibility, v0.3 (#26443 ) Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com>	2024-09-13 14:38:45 -07:00
Bagatur	e32adad17a	community[patch]: Release 0.2.17 (#26432 )	2024-09-13 09:56:39 -07:00
Nuno Campos	212c688ee0	core[minor]: Remove serialized manifest from tracing requests for non-llm runs (#26270 ) - This takes a long time to compute, isn't used, and currently called on every invocation of every chain/retriever/etc	2024-09-10 12:58:24 -07:00
William FH	262e19b15d	infra: Clear cache for env-var checks (#26073 )	2024-09-06 21:29:29 +00:00
Bagatur	1241a004cb	fmt	2024-09-04 11:44:59 -07:00
Bagatur	4ba14ae9e5	fmt	2024-09-04 11:34:59 -07:00
Bagatur	dba308447d	fmt	2024-09-04 11:28:04 -07:00
Yash Parmar	51dae57357	community[minor]: jina search tools integrating (jina reader) (#23339 ) - PR title: "community: add Jina Search tool" - Description: Added the Jina Search tool for querying the Jina search API. This includes the implementation of the JinaSearchAPIWrapper and the JinaSearch tool, along with a Jupyter notebook example demonstrating its usage. - Issue: N/A - Dependencies: N/A - Twitter handle: [Twitter handle](https://x.com/yashp3020?t=7wM0gQ7XjGciFoh9xaBtqA&s=09) - [x] Add tests and docs: If you're adding a new integration, please include 1. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-09-02 14:52:14 -07:00
Alexander KIRILOV	6a8f8a56ac	community[patch]: added content_columns option to CSVLoader (#23809 ) Description: Adding a new option to the CSVLoader that allows us to implicitly specify the columns that are used for generating the Document content. Currently these are implicitly set as "all fields not part of the metadata_columns". In some cases however it is useful to have a field both as a metadata and as part of the document content.	2024-09-02 20:25:53 +00:00
Bruno Alvisio	ab527027ac	community: Resolve refs recursively when generating openai_fn from OpenAPI spec (#19002 ) - Description: This PR is intended to improve the generation of payloads for OpenAI functions when converting from an OpenAPI spec file. The solution is to recursively resolve `$refs`. Currently when converting OpenAPI specs into OpenAI functions using `openapi_spec_to_openai_fn`, if the schemas have nested references, the generated functions contain `$ref` that causes the LLM to generate payloads with an incorrect schema. For example, for the for OpenAPI spec: ``` text = """ { "openapi": "3.0.3", "info": { "title": "Swagger Petstore - OpenAPI 3.0", "termsOfService": "http://swagger.io/terms/", "contact": { "email": "apiteam@swagger.io" }, "license": { "name": "Apache 2.0", "url": "http://www.apache.org/licenses/LICENSE-2.0.html" }, "version": "1.0.11" }, "externalDocs": { "description": "Find out more about Swagger", "url": "http://swagger.io" }, "servers": [ { "url": "https://petstore3.swagger.io/api/v3" } ], "tags": [ { "name": "pet", "description": "Everything about your Pets", "externalDocs": { "description": "Find out more", "url": "http://swagger.io" } }, { "name": "store", "description": "Access to Petstore orders", "externalDocs": { "description": "Find out more about our store", "url": "http://swagger.io" } }, { "name": "user", "description": "Operations about user" } ], "paths": { "/pet": { "post": { "tags": [ "pet" ], "summary": "Add a new pet to the store", "description": "Add a new pet to the store", "operationId": "addPet", "requestBody": { "description": "Create a new pet in the store", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Pet" } } }, "required": true }, "responses": { "200": { "description": "Successful operation", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Pet" } } } } } } } }, "components": { "schemas": { "Tag": { "type": "object", "properties": { "id": { "type": "integer", "format": "int64" }, "model_type": { "type": "number" } } }, "Category": { "type": "object", "required": [ "model", "year", "age" ], "properties": { "year": { "type": "integer", "format": "int64", "example": 1 }, "model": { "type": "string", "example": "Ford" }, "age": { "type": "integer", "example": 42 } } }, "Pet": { "required": [ "name" ], "type": "object", "properties": { "id": { "type": "integer", "format": "int64", "example": 10 }, "name": { "type": "string", "example": "doggie" }, "category": { "$ref": "#/components/schemas/Category" }, "tags": { "type": "array", "items": { "$ref": "#/components/schemas/Tag" } }, "status": { "type": "string", "description": "pet status in the store", "enum": [ "available", "pending", "sold" ] } } } } } } """ ``` Executing: ``` spec = OpenAPISpec.from_text(text) pet_openai_functions, pet_callables = openapi_spec_to_openai_fn(spec) response = model.invoke("Create a pet named Scott", functions=pet_openai_functions) ``` `pet_open_functions` contains unresolved `$refs`: ``` [ { "name": "addPet", "description": "Add a new pet to the store", "parameters": { "type": "object", "properties": { "json": { "properties": { "id": { "type": "integer", "schema_format": "int64", "example": 10 }, "name": { "type": "string", "example": "doggie" }, "category": { "ref": "#/components/schemas/Category" }, "tags": { "items": { "ref": "#/components/schemas/Tag" }, "type": "array" }, "status": { "type": "string", "enum": [ "available", "pending", "sold" ], "description": "pet status in the store" } }, "type": "object", "required": [ "name", "photoUrls" ] } } } } ] ``` and the generated JSON has an incorrect schema (e.g. category is filled with `id` and `name` instead of `model`, `year` and `age`: ``` { "id": 1, "name": "Scott", "category": { "id": 1, "name": "Dogs" }, "tags": [ { "id": 1, "name": "tag1" } ], "status": "available" } ``` With this change, the generated JSON by the LLM becomes, `pet_openai_functions` becomes: ``` [ { "name": "addPet", "description": "Add a new pet to the store", "parameters": { "type": "object", "properties": { "json": { "properties": { "id": { "type": "integer", "schema_format": "int64", "example": 10 }, "name": { "type": "string", "example": "doggie" }, "category": { "properties": { "year": { "type": "integer", "schema_format": "int64", "example": 1 }, "model": { "type": "string", "example": "Ford" }, "age": { "type": "integer", "example": 42 } }, "type": "object", "required": [ "model", "year", "age" ] }, "tags": { "items": { "properties": { "id": { "type": "integer", "schema_format": "int64" }, "model_type": { "type": "number" } }, "type": "object" }, "type": "array" }, "status": { "type": "string", "enum": [ "available", "pending", "sold" ], "description": "pet status in the store" } }, "type": "object", "required": [ "name" ] } } } } ] ``` and the JSON generated by the LLM is: ``` { "id": 1, "name": "Scott", "category": { "year": 2022, "model": "Dog", "age": 42 }, "tags": [ { "id": 1, "model_type": 1 } ], "status": "available" } ``` which has the intended schema. - Twitter handle:: @brunoalvisio --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-09-02 13:17:39 -07:00
xander-art	6cd452d985	Feature/update hunyuan (#25779 ) Description: - Add system templates and user templates in integration testing - initialize the response id field value to request_id - Adjust the default model to hunyuan-pro - Remove the default values of Temperature and TopP - Add SystemMessage all the integration tests have passed. 1、Execute integration tests for the first time <img width="1359" alt="71ca77a2-e9be-4af6-acdc-4d665002bd9b" src="https://github.com/user-attachments/assets/9298dc3a-aa26-4bfa-968b-c011a4e699c9"> 2、Run the integration test a second time <img width="1501" alt="image" src="https://github.com/user-attachments/assets/61335416-4a67-4840-bb89-090ba668e237"> Issue: None Dependencies: None Twitter handle: None --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-09-02 12:55:08 +00:00
Yuwen Hu	566e9ba164	community: add Intel GPU support to `ipex-llm` llm integration (#22458 ) Description: [IPEX-LLM](https://github.com/intel-analytics/ipex-llm) is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency. This PR adds Intel GPU support to `ipex-llm` llm integration. Dependencies: `ipex-llm` Contribution maintainer: @ivy-lv11 @Oscilloscope98 tests and docs: - Add: langchain/docs/docs/integrations/llms/ipex_llm_gpu.ipynb - Update: langchain/docs/docs/integrations/llms/ipex_llm_gpu.ipynb - Update: langchain/libs/community/tests/llms/test_ipex_llm.py --------- Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>	2024-09-02 08:49:08 -04:00
ZhangShenao	fd0f147df3	Improvement[Community] Add tool-calling test case for `ChatZhipuAI` (#25884 ) - Add tool-calling test case for `ChatZhipuAI`	2024-08-30 12:05:43 -04:00
默奕	6377185291	add neo4j query constructor for self query (#25288 ) - [x] PR title - community: add neo4j query constructor for self query - [x] PR message - Description: adding a Neo4jTranslator so that the Neo4j vector database can use SelfQueryRetriever - Issue: this issue had been raised before in #19748 - Dependencies: none. - Twitter handle: @moyi_dang - p.s. I have not added the query constructor in BUILTIN_TRANSLATORS in this PR, I want to make changes to only one package at a time. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-08-30 14:54:33 +00:00
Allan Ascencio	a8af396a82	added octoai test (#21793 ) - [ ] PR title: community: add tests for ChatOctoAI - [ ] PR message: Description: Added unit tests for the ChatOctoAI class in the community package to ensure proper validation and default values. These tests verify the correct initialization of fields, the handling of missing required parameters, and the proper setting of aliases. Issue: N/A Dependencies: None --------- Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>	2024-08-29 15:07:27 +00:00
Param Singh	69f9acb60f	premai[patch]: Standardize premai params (#21513 ) Thank you for contributing to LangChain! community:premai[patch]: standardize init args - updated `temperature` with Pydantic Field, updated the unit test. - updated `max_tokens` with Pydantic Field, updated the unit test. - updated `max_retries` with Pydantic Field, updated the unit test. Related to #20085 --------- Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-08-29 11:01:28 -04:00
Guangdong Liu	fcf9230257	community(sparkllm): Add function call support in Sparkllm chat model. (#20607 ) - Description: Add function call support in Sparkllm chat model. Related documents https://www.xfyun.cn/doc/spark/Web.html#_2-function-call%E8%AF%B4%E6%98%8E - @baskaryan --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-08-29 14:38:39 +00:00
Mikhail Khludnev	a017f49fd3	comminity[patch]: fix #25575 YandexGPTs for _grpc_metadata (#25617 ) it fixes two issues: ### YGPTs are broken #25575 ``` File ....conda/lib/python3.11/site-packages/langchain_community/embeddings/yandex.py:211, in _make_request(self, texts, **kwargs) .. --> 211 res = stub.TextEmbedding(request, metadata=self._grpc_metadata) # type: ignore[attr-defined] AttributeError: 'YandexGPTEmbeddings' object has no attribute '_grpc_metadata' ``` My gut feeling that #23841 is the cause. I have to drop leading underscore from `_grpc_metadata` for quickfix, but I just don't know how to do it _pydantic_ enough. ### minor issue: if we use `api_key`, which is not the best practice the code fails with ``` File ~/git/...../python3.11/site-packages/langchain_community/embeddings/yandex.py:119, in YandexGPTEmbeddings.validate_environment(cls, values) ... AttributeError: 'tuple' object has no attribute 'append' ``` - Added new integration test. But it requires YGPT env available and active account. I don't know how int tests dis\enabled in CI. - added small unit tests with mocks. Should be fine. --------- Co-authored-by: mikhail-khludnev <mikhail_khludnev@rntgroup.com>	2024-08-28 18:48:10 -07:00
Serena Ruan	850bf89e48	community[patch]: Support passing extra params for executing functions in UCFunctionToolkit (#25652 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" Support passing extra params when executing UC functions: The params should be a dictionary with key EXECUTE_FUNCTION_ARG_NAME, the assumption is that the function itself doesn't use such variable name (starting and ending with double underscores), and if it does we raise Exception. If invalid params passing to the execute_statement, we raise Exception as well. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Signed-off-by: Serena Ruan <serena.rxy@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-08-28 18:47:32 -07:00
Moritz Schlager	555f97becb	community[patch]: fix model initialization bug for deepinfra (#25727 ) ### Description adds an init method to ChatDeepInfra to set the model_name attribute accordings to the argument ### Issue currently, the model_name specified by the user during initialization of the ChatDeepInfra class is never set. Therefore, it always chooses the default model (meta-llama/Llama-2-70b-chat-hf, however probably since this is deprecated it always uses meta-llama/Llama-3-70b-Instruct). We stumbled across this issue and fixed it as proposed in this pull request. Feel free to change the fix according to your coding guidelines and style, this is just a proposal and we want to draw attention to this problem. ### Dependencies no additional dependencies required Feel free to contact me or @timo282 and @finitearth if you have any questions. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-08-28 02:02:35 -07:00
Tomaz Bratanic	f359e6b0a5	Add mmr to neo4j vector (#25765 )	2024-08-27 08:55:19 -04:00
Luis Valencia	99f9a664a5	community: Azure Search Vector Store is missing Access Token Authentication (#24330 ) Added Azure Search Access Token Authentication instead of API KEY auth. Fixes Issue: https://github.com/langchain-ai/langchain/issues/24263 Dependencies: None Twitter: @levalencia @baskaryan Could you please review? First time creating a PR that fixes some code. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-08-26 15:41:50 -07:00
maang-h	a566a15930	Fix MoonshotChat instantiate with alias (#25755 ) - Description: - Fix `MoonshotChat` instantiate with alias - Add `MoonshotChat` to `__init__.py` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-08-26 17:33:22 +00:00
Sharmistha S. Gupta	90439b12f6	Added support for Nebula Chat model (#21925 ) Description: Added support for Nebula Chat model in addition to Nebula Instruct Dependencies: N/A Twitter handle: @Symbldotai --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-08-23 22:34:32 +00:00
Ian	64ace25eb8	<Community>: tidb vector support vector index (#19984 ) This PR introduces adjustments to ensure compatibility with the recently released preview version of [TiDB Serverless Vector Search](https://tidb.cloud/ai), aiming to prevent user confusion. - TiDB Vector now supports vector indexing with cosine and l2 distance strategies, although inner_product remains unsupported. - Changing the distance strategy is currently not supported, so the test cased should be adjusted.	2024-08-23 13:59:23 -04:00
Austin Burdette	f355a98bb6	community:yuan2[patch]: standardize init args (#21462 ) updated stop and request_timeout so they aliased to stop_sequences, and timeout respectively. Added test that both continue to set the same underlying attributes. Related to [20085](https://github.com/langchain-ai/langchain/issues/20085) Co-authored-by: ccurme <chester.curme@gmail.com>	2024-08-23 17:56:19 +00:00
Christophe Bornet	7f1e444efa	partners: Use simsimd types (#25299 ) The simsimd package [now has types](https://github.com/ashvardanian/SimSIMD/releases/tag/v5.0.0)	2024-08-23 10:41:39 -04:00
Erik Lindgren	583b0449eb	community[patch]: Fix Hybrid Search for non-Databricks managed embeddings (#25590 ) Description: Send both the query and query_embedding to the Databricks index for hybrid search. Issue: When using hybrid search with non-Databricks managed embedding we currently don't pass both the embedding and query_text to the index. Hybrid search requires both of these. This change fixes this issue for both `similarity_search` and `similarity_search_by_vector`. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-08-23 08:57:13 +00:00

1 2 3 4 5 ...

676 Commits