langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-09-27 22:37:46 +00:00

Author	SHA1	Message	Date
yeounhak	f38fc89f35	community: Corrected aload func to be asynchronous from webBaseLoader (#28337 ) - Description: The aload function, contrary to its name, is not an asynchronous function, so it cannot work concurrently with other asynchronous functions. - Issue: #28336 - Test: : Done - Docs: [here](`e0a95e5646/docs/docs/integrations/document_loaders/web_base.ipynb (L201)`) - Lint: All checks passed If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-20 14:42:52 -05:00
ScriptShi	97f1e1d39f	community: tablestore vector store check the dimension of the embedding when writing it to store. (#28812 ) Added some restrictions to a vectorstore I released in the community before.	2024-12-19 09:30:43 -05:00
binhnd102	f723a8456e	Fixes: community: fix LanceDB return no metadata (#27024 ) - [ x ] Fix when lancedb return table without metadata column - Description: Check the table schema, if not has metadata column, init the Document with metadata argument equal to empty dict - Issue: https://github.com/langchain-ai/langchain/issues/27005 - [ x ] Add tests and docs --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-18 15:21:28 +00:00
gsa9989	cdf6202156	cosmosdbnosql: Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook (#24424 ) * Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook --------- Co-authored-by: Aayush Kataria <aayushkataria3011@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-16 21:57:05 -05:00
Tari Yekorogha	d262d41cc0	community: added FalkorDB vector store support i.e implementation, test, docs an… (#26245 ) Description: Added support for FalkorDB Vector Store, including its implementation, unit tests, documentation, and an example notebook. The FalkorDB integration allows users to efficiently manage and query embeddings in a vector database, with relevance scoring and maximal marginal relevance search. The following components were implemented: - Core implementation for FalkorDBVector store. - Unit tests ensuring proper functionality and edge case coverage. - Example notebook demonstrating an end-to-end setup, search, and retrieval using FalkorDB. Twitter handle: @tariyekorogha --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 19:37:55 +00:00
Aayush Kataria	d417e4b372	Community: Azure CosmosDB No Sql Vector Store: Full Text and Hybrid Search Support (#28716 ) Thank you for contributing to LangChain! - Added [full text](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/full-text-search) and [hybrid search](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search) support for Azure CosmosDB NoSql Vector Store - Added a new enum called CosmosDBQueryType which supports the following values: - VECTOR = "vector" - FULL_TEXT_SEARCH = "full_text_search" - FULL_TEXT_RANK = "full_text_rank" - HYBRID = "hybrid" - User now needs to provide this query_type to the similarity_search method for the vectorStore to make the correct query api call. - Added a couple of work arounds as for the FULL_TEXT_RANK and HYBRID query functions we don't support parameterized queries right now. I have added TODO's in place, and will remove these work arounds by end of January. - Added necessary test cases and updated the - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-12-15 13:26:32 -08:00
Karthik Bharadhwaj	498f0249e2	community[minor]: Opensearch hybridsearch implementation (#25375 ) community: add hybrid search in opensearch # Langchain OpenSearch Hybrid Search Implementation ## Implementation of Hybrid Search: I have taken LangChain's OpenSearch integration to the next level by adding hybrid search capabilities. Building on the existing OpenSearchVectorSearch class, I have implemented Hybrid Search functionality (which combines the best of both keyword and semantic search). This new functionality allows users to harness the power of OpenSearch's advanced hybrid search features without leaving the familiar LangChain ecosystem. By blending traditional text matching with vector-based similarity, the enhanced class delivers more accurate and contextually relevant results. It's designed to seamlessly fit into existing LangChain workflows, making it easy for developers to upgrade their search capabilities. In implementing the hybrid search for OpenSearch within the LangChain framework, I also incorporated filtering capabilities. It's important to note that according to the OpenSearch hybrid search documentation, only post-filtering is supported for hybrid queries. This means that the filtering is applied after the hybrid search results are obtained, rather than during the initial search process. Note: For the implementation of hybrid search, I strictly followed the official OpenSearch Hybrid search documentation and I took inspiration from https://github.com/AndreasThinks/langchain/tree/feature/opensearch_hybrid_search Thanks Mate! ### Experiments I conducted few experiments to verify that the hybrid search implementation is accurate and capable of reproducing the results of both plain keyword search and vector search. Experiment - 1 Hybrid Search Keyword_weight: 1, vector_weight: 0 I conducted an experiment to verify the accuracy of my hybrid search implementation by comparing it to a plain keyword search. For this test, I set the keyword_weight to 1 and the vector_weight to 0 in the hybrid search, effectively giving full weightage to the keyword component. The results from this hybrid search configuration matched those of a plain keyword search, confirming that my implementation can accurately reproduce keyword-only search results when needed. It's important to note that while the results were the same, the scores differed between the two methods. This difference is expected because the plain keyword search in OpenSearch uses the BM25 algorithm for scoring, whereas the hybrid search still performs both keyword and vector searches before normalizing the scores, even when the vector component is given zero weight. This experiment validates that my hybrid search solution correctly handles the keyword search component and properly applies the weighting system, demonstrating its accuracy and flexibility in emulating different search scenarios. Experiment - 2 Hybrid Search keyword_weight = 0.0, vector_weight = 1.0 For experiment-2, I took the inverse approach to further validate my hybrid search implementation. I set the keyword_weight to 0 and the vector_weight to 1, effectively giving full weightage to the vector search component (KNN search). I then compared these results with a pure vector search. The outcome was consistent with my expectations: the results from the hybrid search with these settings exactly matched those from a standalone vector search. This confirms that my implementation accurately reproduces vector search results when configured to do so. As with the first experiment, I observed that while the results were identical, the scores differed between the two methods. This difference in scoring is expected and can be attributed to the normalization process in hybrid search, which still considers both components even when one is given zero weight. This experiment further validates the accuracy and flexibility of my hybrid search solution, demonstrating its ability to effectively emulate pure vector search when needed while maintaining the underlying hybrid search structure. Experiment - 3 Hybrid Search - balanced keyword_weight = 0.5, vector_weight = 0.5 For experiment-3, I adopted a balanced approach to further evaluate the effectiveness of my hybrid search implementation. In this test, I set both the keyword_weight and vector_weight to 0.5, giving equal importance to keyword-based and vector-based search components. This configuration aims to leverage the strengths of both search methods simultaneously. By setting both weights to 0.5, I intended to create a scenario where the hybrid search would consider lexical matches and semantic similarity equally. This balanced approach is often ideal for many real-world applications, as it can capture both exact keyword matches and contextually relevant results that might not contain the exact search terms. Kindly verify the notebook for the experiments conducted! Notebook: https://github.com/karthikbharadhwajKB/Langchain_OpenSearch_Hybrid_search/blob/main/Opensearch_Hybridsearch.ipynb ### Instructions to follow for Performing Hybrid Search: Step-1: Instantiating OpenSearchVectorSearch Class: ```python opensearch_vectorstore = OpenSearchVectorSearch( index_name=os.getenv("INDEX_NAME"), embedding_function=embedding_model, opensearch_url=os.getenv("OPENSEARCH_URL"), http_auth=(os.getenv("OPENSEARCH_USERNAME"),os.getenv("OPENSEARCH_PASSWORD")), use_ssl=False, verify_certs=False, ssl_assert_hostname=False, ssl_show_warn=False ) ``` Parameters: 1. index_name: The name of the OpenSearch index to use. 2. embedding_function: The function or model used to generate embeddings for the documents. It's assumed that embedding_model is defined elsewhere in the code. 3. opensearch_url: The URL of the OpenSearch instance. 4. http_auth: A tuple containing the username and password for authentication. 5. use_ssl: Set to False, indicating that the connection to OpenSearch is not using SSL/TLS encryption. 6. verify_certs: Set to False, which means the SSL certificates are not being verified. This is often used in development environments but is not recommended for production. 7. ssl_assert_hostname: Set to False, disabling hostname verification in SSL certificates. 8. ssl_show_warn: Set to False, suppressing SSL-related warnings. Step-2: Configure Search Pipeline: To initiate hybrid search functionality, you need to configures a search pipeline first. Implementation Details: This method configures a search pipeline in OpenSearch that: 1. Normalizes the scores from both keyword and vector searches using the min-max technique. 2. Applies the specified weights to the normalized scores. 3. Calculates the final score using an arithmetic mean of the weighted, normalized scores. Parameters: * pipeline_name (str): A unique identifier for the search pipeline. It's recommended to use a descriptive name that indicates the weights used for keyword and vector searches. * keyword_weight (float): The weight assigned to the keyword search component. This should be a float value between 0 and 1. In this example, 0.3 gives 30% importance to traditional text matching. * vector_weight (float): The weight assigned to the vector search component. This should be a float value between 0 and 1. In this example, 0.7 gives 70% importance to semantic similarity. ```python opensearch_vectorstore.configure_search_pipelines( pipeline_name="search_pipeline_keyword_0.3_vector_0.7", keyword_weight=0.3, vector_weight=0.7, ) ``` Step-3: Performing Hybrid Search: After creating the search pipeline, you can perform a hybrid search using the `similarity_search()` method (or) any methods that are supported by `langchain`. This method combines both `keyword-based and semantic similarity` searches on your OpenSearch index, leveraging the strengths of both traditional information retrieval and vector embedding techniques. parameters: * query: The search query string. * k: The number of top results to return (in this case, 3). * search_type: Set to `hybrid_search` to use both keyword and vector search capabilities. * search_pipeline: The name of the previously created search pipeline. ```python query = "what are the country named in our database?" top_k = 3 pipeline_name = "search_pipeline_keyword_0.3_vector_0.7" matched_docs = opensearch_vectorstore.similarity_search_with_score( query=query, k=top_k, search_type="hybrid_search", search_pipeline = pipeline_name ) matched_docs ``` twitter handle: @iamkarthik98 --------- Co-authored-by: Karthik Kolluri <karthik.kolluri@eidosmedia.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 16:34:12 -05:00
ScriptShi	b0a298894d	community[minor]: Add TablestoreVectorStore (#25767 ) Thank you for contributing to LangChain! - [x] PR title: community: add TablestoreVectorStore - [x] PR message: - Description: add TablestoreVectorStore - Dependencies: none - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration: yes 2. an example notebook showing its use: yes If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-12-13 11:17:28 -08:00
fatmelon	d1e0ec7b55	community: VectorStores: Azure Cosmos DB Mongo vCore with DiskANN (#27329 ) # Description Add a new vector index type `diskann` to Azure Cosmos DB Mongo vCore vector store. Paper of DiskANN can be found here [DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf). ## Sample Usage ```python from pymongo import MongoClient # INDEX_NAME = "izzy-test-index-2" # NAMESPACE = "izzy_test_db.izzy_test_collection" # DB_NAME, COLLECTION_NAME = NAMESPACE.split(".") client: MongoClient = MongoClient(CONNECTION_STRING) collection = client[DB_NAME][COLLECTION_NAME] model_deployment = os.getenv( "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada" ) model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002") vectorstore = AzureCosmosDBVectorSearch.from_documents( docs, openai_embeddings, collection=collection, index_name=INDEX_NAME, ) # Read more about these variables in detail here. https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search maxDegree = 40 dimensions = 1536 similarity_algorithm = CosmosDBSimilarityType.COS kind = CosmosDBVectorSearchType.VECTOR_DISKANN lBuild = 20 vectorstore.create_index( dimensions=dimensions, similarity=similarity_algorithm, kind=kind , max_degree=maxDegree, l_build=lBuild, ) ``` ## Dependencies No additional dependencies were added --------- Co-authored-by: Yang Qiao (from Dev Box) <yangqiao@microsoft.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 01:54:04 +00:00
Brian Sharon	b20230c800	community: use correct `id_key` when deleting by id in LanceDB wrapper (#28655 ) - Description: The current version of the `delete` method assumes that the id field will always be called `id`. - Issue: n/a - Dependencies: n/a - Twitter handle: ugh, Twitter :D --- Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 23:49:35 +00:00
Bagatur	e6a62d8422	core,langchain,community[patch]: allow langsmith 0.2 (#28598 )	2024-12-10 18:50:58 +00:00
ccurme	2c6bc74cb1	multiple: combine sync/async vector store standard test suites (#28580 ) Breaking change in `langchain-tests`.	2024-12-06 14:55:06 -05:00
cinqisap	482e8a7855	community: Add support for SAP HANA Vector hnsw index creation (#27884 ) Issue: Added support for creating indexes in the SAP HANA Vector engine. Changes: 1. Introduced a new function `create_hnsw_index` in `hanavector.py` that enables the creation of indexes for SAP HANA Vector. 2. Added integration tests for the index creation function to ensure functionality. 3. Updated the documentation to reflect the new index creation feature, including examples and output from the notebook. 4. Fix the operator issue in ` _process_filter_object` function and change the array argument to a placeholder in the similarity search SQL statement. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-05 23:29:08 +00:00
Erick Friis	0dbaf05bb7	standard-tests: rename langchain_standard_tests to langchain_tests, release 0.3.2 (#28203 )	2024-11-18 19:10:39 -08:00
Eric Pinzur	ea0ad917b0	community: added Document.id support to opensearch vectorstore (#27945 ) Description: * Added support of Document.id on OpenSearch vector store * Added tests cases to match	2024-11-06 15:04:09 -05:00
Ofer Mendelevitch	d7c39e6dbb	community: update Vectara integration (#27869 ) Thank you for contributing to LangChain! - Description: Updated Vectara integration - Issue: refresh on descriptions across all demos and added UDF reranker - Dependencies: None - Twitter handle: @ofermend --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:40:39 +00:00
Erick Friis	600b7bdd61	all: test 3.13 ci (#27197 ) Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-10-25 12:56:58 -07:00
Eric Pinzur	f636c83321	community: Cassandra Vector Store: modernize implementation (#27253 ) Description: This PR updates `CassandraGraphVectorStore` to be based off `CassandraVectorStore`, instead of using a custom CQL implementation. This allows users using a `CassandraVectorStore` to upgrade to a `GraphVectorStore` without having to change their database schema or re-embed documents. This PR also updates the documentation of the `GraphVectorStore` base class and contains native async implementations for the standard graph methods: `traversal_search` and `mmr_traversal_search` in `CassandraVectorStore`. Issue: No issue number. Dependencies: https://github.com/langchain-ai/langchain/pull/27078 (already-merged) Lint and test: - Lint and tests all pass, including existing `CassandraGraphVectorStore` tests. - Also added numerous additional tests based of the tests in `langchain-astradb` which cover many more scenarios than the existing tests for `Cassandra` and `CassandraGraphVectorStore` BREAKING CHANGE Note that this is a breaking change for existing users of `CassandraGraphVectorStore`. They will need to wipe their database table and restart. However: - The interfaces have not changed. Just the underlying storage mechanism. - Any one using `langchain_community.vectorstores.Cassandra` can instead use `langchain_community.graph_vectorstores.CassandraGraphVectorStore` and they will gain Graph capabilities without having to re-embed their existing documents. This is the primary goal of this PR. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-22 18:11:11 +00:00
Erick Friis	92ae61bcc8	multiple: rely on asyncio_mode auto in tests (#27200 )	2024-10-15 16:26:38 +00:00
Vittorio Rigamonti	7da2efd9d3	community[minor]: VectorStore Infinispan. Adding TLS and authentication (#23522 ) Description: this PR enable VectorStore TLS and authentication (digest, basic) with HTTP/2 for Infinispan server. Based on httpx. Added docker-compose facilities for testing Added documentation Dependencies: requires `pip install httpx[http2]` if HTTP2 is needed Twitter handle: https://twitter.com/infinispan	2024-10-09 10:51:39 -04:00
Stefano Lottini	d05fdd97dd	community: Cassandra Vector Store: extend metadata-related methods (#27078 ) Description: this PR adds a set of methods to deal with metadata associated to the vector store entries. These, while essential to the Graph-related extension of the `Cassandra` vector store, are also useful in themselves. These are (all come in their sync+async versions): - `[a]delete_by_metadata_filter` - `[a]replace_metadata` - `[a]get_by_document_id` - `[a]metadata_search` Additionally, a `[a]similarity_search_with_embedding_id_by_vector` method is introduced to better serve the store's internal working (esp. related to reranking logic). Issue: no issue number, but now all Document's returned bear their `.id` consistently (as a consequence of a slight refactoring in how the raw entries read from DB are made back into `Document` instances). Dependencies: (no new deps: packaging comes through langchain-core already; `cassio` is now required to be version 0.1.10+) Add tests and docs Added integration tests for the relevant newly-introduced methods. (Docs will be updated in a separate PR). Lint and test Lint and (updated) test all pass. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-09 06:41:34 +00:00
Abhi Agarwal	696114e145	community: add sqlite-vec vectorstore (#25003 ) Description: Adds a vector store integration with [sqlite-vec](https://alexgarcia.xyz/sqlite-vec/), the successor to sqlite-vss that is a single C file with no external dependencies. Pretty straightforward, just copy-pasted the sqlite-vss integration and made a few tweaks and added integration tests. Only question is whether all documentation should be directed away from sqlite-vss if it is defacto deprecated (cc @asg017). --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: philippe-oger <philippe.oger@adevinta.com>	2024-09-26 17:37:10 +00:00
Tomaz Bratanic	f359e6b0a5	Add mmr to neo4j vector (#25765 )	2024-08-27 08:55:19 -04:00
Luis Valencia	99f9a664a5	community: Azure Search Vector Store is missing Access Token Authentication (#24330 ) Added Azure Search Access Token Authentication instead of API KEY auth. Fixes Issue: https://github.com/langchain-ai/langchain/issues/24263 Dependencies: None Twitter: @levalencia @baskaryan Could you please review? First time creating a PR that fixes some code. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-08-26 15:41:50 -07:00
Ian	64ace25eb8	<Community>: tidb vector support vector index (#19984 ) This PR introduces adjustments to ensure compatibility with the recently released preview version of [TiDB Serverless Vector Search](https://tidb.cloud/ai), aiming to prevent user confusion. - TiDB Vector now supports vector indexing with cosine and l2 distance strategies, although inner_product remains unsupported. - Changing the distance strategy is currently not supported, so the test cased should be adjusted.	2024-08-23 13:59:23 -04:00
ccurme	ba167dc158	community[patch]: update connection string in azure cosmos integration test (#25438 )	2024-08-15 14:07:54 +00:00
Chaunte W. Lacewell	69eacaa887	Community[minor]: Update VDMS vectorstore (#23729 ) Description: - This PR exposes some functions in VDMS vectorstore, updates VDMS related notebooks, updates tests, and upgrade version of VDMS (>=0.0.20) Issue: N/A Dependencies: - Update vdms>=0.0.20	2024-07-25 22:13:04 -04:00
Aayush Kataria	0f45ac4088	LangChain Community: VectorStores: Azure Cosmos DB Filtered Vector Search (#24087 ) Thank you for contributing to LangChain! - This PR adds vector search filtering for Azure Cosmos DB Mongo vCore and NoSQL. - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-07-23 16:59:23 -07:00
ZhangShenao	0f6737cbfe	[Vector Store] Fix function `add_texts` in `TencentVectorDB` (#24469 ) Regardless of whether `embedding_func` is set or not, the 'text' attribute of document should be assigned, otherwise the `page_content` in the document of the final search result will be lost	2024-07-22 09:50:22 -04:00
Rafael Pereira	cf28708e7b	Neo4j: Update with non-deprecated cypher methods, and new method to associate relationship embeddings (#23725 ) Description: At the moment neo4j wrapper is using setVectorProperty, which is deprecated ([link](https://neo4j.com/docs/operations-manual/5/reference/procedures/#procedure_db_create_setVectorProperty)). I replaced with the non-deprecated version. Neo4j recently introduced a new cypher method to associate embeddings into relations using "setRelationshipVectorProperty" method. In this PR I also implemented a new method to perform this association maintaining the same format used in the "add_embeddings" method which is used to associate embeddings into Nodes. I also included a test case for this new method.	2024-07-17 12:37:47 -04:00
Rafael Pereira	fc41730e28	neo4j: Fix test for order-insensitive comparison and floating-point precision issues (#24338 ) Description: This PR addresses two main issues in the `test_neo4jvector.py`: 1. Order-insensitive Comparison: Modified the `test_retrieval_dictionary` to ensure that it passes regardless of the order of returned values by parsing `page_content` into a structured format (dictionary) before comparison. 2. Floating-point Precision: Updated `test_neo4jvector_relevance_score` to handle minor floating-point precision differences by using the `isclose` function for comparing relevance scores with a relative tolerance. Errors addressed: - test_neo4jvector_relevance_score: ``` AssertionError: assert [(Document(page_content='foo', metadata={'page': '0'}), 1.0000014305114746), (Document(page_content='bar', metadata={'page': '1'}), 0.9998371005058289), (Document(page_content='baz', metadata={'page': '2'}), 0.9993508458137512)] == [(Document(page_content='foo', metadata={'page': '0'}), 1.0), (Document(page_content='bar', metadata={'page': '1'}), 0.9998376369476318), (Document(page_content='baz', metadata={'page': '2'}), 0.9993523359298706)] At index 0 diff: (Document(page_content='foo', metadata={'page': '0'}), 1.0000014305114746) != (Document(page_content='foo', metadata={'page': '0'}), 1.0) Full diff: - [(Document(page_content='foo', metadata={'page': '0'}), 1.0), + [(Document(page_content='foo', metadata={'page': '0'}), 1.0000014305114746), ? +++++++++++++++ - (Document(page_content='bar', metadata={'page': '1'}), 0.9998376369476318), ? ^^^ ------ + (Document(page_content='bar', metadata={'page': '1'}), 0.9998371005058289), ? ^^^^^^^^^ - (Document(page_content='baz', metadata={'page': '2'}), 0.9993523359298706), ? ---------- + (Document(page_content='baz', metadata={'page': '2'}), 0.9993508458137512), ? ++++++++++ ] ``` - test_retrieval_dictionary: ``` AssertionError: assert [Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine Learning\nname: John\nage: 30\n')] == [Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine Learning\nage: 30\nname: John\n')] At index 0 diff: Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine Learning\nname: John\nage: 30\n') != Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine Learning\nage: 30\nname: John\n') Full diff: - [Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine Learning\nage: 30\nname: John\n')] ? --------- + [Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine Learning\nage: John\nage: 30\n')] ? +++++++++ ```	2024-07-17 09:28:25 -04:00
bovlb	5caa381177	community[minor]: Add ApertureDB as a vectorstore (#24088 ) Thank you for contributing to LangChain! - [X] ApertureDB as vectorstore: "community: Add ApertureDB as a vectorestore" - Description:* this change provides a new community integration that uses ApertureData's ApertureDB as a vector store. - Issue: none - Dependencies: depends on ApertureDB Python SDK - Twitter handle: ApertureData - [X] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. Integration tests rely on a local run of a public docker image. Example notebook additionally relies on a local Ollama server. - [X] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ All lint tests pass. Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Gautam <gautam@aperturedata.io>	2024-07-16 09:32:59 -07:00
Eugene Yurtsev	2c180d645e	core[minor],community[minor]: Upgrade all @root_validator() to @pre_init (#23841 ) This PR introduces a @pre_init decorator that's a @root_validator(pre=True) but with all the defaults populated!	2024-07-08 16:09:29 -04:00
volodymyr-memsql	a4eb6d0fb1	community: add SingleStoreDB semantic cache (#23218 ) This PR adds a `SingleStoreDBSemanticCache` class that implements a cache based on SingleStoreDB vector store, integration tests, and a notebook example. Additionally, this PR contains minor changes to SingleStoreDB vector store: - change add texts/documents methods to return a list of inserted ids - implement delete(ids) method to delete documents by list of ids - added drop() method to drop a correspondent database table - updated integration tests to use and check functionality implemented above CC: @baskaryan, @hwchase17 --------- Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>	2024-07-05 09:26:06 -04:00
Bagatur	a0c2281540	infra: update mypy 1.10, ruff 0.5 (#23721 ) ```python """python scripts/update_mypy_ruff.py""" import glob import tomllib from pathlib import Path import toml import subprocess import re ROOT_DIR = Path(__file__).parents[1] def main(): for path in glob.glob(str(ROOT_DIR / "libs/*/pyproject.toml"), recursive=True): print(path) with open(path, "rb") as f: pyproject = tomllib.load(f) try: pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = ( "^1.10" ) pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = ( "^0.5" ) except KeyError: continue with open(path, "w") as f: toml.dump(pyproject, f) cwd = "/".join(path.split("/")[:-1]) completed = subprocess.run( "poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color", cwd=cwd, shell=True, capture_output=True, text=True, ) logs = completed.stdout.split("\n") to_ignore = {} for l in logs: if re.match("^(.)\:(\d+)\: error:.\[(.)\]", l): path, line_no, error_type = re.match( "^(.)\:(\d+)\: error:.\[(.*)\]", l ).groups() if (path, line_no) in to_ignore: to_ignore[(path, line_no)].append(error_type) else: to_ignore[(path, line_no)] = [error_type] print(len(to_ignore)) for (error_path, line_no), error_types in to_ignore.items(): all_errors = ", ".join(error_types) full_path = f"{cwd}/{error_path}" try: with open(full_path, "r") as f: file_lines = f.readlines() except FileNotFoundError: continue file_lines[int(line_no) - 1] = ( file_lines[int(line_no) - 1][:-1] + f" # type: ignore[{all_errors}]\n" ) with open(full_path, "w") as f: f.write("".join(file_lines)) subprocess.run( "poetry run ruff format .; poetry run ruff --select I --fix .", cwd=cwd, shell=True, capture_output=True, text=True, ) if __name__ == "__main__": main() ```	2024-07-03 10:33:27 -07:00
Raghav Dixit	55705c0f5e	LanceDB integration update (#22869 ) Added : - [x] relevance search (w/wo scores) - [x] maximal marginal search - [x] image ingestion - [x] filtering support - [x] hybrid search w reranking make test, lint_diff and format checked.	2024-06-17 20:54:26 -07:00
Aayush Kataria	71811e0547	community[minor]: Adds a vector store for Azure Cosmos DB for NoSQL (#21676 ) This PR add supports for Azure Cosmos DB for NoSQL vector store. Summary: Description: added vector store integration for Azure Cosmos DB for NoSQL Vector Store, Dependencies: azure-cosmos dependency, Tag maintainer: @hwchase17, @baskaryan @efriis @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-06-11 10:34:01 -07:00
am-kinetica	ad101adec8	community[patch]: Kinetica Integrations handled error in querying; quotes in table names; updated gpudb API (#22724 ) - [ ] Miscellaneous updates and fixes: - Description: Handled error in querying; quotes in table names; updated gpudb API - Issue: Threw an error with an error message difficult to understand if a query failed or returned no records - Dependencies: Updated GPUDB API version to `7.2.0.9` @baskaryan @hwchase17	2024-06-11 10:01:26 -04:00
Erick Friis	a24a9c6427	multiple: get rid of pyproject extras (#22581 ) They cause `poetry lock` to take a ton of time, and `uv pip install` can resolve the constraints from these toml files in trivial time (addressing problem with #19153) This allows us to properly upgrade lockfile dependencies moving forward, which revealed some issues that were either fixed or type-ignored (see file comments)	2024-06-06 15:45:22 -07:00
Stefano Lottini	328d0c99f2	community[minor]: Add support for metadata indexing policy in Cassandra vector store (#22548 ) This PR adds a constructor `metadata_indexing` parameter to the Cassandra vector store to allow optional fine-tuning of which fields of the metadata are to be indexed. This is a feature supported by the underlying CassIO library. Indexing mode of "all", "none" or deny- and allow-list based choices are available. The rationale is, in some cases it's advisable to programmatically exclude some portions of the metadata from the index if one knows in advance they won't ever be used at search-time. this keeps the index more lightweight and performant and avoids limitations on the length of _indexed_ strings. I added a integration test of the feature. I also added the possibility of running the integration test with Cassandra on an arbitrary IP address (e.g. Dockerized), via `CASSANDRA_CONTACT_POINTS=10.1.1.5,10.1.1.6 poetry run pytest [...]` or similar. While I was at it, I added a line to the `.gitignore` since the mypy _test_ cache was not ignored yet. My X (Twitter) handle: @rsprrs.	2024-06-05 11:23:26 -04:00
Ofer Mendelevitch	ad502e8d50	community[minor]: Vectara Integration Update - Streaming, FCS, Chat, updates to documentation and example notebooks (#21334 ) Thank you for contributing to LangChain! Description: update to the Vectara / Langchain integration to integrate new Vectara capabilities: - Full RAG implemented as a Runnable with as_rag() - Vectara chat supported with as_chat() - Both support streaming response - Updated documentation and example notebook to reflect all the changes - Updated Vectara templates Twitter handle: ofermend Add tests and docs: no new tests or docs, but updated both existing tests and existing docs	2024-06-04 12:57:28 -07:00
Bagatur	50186da0a1	infra: rm unused # noqa violations (#22049 ) Updating #21137	2024-05-22 15:21:08 -07:00
SaschaStoll	709664a079	community[patch]: Performant filter columns option for Hanavector (#21971 ) Description: Backwards compatible extension of the initialisation interface of HanaDB to allow the user to specify specific_metadata_columns that are used for metadata storage of selected keys which yields increased filter performance. Any not-mentioned metadata remains in the general metadata column as part of a JSON string. Furthermore switched to executemany for batch inserts into HanaDB. Issue: N/A Dependencies: no new dependencies added Twitter handle: @sapopensource --------- Co-authored-by: Martin Kolb <martin.kolb@sap.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-05-22 13:21:21 -07:00
Jesse S	fc79b372cb	community[minor]: add aerospike vectorstore integration (#21735 ) Please let me know if you see any possible areas of improvement. I would very much appreciate your constructive criticism if time allows. Description: - Added a aerospike vector store integration that utilizes [Aerospike-Vector-Search](https://aerospike.com/products/vector-database-search-llm/) add-on. - Added both unit tests and integration tests - Added a docker compose file for spinning up a test environment - Added a notebook Dependencies: any dependencies required for this change - aerospike-vector-search Twitter handle: - No twitter, you can use my GitHub handle or LinkedIn if you'd like Thanks! --------- Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-05-21 01:01:47 +00:00
Eugene Yurtsev	25fbe356b4	community[patch]: upgrade to recent version of mypy (#21616 ) This PR upgrades community to a recent version of mypy. It inserts type: ignore on all existing failures.	2024-05-13 14:55:07 -04:00
Mark Cusack	060987d755	community[minor]: Add indexing via locality sensitive hashing to the Yellowbrick vector store (#20856 ) - Description: Add LSH-based indexing to the Yellowbrick vector store module - Twitter handle: @markcusack --------- Co-authored-by: markcusack <markcusack@markcusacksmac.lan> Co-authored-by: markcusack <markcusack@Mark-Cusack-sMac.local> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-05-06 20:18:02 +00:00
Rohan Aggarwal	8021d2a2ab	community[minor]: Oraclevs integration (#21123 ) Thank you for contributing to LangChain! - Oracle AI Vector Search Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems. - Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems. This Pull Requests Adds the following functionalities Oracle AI Vector Search : Vector Store Oracle AI Vector Search : Document Loader Oracle AI Vector Search : Document Splitter Oracle AI Vector Search : Summary Oracle AI Vector Search : Oracle Embeddings - We have added unit tests and have our own local unit test suite which verifies all the code is correct. We have made sure to add guides for each of the components and one end to end guide that shows how the entire thing runs. - We have made sure that make format and make lint run clean. Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: skmishraoracle <shailendra.mishra@oracle.com> Co-authored-by: hroyofc <harichandan.roy@oracle.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-05-04 03:15:35 +00:00
Cahid Arda Öz	cc6191cb90	community[minor]: Add support for Upstash Vector (#20824 ) ## Description Adding `UpstashVectorStore` to utilize [Upstash Vector](https://upstash.com/docs/vector/overall/getstarted)! #17012 was opened to add Upstash Vector to langchain but was closed to wait for filtering. Now filtering is added to Upstash vector and we open a new PR. Additionally, [embedding feature](https://upstash.com/docs/vector/features/embeddingmodels) was added and we add this to our vectorstore aswell. ## Dependencies [upstash-vector](https://pypi.org/project/upstash-vector/) should be installed to use `UpstashVectorStore`. Didn't update dependencies because of [this comment in the previous PR](https://github.com/langchain-ai/langchain/pull/17012#pullrequestreview-1876522450). ## Tests Tests are added and they pass. Tests are naturally network bound since Upstash Vector is offered through an API. There was [a discussion in the previous PR about mocking the unittests](https://github.com/langchain-ai/langchain/pull/17012#pullrequestreview-1891820567). We didn't make changes to this end yet. We can update the tests if you can explain how the tests should be mocked. --------- Co-authored-by: ytkimirti <yusuftaha9@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-29 17:25:01 -04:00
Mayank Solanki	8c085fc697	community[patch]: Added a function `from_existing_collection` in `Qdrant` vector database. (#20779 ) Issue: #20514 The current implementation of `construct_instance` expects a `texts: List[str]` that will call the embedding function. This might not be needed when we already have a client with collection and `path, you don't want to add any text. This PR adds a class method that returns a qdrant instance with an existing client. Here everytime `cb6e5e56c2/libs/community/langchain_community/vectorstores/qdrant.py (L1592)` `construct_instance` is called, this line sends some text for embedding generation. --------- Co-authored-by: Anush <anushshetty90@gmail.com>	2024-04-26 15:34:09 -07:00
Jingpan Xiong	1202017c56	community[minor]: Add relyt vector database (#20316 ) Co-authored-by: kaka <kaka@zbyte-inc.cloud> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: jingsi <jingsi@leadincloud.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-04-25 19:49:29 +00:00

1 2 3

109 Commits