langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-08-14 07:07:34 +00:00

Author	SHA1	Message	Date
Mohammad Mohtashim	8d746086ab	Added `bind_tools` support for `ChatMLX` along with small fix in `_stream` (#28743 ) - Description: Added Support for `bind_tool` as requested in the issue. Plus two issue in `_stream` were fixed: - Corrected the Positional Argument Passing for `generate_step` - Accountability if `token` returned by `generate_step` is integer. - Issue: #28692	2024-12-16 09:52:49 -05:00
Jorge Piedrahita Ortiz	558b65ea32	community: SamabaStudio Tool Calling and Structured Output (#28025 ) Description: Add tool calling and structured output support for SambaStudio chat models, docs included --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 06:15:19 +00:00
clairebehue	fb44e74ca4	community: fix AzureSearch Oauth with azure_ad_access_token (#26995 ) Description: AzureSearch vector store: create a wrapper class on `azure.core.credentials.TokenCredential` (which is not-instantiable) to fix Oauth usage with `azure_ad_access_token` argument Issue: [the issue it fixes](https://github.com/langchain-ai/langchain/issues/26216) Dependencies: None - [x] Lint and test --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 05:56:45 +00:00
SirSmokeAlot	29305cd948	community: O365Toolkit - send_event - fixed timezone error (#25876 ) Description: Fixed formatting start and end time Issue: The old formatting resulted everytime in an timezone error Dependencies: / Twitter handle: / --------- Co-authored-by: Yannick Opitz <yannick.opitz@gob.de> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-16 05:32:28 +00:00
Aayush Kataria	d417e4b372	Community: Azure CosmosDB No Sql Vector Store: Full Text and Hybrid Search Support (#28716 ) Thank you for contributing to LangChain! - Added [full text](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/full-text-search) and [hybrid search](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search) support for Azure CosmosDB NoSql Vector Store - Added a new enum called CosmosDBQueryType which supports the following values: - VECTOR = "vector" - FULL_TEXT_SEARCH = "full_text_search" - FULL_TEXT_RANK = "full_text_rank" - HYBRID = "hybrid" - User now needs to provide this query_type to the similarity_search method for the vectorStore to make the correct query api call. - Added a couple of work arounds as for the FULL_TEXT_RANK and HYBRID query functions we don't support parameterized queries right now. I have added TODO's in place, and will remove these work arounds by end of January. - Added necessary test cases and updated the - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-12-15 13:26:32 -08:00
Mohammad Mohtashim	4c1871d9a8	community: Passing the `model_kwargs` correctly while maintaing backward compatability (#28439 ) - Description: `Model_Kwargs` was not being passed correctly to `sentence_transformers.SentenceTransformer` which has been corrected while maintaing backward compatability - Issue: #28436 --------- Co-authored-by: MoosaTae <sadhis.tae@gmail.com> Co-authored-by: Sadit Wongprayon <101176694+MoosaTae@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-15 20:34:29 +00:00
nhols	a3851cb3bc	community: FAISS vectorstore - consistent Document id field (#28728 ) make sure id field of Documents in `FAISS` docstore have the same id as values in `index_to_docstore_id`, implement `get_by_ids` method	2024-12-15 12:23:49 -08:00
Bagatur	a0534ae62a	community[patch]: Release 0.3.12 (#28725 )	2024-12-14 22:13:20 +00:00
Nawaf Alharbi	decd77c515	community: fix an issue with deepinfra integration (#28715 ) Thank you for contributing to LangChain! - [x] PR title: langchain: add URL parameter to ChatDeepInfra class - [x] PR message: add URL parameter to ChatDeepInfra class - Description: This PR introduces a url parameter to the ChatDeepInfra class in LangChain, allowing users to specify a custom URL. Previously, the URL for the DeepInfra API was hardcoded to "https://stage.api.deepinfra.com/v1/openai/chat/completions", which caused issues when the staging endpoint was not functional. The _url method was updated to return the value from the url parameter, enabling greater flexibility and addressing the problem. out! --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-14 02:15:29 +00:00
Ben Chambers	008efada2c	[community]: Render documents to graphviz (#24830 ) - Description: Adds a helper that renders documents with the GraphVectorStore metadata fields to Graphviz for visualization. This is helpful for understanding and debugging. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-14 02:02:09 +00:00
Erick Friis	288f204758	docs, community: aerospike docs update (#28717 ) Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com> Co-authored-by: Jesse S <jschmidt@aerospike.com> Co-authored-by: dylan <dwelch@aerospike.com>	2024-12-14 00:27:37 +00:00
Vimpas	337fed80a5	community: 🐛 PDF Filter Type Error (#27154 ) Thank you for contributing to LangChain! PR title: "community: fix PDF Filter Type Error" - Description: fix PDF Filter Type Error" - Issue: the issue #27153 it fixes, - Dependencies: no - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 23:30:29 +00:00
Ryan Parker	12111cb922	community: fallback on core async atransform_documents method for `MarkdownifyTransformer` (#27866 ) # Description Implements the `atransform_documents` method for `MarkdownifyTransformer` using the `asyncio` built-in library for concurrency. Note that this is mainly for API completeness when working with async frameworks rather than for performance, since the `markdownify` function is not I/O bound because it works with `Document` objects already in memory. # Issue Fixes #27865 # Dependencies No new dependencies added, but [`markdownify`](https://github.com/matthewwithanm/python-markdownify) is required since this PR updates the `markdownify` integration. # Tests and docs - Tests added - I did not modify the docstrings since they already described the basic functionality, and [the API docs also already included a description](https://python.langchain.com/api_reference/community/document_transformers/langchain_community.document_transformers.markdownify.MarkdownifyTransformer.html#langchain_community.document_transformers.markdownify.MarkdownifyTransformer.atransform_documents). If it would be helpful, I would be happy to update the docstrings and/or the API docs. # Lint and test - [x] format - [x] lint - [x] test I ran formatting with `make format`, linting with `make lint`, and confirmed that tests pass using `make test`. Note that some unit tests pass in CI but may fail when running `make_test`. Those unit tests are: - `test_extract_html` (and `test_extract_html_async`) - `test_strip_tags` (and `test_strip_tags_async`) - `test_convert_tags` (and `test_convert_tags_async`) The reason for the difference is that there are trailing spaces when the tests are run in the CI checks, and no trailing spaces when run with `make test`. I ensured that the tests pass in CI, but they may fail with `make test` due to the addition of trailing spaces. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 22:32:22 +00:00
Karthik Bharadhwaj	498f0249e2	community[minor]: Opensearch hybridsearch implementation (#25375 ) community: add hybrid search in opensearch # Langchain OpenSearch Hybrid Search Implementation ## Implementation of Hybrid Search: I have taken LangChain's OpenSearch integration to the next level by adding hybrid search capabilities. Building on the existing OpenSearchVectorSearch class, I have implemented Hybrid Search functionality (which combines the best of both keyword and semantic search). This new functionality allows users to harness the power of OpenSearch's advanced hybrid search features without leaving the familiar LangChain ecosystem. By blending traditional text matching with vector-based similarity, the enhanced class delivers more accurate and contextually relevant results. It's designed to seamlessly fit into existing LangChain workflows, making it easy for developers to upgrade their search capabilities. In implementing the hybrid search for OpenSearch within the LangChain framework, I also incorporated filtering capabilities. It's important to note that according to the OpenSearch hybrid search documentation, only post-filtering is supported for hybrid queries. This means that the filtering is applied after the hybrid search results are obtained, rather than during the initial search process. Note: For the implementation of hybrid search, I strictly followed the official OpenSearch Hybrid search documentation and I took inspiration from https://github.com/AndreasThinks/langchain/tree/feature/opensearch_hybrid_search Thanks Mate! ### Experiments I conducted few experiments to verify that the hybrid search implementation is accurate and capable of reproducing the results of both plain keyword search and vector search. Experiment - 1 Hybrid Search Keyword_weight: 1, vector_weight: 0 I conducted an experiment to verify the accuracy of my hybrid search implementation by comparing it to a plain keyword search. For this test, I set the keyword_weight to 1 and the vector_weight to 0 in the hybrid search, effectively giving full weightage to the keyword component. The results from this hybrid search configuration matched those of a plain keyword search, confirming that my implementation can accurately reproduce keyword-only search results when needed. It's important to note that while the results were the same, the scores differed between the two methods. This difference is expected because the plain keyword search in OpenSearch uses the BM25 algorithm for scoring, whereas the hybrid search still performs both keyword and vector searches before normalizing the scores, even when the vector component is given zero weight. This experiment validates that my hybrid search solution correctly handles the keyword search component and properly applies the weighting system, demonstrating its accuracy and flexibility in emulating different search scenarios. Experiment - 2 Hybrid Search keyword_weight = 0.0, vector_weight = 1.0 For experiment-2, I took the inverse approach to further validate my hybrid search implementation. I set the keyword_weight to 0 and the vector_weight to 1, effectively giving full weightage to the vector search component (KNN search). I then compared these results with a pure vector search. The outcome was consistent with my expectations: the results from the hybrid search with these settings exactly matched those from a standalone vector search. This confirms that my implementation accurately reproduces vector search results when configured to do so. As with the first experiment, I observed that while the results were identical, the scores differed between the two methods. This difference in scoring is expected and can be attributed to the normalization process in hybrid search, which still considers both components even when one is given zero weight. This experiment further validates the accuracy and flexibility of my hybrid search solution, demonstrating its ability to effectively emulate pure vector search when needed while maintaining the underlying hybrid search structure. Experiment - 3 Hybrid Search - balanced keyword_weight = 0.5, vector_weight = 0.5 For experiment-3, I adopted a balanced approach to further evaluate the effectiveness of my hybrid search implementation. In this test, I set both the keyword_weight and vector_weight to 0.5, giving equal importance to keyword-based and vector-based search components. This configuration aims to leverage the strengths of both search methods simultaneously. By setting both weights to 0.5, I intended to create a scenario where the hybrid search would consider lexical matches and semantic similarity equally. This balanced approach is often ideal for many real-world applications, as it can capture both exact keyword matches and contextually relevant results that might not contain the exact search terms. Kindly verify the notebook for the experiments conducted! Notebook: https://github.com/karthikbharadhwajKB/Langchain_OpenSearch_Hybrid_search/blob/main/Opensearch_Hybridsearch.ipynb ### Instructions to follow for Performing Hybrid Search: Step-1: Instantiating OpenSearchVectorSearch Class: ```python opensearch_vectorstore = OpenSearchVectorSearch( index_name=os.getenv("INDEX_NAME"), embedding_function=embedding_model, opensearch_url=os.getenv("OPENSEARCH_URL"), http_auth=(os.getenv("OPENSEARCH_USERNAME"),os.getenv("OPENSEARCH_PASSWORD")), use_ssl=False, verify_certs=False, ssl_assert_hostname=False, ssl_show_warn=False ) ``` Parameters: 1. index_name: The name of the OpenSearch index to use. 2. embedding_function: The function or model used to generate embeddings for the documents. It's assumed that embedding_model is defined elsewhere in the code. 3. opensearch_url: The URL of the OpenSearch instance. 4. http_auth: A tuple containing the username and password for authentication. 5. use_ssl: Set to False, indicating that the connection to OpenSearch is not using SSL/TLS encryption. 6. verify_certs: Set to False, which means the SSL certificates are not being verified. This is often used in development environments but is not recommended for production. 7. ssl_assert_hostname: Set to False, disabling hostname verification in SSL certificates. 8. ssl_show_warn: Set to False, suppressing SSL-related warnings. Step-2: Configure Search Pipeline: To initiate hybrid search functionality, you need to configures a search pipeline first. Implementation Details: This method configures a search pipeline in OpenSearch that: 1. Normalizes the scores from both keyword and vector searches using the min-max technique. 2. Applies the specified weights to the normalized scores. 3. Calculates the final score using an arithmetic mean of the weighted, normalized scores. Parameters: * pipeline_name (str): A unique identifier for the search pipeline. It's recommended to use a descriptive name that indicates the weights used for keyword and vector searches. * keyword_weight (float): The weight assigned to the keyword search component. This should be a float value between 0 and 1. In this example, 0.3 gives 30% importance to traditional text matching. * vector_weight (float): The weight assigned to the vector search component. This should be a float value between 0 and 1. In this example, 0.7 gives 70% importance to semantic similarity. ```python opensearch_vectorstore.configure_search_pipelines( pipeline_name="search_pipeline_keyword_0.3_vector_0.7", keyword_weight=0.3, vector_weight=0.7, ) ``` Step-3: Performing Hybrid Search: After creating the search pipeline, you can perform a hybrid search using the `similarity_search()` method (or) any methods that are supported by `langchain`. This method combines both `keyword-based and semantic similarity` searches on your OpenSearch index, leveraging the strengths of both traditional information retrieval and vector embedding techniques. parameters: * query: The search query string. * k: The number of top results to return (in this case, 3). * search_type: Set to `hybrid_search` to use both keyword and vector search capabilities. * search_pipeline: The name of the previously created search pipeline. ```python query = "what are the country named in our database?" top_k = 3 pipeline_name = "search_pipeline_keyword_0.3_vector_0.7" matched_docs = opensearch_vectorstore.similarity_search_with_score( query=query, k=top_k, search_type="hybrid_search", search_pipeline = pipeline_name ) matched_docs ``` twitter handle: @iamkarthik98 --------- Co-authored-by: Karthik Kolluri <karthik.kolluri@eidosmedia.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 16:34:12 -05:00
Philippe PRADOS	f3fb5a9c68	community[minor]: Fix json._validate_metadata_func() (#22842 ) JSONparse, in _validate_metadata_func(), checks the consistency of the _metadata_func() function. To do this, it invokes it and makes sure it receives a dictionary in response. However, during the call, it does not respect future calls, as shown on line 100. This generates errors if, for example, the function is like this: ```python def generate_metadata(json_node:Dict[str,Any],kwargs:Dict[str,Any]) -> Dict[str,Any]: return { "source": url, "row": kwargs['seq_num'], "question":json_node.get("question"), } loader = JSONLoader( file_path=file_path, content_key="answer", jq_schema='.[]', metadata_func=generate_metadata, text_content=False) ``` To avoid this, the verification must comply with the specifications. This patch does just that. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-12-13 21:24:20 +00:00
ScriptShi	b0a298894d	community[minor]: Add TablestoreVectorStore (#25767 ) Thank you for contributing to LangChain! - [x] PR title: community: add TablestoreVectorStore - [x] PR message: - Description: add TablestoreVectorStore - Dependencies: none - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration: yes 2. an example notebook showing its use: yes If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-12-13 11:17:28 -08:00
Erick Friis	86b3c6e81c	community: make old stub for QuerySQLDataBaseTool private to skip api ref (#28711 )	2024-12-13 10:43:23 -08:00
Martin Triska	05ebe1e66b	Community: add `modified_since` argument to `O365BaseLoader` (#28708 ) ## What are we doing in this PR We're adding `modified_since` optional argument to `O365BaseLoader`. When set, O365 loader will only load documents newer than `modified_since` datetime. ## Why? OneDrives / Sharepoints can contain large number of documents. Current approach is to download and parse all files and let indexer to deal with duplicates. This can be prohibitively time-consuming. Especially when using OCR-based parser like [zerox](`fa06188834/libs/community/langchain_community/document_loaders/pdf.py (L948)`). This argument allows to skip documents that are older than known time of indexing. _Q: What if a file was modfied during last indexing process? A: Users can set the `modified_since` conservatively and indexer will still take care of duplicates._ If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 17:30:17 +00:00
Bagatur	fa06188834	community[patch]: fix QuerySQLDatabaseTool name (#28659 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-12 19:16:03 -08:00
Michael Chin	28cb2cefc6	docs: Fix stack diagram in community README (#28685 ) - Description: The stack diagram illustration in the community README fails to render due to an invalid branch reference. This PR replaces the broken image link with a valid one referencing master branch.	2024-12-12 13:33:50 -08:00
Botong Zhu	13c3c4a210	community: fixes json loader not getting texts with json standard (#27327 ) This PR fixes JSONLoader._get_text not converting objects to json string correctly. If an object is serializable and is not a dict, JSONLoader will use python built-in str() method to convert it to string. This may cause object converted to strings not following json standard. For example, a list will be converted to string with single quotes, and if json.loads try to load this string, it will cause error. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 19:33:45 +00:00
Lorenzo	4149c0dd8d	community: add method to create branch and list files for gitlab tool (#27883 ) ### About - Description: In the Gitlab utilities used for the Gitlab tool there are no methods to create branches, list branches and files, as this is already done for Github - Issue: None - Dependencies: None This Pull request add the methods: - create_branch - list_branches_in_repo - set_active_branch - list_files_in_main_branch - list_files_in_bot_branch - list_files_from_directory --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 19:11:35 +00:00
Prathamesh Nimkar	ca054ed1b1	community: ChatSnowflakeCortex - Add streaming functionality (#27753 ) Description: snowflake.py Add _stream and _stream_content methods to enable streaming functionality fix pydantic issues and added functionality with the overall langchain version upgrade added bind_tools method for agentic workflows support through langgraph updated the _generate method to account for agentic workflows support through langgraph cosmetic changes to comments and if conditions snowflake.ipynb Added _stream example cosmetic changes to comments fixed lint errors check_pydantic.sh Decreased counter from 126 to 125 as suggested when formatting --------- Co-authored-by: Prathamesh Nimkar <prathamesh.nimkar@snowflake.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 18:35:40 -08:00
Lakindu Boteju	5a31792bf1	community: Add support for cross-region inference profile IDs in Bedrock Anthropic Claude token cost calculation (#28167 ) This change modifies the token cost calculation logic to support cross-region inference profile IDs for Anthropic Claude models. Instead of explicitly listing all regional variants of new inference profile IDs in the cost dictionaries, the code now extracts a base model ID from the input model ID (or inference profile ID), making it more maintainable and automatically supporting new regional variants. These inference profile IDs follow the format: `<region>.<vendor>.<model-name>` (e.g., `us.anthropic.claude-3-haiku-xxx`, `eu.anthropic.claude-3-sonnet-xxx`). Cross-region inference profiles are system-defined identifiers that enable distributing model inference requests across multiple AWS regions. They help manage unplanned traffic bursts and enhance resilience during peak demands without additional routing costs. References for Amazon Bedrock's cross-region inference profiles:- - https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html - https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 02:33:50 +00:00
fatmelon	d1e0ec7b55	community: VectorStores: Azure Cosmos DB Mongo vCore with DiskANN (#27329 ) # Description Add a new vector index type `diskann` to Azure Cosmos DB Mongo vCore vector store. Paper of DiskANN can be found here [DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf). ## Sample Usage ```python from pymongo import MongoClient # INDEX_NAME = "izzy-test-index-2" # NAMESPACE = "izzy_test_db.izzy_test_collection" # DB_NAME, COLLECTION_NAME = NAMESPACE.split(".") client: MongoClient = MongoClient(CONNECTION_STRING) collection = client[DB_NAME][COLLECTION_NAME] model_deployment = os.getenv( "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada" ) model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002") vectorstore = AzureCosmosDBVectorSearch.from_documents( docs, openai_embeddings, collection=collection, index_name=INDEX_NAME, ) # Read more about these variables in detail here. https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search maxDegree = 40 dimensions = 1536 similarity_algorithm = CosmosDBSimilarityType.COS kind = CosmosDBVectorSearchType.VECTOR_DISKANN lBuild = 20 vectorstore.create_index( dimensions=dimensions, similarity=similarity_algorithm, kind=kind , max_degree=maxDegree, l_build=lBuild, ) ``` ## Dependencies No additional dependencies were added --------- Co-authored-by: Yang Qiao (from Dev Box) <yangqiao@microsoft.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 01:54:04 +00:00
manukychen	ba9b95cd23	Community: Adding bulk_size as a setable param for OpenSearchVectorSearch (#28325 ) Description: When using langchain.retrievers.parent_document_retriever.py with vectorstore is OpenSearchVectorSearch, I found that the bulk_size param I passed into OpenSearchVectorSearch class did not work on my ParentDocumentRetriever.add_documents() function correctly, it will be overwrite with int 500 the function which OpenSearchVectorSearch class had (e.g., add_texts(), add_embeddings()...). So I made this PR requset to fix this, thanks! --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-12 01:45:22 +00:00
Thomas van Dongen	ee640d6bd3	community: fixed bug in model2vec embedding code (#28670 ) This PR fixes a bug with the current implementation for Model2Vec embeddings where `embed_documents` does not work as expected. - Description: the current implementation uses `encode_as_sequence` for encoding documents. This is incorrect, as `encode_as_sequence` creates token embeddings and not mean embeddings. The normal `encode` function handles both single and batched inputs and should be used instead. The return type was also incorrect, as encode returns a NumPy array. This PR converts the embedding to a list so that the output is consistent with the Embeddings ABC.	2024-12-11 15:50:56 -08:00
Brian Sharon	b20230c800	community: use correct `id_key` when deleting by id in LanceDB wrapper (#28655 ) - Description: The current version of the `delete` method assumes that the id field will always be called `id`. - Issue: n/a - Dependencies: n/a - Twitter handle: ugh, Twitter :D --- Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 23:49:35 +00:00
Mohammad Mohtashim	fa155a422f	[Community]: `requests_kwargs` not being used in _fetch (#28646 ) - Description: `requests_kwargs` is not being passed to `_fetch` which is fetching pages asynchronously. In this PR, making sure that we are passing `requests_kwargs` to `_fetch` just like `_scrape`. - Issue: #28634 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-11 23:46:54 +00:00
Vincent Zhang	df5008fe55	community[minor]: FAISS Filter Function Enhancement with Advanced Query Operators (#28207 ) ## Description We are submitting as a team of four for a project. Other team members are @RuofanChen03, @LikeWang10067, @TANYAL77. This pull requests expands the filtering capabilities of the FAISS vectorstore by adding MongoDB-style query operators indicated as follows, while including comprehensive testing for the added functionality. - $eq (equals) - $neq (not equals) - $gt (greater than) - $lt (less than) - $gte (greater than or equal) - $lte (less than or equal) - $in (membership in list) - $nin (not in list) - $and (all conditions must match) - $or (any condition must match) - $not (negation of condition) ## Issue This closes https://github.com/langchain-ai/langchain/issues/26379. ## Sample Usage ```python import faiss import asyncio from langchain_community.vectorstores import FAISS from langchain.schema import Document from langchain_huggingface import HuggingFaceEmbeddings embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2") documents = [ Document(page_content="Process customer refund request", metadata={"schema_type": "financial", "handler_type": "refund",}), Document(page_content="Update customer shipping address", metadata={"schema_type": "customer", "handler_type": "update",}), Document(page_content="Process payment transaction", metadata={"schema_type": "financial", "handler_type": "payment",}), Document(page_content="Handle customer complaint", metadata={"schema_type": "customer","handler_type": "complaint",}), Document(page_content="Process invoice payment", metadata={"schema_type": "financial","handler_type": "payment",}) ] async def search(vectorstore, query, schema_type, handler_type, k=2): schema_filter = {"schema_type": {"$eq": schema_type}} handler_filter = {"handler_type": {"$eq": handler_type}} combined_filter = { "$and": [ schema_filter, handler_filter, ] } base_retriever = vectorstore.as_retriever( search_kwargs={"k":k, "filter":combined_filter} ) return await base_retriever.ainvoke(query) async def main(): vectorstore = FAISS.from_texts( texts=[doc.page_content for doc in documents], embedding=embeddings, metadatas=[doc.metadata for doc in documents] ) def printt(title, documents): print(title) if not documents: print("\tNo documents found.") return for doc in documents: print(f"\t{doc.page_content}. {doc.metadata}") printt("Documents:", documents) printt('\nquery="process payment", schema_type="financial", handler_type="payment":', await search(vectorstore, query="process payment", schema_type="financial", handler_type="payment", k=2)) printt('\nquery="customer update", schema_type="customer", handler_type="update":', await search(vectorstore, query="customer update", schema_type="customer", handler_type="update", k=2)) printt('\nquery="refund process", schema_type="financial", handler_type="refund":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="refund", k=2)) printt('\nquery="refund process", schema_type="financial", handler_type="foobar":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="foobar", k=2)) print() if __name__ == "__main__":asyncio.run(main()) ``` ## Output ``` Documents: Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'} Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'} Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'} Handle customer complaint. {'schema_type': 'customer', 'handler_type': 'complaint'} Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'} query="process payment", schema_type="financial", handler_type="payment": Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'} Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'} query="customer update", schema_type="customer", handler_type="update": Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'} query="refund process", schema_type="financial", handler_type="refund": Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'} query="refund process", schema_type="financial", handler_type="foobar": No documents found. ``` --------- Co-authored-by: ruofan chen <ruofan.is.awesome@gmail.com> Co-authored-by: RickyCowboy <like.wang@mail.utoronto.ca> Co-authored-by: Shanni Li <tanya.li@mail.utoronto.ca> Co-authored-by: RuofanChen03 <114096642+ruofanchen03@users.noreply.github.com> Co-authored-by: Like Wang <102838708+likewang10067@users.noreply.github.com>	2024-12-11 17:52:22 -05:00
like	3048a9a26d	community: tongyi multimodal response format fix to support langchain (#28645 ) Description: The multimodal(tongyi) response format "message": {"role": "assistant", "content": [{"text": "图像"}]}}]} is not compatible with LangChain. Dependencies: No --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 21:13:26 +00:00
Bagatur	d0e662e43b	community[patch]: Release 0.3.11 (#28658 )	2024-12-10 20:51:13 +00:00
Bagatur	e6a62d8422	core,langchain,community[patch]: allow langsmith 0.2 (#28598 )	2024-12-10 18:50:58 +00:00
Johannes Mohren	c1d348e95d	doc-loader: retain Azure Doc Intelligence API metadata in Document parser (#28382 ) Description: This PR modifies the doc_intelligence.py parser in the community package to include all metadata returned by the Azure Doc Intelligence API in the Document object. Previously, only the parsed content (markdown) was retained, while other important metadata such as bounding boxes (bboxes) for images and tables was discarded. These image bboxes are crucial for supporting use cases like multi-modal RAG workflows when using Azure Doc Intelligence. The change ensures that all information returned by the Azure Doc Intelligence API is preserved by setting the metadata attribute of the Document object to the entire result returned by the API, rather than an empty dictionary. This extends the parser's utility for complex use cases without breaking existing functionality. Issue: This change does not address a specific issue number, but it resolves a critical limitation in supporting multimodal workflows when using the LangChain wrapper for the Azure API. Dependencies: No additional dependencies are required for this change. --------- Co-authored-by: jmohren <johannes.mohren@aol.de>	2024-12-10 11:22:58 -05:00
Alex Tonkonozhenko	0d20c314dd	Confluence Loader: Fix CQL loading (#27620 ) fix #12082 <!--- If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. -->	2024-12-10 11:05:23 -05:00
Katarina Supe	aba2711e7f	community: update Memgraph integration (#27017 ) Description: - Memgraph no longer relies on `Neo4jGraphStore` but implements `GraphStore`, just like other graph databases. - Memgraph no longer relies on `GraphQAChain`, but implements `MemgraphQAChain`, just like other graph databases. - The refresh schema procedure has been updated to try using `SHOW SCHEMA INFO`. The fallback uses Cypher queries (a combination of schema and Cypher) → LangChain integration no longer relies on MAGE library. - The schema structure has been reformatted. Regardless of the procedures used to get schema, schema structure is the same. - The `add_graph_documents()` method has been implemented. It transforms `GraphDocument` into Cypher queries and creates a graph in Memgraph. It implements the ability to use `baseEntityLabel` to improve speed (`baseEntityLabel` has an index on the `id` property). It also implements the ability to include sources by creating a `MENTIONS` relationship to the source document. - Jupyter Notebook for Memgraph has been updated. - Issue: / - Dependencies: / - Twitter handle: supe_katarina (DX Engineer @ Memgraph) Closes #25606	2024-12-10 10:57:21 -05:00
TamagoTorisugi	0f0df2df60	fix: Set default search_type to 'similarity' in as_retriever method of AzureSearch (#28376 ) Description This PR updates the `as_retriever` method in the `AzureSearch` to ensure that the `search_type` parameter defaults to 'similarity' when not explicitly provided. Previously, if the `search_type` was omitted, it did not default to any specific value. So it was inherited from `AzureSearchVectorStoreRetriever`, which defaults to 'hybrid'. This change ensures that the intended default behavior aligns with the expected usage. Issue No specific issue was found related to this change. Dependencies No new dependencies are introduced with this change. --------- Co-authored-by: prrao87 <prrao87@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 03:40:04 +00:00
Prashanth Rao	8c6eec5f25	community: KuzuGraph needs allow_dangerous_requests, add graph documents via LLMGraphTransformer (#27949 ) - [x] PR title: "community: Kuzu - Add graph documents via LLMGraphTransformer" - This PR adds a new method `add_graph_documents` to use the `GraphDocument`s extracted by `LLMGraphTransformer` and store in a Kùzu graph backend. - This allows users to transform unstructured text into a graph that uses Kùzu as the graph store. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: pookam90 <pookam@microsoft.com> Co-authored-by: Pooja Kamath <60406274+Pookam90@users.noreply.github.com> Co-authored-by: hsm207 <hsm207@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 03:15:28 +00:00
Arnav Priyadarshi	b78b2f7a28	community[fix]: Update Perplexity to pass parameters into API calls (#28421 ) - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - Description: I realized the invocation parameters were not being passed into `_generate` so I added those in but then realized that the parameters contained some old fields designed for an older openai client which I removed. Parameters work fine now. - Issue: Fixes #28229 - Dependencies: No new dependencies. - Twitter handle: @arch_plane - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-10 00:23:31 +00:00
Amir Sadeghi	2c49f587aa	community[fix]: could not locate runnable browser (#28289 ) set open_browser to false to resolve "could not locate runnable browser" error while default browser is None Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 21:05:52 +00:00
Martin Triska	75bc6bb191	community: [bugfix] fix source path for office files in O365 (#28260 ) # What problem are we fixing? Currently documents loaded using `O365BaseLoader` fetch source from `file.web_url` (where `file` is `<class 'O365.drive.File'>`). This works well for `.pdf` documents. Unfortunately office documents (`.xlsx`, `.docx` ...) pass their `web_url` in following format: `https://sharepoint_address/sites/path/to/library/root/Doc.aspx?sourcedoc=%XXXXXXXX-1111-1111-XXXX-XXXXXXXXXX%7D&file=filename.xlsx&action=default&mobileredirect=true` This obfuscates the path to the file. This PR utilizes the parrent folder's path and file name to reconstruct the actual location of the file. Knowing the file's location can be crucial for some RAG applications (path to the file can carry information we don't want to loose). @vbarda Could you please look at this one? I'm @-mentioning you since we've already closed some PRs together :-) Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 12:34:59 -08:00
Naka Masato	ce3b69aa05	community: add include_labels option to ConfluenceLoader (#28259 ) ## Description: Enable `ConfluenceLoader` to include labels with `include_labels` option (`false` by default for backward compatibility). and the labels are set to `metadata` in the `Document`. e.g. `{"labels": ["l1", "l2"]}` ## Notes Confluence API supports to get labels by providing `metadata.labels` to `expand` query parameter All of the following functions support `expand` in the same way: - confluence.get_page_by_id - confluence.get_all_pages_by_label - confluence.get_all_pages_from_space - cql (internally using [/api/content/search](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content/#api-wiki-rest-api-content-search-get)) ## Issue: No issue related to this PR. ## Dependencies: No changes. ## Twitter handle: [@gymnstcs](https://x.com/gymnstcs) - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 19:35:01 +00:00
Rajendra Kadam	242fee11be	community[minor] Pebblo: Support for new Pinecone class PineconeVectorStore (#28253 ) - Description: Support for new Pinecone class PineconeVectorStore in PebbloRetrievalQA. - Issue: NA - Dependencies: NA - Tests: - Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 19:33:54 +00:00
maang-h	b64d846347	docs: Standardize MoonshotChat docstring (#28159 ) - Description: Add docstring Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-09 18:46:25 +00:00
WGNW_MG	eabe587787	community[patch]:Fix for get_openai_callback() return token_cost=0.0 when model is gpt-4o-11-20 (#28408 ) - Description: update MODEL_COST_PER_1K_TOKENS for new gpt-4o-11-20. - Issue: with latest gpt-4o-11-20, openai callback return token_cost=0.0 - Dependencies: None (just simple dict fix.) - Twitter handle: I Don't Use Twitter. - (However..., I have a YouTube channel. Could you upload this there, by any chance? https://www.youtube.com/@%EA%B2%9C%EC%B0%BD%EB%B6%80%EA%B3%A0%EB%AC%B8AI%EC%9E%90%EB%AC%B8%EC%84%BC%EC%84%B8)	2024-12-08 20:46:50 -08:00
Abhinav	317a38b83e	community[minor]: Add support for modle2vec embeddings (#28507 ) This PR add an embeddings integration for model2vec, the `Model2vecEmbeddings` class. - Description: [Model2Vec](https://github.com/MinishLab/model2vec) lets you turn any sentence transformer into a really small static model and makes running the model faster. - Issue: - Dependencies: model2vec ([pypi](https://pypi.org/project/model2vec/)) - Twitter handle:: - [x] Add tests and docs: - [Test](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/libs/community/langchain_community/embeddings/model2vec.py), [docs](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/docs/docs/integrations/text_embedding/model2vec.ipynb) - [x] Lint and test: --------- Co-authored-by: Abhinav KM <abhinav.m@zerone-consulting.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-12-09 02:17:22 +00:00
Mohammad Mohtashim	524ee6d9ac	Invalid `tool_choice` being passed to `ChatLiteLLM` (#28198 ) - Description: Invalid `tool_choice` is given to `ChatLiteLLM` to `bind_tools` due to it's parent's class default value being pass through `with_structured_output`. - Issue: #28176	2024-12-07 14:33:40 -05:00
Erick Friis	07c2ac765a	community: release 0.3.10 (#28600 )	2024-12-07 00:07:13 +00:00
ccurme	2c6bc74cb1	multiple: combine sync/async vector store standard test suites (#28580 ) Breaking change in `langchain-tests`.	2024-12-06 14:55:06 -05:00
cinqisap	482e8a7855	community: Add support for SAP HANA Vector hnsw index creation (#27884 ) Issue: Added support for creating indexes in the SAP HANA Vector engine. Changes: 1. Introduced a new function `create_hnsw_index` in `hanavector.py` that enables the creation of indexes for SAP HANA Vector. 2. Added integration tests for the index creation function to ensure functionality. 3. Updated the documentation to reflect the new index creation feature, including examples and output from the notebook. 4. Fix the operator issue in ` _process_filter_object` function and change the array argument to a placeholder in the similarity search SQL statement. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-05 23:29:08 +00:00
Erick Friis	e6a08355a3	docs: more api ref links, add linting step to prevent more (#28495 )	2024-12-04 04:19:42 +00:00
wlleiiwang	6151ea78d5	community: implement _select_relevance_score_fn for tencent vectordb (#28036 ) implement _select_relevance_score_fn for tencent vectordb fix use external embedding for tencent vectordb Co-authored-by: wlleiiwang <wlleiiwang@tencent.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-04 03:03:00 +00:00
Asi Greenholts	d34bf78f3b	community: BM25Retriever preservation of document id (#27019 ) Currently this retriever discards document ids --------- Co-authored-by: asi-cider <88270351+asi-cider@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-04 00:36:00 +00:00
peterdhp	bc5ec63d67	community : allow using apikey for PubMedAPIWrapper (#27246 ) Description: > Without an API key, any site (IP address) posting more than 3 requests per second to the E-utilities will receive an error message. By including an API key, a site can post up to 10 requests per second by default. quoted from A General Introduction to the E-utilities,NCBI : https://www.ncbi.nlm.nih.gov/books/NBK25497/ I have simply added a api_key parameter to the PubMedAPIWrapper that can be used to increase the number of requests per second from 3 to 10. Twitter handle : @KORmaori --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-03 16:21:22 -08:00
William Smith	15e7353168	langchain_community: updated query constructor for Databricks Vector Search due to LangChainDeprecationWarning: `filters` was deprecated since langchain-community 0.2.11 and will be removed in 0.3. Please use `filter` instead. (#27974 ) - Description: Updated the kwargs for the structured query from filters to filter due to deprecation of 'filters' for Databricks Vector Search. Also changed the error messages as the allowed operators and comparators are different which can cause issues with functions such as get_query_constructor_prompt() - Issue: Fixes the Key Error for filters due to deprecation in favor for 'filter': LangChainDeprecationWarning: DatabricksVectorSearch received a key `filters` in search_kwargs. `filters` was deprecated since langchain-community 0.2.11 and will be removed in 0.3. Please use `filter` instead. - Dependencies: N/A - Twitter handle: N/A --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-03 16:03:53 -08:00
Jan Heimes	ef365543cb	community: add Needle retriever and document loader integration (#28157 ) - [x] PR title: "community: add Needle retriever and document loader integration" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: This PR adds a new integration for Needle, which includes: - NeedleRetriever: A retriever for fetching documents from Needle collections. - NeedleLoader: A document loader for managing and loading documents into Needle collections. - Example notebooks demonstrating usage have been added in: - `docs/docs/integrations/retrievers/needle.ipynb` - `docs/docs/integrations/document_loaders/needle.ipynb`. - Dependencies: The `needle-python` package is required as an external dependency for accessing Needle's API. It has been added to the extended testing dependencies list. - Twitter handle: Feel free to mention me if this PR gets announced: [needlexai](https://x.com/NeedlexAI). - [x] Add tests and docs: If you're adding a new integration, please include 1. Unit tests have been added for both `NeedleRetriever` and `NeedleLoader` in `libs/community/tests/unit_tests`. These tests mock API calls to avoid relying on network access. 2. Example notebooks have been added to `docs/docs/integrations/`, showcasing both retriever and loader functionality. - [x] Lint and test: Run `make format`, `make lint`, and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ - `make format`: Passed - `make lint`: Passed - `make test`: Passed (requires `needle-python` to be installed locally; this package is not added to LangChain dependencies). Additional guidelines: - [x] Optional dependencies are imported only within functions. - [x] No dependencies have been added to pyproject.toml files except for those required for unit tests. - [x] The PR does not touch more than one package. - [x] Changes are fully backwards compatible. - [x] Community additions are not re-imported into LangChain core. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-03 22:06:25 +00:00
Audrey Sage Lorberfeld	6b7e93d4c7	pinecone: update pinecone client (#28320 ) This PR updates the Pinecone client to `5.4.0`, as well as its dependencies (`pinecone-plugin-inference` and `pinecone-plugin-interface`). Note: `pinecone-client` is now simply called `pinecone`. Question for reviewer(s): should this PR also update the `pinecone` dep in [the root dir's `poetry.lock` file](https://github.com/langchain-ai/langchain/blob/master/poetry.lock#L6729)? Was unsure. (I don't believe so b/c it seems pinned to a lower version likely based on 3rd-party deps (e.g. Unstructured).) -- TW: @audrey_sage_ --- - To see the specific tasks where the Asana app for GitHub is being used, see below: - https://app.asana.com/0/0/1208693659122374	2024-12-02 22:47:09 -08:00
lucasiscovici	60021e54b5	community: Add the additonnal kward 'context' for openai (#28351 ) - Description: Add the additonnal kward 'context' for openai into `convert_dict_to_message` and `convert_message_to_dict` functions.	2024-12-02 16:43:30 -05:00
Bagatur	49914e959a	community[patch]: Release 0.3.9 (#28451 )	2024-12-02 16:23:37 +00:00
Greg Hinch	5141f25a20	community[patch]: support numpy2 (#28184 ) Follows on from #27991, updates the langchain-community package to support numpy 2 versions --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-27 11:10:58 -05:00
LuisMSotamba	0901f11b0f	community: add truncation params when an openai assistant's run is created (#28158 ) Description: When an OpenAI assistant is invoked, it creates a run by default, allowing users to set only a few request fields. The truncation strategy is set to auto, which includes previous messages in the thread along with the current question until the context length is reached. This causes token usage to grow incrementally: consumed_tokens = previous_consumed_tokens + current_consumed_tokens. This PR adds support for user-defined truncation strategies, giving better control over token consumption. Issue: High token consumption.	2024-11-27 10:53:53 -05:00
ccurme	bb83abd037	community[patch]: remove sqlalchemy cap (#28389 )	2024-11-27 10:20:36 -05:00
Mohammad Mohtashim	06fafc6651	Community: Marqo Index Setting GET Request Updated according to `2.x` API version while keep backward compatability for 1.5.x (#28342 ) - Description: `add_texts` was using `get_setting` for marqo client which was being used according to 1.5.x API version. However, this PR updates the `add_text` accounting for updated response payload for 2.x and later while maintaining backward compatibility. Plus I have verified this was the only place where marqo client was not accounting for updated API version. - Issue: #28323 --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-11-26 18:26:56 +00:00
Richard Hao	c161f7d46f	docs(create_sql_agent): fix reStructured Text Markup (#28356 ) - Description: Lines of code must be indented beneath `.. code-block::` for proper formatting. https://devguide.python.org/documentation/markup/#showing-code-examples - Issue: The example code block on the `create_sql_agent` document page is not properly rendered. https://python.langchain.com/api_reference/community/agent_toolkits/langchain_community.agent_toolkits.sql.base.create_sql_agent.html#langchain_community.agent_toolkits.sql.base.create_sql_agent <img width="933" alt="image" src="https://github.com/user-attachments/assets/d764bcad-e412-408b-ab0b-9a78a11188ee">	2024-11-26 09:52:16 -05:00
Mohammad Mohtashim	195ae7baa3	Community: Adding citations in AIMessage for ChatPerplexity (#28321 ) Description: Adding Citation in response payload of ChatPerplexity Issue: #28108	2024-11-26 09:45:47 -05:00
ccurme	a5374952f8	community[patch]: fix import in test (#28339 ) Library name was updated after https://github.com/langchain-ai/langchain/pull/27879 branched off master.	2024-11-25 19:28:01 +00:00
Alex Thomas	5867f25ff3	community[patch]: Neo4j community deprecation (#28130 ) Adds deprecation notices for Neo4j components moving to the `langchain_neo4j` partner package. - Adds deprecation warnings to all Neo4j-related classes and functions that have been migrated to the new `langchain_neo4j` partner package - Updates documentation to reference the new `langchain_neo4j` package instead of `langchain_community`	2024-11-25 10:34:22 -08:00
Yan	c60695a1c7	community: fixed critical bugs at Writer provider (#27879 )	2024-11-25 12:03:37 -05:00
Yelin Zhang	6ed2d387bb	docs: fix GOOGLE_API_KEY typo (#28322 ) fix small GOOGLE_API_KEY markdown formatting typo	2024-11-25 09:45:22 -05:00
ccurme	a83357dc5a	community[patch]: release 0.3.8 (#28316 )	2024-11-23 08:21:21 -05:00
ccurme	203d20caa5	community[patch]: fix errors introduced by pydantic 2.10 (#28297 )	2024-11-22 17:50:13 -05:00
Pat Patterson	2ee37a1c7b	community: list valid values for LanceDB constructor's `mode` argument (#28296 ) Description: Currently, the docstring for `LanceDB.__init__()` provides the default value for `mode`, but not the list of valid values. This PR adds that list to the docstring. Issue: N/A Dependencies: N/A Twitter handle: `@metadaddy` [Leaving as a reminder: If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.]	2024-11-22 15:40:06 -05:00
Priyanshi Garg	f5f53d1101	community: fix compatibility issue in kinetica chat model integration for Pydantic 2 (#28252 ) Fixed a compatibility issue in the `load_messages_from_context()` function for the Kinetica chat model integration. The issue was caused by stricter validation introduced in Pydantic 2.	2024-11-21 09:33:00 -05:00
Erick Friis	d1108607f4	multiple: push deprecation removals to 1.0 (#28236 )	2024-11-20 19:56:29 -08:00
shroominic	dee72c46c1	community: Outlines integration (#27449 ) In collaboration with @rlouf I build an [outlines](https://dottxt-ai.github.io/outlines/latest/) integration for langchain! I think this is really useful for doing any type of structured output locally. [Dottxt](https://dottxt.co) spend alot of work optimising this process at a lower level ([outlines-core](https://pypi.org/project/outlines-core/0.1.14/) written in rust) so I think this is a better alternative over all current approaches in langchain to do structured output. It also implements the `.with_structured_output` method so it should be a drop in replacement for a lot of applications. The integration includes: - Outlines LLM class - ChatOutlines class - Tutorial Cookbooks - Documentation Page - Validation and error messages - Exposes Outlines Structured output features - Support for multiple backends - Integration and Unit Tests Dependencies: `outlines` + additional (depending on backend used) I am not sure if the unit-tests comply with all requirements, if not I suggest to just remove them since I don't see a useful way to do it differently. ### Quick overview: Chat Models: <img width="698" alt="image" src="https://github.com/user-attachments/assets/05a499b9-858c-4397-a9ff-165c2b3e7acc"> Structured Output: <img width="955" alt="image" src="https://github.com/user-attachments/assets/b9fcac11-d3e5-4698-b1ae-8c4cb3d54c45"> --------- Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-11-20 16:31:31 -05:00
Mikelarg	2901fa20cc	community: Add deprecation warning for GigaChat integration in langchain-community (#28022 ) - Description: We have released the [langchain-gigachat](https://github.com/ai-forever/langchain-gigachat?tab=readme-ov-file) with new GigaChat integration that support's function/tool calling. This PR deprecated legacy GigaChat class in community package. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-20 21:03:47 +00:00
Renzo-vS	567dc1e422	community: fix duplicate content (#28003 ) Thank you for reading my first PR! Description: Deduplicate content in AzureSearch vectorstore. Currently, by default, the content of the retrieval is placed both in metadata and page_content of a Document. This PR removes the content from metadata, and leaves it in page_content. Issue:: Previously, the content was popped from result before metadata was populated. In #25828 , the order was changed which leads to a response with duplicated content. This was not the intention of that PR and seems undesirable. Looking forward to seeing my contribution in the next version! Cheers, Renzo	2024-11-20 12:49:03 -08:00
Jorge Piedrahita Ortiz	abaea28417	community: SamabanovaCloud tool calling and Structured output (#27967 ) Description: Add tool calling and structured output support for SambaNovaCloud chat models, docs included --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-20 19:12:08 +00:00
CLOVA Studio 개발	218b4e073e	community: fix some features on Naver ChatModel & embedding model (#28228 ) # Description - adding stopReason to response_metadata to call stream and astream - excluding NCP_APIGW_API_KEY input required validation - to remove warning Field "model_name" has conflict with protected namespace "model_". cc. @vbarda	2024-11-20 10:35:41 -08:00
Erick Friis	0dbaf05bb7	standard-tests: rename langchain_standard_tests to langchain_tests, release 0.3.2 (#28203 )	2024-11-18 19:10:39 -08:00
Eric Pinzur	0a57fc0016	community: OpenSearchVectorStore: use engine set at init() time by default (#28147 ) Description: * Updated the OpenSearchVectorStore to use the `engine` parameter captured at `init()` time as the default when adding documents to the store. Formatted, Linted, and Tested.	2024-11-16 17:07:42 -05:00
Erick Friis	6d2004ee7d	multiple: langchain-standard-tests -> langchain-tests (#28139 )	2024-11-15 11:32:04 -08:00
alex shengzhi li	39fcb476fd	community: add reka chat model integration (#27379 )	2024-11-15 13:37:14 -05:00
Jorge Piedrahita Ortiz	39956a3ef0	community: sambanovacloud llm integration (#27526 ) - Description: SambaNovaCloud llm integration added, previously only chat model integration --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-15 16:58:11 +00:00
ccurme	f1222739f8	core[patch]: support numpy 2 (#27991 )	2024-11-14 13:08:57 -05:00
Bharat Ramanathan	3e972faf81	community: chore warn deprecate the tracer (#27159 ) - Description:: This PR deprecates the wandb tracer in favor of the new [WeaveTracer](https://weave-docs.wandb.ai/guides/integrations/langchain#using-weavetracer) in W&B - Dependencies: No dependencies, just a deprecation warning. - Twitter handle: @parambharat @baskaryan	2024-11-13 11:33:34 -05:00
am-kinetica	a646f1c383	Handled empty search result handling and updated the notebook (#27914 ) - [ ] PR title: "community: updated Kinetica vectorstore" - Description: Handled empty search results - Issue: used to throw error if the search results were empty @efriis	2024-11-12 13:03:49 -08:00
ccurme	15b7dd3ad7	community[patch]: release 0.3.7 (#28061 )	2024-11-12 19:54:58 +00:00
ccurme	1538ee17f9	anthropic[major]: support python 3.13 (#27916 ) Last week Anthropic released version 0.39.0 of its python sdk, which enabled support for Python 3.13. This release deleted a legacy `client.count_tokens` method, which we currently access during init of the `Anthropic` LLM. Anthropic has replaced this functionality with the [client.beta.messages.count_tokens() API](https://github.com/anthropics/anthropic-sdk-python/pull/726). To enable support for `anthropic >= 0.39.0` and Python 3.13, here we drop support for the legacy token counting method, and add support for the new method via `ChatAnthropic.get_num_tokens_from_messages`. To fully support the token counting API, we update the signature of `get_num_tokens_from_message` to accept tools everywhere. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-11-12 14:31:07 -05:00
ZhangShenao	ca7375ac20	Improvement[Community]Improve Embeddings API (#28038 ) - Fix `BaichuanTextEmbeddings` api url - Remove unused params in api doc - Fix word spelling	2024-11-12 13:57:35 -05:00
Bagatur	5c14e1f935	community[patch]: Release 0.3.6 (#28046 )	2024-11-12 15:15:07 +00:00
Eric Pinzur	c421997caa	community[patch]: Added type hinting to OpenSearch clients (#27946 ) Description: * When working with OpenSearchVectorSearch to make OpenSearchGraphVectorStore (coming soon), I noticed that there wasn't type hinting for the underlying OpenSearch clients. This fixes that issue. * Confirmed tests are still passing with code changes. Note that there is some additional code duplication now, but I think this approach is cleaner overall.	2024-11-08 11:04:57 -08:00
Saad Makrod	b509747c7f	Community: Google Books API Tool (#27307 ) ## Description As proposed in our earlier discussion #26977 we have introduced a Google Books API Tool that leverages the Google Books API found at [https://developers.google.com/books/docs/v1/using](https://developers.google.com/books/docs/v1/using) to generate book recommendations. ### Sample Usage ```python from langchain_community.tools import GoogleBooksQueryRun from langchain_community.utilities import GoogleBooksAPIWrapper api_wrapper = GoogleBooksAPIWrapper() tool = GoogleBooksQueryRun(api_wrapper=api_wrapper) tool.run('ai') ``` ### Sample Output ```txt Here are 5 suggestions based off your search for books related to ai: 1. "AI's Take on the Stigma Against AI-Generated Content" by Sandy Y. Greenleaf: In a world where artificial intelligence (AI) is rapidly advancing and transforming various industries, a new form of content creation has emerged: AI-generated content. However, despite its potential to revolutionize the way we produce and consume information, AI-generated content often faces a significant stigma. "AI's Take on the Stigma Against AI-Generated Content" is a groundbreaking book that delves into the heart of this issue, exploring the reasons behind the stigma and offering a fresh, unbiased perspective on the topic. Written from the unique viewpoint of an AI, this book provides readers with a comprehensive understanding of the challenges and opportunities surrounding AI-generated content. Through engaging narratives, thought-provoking insights, and real-world examples, this book challenges readers to reconsider their preconceptions about AI-generated content. It explores the potential benefits of embracing this technology, such as increased efficiency, creativity, and accessibility, while also addressing the concerns and drawbacks that contribute to the stigma. As you journey through the pages of this book, you'll gain a deeper understanding of the complex relationship between humans and AI in the realm of content creation. You'll discover how AI can be used as a tool to enhance human creativity, rather than replace it, and how collaboration between humans and machines can lead to unprecedented levels of innovation. Whether you're a content creator, marketer, business owner, or simply someone curious about the future of AI and its impact on our society, "AI's Take on the Stigma Against AI-Generated Content" is an essential read. With its engaging writing style, well-researched insights, and practical strategies for navigating this new landscape, this book will leave you equipped with the knowledge and tools needed to embrace the AI revolution and harness its potential for success. Prepare to have your assumptions challenged, your mind expanded, and your perspective on AI-generated content forever changed. Get ready to embark on a captivating journey that will redefine the way you think about the future of content creation. Read more at https://play.google.com/store/books/details?id=4iH-EAAAQBAJ&source=gbs_api 2. "AI Strategies For Web Development" by Anderson Soares Furtado Oliveira: From fundamental to advanced strategies, unlock useful insights for creating innovative, user-centric websites while navigating the evolving landscape of AI ethics and security Key Features Explore AI's role in web development, from shaping projects to architecting solutions Master advanced AI strategies to build cutting-edge applications Anticipate future trends by exploring next-gen development environments, emerging interfaces, and security considerations in AI web development Purchase of the print or Kindle book includes a free PDF eBook Book Description If you're a web developer looking to leverage the power of AI in your projects, then this book is for you. Written by an AI and ML expert with more than 15 years of experience, AI Strategies for Web Development takes you on a transformative journey through the dynamic intersection of AI and web development, offering a hands-on learning experience.The first part of the book focuses on uncovering the profound impact of AI on web projects, exploring fundamental concepts, and navigating popular frameworks and tools. As you progress, you'll learn how to build smart AI applications with design intelligence, personalized user journeys, and coding assistants. Later, you'll explore how to future-proof your web development projects using advanced AI strategies and understand AI's impact on jobs. Toward the end, you'll immerse yourself in AI-augmented development, crafting intelligent web applications and navigating the ethical landscape.Packed with insights into next-gen development environments, AI-augmented practices, emerging realities, interfaces, and security governance, this web development book acts as your roadmap to staying ahead in the AI and web development domain. What you will learn Build AI-powered web projects with optimized models Personalize UX dynamically with AI, NLP, chatbots, and recommendations Explore AI coding assistants and other tools for advanced web development Craft data-driven, personalized experiences using pattern recognition Architect effective AI solutions while exploring the future of web development Build secure and ethical AI applications following TRiSM best practices Explore cutting-edge AI and web development trends Who this book is for This book is for web developers with experience in programming languages and an interest in keeping up with the latest trends in AI-powered web development. Full-stack, front-end, and back-end developers, UI/UX designers, software engineers, and web development enthusiasts will also find valuable information and practical guidelines for developing smarter websites with AI. To get the most out of this book, it is recommended that you have basic knowledge of programming languages such as HTML, CSS, and JavaScript, as well as a familiarity with machine learning concepts. Read more at https://play.google.com/store/books/details?id=FzYZEQAAQBAJ&source=gbs_api 3. "Artificial Intelligence for Students" by Vibha Pandey: A multifaceted approach to develop an understanding of AI and its potential applications KEY FEATURES ● AI-informed focuses on AI foundation, applications, and methodologies. ● AI-inquired focuses on computational thinking and bias awareness. ● AI-innovate focuses on creative and critical thinking and the Capstone project. DESCRIPTION AI is a discipline in Computer Science that focuses on developing intelligent machines, machines that can learn and then teach themselves. If you are interested in AI, this book can definitely help you prepare for future careers in AI and related fields. The book is aligned with the CBSE course, which focuses on developing employability and vocational competencies of students in skill subjects. The book is an introduction to the basics of AI. It is divided into three parts – AI-informed, AI-inquired and AI-innovate. It will help you understand AI's implications on society and the world. You will also develop a deeper understanding of how it works and how it can be used to solve complex real-world problems. Additionally, the book will also focus on important skills such as problem scoping, goal setting, data analysis, and visualization, which are essential for success in AI projects. Lastly, you will learn how decision trees, neural networks, and other AI concepts are commonly used in real-world applications. By the end of the book, you will develop the skills and competencies required to pursue a career in AI. WHAT YOU WILL LEARN ● Get familiar with the basics of AI and Machine Learning. ● Understand how and where AI can be applied. ● Explore different applications of mathematical methods in AI. ● Get tips for improving your skills in Data Storytelling. ● Understand what is AI bias and how it can affect human rights. WHO THIS BOOK IS FOR This book is for CBSE class XI and XII students who want to learn and explore more about AI. Basic knowledge of Statistical concepts, Algebra, and Plotting of equations is a must. TABLE OF CONTENTS 1. Introduction: AI for Everyone 2. AI Applications and Methodologies 3. Mathematics in Artificial Intelligence 4. AI Values (Ethical Decision-Making) 5. Introduction to Storytelling 6. Critical and Creative Thinking 7. Data Analysis 8. Regression 9. Classification and Clustering 10. AI Values (Bias Awareness) 11. Capstone Project 12. Model Lifecycle (Knowledge) 13. Storytelling Through Data 14. AI Applications in Use in Real-World Read more at https://play.google.com/store/books/details?id=ptq1EAAAQBAJ&source=gbs_api 4. "The AI Book" by Ivana Bartoletti, Anne Leslie and Shân M. Millie: Written by prominent thought leaders in the global fintech space, The AI Book aggregates diverse expertise into a single, informative volume and explains what artifical intelligence really means and how it can be used across financial services today. Key industry developments are explained in detail, and critical insights from cutting-edge practitioners offer first-hand information and lessons learned. Coverage includes: · Understanding the AI Portfolio: from machine learning to chatbots, to natural language processing (NLP); a deep dive into the Machine Intelligence Landscape; essentials on core technologies, rethinking enterprise, rethinking industries, rethinking humans; quantum computing and next-generation AI · AI experimentation and embedded usage, and the change in business model, value proposition, organisation, customer and co-worker experiences in today’s Financial Services Industry · The future state of financial services and capital markets – what’s next for the real-world implementation of AITech? · The innovating customer – users are not waiting for the financial services industry to work out how AI can re-shape their sector, profitability and competitiveness · Boardroom issues created and magnified by AI trends, including conduct, regulation & oversight in an algo-driven world, cybersecurity, diversity & inclusion, data privacy, the ‘unbundled corporation’ & the future of work, social responsibility, sustainability, and the new leadership imperatives · Ethical considerations of deploying Al solutions and why explainable Al is so important Read more at http://books.google.ca/books?id=oE3YDwAAQBAJ&dq=ai&hl=&source=gbs_api 5. "Artificial Intelligence in Society" by OECD: The artificial intelligence (AI) landscape has evolved significantly from 1950 when Alan Turing first posed the question of whether machines can think. Today, AI is transforming societies and economies. It promises to generate productivity gains, improve well-being and help address global challenges, such as climate change, resource scarcity and health crises. Read more at https://play.google.com/store/books/details?id=eRmdDwAAQBAJ&source=gbs_api ``` ## Issue This closes #27276 ## Dependencies No additional dependencies were added --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 15:29:35 -08:00
Erick Friis	733e43eed0	docs: new stack diagram (#27972 )	2024-11-07 22:46:56 +00:00
Shawn Lee	6f368e9eab	community: handle chatdeepinfra jsondecode error (#27603 ) Fixes #27602 Added error handling to return empty dict if args is empty string or None. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 13:47:19 -08:00
Akshata	05fd6a16a9	Add ChatModels wrapper for Cloudflare Workers AI (#27645 ) Thank you for contributing to LangChain! - [x] PR title: "community: chat models wrapper for Cloudflare Workers AI" - [x] PR message: - Description: Add chat models wrapper for Cloudflare Workers AI. Enables Langgraph intergration via ChatModel for tool usage, agentic usage. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-07 15:34:24 -05:00
Aksel Joonas Reedi	2cb39270ec	community: bytes as a source to `AzureAIDocumentIntelligenceLoader` (#26618 ) - Description: This PR adds functionality to pass in in-memory bytes as a source to `AzureAIDocumentIntelligenceLoader`. - Issue: I needed the functionality, so I added it. - Dependencies: NA - Twitter handle: @akseljoonas if this is a big enough change :) --------- Co-authored-by: Aksel Joonas Reedi <aksel@klippa.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 03:40:21 +00:00
Martin Triska	7a9149f5dd	community: ZeroxPDFLoader (#27800 ) # OCR-based PDF loader This implements [Zerox](https://github.com/getomni-ai/zerox) PDF document loader. Zerox utilizes simple but very powerful (even though slower and more costly) approach to parsing PDF documents: it converts PDF to series of images and passes it to a vision model requesting the contents in markdown. It is especially suitable for complex PDFs that are not parsed well by other alternatives. ## Example use: ```python from langchain_community.document_loaders.pdf import ZeroxPDFLoader os.environ["OPENAI_API_KEY"] = "" ## your-api-key model = "gpt-4o-mini" ## openai model pdf_url = "https://assets.ctfassets.net/f1df9zr7wr1a/soP1fjvG1Wu66HJhu3FBS/034d6ca48edb119ae77dec5ce01a8612/OpenAI_Sacra_Teardown.pdf" loader = ZeroxPDFLoader(file_path=pdf_url, model=model) docs = loader.load() ``` The Zerox library supports wide range of provides/models. See Zerox documentation for details. - Dependencies: `zerox` - Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>	2024-11-07 03:14:57 +00:00
Dmitriy Prokopchuk	53b0a99f37	community: Memcached LLM Cache Integration (#27323 ) ## Description This PR adds support for Memcached as a usable LLM model cache by adding the ```MemcachedCache``` implementation relying on the [pymemcache](https://github.com/pinterest/pymemcache) client. Unit test-wise, the new integration is generally covered under existing import testing. All new functionality depends on pymemcache if instantiated and used, so to comply with the other cache implementations the PR also adds optional integration tests for ```MemcachedCache```. Since this is a new integration, documentation is added for Memcached as an integration and as an LLM Cache. ## Issue This PR closes #27275 which was originally raised as a discussion in #27035 ## Dependencies There are no new required dependencies for langchain, but [pymemcache](https://github.com/pinterest/pymemcache) is required to instantiate the new ```MemcachedCache```. ## Example Usage ```python3 from langchain.globals import set_llm_cache from langchain_openai import OpenAI from langchain_community.cache import MemcachedCache from pymemcache.client.base import Client llm = OpenAI(model="gpt-3.5-turbo-instruct", n=2, best_of=2) set_llm_cache(MemcachedCache(Client('localhost'))) # The first time, it is not yet in cache, so it should take longer llm.invoke("Which city is the most crowded city in the USA?") # The second time it is, so it goes faster llm.invoke("Which city is the most crowded city in the USA?") ``` --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-07 03:07:59 +00:00
Baptiste Pasquier	81f7daa458	community: add InfinityRerank (#27043 ) Description: - Add a Reranker for Infinity server. Dependencies: This wrapper uses [infinity_client](https://github.com/michaelfeil/infinity/tree/main/libs/client_infinity/infinity_client) to connect to an Infinity server. Tests and docs - integration test: test_infinity_rerank.py - example notebook: infinity_rerank.ipynb [here](https://github.com/baptiste-pasquier/langchain/blob/feat/infinity-rerank/docs/docs/integrations/document_transformers/infinity_rerank.ipynb) --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-06 17:26:30 -08:00

1 2 3 4 5 ...

1830 Commits