langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-09-22 11:00:37 +00:00

Author	SHA1	Message	Date
Juan Jose Miguel Ovalle Villamil	51baa1b5cf	langchain[patch]: fix-cohere-reranker-rerank-method with cohere v5 (#19486 ) #### Description Fixed the following error with `rerank` method from `CohereRerank`: ``` ---> [79](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:79) results = self.client.rerank( [80](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:80) query, docs, model, top_n=top_n, max_chunks_per_doc=max_chunks_per_doc [81](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:81) ) [82](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:82) result_dicts = [] [83](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:83) for res in results.results: TypeError: BaseCohere.rerank() takes 1 positional argument but 4 positional arguments (and 2 keyword-only arguments) were given ``` This was easily fixed going from this: ``` def rerank( self, documents: Sequence[Union[str, Document, dict]], query: str, , model: Optional[str] = None, top_n: Optional[int] = -1, max_chunks_per_doc: Optional[int] = None, ) -> List[Dict[str, Any]]: ... if len(documents) == 0: # to avoid empty api call return [] docs = [ doc.page_content if isinstance(doc, Document) else doc for doc in documents ] model = model or self.model top_n = top_n if (top_n is None or top_n > 0) else self.top_n results = self.client.rerank( query, docs, model, top_n=top_n, max_chunks_per_doc=max_chunks_per_doc ) result_dicts = [] for res in results: result_dicts.append( {"index": res.index, "relevance_score": res.relevance_score} ) return result_dicts ``` to this: ``` def rerank( self, documents: Sequence[Union[str, Document, dict]], query: str, , model: Optional[str] = None, top_n: Optional[int] = -1, max_chunks_per_doc: Optional[int] = None, ) -> List[Dict[str, Any]]: ... if len(documents) == 0: # to avoid empty api call return [] docs = [ doc.page_content if isinstance(doc, Document) else doc for doc in documents ] model = model or self.model top_n = top_n if (top_n is None or top_n > 0) else self.top_n results = self.client.rerank( query=query, documents=docs, model=model, top_n=top_n, max_chunks_per_doc=max_chunks_per_doc <------------- ) result_dicts = [] for res in results.results: <------------- result_dicts.append( {"index": res.index, "relevance_score": res.relevance_score} ) return result_dicts ``` #### Unit & Integration tests I added a unit test to check the behaviour of `rerank`. Also fixed the original integration test which was failing. #### Format & Linting Everything worked properly with `make lint_diff`, `make format_diff` and `make format`. However I noticed an error coming from other part of the library when doing `make lint`: ``` (langchain-py3.9) ➜ langchain git:(master) make format [ "." = "" ] \|\| poetry run ruff format . 1636 files left unchanged [ "." = "" ] \|\| poetry run ruff --select I --fix . (langchain-py3.9) ➜ langchain git:(master) make lint ./scripts/check_pydantic.sh . ./scripts/lint_imports.sh poetry run ruff . [ "." = "" ] \|\| poetry run ruff format . --diff 1636 files already formatted [ "." = "" ] \|\| poetry run ruff --select I . [ "." = "" ] \|\| mkdir -p .mypy_cache && poetry run mypy . --cache-dir .mypy_cache langchain/agents/openai_assistant/base.py:252: error: Argument "file_ids" to "create" of "Assistants" has incompatible type "Optional[Any]"; expected "Union[list[str], NotGiven]" [arg-type] langchain/agents/openai_assistant/base.py:374: error: Argument "file_ids" to "create" of "AsyncAssistants" has incompatible type "Optional[Any]"; expected "Union[list[str], NotGiven]" [arg-type] Found 2 errors in 1 file (checked 1634 source files) make: *** [Makefile:65: lint] Error 1 ``` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 06:32:03 +00:00
William FH	5c41f4083e	[Evals] Fix function calling support (#19658 ) Current implementation is overzealous in validating chat datasets Fixes [#langsmith-sdk:557](https://github.com/langchain-ai/langsmith-sdk/issues/557)	2024-03-27 17:23:35 -07:00
Evgenii Zheltonozhskii	5b1f9c6d3a	infra: Consistent lxml requirements (#19520 ) Update the dependency for lxml to be consistent among different packages; should fix https://github.com/langchain-ai/langchain/issues/19040	2024-03-27 20:27:59 +00:00
Christophe Bornet	9954c6a38e	langchain[minor]: Add async methods to EncoderBackedStore (#19597 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-27 17:36:36 +00:00
Christophe Bornet	7c2578bd55	langchain[patch]: Add async methods to EmbeddingRouterChain (#19603 )	2024-03-26 14:33:36 -07:00
Christophe Bornet	b3d7b5a653	langchain[patch[: Add async methods to TimeWeightedVectorStoreRetriever (#19606 )	2024-03-26 14:03:47 -07:00
Christophe Bornet	a7274f006e	langchain[patch]: Add async methods to VectorstoreIndexCreator (#19582 )	2024-03-26 13:57:13 -07:00
Yuki Watanabe	cfecbda48b	community[minor]: Allow passing `allow_dangerous_deserialization` when loading LLM chain (#18894 ) ### Issue Recently, the new `allow_dangerous_deserialization` flag was introduced for preventing unsafe model deserialization that relies on pickle without user's notice (#18696). Since then some LLMs like Databricks requires passing in this flag with true to instantiate the model. However, this breaks existing functionality to loading such LLMs within a chain using `load_chain` method, because the underlying loader function [load_llm_from_config](`f96dd57501/libs/langchain/langchain/chains/loading.py (L40)`) (and load_llm) ignores keyword arguments passed in. ### Solution This PR fixes this issue by propagating the `allow_dangerous_deserialization` argument to the class loader iff the LLM class has that field. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 11:07:55 -04:00
Shotaro Sano	55c624a694	infra: Resolve the endless dependency resolution during the build of `dev.Dockerfile` by copying `poetry.lock` (#19465 ) ## Description This PR proposes a modification to the `libs/langchain/dev.Dockerfile` configuration to copy the `libs/langchain/poetry.lock` into the working directory. The change aims to address the issue where the Poetry install command, the last command in the `dev.Dockerfile`, takes excessively long hours, and to ensure the reproducibility of the poetry environment in the devcontainer. ## Problem The `dev.Dockerfile`, prepared for development environments such as `.devcontainer`, encounters an unending dependency resolution when attempting the Poetry installation. ### Steps to Reproduce Execute the following build command: ```bash docker build -f libs/langchain/dev.Dockerfile . ``` ### Current Behavior The Docker build process gets stuck at the following step, which, in my experience, did not conclude even after an entire night: ``` => [langchain-dev-dependencies 4/6] COPY libs/community/ ../community/ 0.9s => [langchain-dev-dependencies 5/6] COPY libs/text-splitters/ ../text-splitters/ 0.0s => [langchain-dev-dependencies 6/6] RUN poetry install --no-interaction --no-ansi --with dev,test,docs 12.3s => => # Updating dependencies => => # Resolving dependencies... ``` ### Expected Behavior The Docker build completes in a realistic timeframe. By applying this PR, the build finishes within a few minutes. ### Analysis The complexity of LangChain's dependencies has reached a point where Poetry is required to resolve dependencies akin to threading a needle. Consequently, poetry install fails to complete in a practical timeframe. ## Solution The solution for dependency resolution is already recorded in `libs/langchain/poetry.lock`, so we can use it. When copying `project.toml` and `poetry.toml`, the `poetry.lock` located in the same directory should also be copied. ```diff # Copy only the dependency files for installation -COPY libs/langchain/pyproject.toml libs/langchain/poetry.toml ./ +COPY libs/langchain/pyproject.toml libs/langchain/poetry.toml libs/langchain/poetry.lock ./ ``` ## Note I am not intimately familiar with the historical context of the `dev.Dockerfile` and thus do not know why `poetry.lock` has not been copied until now. It might have been an oversight, or perhaps dependency resolution used to complete quickly even without the `poetry.lock` file in the past. However, if there are deliberate reasons why copying `poetry.lock` is not advisable, please just close this PR.	2024-03-26 10:54:53 -04:00
Christophe Bornet	999365186b	langchain[major]: Use InMemoryVectorStore by default in VectorstoreIndexCreator (#19575 ) This is a small breaking change but I think it should be done as: * No external dependency needs to be installed anymore for the default to work * It is vendor-neutral	2024-03-26 10:01:23 -04:00
Aayush Kataria	03c38005cb	community[patch]: Fixing some caching issues for AzureCosmosDBSemanticCache (#18884 ) Fixing some issues for AzureCosmosDBSemanticCache - Added the entry for "AzureCosmosDBSemanticCache" which was missing in langchain/cache.py - Added application name when creating the MongoClient for the AzureCosmosDBVectorSearch, for tracking purposes. @baskaryan, can you please review this PR, we need this to go in asap. These are just small fixes which we found today in our testing.	2024-03-25 19:06:17 -07:00
billytrend-cohere	63343b4987	cohere[patch]: add cohere as a partner package (#19049 ) Description: adds support for langchain_cohere --------- Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-03-25 20:23:47 +00:00
Zachary Wilkins	e1a6341940	langchain: Passthrough batch_size on index()/aindex() calls (#19443 ) Description: This change passes through `batch_size` to `add_documents()`/`aadd_documents()` on calls to `index()` and `aindex()` such that the documents are processed in the expected batch size. Issue: #19415 Dependencies: N/A Twitter handle: N/A	2024-03-25 11:58:29 -04:00
Christophe Bornet	63898dbda0	langchain[patch]: Use async memory in Chain when needed (#19429 )	2024-03-24 23:49:00 -07:00
Christophe Bornet	1b813fe6fe	langchain[patch]: Add async methods to VectorStoreRetrieverMemory (#19408 )	2024-03-22 15:44:24 -07:00
ccurme	8a2528c34a	[langchain] fix OpenAIAssistantRunnable.create_assistant (#19081 ) - Description: OpenAI assistants support some pre-built tools (e.g., `"retrieval"` and `"code_interpreter"`) and expect these as `{"type": "code_interpreter"}`. This may have been upset by https://github.com/langchain-ai/langchain/pull/18935 - Issue: https://github.com/langchain-ai/langchain/issues/19057	2024-03-22 13:23:19 -04:00
Bagatur	d95ea3550e	langchain[patch]: Release 0.1.13 (#19351 )	2024-03-20 18:25:12 +00:00
Eugene Yurtsev	aa9ccca775	langchain[patch]: Add tests for indexing (#19342 ) This PR adds tests for the indexing API	2024-03-20 13:00:22 -04:00
mackong	d9396bdec1	langchain[patch]: add stop for various non-openai agents (#19333 ) * Description: add stop for various non-openai agents. * Issue: N/A * Dependencies: N/A --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-20 11:34:10 -04:00
Chris Papademetrious	305d74c67a	core: implement a batch_size parameter for CacheBackedEmbeddings (#18070 ) Description: Currently, `CacheBackedEmbeddings` computes vectors for all uncached documents before updating the store. This pull request updates the embedding computation loop to compute embeddings in batches, updating the store after each batch. I noticed this when I tried `CacheBackedEmbeddings` on our 30k document set and the cache directory hadn't appeared on disk after 30 minutes. The motivation is to minimize compute/data loss when problems occur: * If there is a transient embedding failure (e.g. a network outage at the embedding endpoint triggers an exception), at least the completed vectors are written to the store instead of being discarded. * If there is an issue with the store (e.g. no write permissions), the condition is detected early without computing (and discarding!) all the vectors. Issue: Implements enhancement #18026. Testing: I was unable to run unit tests; details in [this post](https://github.com/langchain-ai/langchain/discussions/15019#discussioncomment-8576684). --------- Signed-off-by: chrispy <chrispy@synopsys.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-19 18:55:43 +00:00
William FH	89af30807b	Permit function eval on llm data type (#19287 )	2024-03-19 11:53:50 -07:00
Frederico Wu	f36418a5b0	langchain: creating assistants with file_ids (#19199 ) Changing OpenAIAssistantRunnable.create_assistant to send the `file_ids` parameter to openai.beta.assistants.create Co-authored-by: Frederico Wu <fred.diaswu@coxautoinc.com>	2024-03-18 21:34:03 -07:00
Simon Stone	58c7687174	langchain: preserve document metadata in `FlashrankRerank` (#19148 ) Description: Preserves document metadata in `FlashrankRerank` - Issue: #19142 - Dependencies: None - Twitter handle: n/a --------- Co-authored-by: Simon Stone <simon.stone@dartmouth.edu>	2024-03-19 04:15:18 +00:00
Shotaro Sano	ca9c8c58ea	text-splitters, infra: fix `libs/langchain/dev.Dockerfile` so that the `text-splitter` directory is copied before poetry installation (#19214 ) ## Description This PR modifies the settings in `libs/langchain/dev.Dockerfile` to ensure that the `text-splitters` directory is copied before the poetry installation process begins. Without this modification, the `docker build` command fails for `dev.Dockerfile`, preventing the setup of some development environments, including `.devcontainer`. ## Bug Details ### Repro Run the following command: ```bash docker build -f libs/langchain/dev.Dockerfile . ``` ### Current Behavior The docker build command fails, raising the following error: ``` ... => [langchain-dev-dependencies 4/5] COPY libs/community/ ../community/ 0.4s => ERROR [langchain-dev-dependencies 5/5] RUN poetry install --no-interaction --no-ansi --with dev,test,docs 1.1s ------ > [langchain-dev-dependencies 5/5] RUN poetry install --no-interaction --no-ansi --with dev,test,docs: #13 0.970 #13 0.970 Directory ../text-splitters does not exist ------ executor failed running [/bin/sh -c poetry install --no-interaction --no-ansi --with dev,test,docs]: exit code: 1 ``` ### Expected Behavior The `docker build` command successfully completes without the poetry error. ### Analysis The error occurs because the `text-splitters` directory is not copied into the build environment, unlike the other packages under the `libs` directory. I suspect that the `COPY` setting was overlooked since `text-splitters` was separated in a recent PR. ## Fix Add the following lines to the `libs/langchain/dev.Dockerfile`: ```dockerfile # Copy the text-splitters library for installation COPY libs/text-splitters/ ../text-splitters/ ```	2024-03-18 20:45:35 -07:00
Erick Friis	95904fe443	langchain[patch]: update base imports to core (#19248 ) still deprecated, but was misleading before	2024-03-19 03:17:07 +00:00
Eugene Yurtsev	0ddfe7fc9d	langchain[patch]: make hub work with older langchainhub versions (#19076 ) Make it work with older clients	2024-03-15 15:37:52 -07:00
Eugene Yurtsev	745d2476a2	langchain: upgrade mypy (#19163 ) Update mypy in langchain	2024-03-15 16:37:09 -04:00
Erick Friis	781aee0068	community, langchain, infra: revert store extended test deps outside of poetry (#19153 ) Reverts langchain-ai/langchain#18995 Because it makes installing dependencies in python 3.11 extended testing take 80 minutes	2024-03-15 17:10:47 +00:00
Erick Friis	9e569d85a4	community, langchain, infra: store extended test deps outside of poetry (#18995 ) poetry can't reliably handle resolving the number of optional "extended test" dependencies we have. If we instead just rely on pip to install extended test deps in CI, this isn't an issue.	2024-03-15 05:55:30 +00:00
Nuno Campos	508f75853c	core[patch]: Change structured prompt lc id to match js (#19099 )	2024-03-14 20:02:52 -07:00
Erick Friis	873d06c009	langchain[patch]: release 0.1.12 (#18999 )	2024-03-13 00:22:21 +00:00
Roshan Santhosh	acf1ecc081	langchain[patch]: update llm_router.py (#18865 ) Issue : _call method of LLMRouterChain uses predict_and_parse, which is slated for deprecation. Description : Instead of using predict_and_parse, this replaces it with individual predict and parse functions.	2024-03-11 22:30:07 -07:00
Bagatur	e0e688a277	core[minor]: generation info on msg (#18592 ) related to #16403 #17188	2024-03-12 04:43:17 +00:00
Tomaz Bratanic	a28be31a96	Switch to md5 for deduplication in neo4j integrations (#18846 ) Deduplicate documents using MD5 of the page_content. Also allows for custom deduplication with graph ingestion method by providing metadata id attribute --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-03-09 13:28:55 -08:00
Erick Friis	b48865bf94	langchain[patch]: attach hub metadata (#18830 )	2024-03-08 18:40:49 -08:00
Erick Friis	a88f62ec3c	langchain[patch]: getattr import from langchain.chains (#18160 )	2024-03-08 10:36:14 -08:00
Bagatur	3e29c04213	core[minor]: add BaseMessage.response_metadata (#18699 )	2024-03-08 09:35:56 -08:00
Bagatur	bc6249c889	langchain[patch]: runnable agent streaming param (#18761 ) Usage: ```python agent = RunnableAgent(runnable=runnable, .., stream_runnable=False) ``` or for convenience ```python agent_executor = AgentExecutor(agent=agent, ..., stream_runnable=False) ```	2024-03-07 20:53:53 -08:00
Eugene Yurtsev	e188d4ecb0	Add dangerous parameter to requests tool (#18697 ) The tools are already documented as dangerous. Not clear whether adding an opt-in parameter is necessary or not	2024-03-07 15:10:56 -05:00
Dounx	ad48f55357	community[minor]: add Yuque document loader (#17924 ) This pull request support loading documents from Yuque with Langchain. Yuque is a professional cloud-based knowledge base for team collaboration in documentation. Website: https://www.yuque.com OpenAPI: https://www.yuque.com/yuque/developer/openapi	2024-03-05 15:54:07 -08:00
Hech	6a08134661	community[patch], langchain[minor]: Add retriever self_query and score_threshold in DingoDB (#18106 )	2024-03-05 15:47:29 -08:00
Bagatur	5fc67ca2c7	langchain[patch]: Release 0.1.11 (#18558 )	2024-03-04 23:58:34 -08:00
William FH	ca1d42785d	Evals wording (#18542 )	2024-03-04 16:32:33 -08:00
William FH	30ccc009e6	[Evals] Support list examples by dataset version tag (#18534 ) previously only supported by timestamp	2024-03-04 14:23:32 -08:00
William FH	1eec67e8fe	Evaluate on Version (#18471 )	2024-03-03 17:47:35 -08:00
Harrison Chase	73d653324f	[Evals] Session-level feedback (#18463 ) Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>	2024-03-03 17:18:29 -08:00
mackong	b89d9fc177	langchain[patch]: add tools renderer for various non-openai agents (#18307 ) - Description: add tools_renderer for various non-openai agents, make tools can be render in different ways for your LLM. - Issue: N/A - Dependencies: N/A --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-03-03 14:25:12 -08:00
Aayush Kataria	7c2f3f6f95	community[minor]: Adding Azure Cosmos Mongo vCore Vector DB Cache (#16856 ) Description: This pull request introduces several enhancements for Azure Cosmos Vector DB, primarily focused on improving caching and search capabilities using Azure Cosmos MongoDB vCore Vector DB. Here's a summary of the changes: - AzureCosmosDBSemanticCache: Added a new cache implementation called AzureCosmosDBSemanticCache, which utilizes Azure Cosmos MongoDB vCore Vector DB for efficient caching of semantic data. Added comprehensive test cases for AzureCosmosDBSemanticCache to ensure its correctness and robustness. These tests cover various scenarios and edge cases to validate the cache's behavior. - HNSW Vector Search: Added HNSW vector search functionality in the CosmosDB Vector Search module. This enhancement enables more efficient and accurate vector searches by utilizing the HNSW (Hierarchical Navigable Small World) algorithm. Added corresponding test cases to validate the HNSW vector search functionality in both AzureCosmosDBSemanticCache and AzureCosmosDBVectorSearch. These tests ensure the correctness and performance of the HNSW search algorithm. - LLM Caching Notebook - The notebook now includes a comprehensive example showcasing the usage of the AzureCosmosDBSemanticCache. This example highlights how the cache can be employed to efficiently store and retrieve semantic data. Additionally, the example provides default values for all parameters used within the AzureCosmosDBSemanticCache, ensuring clarity and ease of understanding for users who are new to the cache implementation. @hwchase17,@baskaryan, @eyurtsev,	2024-03-03 14:04:15 -08:00
Erick Friis	f96dd57501	langchain[patch]: release 0.1.10 (#18410 )	2024-03-02 01:48:57 +00:00
Petteri Johansson	6c1989d292	community[minor], langchain[minor], docs: Gremlin Graph Store and QA Chain (#17683 ) - Description: New feature: Gremlin graph-store and QA chain (including docs). Compatible with Azure CosmosDB. - Dependencies: no changes	2024-03-01 12:21:14 -08:00

... 5 6 7 8 9 ...

2427 Commits