langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-10-11 20:07:52 +00:00

Author	SHA1	Message	Date
Bagatur	b2ba4f4072	core[patch]: fix deprecated pydantic code (#26161 )	2024-09-06 17:14:17 -04:00
ccurme	c27703a10f	core[patch]: resolve warnings (#26157 ) Resolve a batch of warnings	2024-09-06 15:00:53 -04:00
Eugene Yurtsev	ae5a574aa5	core[major]: Upgrade langchain-core to pydantic 2 (#25986 ) This PR upgrades core to pydantic 2. It involves a combination of manual changes together with automated code mods using gritql. Changes and known issues: 1. Current models override __repr__ to be consistent with pydantic 1 (this will be removed in a follow up PR) Related: https://github.com/langchain-ai/langchain/pull/25986/files#diff-e5bd296179b7a72fcd4ea5cfa28b145beaf787da057e6d122aa76ee0bb8132c9R74 2. Issue with decorator for BaseChatModel (https://github.com/langchain-ai/langchain/pull/25986/files#diff-932bf3b314b268754ef640a5b8f52da96f9024fb81dd388dcd166b5713ecdf66R202) -- cc @baskaryan 3. `name` attribute in Base Runnable does not have a default -- was raising a pydantic warning due to override. We need to see if there's a way to fix to avoid making a breaking change for folks with custom runnables. (https://github.com/langchain-ai/langchain/pull/25986/files#diff-836773d27f8565f4dd45e9d6cf828920f89991a880c098b7511e0d3bb78a8a0dR238) 4. Likely can remove hard-coded RunnableBranch name (https://github.com/langchain-ai/langchain/pull/25986/files#diff-72894b94f70b1bfc908eb4d53f5ff90bb33bf8a4240a5e34cae48ddc62ac313aR147) 5. `model_*` namespace is reserved in pydantic. We'll need to specify `protected_namespaces` 6. create_model does not have a cached path yet 7. get_input_schema() in many places has been updated to be explicit about whether parameters are required or optional 8. injected tool args aren't picked up properly (losing type annotation) For posterity the following gritql migrations were used: ``` engine marzano(0.1) language python or { `from $IMPORT import $...` where { $IMPORT <: contains `pydantic_v1`, $IMPORT => `pydantic` }, `$X.update_forward_refs` => `$X.model_rebuild`, // This pattern still needs fixing as it fails (populate_by_name vs. // allow_populate_by_name) class_definition($name, $body) as $C where { $name <: `Config`, $body <: block($statements), $t = "", $statements <: some bubble($t) assignment(left=$x, right=$y) as $A where { or { $x <: `allow_population_by_field_name` where { $t += `populate_by_name=$y,` }, $t += `$x=$y,` } }, $C => `model_config = ConfigDict($t)`, add_import(source="pydantic", name="ConfigDict") } } ``` ``` engine marzano(0.1) language python `@root_validator(pre=True)` as $decorator where { $decorator <: before function_definition($body, $return_type), $decorator => `@model_validator(mode="before")\n@classmethod`, add_import(source="pydantic", name="model_validator"), $return_type => `Any` } ``` ``` engine marzano(0.1) language python `@root_validator(pre=False, skip_on_failure=True)` as $decorator where { $decorator <: before function_definition($body, $parameters, $return_type) where { $body <: contains bubble or { `values["$Q"]` => `self.$Q`, `values.get("$Q")` => `(self.$Q or None)`, `values.get($Q, $...)` as $V where { $Q <: contains `"$QName"`, $V => `self.$QName`, }, `return $Q` => `return self` } }, $decorator => `@model_validator(mode="after")`, // Silly work around a bug in grit // Adding Self to pydantic and then will replace it with one from typing add_import(source="pydantic", name="model_validator"), $parameters => `self`, $return_type => `Self` } ``` ``` grit apply --language python '`Self` where { add_import(source="typing_extensions", name="Self")}' ```	2024-09-03 16:30:44 -04:00
Christophe Bornet	ff0df5ea15	core[patch]: Add B(bugbear) ruff rules (#25520 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-08-28 07:09:29 +00:00
Eugene Yurtsev	425f6ffa5b	core[patch]: Fix aindex API (#25155 ) A previous PR accidentally broke the aindex API by renaming a positional argument vectorstore into vector_store. This PR reverts this change.	2024-08-08 12:08:18 -04:00
Eugene Yurtsev	d283f452cc	core[minor]: Add support for DocumentIndex in the index api (#25100 ) Support document index in the index api.	2024-08-06 12:30:49 -07:00
Eugene Yurtsev	41dfad5104	core[minor]: Introduce DocumentIndex abstraction (#25062 ) This PR adds a minimal document indexer abstraction. The goal of this abstraction is to allow developers to create custom retrievers that also have a standard indexing API and allow updating the document content in them. The abstraction comes with a test suite that can verify that the indexer implements the correct semantics. This is an iteration over a previous PRs (https://github.com/langchain-ai/langchain/pull/24364). The main difference is that we're sub-classing from BaseRetriever in this iteration and as so have consolidated the sync and async interfaces. The main problem with the current design is that runt time search configuration has to be specified at init rather than provided at run time. We will likely resolve this issue in one of the two ways: (1) Define a method (`get_retriever`) that will allow creating a retriever at run time with a specific configuration.. If we do this, we will likely break the subclass on BaseRetriever (2) Generalize base retriever so it can support structured queries --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-08-05 18:06:33 +00:00
Eugene Yurtsev	e0186df56b	core[patch]: Clarify upsert response semantics (#23921 )	2024-07-05 15:59:47 -04:00
Eugene Yurtsev	6f08e11d7c	core[minor]: add upsert, streaming_upsert, aupsert, astreaming_upsert methods to the VectorStore abstraction (#23774 ) This PR rolls out part of the new proposed interface for vectorstores (https://github.com/langchain-ai/langchain/pull/23544) to existing store implementations. The PR makes the following changes: 1. Adds standard upsert, streaming_upsert, aupsert, astreaming_upsert methods to the vectorstore. 2. Updates `add_texts` and `aadd_texts` to be non required with a default implementation that delegates to `upsert` and `aupsert` if those have been implemented. The original `add_texts` and `aadd_texts` methods are problematic as they spread object specific information across document and *kwargs. (e.g., ids are not a part of the document) 3. Adds a default implementation to `add_documents` and `aadd_documents` that delegates to `upsert` and `aupsert` respectively. 4. Adds standard unit tests to verify that a given vectorstore implements a correct read/write API. A downside of this implementation is that it creates `upsert` with a very similar signature to `add_documents`. The reason for introducing `upsert` is to: Remove any ambiguities about what information is allowed in `kwargs`. Specifically kwargs should only be used for information common to all indexed data. (e.g., indexing timeout). *Allow inheriting from an anticipated generalized interface for indexing that will allow indexing `BaseMedia` (i.e., allow making a vectorstore for images/audio etc.) `add_documents` can be deprecated in the future in favor of `upsert` to make sure that users have a single correct way of indexing content. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-07-05 12:21:40 -04:00
Leonid Ganeline	716a316654	core: docstrings `indexing` (#23785 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-07-03 11:27:34 -04:00
Philippe PRADOS	8711c61298	core[minor]: Adds an in-memory implementation of RecordManager (#13200 ) Description: langchain offers three technologies to save data: - [vectorstore](https://python.langchain.com/docs/modules/data_connection/vectorstores/) - [docstore](https://js.langchain.com/docs/api/schema/classes/Docstore) - [record manager](https://python.langchain.com/docs/modules/data_connection/indexing) If you want to combine these technologies in a sample persistence stategy you need a common implementation for each. `DocStore` propose `InMemoryDocstore`. We propose the class `MemoryRecordManager` to complete the system. This is the prelude to another full-request, which needs a consistent combination of persistence components. Tag maintainer: @baskaryan Twitter handle: @pprados --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-06-20 12:19:10 -04:00
Eugene Yurtsev	4fe8403bfb	core[patch]: Expand documentation in the indexing namespace (#23134 )	2024-06-19 10:11:44 -04:00
Leonid Ganeline	1a2ff56cd8	core[patch[: docstring update (#21036 ) Added missed docstrings. Updated docstrings to consistent format.	2024-04-29 15:35:34 -04:00
Eugene Yurtsev	d8aa72f51d	core[minor],langchain[patch]: Move base indexing interface and logic to core (#20667 ) This PR moves the interface and the logic to core. The following changes to namespaces: `indexes` -> `indexing` `indexes._api` -> `indexing.api` Testing code is intentionally duplicated for now since it's testing different implementations of the record manager (in-memory vs. SQL). Common logic will need to be pulled out into the test client. A follow up PR will move the SQL based implementation outside of LangChain.	2024-04-24 13:18:42 -04:00

14 Commits