langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-07-09 14:35:50 +00:00

Author	SHA1	Message	Date
João Carlos Ferra de Almeida	780ce00dea	core[minor]: add kwargs to index and aindex functions for custom vector_field support (#26998 ) Added `kwargs` parameters to the `index` and `aindex` functions in `libs/core/langchain_core/indexing/api.py`. This allows users to pass additional arguments to the `add_documents` and `aadd_documents` methods, enabling the specification of a custom `vector_field`. For example, users can now use `vector_field="embedding"` when indexing documents in `OpenSearchVectorStore` --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-07 14:52:50 -04:00
federico-pisanu	2538963945	core[patch]: improve index/aindex api when batch_size<n_docs (#25754 ) - Description: prevent index function to re-index entire source document even if nothing has changed. - Issue: #22135 I worked on a solution to this issue that is a compromise between being cheap and being fast. In the previous code, when batch_size is greater than the number of docs from a certain source almost the entire source is deleted (all documents from that source except for the documents in the first batch) My solution deletes documents from vector store and record manager only if at least one document has changed for that source. Hope this can help! --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-09-30 20:57:41 +00:00
Christophe Bornet	db8845a62a	core: Add ruff rules for pycodestyle Warning (W) (#26964 ) All auto-fixes.	2024-09-30 09:31:43 -04:00
Christophe Bornet	7809b31b95	core[patch]: Add ruff rules for flake8-simplify (SIM) (#26848 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-09-27 20:13:23 +00:00
Christophe Bornet	a47b332841	core: Put Python version as a project requirement so it is considered by ruff (#26608 ) Ruff doesn't know about the python version in `[tool.poetry.dependencies]`. It can get it from `project.requires-python`. Notes: * poetry seems to have issues getting the python constraints from `requires-python` and using `python` in per dependency constraints. So I had to duplicate the info. I will open an issue on poetry. * `inspect.isclass()` doesn't work correctly with `GenericAlias` (`list[...]`, `dict[..., ...]`) on Python <3.11 so I added some `not isinstance(type, GenericAlias)` checks: Python 3.11 ```pycon >>> import inspect >>> inspect.isclass(list) True >>> inspect.isclass(list[str]) False ``` Python 3.9 ```pycon >>> import inspect >>> inspect.isclass(list) True >>> inspect.isclass(list[str]) True ``` Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-09-18 14:37:57 +00:00
Christophe Bornet	3a99467ccb	core[patch]: Add ruff rule UP006(use PEP585 annotations) (#26574 ) * Added rules `UPD006` now that Pydantic is v2+ --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-09-17 21:22:50 +00:00
Erick Friis	c2a3021bb0	multiple: pydantic 2 compatibility, v0.3 (#26443 ) Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com>	2024-09-13 14:38:45 -07:00
Christophe Bornet	ff0df5ea15	core[patch]: Add B(bugbear) ruff rules (#25520 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-08-28 07:09:29 +00:00
Eugene Yurtsev	425f6ffa5b	core[patch]: Fix aindex API (#25155 ) A previous PR accidentally broke the aindex API by renaming a positional argument vectorstore into vector_store. This PR reverts this change.	2024-08-08 12:08:18 -04:00
Eugene Yurtsev	d283f452cc	core[minor]: Add support for DocumentIndex in the index api (#25100 ) Support document index in the index api.	2024-08-06 12:30:49 -07:00
Eugene Yurtsev	41dfad5104	core[minor]: Introduce DocumentIndex abstraction (#25062 ) This PR adds a minimal document indexer abstraction. The goal of this abstraction is to allow developers to create custom retrievers that also have a standard indexing API and allow updating the document content in them. The abstraction comes with a test suite that can verify that the indexer implements the correct semantics. This is an iteration over a previous PRs (https://github.com/langchain-ai/langchain/pull/24364). The main difference is that we're sub-classing from BaseRetriever in this iteration and as so have consolidated the sync and async interfaces. The main problem with the current design is that runt time search configuration has to be specified at init rather than provided at run time. We will likely resolve this issue in one of the two ways: (1) Define a method (`get_retriever`) that will allow creating a retriever at run time with a specific configuration.. If we do this, we will likely break the subclass on BaseRetriever (2) Generalize base retriever so it can support structured queries --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-08-05 18:06:33 +00:00
Eugene Yurtsev	e0186df56b	core[patch]: Clarify upsert response semantics (#23921 )	2024-07-05 15:59:47 -04:00
Eugene Yurtsev	6f08e11d7c	core[minor]: add upsert, streaming_upsert, aupsert, astreaming_upsert methods to the VectorStore abstraction (#23774 ) This PR rolls out part of the new proposed interface for vectorstores (https://github.com/langchain-ai/langchain/pull/23544) to existing store implementations. The PR makes the following changes: 1. Adds standard upsert, streaming_upsert, aupsert, astreaming_upsert methods to the vectorstore. 2. Updates `add_texts` and `aadd_texts` to be non required with a default implementation that delegates to `upsert` and `aupsert` if those have been implemented. The original `add_texts` and `aadd_texts` methods are problematic as they spread object specific information across document and *kwargs. (e.g., ids are not a part of the document) 3. Adds a default implementation to `add_documents` and `aadd_documents` that delegates to `upsert` and `aupsert` respectively. 4. Adds standard unit tests to verify that a given vectorstore implements a correct read/write API. A downside of this implementation is that it creates `upsert` with a very similar signature to `add_documents`. The reason for introducing `upsert` is to: Remove any ambiguities about what information is allowed in `kwargs`. Specifically kwargs should only be used for information common to all indexed data. (e.g., indexing timeout). *Allow inheriting from an anticipated generalized interface for indexing that will allow indexing `BaseMedia` (i.e., allow making a vectorstore for images/audio etc.) `add_documents` can be deprecated in the future in favor of `upsert` to make sure that users have a single correct way of indexing content. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-07-05 12:21:40 -04:00
Leonid Ganeline	716a316654	core: docstrings `indexing` (#23785 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-07-03 11:27:34 -04:00
Philippe PRADOS	8711c61298	core[minor]: Adds an in-memory implementation of RecordManager (#13200 ) Description: langchain offers three technologies to save data: - [vectorstore](https://python.langchain.com/docs/modules/data_connection/vectorstores/) - [docstore](https://js.langchain.com/docs/api/schema/classes/Docstore) - [record manager](https://python.langchain.com/docs/modules/data_connection/indexing) If you want to combine these technologies in a sample persistence stategy you need a common implementation for each. `DocStore` propose `InMemoryDocstore`. We propose the class `MemoryRecordManager` to complete the system. This is the prelude to another full-request, which needs a consistent combination of persistence components. Tag maintainer: @baskaryan Twitter handle: @pprados --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-06-20 12:19:10 -04:00
Eugene Yurtsev	4fe8403bfb	core[patch]: Expand documentation in the indexing namespace (#23134 )	2024-06-19 10:11:44 -04:00
Leonid Ganeline	1a2ff56cd8	core[patch[: docstring update (#21036 ) Added missed docstrings. Updated docstrings to consistent format.	2024-04-29 15:35:34 -04:00
Eugene Yurtsev	d8aa72f51d	core[minor],langchain[patch]: Move base indexing interface and logic to core (#20667 ) This PR moves the interface and the logic to core. The following changes to namespaces: `indexes` -> `indexing` `indexes._api` -> `indexing.api` Testing code is intentionally duplicated for now since it's testing different implementations of the record manager (in-memory vs. SQL). Common logic will need to be pulled out into the test client. A follow up PR will move the SQL based implementation outside of LangChain.	2024-04-24 13:18:42 -04:00

18 Commits