langchain/libs/core/langchain_core/utils
Eugene Yurtsev 6f08e11d7c
core[minor]: add upsert, streaming_upsert, aupsert, astreaming_upsert methods to the VectorStore abstraction (#23774)
This PR rolls out part of the new proposed interface for vectorstores
(https://github.com/langchain-ai/langchain/pull/23544) to existing store
implementations.

The PR makes the following changes:

1. Adds standard upsert, streaming_upsert, aupsert, astreaming_upsert
methods to the vectorstore.
2. Updates `add_texts` and `aadd_texts` to be non required with a
default implementation that delegates to `upsert` and `aupsert` if those
have been implemented. The original `add_texts` and `aadd_texts` methods
are problematic as they spread object specific information across
document and **kwargs. (e.g., ids are not a part of the document)
3. Adds a default implementation to `add_documents` and `aadd_documents`
that delegates to `upsert` and `aupsert` respectively.
4. Adds standard unit tests to verify that a given vectorstore
implements a correct read/write API.

A downside of this implementation is that it creates `upsert` with a
very similar signature to `add_documents`.
The reason for introducing `upsert` is to:
* Remove any ambiguities about what information is allowed in `kwargs`.
Specifically kwargs should only be used for information common to all
indexed data. (e.g., indexing timeout).
*Allow inheriting from an anticipated generalized interface for indexing
that will allow indexing `BaseMedia` (i.e., allow making a vectorstore
for images/audio etc.)
 
`add_documents` can be deprecated in the future in favor of `upsert` to
make sure that users have a single correct way of indexing content.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-05 12:21:40 -04:00
..
__init__.py core[minor]: add upsert, streaming_upsert, aupsert, astreaming_upsert methods to the VectorStore abstraction (#23774) 2024-07-05 12:21:40 -04:00
_merge.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
aiter.py core[minor]: add upsert, streaming_upsert, aupsert, astreaming_upsert methods to the VectorStore abstraction (#23774) 2024-07-05 12:21:40 -04:00
env.py core[minor]: Support multiple keys in get_from_dict_or_env (#23086) 2024-06-18 14:13:28 -04:00
formatting.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
function_calling.py [Core] Unify function schema parsing (#23370) 2024-07-03 09:55:38 -07:00
html.py core[patch]: Enhance link extraction with query parameters (#20259) 2024-04-27 02:22:36 +00:00
image.py core[minor]: Image prompt template (#14263) 2024-01-27 17:04:29 -08:00
input.py infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00
interactive_env.py core[patch]: simple prompt pretty printing (#15968) 2024-01-12 21:08:51 -05:00
iter.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
json_schema.py core[patch]: fixed circular dependency with json schema (#18657) 2024-03-12 05:42:45 +00:00
json.py core[minor], ...: add tool calls message (#18947) 2024-04-09 18:41:42 -05:00
loading.py docs: Fix URL formatting in deprecation warnings (#23075) 2024-06-18 14:49:58 -04:00
mustache.py core: fix mustache falsy cases (#22747) 2024-06-10 14:00:12 -07:00
pydantic.py Separate out langchain_core package (#13577) 2023-11-20 13:09:30 -08:00
strings.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
utils.py core[patch]: utils.guard_import fix (#21133) 2024-05-03 17:21:36 -04:00