langchain/libs/core/langchain_core/indexing
federico-pisanu 2538963945
core[patch]: improve index/aindex api when batch_size<n_docs (#25754)
- **Description:** prevent index function to re-index entire source
document even if nothing has changed.
- **Issue:** #22135

I worked on a solution to this issue that is a compromise between being
cheap and being fast.
In the previous code, when batch_size is greater than the number of docs
from a certain source almost the entire source is deleted (all documents
from that source except for the documents in the first batch)
My solution deletes documents from vector store and record manager only
if at least one document has changed for that source.

Hope this can help!

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-09-30 20:57:41 +00:00
..
__init__.py core[minor]: Introduce DocumentIndex abstraction (#25062) 2024-08-05 18:06:33 +00:00
api.py core[patch]: improve index/aindex api when batch_size<n_docs (#25754) 2024-09-30 20:57:41 +00:00
base.py core: Add ruff rules for pycodestyle Warning (W) (#26964) 2024-09-30 09:31:43 -04:00
in_memory.py core: Put Python version as a project requirement so it is considered by ruff (#26608) 2024-09-18 14:37:57 +00:00