mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-08 16:06:30 +00:00
~Note that this PR is now Draft, so I didn't add change to `aindex` function and didn't add test codes for my change. After we have an agreement on the direction, I will add commits.~ `batch_size` is very difficult to decide because setting a large number like >10000 will impact VectorDB and RecordManager, while setting a small number will delete records unnecessarily, leading to redundant work, as the `IMPORTANT` section says. On the other hand, we can't use `full` because the loader returns just a subset of the dataset in our use case. I guess many people are in the same situation as us. So, as one of the possible solutions for it, I would like to introduce a new argument, `scoped_full_cleanup`. This argument will be valid only when `claneup` is Full. If True, Full cleanup deletes all documents that haven't been updated AND that are associated with source ids that were seen during indexing. Default is False. This change keeps backward compatibility. --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> |
||
---|---|---|
.. | ||
_templates | ||
additional_resources | ||
changes/changelog | ||
concepts | ||
contributing | ||
example_data | ||
how_to | ||
integrations | ||
troubleshooting/errors | ||
tutorials | ||
versions | ||
.gitignore | ||
introduction.mdx | ||
people.mdx |