mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-02 21:23:32 +00:00
~Note that this PR is now Draft, so I didn't add change to `aindex` function and didn't add test codes for my change. After we have an agreement on the direction, I will add commits.~ `batch_size` is very difficult to decide because setting a large number like >10000 will impact VectorDB and RecordManager, while setting a small number will delete records unnecessarily, leading to redundant work, as the `IMPORTANT` section says. On the other hand, we can't use `full` because the loader returns just a subset of the dataset in our use case. I guess many people are in the same situation as us. So, as one of the possible solutions for it, I would like to introduce a new argument, `scoped_full_cleanup`. This argument will be valid only when `claneup` is Full. If True, Full cleanup deletes all documents that haven't been updated AND that are associated with source ids that were seen during indexing. Default is False. This change keeps backward compatibility. --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> |
||
---|---|---|
.. | ||
api_reference | ||
cassettes | ||
data | ||
docs | ||
scripts | ||
src | ||
static | ||
.gitignore | ||
.yarnrc.yml | ||
babel.config.js | ||
docusaurus.config.js | ||
ignore-step.sh | ||
Makefile | ||
package.json | ||
README.md | ||
sidebars.js | ||
vercel_requirements.txt | ||
vercel.json | ||
yarn.lock |
LangChain Documentation
For more information on contributing to our documentation, see the Documentation Contributing Guide