langchain/docs
Keiichi Hirobe 258b3be5ec
core[minor]: add new clean up strategy "scoped_full" to indexing (#28505)
~Note that this PR is now Draft, so I didn't add change to `aindex`
function and didn't add test codes for my change.
After we have an agreement on the direction, I will add commits.~

`batch_size` is very difficult to decide because setting a large number
like >10000 will impact VectorDB and RecordManager, while setting a
small number will delete records unnecessarily, leading to redundant
work, as the `IMPORTANT` section says.
On the other hand, we can't use `full` because the loader returns just a
subset of the dataset in our use case.

I guess many people are in the same situation as us.

So, as one of the possible solutions for it, I would like to introduce a
new argument, `scoped_full_cleanup`.
This argument will be valid only when `claneup` is Full. If True, Full
cleanup deletes all documents that haven't been updated AND that are
associated with source ids that were seen during indexing. Default is
False.

This change keeps backward compatibility.

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 20:35:25 +00:00
..
api_reference docs: ganalytics in api ref (#28697) 2024-12-12 23:55:59 +00:00
cassettes docs: update tutorials (#28219) 2024-11-26 10:43:12 -05:00
data docs: 👥 Update LangChain people data (#27022) 2024-10-08 17:09:07 +00:00
docs core[minor]: add new clean up strategy "scoped_full" to indexing (#28505) 2024-12-13 20:35:25 +00:00
scripts docs: more useful vercel warnings (#28699) 2024-12-13 03:07:24 +00:00
src docs: update intro page (#28639) 2024-12-13 15:24:14 -05:00
static community: update Memgraph integration (#27017) 2024-12-10 10:57:21 -05:00
.gitignore infra: cleanup docs build (#21134) 2024-05-01 17:34:05 -07:00
.yarnrc.yml docs[minor]: Add thumbs up/down to all docs pages (#18526) 2024-03-04 15:14:28 -08:00
babel.config.js Restructure docs (#11620) 2023-10-10 12:55:19 -07:00
docusaurus.config.js docs: throw on broken anchors (#27773) 2024-11-13 14:29:27 -05:00
ignore-step.sh docs: ignore case production fork master (#27971) 2024-11-07 13:55:21 -08:00
Makefile docs: more useful vercel warnings (#28699) 2024-12-13 03:07:24 +00:00
package.json docs: raw loader codeblock (#28548) 2024-12-06 09:26:34 -08:00
README.md docs: reorganize contributing docs (#27649) 2024-10-25 22:41:54 +00:00
sidebars.js docs: reorg sidebar (#27978) 2024-11-15 14:28:18 -08:00
vercel_requirements.txt docs: more useful vercel warnings (#28699) 2024-12-13 03:07:24 +00:00
vercel.json langchain-weaviate: Remove outdated docs (#28058) 2024-12-10 05:00:07 +00:00
yarn.lock docs: raw loader codeblock (#28548) 2024-12-06 09:26:34 -08:00

LangChain Documentation

For more information on contributing to our documentation, see the Documentation Contributing Guide