langchain/docs/docs
Keiichi Hirobe 258b3be5ec
core[minor]: add new clean up strategy "scoped_full" to indexing (#28505)
~Note that this PR is now Draft, so I didn't add change to `aindex`
function and didn't add test codes for my change.
After we have an agreement on the direction, I will add commits.~

`batch_size` is very difficult to decide because setting a large number
like >10000 will impact VectorDB and RecordManager, while setting a
small number will delete records unnecessarily, leading to redundant
work, as the `IMPORTANT` section says.
On the other hand, we can't use `full` because the loader returns just a
subset of the dataset in our use case.

I guess many people are in the same situation as us.

So, as one of the possible solutions for it, I would like to introduce a
new argument, `scoped_full_cleanup`.
This argument will be valid only when `claneup` is Full. If True, Full
cleanup deletes all documents that haven't been updated AND that are
associated with source ids that were seen during indexing. Default is
False.

This change keeps backward compatibility.

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 20:35:25 +00:00
..
_templates Docs: Fixed grammatical mistake (#16858) 2024-02-01 11:28:15 -08:00
additional_resources docs: more api ref links, add linting step to prevent more (#28495) 2024-12-04 04:19:42 +00:00
changes/changelog docs: Add langchain over time (#21434) 2024-05-10 00:34:35 +00:00
concepts docs: readme/intro nits (#28581) 2024-12-06 12:52:15 -08:00
contributing docs: integration contrib typo (#28642) 2024-12-09 23:46:31 +00:00
example_data docs[minor]: Add "Build a PDF ingestion and Question/Answering system" tutorial (#22570) 2024-06-05 17:09:28 -07:00
how_to core[minor]: add new clean up strategy "scoped_full" to indexing (#28505) 2024-12-13 20:35:25 +00:00
integrations community[minor]: Add TablestoreVectorStore (#25767) 2024-12-13 11:17:28 -08:00
troubleshooting/errors docs: Fixed wrong link redirect from JS ToolMessage to Python ToolMes… (#28083) 2024-11-13 10:05:19 -05:00
tutorials community[patch]: fix QuerySQLDatabaseTool name (#28659) 2024-12-12 19:16:03 -08:00
versions docs: more api ref links, add linting step to prevent more (#28495) 2024-12-04 04:19:42 +00:00
.gitignore
introduction.mdx docs: update intro page (#28639) 2024-12-13 15:24:14 -05:00
people.mdx 👥 Update LangChain people data (#17743) 2024-02-20 18:30:11 -08:00