mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-02 04:58:46 +00:00
**Description:** Currently, `CacheBackedEmbeddings` computes vectors for *all* uncached documents before updating the store. This pull request updates the embedding computation loop to compute embeddings in batches, updating the store after each batch. I noticed this when I tried `CacheBackedEmbeddings` on our 30k document set and the cache directory hadn't appeared on disk after 30 minutes. The motivation is to minimize compute/data loss when problems occur: * If there is a transient embedding failure (e.g. a network outage at the embedding endpoint triggers an exception), at least the completed vectors are written to the store instead of being discarded. * If there is an issue with the store (e.g. no write permissions), the condition is detected early without computing (and discarding!) all the vectors. **Issue:** Implements enhancement #18026. **Testing:** I was unable to run unit tests; details in [this post](https://github.com/langchain-ai/langchain/discussions/15019#discussioncomment-8576684). --------- Signed-off-by: chrispy <chrispy@synopsys.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> |
||
---|---|---|
.. | ||
_api | ||
beta | ||
callbacks | ||
document_loaders | ||
documents | ||
embeddings | ||
example_selectors | ||
language_models | ||
load | ||
messages | ||
output_parsers | ||
outputs | ||
prompts | ||
pydantic_v1 | ||
runnables | ||
tracers | ||
utils | ||
__init__.py | ||
agents.py | ||
caches.py | ||
chat_history.py | ||
chat_sessions.py | ||
env.py | ||
exceptions.py | ||
globals.py | ||
memory.py | ||
prompt_values.py | ||
py.typed | ||
retrievers.py | ||
stores.py | ||
sys_info.py | ||
tools.py | ||
vectorstores.py |