github/langchain

mirror of https://github.com/hwchase17/langchain.git synced 2026-05-12 17:57:22 +00:00

Files

History

Chris Papademetrious 305d74c67a core: implement a batch_size parameter for CacheBackedEmbeddings (#18070 )

**Description:**

Currently, `CacheBackedEmbeddings` computes vectors for *all* uncached
documents before updating the store. This pull request updates the
embedding computation loop to compute embeddings in batches, updating
the store after each batch.

I noticed this when I tried `CacheBackedEmbeddings` on our 30k document
set and the cache directory hadn't appeared on disk after 30 minutes.

The motivation is to minimize compute/data loss when problems occur:

* If there is a transient embedding failure (e.g. a network outage at
the embedding endpoint triggers an exception), at least the completed
vectors are written to the store instead of being discarded.
* If there is an issue with the store (e.g. no write permissions), the
condition is detected early without computing (and discarding!) all the
vectors.

**Issue:**
Implements enhancement #18026.

**Testing:**
I was unable to run unit tests; details in [this
post](https://github.com/langchain-ai/langchain/discussions/15019#discussioncomment-8576684).

---------

Signed-off-by: chrispy <chrispy@synopsys.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>

2024-03-19 18:55:43 +00:00

..

community[patch], langchain[minor]: Add retriever self_query and score_threshold in DingoDB (#18106 )

2024-03-05 15:47:29 -08:00

👥 Update LangChain people data (#18473 )

2024-03-03 19:58:58 -08:00

core: implement a batch_size parameter for CacheBackedEmbeddings (#18070 )

2024-03-19 18:55:43 +00:00

ci[minor]: Bump LC scripts package, add retry option (#19285 )

2024-03-19 10:42:59 -07:00

docs[patch]: properly load/use env vars (#18942 )

2024-03-11 15:38:05 -07:00

docs: Add graph construction docs (#18904 )

2024-03-13 12:27:58 -07:00

.gitignore

docs[minor]: Swap gtag for supabase (#18937 )

2024-03-11 14:23:12 -07:00

.local_build.sh

docs: partner packages (#16960 )

2024-02-02 15:12:21 -08:00

.yarnrc.yml

docs[minor]: Add thumbs up/down to all docs pages (#18526 )

2024-03-04 15:14:28 -08:00

babel.config.js

…

code-block-loader.js

…

docusaurus.config.js

docs[patch]: properly load/use env vars (#18942 )

2024-03-11 15:38:05 -07:00

package.json

ci[minor]: Bump LC scripts package, add retry option (#19285 )

2024-03-19 10:42:59 -07:00

README.md

docs: developer docs (#14776 )

2023-12-17 12:55:49 -08:00

settings.ini

…

sidebars.js

docs: Toolkits menu (#16217 )

2024-02-08 14:52:26 -08:00

vercel_build.sh

docs: fix vercel build script (#19090 )

2024-03-14 20:53:43 +00:00

vercel_requirements.txt

infra: docs build install community editable (#14739 )

2023-12-14 16:13:09 -08:00

vercel.json

docs: providers update 4 (#18540 )

2024-03-09 13:30:48 -08:00

yarn.lock

ci[minor]: Bump LC scripts package, add retry option (#19285 )

2024-03-19 10:42:59 -07:00

README.md

LangChain Documentation

For more information on contributing to our documentation, see the Documentation Contributing Guide