langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-06-03 21:54:04 +00:00

History

Chris Papademetrious 305d74c67a core: implement a batch_size parameter for CacheBackedEmbeddings (#18070 ) Description: Currently, `CacheBackedEmbeddings` computes vectors for all uncached documents before updating the store. This pull request updates the embedding computation loop to compute embeddings in batches, updating the store after each batch. I noticed this when I tried `CacheBackedEmbeddings` on our 30k document set and the cache directory hadn't appeared on disk after 30 minutes. The motivation is to minimize compute/data loss when problems occur: * If there is a transient embedding failure (e.g. a network outage at the embedding endpoint triggers an exception), at least the completed vectors are written to the store instead of being discarded. * If there is an issue with the store (e.g. no write permissions), the condition is detected early without computing (and discarding!) all the vectors. Issue: Implements enhancement #18026. Testing: I was unable to run unit tests; details in [this post](https://github.com/langchain-ai/langchain/discussions/15019#discussioncomment-8576684). --------- Signed-off-by: chrispy <chrispy@synopsys.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>		2024-03-19 18:55:43 +00:00
..
__init__.py	core[minor]: Image prompt template (#14263 )	2024-01-27 17:04:29 -08:00
_merge.py	core[minor]: generation info on msg (#18592 )	2024-03-12 04:43:17 +00:00
aiter.py	Separate out langchain_core package (#13577 )	2023-11-20 13:09:30 -08:00
env.py	Improve: remove extra spaces in get_from_env error (#15064 )	2023-12-22 11:50:03 -08:00
formatting.py	core[patch]: docstring update (#16813 )	2024-02-09 12:47:41 -08:00
function_calling.py	core: update _rm_titles to account for title argument name bug (#19036 )	2024-03-18 21:25:06 -07:00
html.py	core[patch], community[patch]: link extraction continue on failure (#17200 )	2024-02-07 14:15:30 -08:00
image.py	core[minor]: Image prompt template (#14263 )	2024-01-27 17:04:29 -08:00
input.py	infra: add print rule to ruff (#16221 )	2024-02-09 16:13:30 -08:00
interactive_env.py	core[patch]: simple prompt pretty printing (#15968 )	2024-01-12 21:08:51 -05:00
iter.py	core: implement a batch_size parameter for CacheBackedEmbeddings (#18070 )	2024-03-19 18:55:43 +00:00
json_schema.py	core[patch]: fixed circular dependency with json schema (#18657 )	2024-03-12 05:42:45 +00:00
loading.py	core[patch]: deprecate hwchase17/langchain-hub, address path traversal (#18600 )	2024-03-05 12:49:38 -08:00
pydantic.py	Separate out langchain_core package (#13577 )	2023-11-20 13:09:30 -08:00
strings.py	community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463 )	2023-12-11 13:53:30 -08:00
utils.py	Separate out langchain_core package (#13577 )	2023-11-20 13:09:30 -08:00