Fix #29759: Use local chunk_size_ for looping in embed_documents (#29761)

This fix ensures that the chunk size is correctly determined when
processing text embeddings. Previously, the code did not properly handle
cases where chunk_size was None, potentially leading to incorrect
chunking behavior.

Now, chunk_size_ is explicitly set to either the provided chunk_size or
the default self.chunk_size, ensuring consistent chunking. This update
improves reliability when processing large text inputs in batches and
prevents unintended behavior when chunk_size is not specified.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
Chaymae El Aattabi
2025-02-13 02:28:26 +01:00
committed by GitHub
parent 1fbc01c350
commit 4b08a7e8e8
2 changed files with 22 additions and 1 deletions

View File

@@ -573,7 +573,7 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
chunk_size_ = chunk_size or self.chunk_size
if not self.check_embedding_ctx_length:
embeddings: List[List[float]] = []
for i in range(0, len(texts), self.chunk_size):
for i in range(0, len(texts), chunk_size_):
response = self.client.create(
input=texts[i : i + chunk_size_], **self._invocation_params
)