Commit Graph

25 Commits

Author SHA1 Message Date
Chaymae El Aattabi
4b08a7e8e8
Fix #29759: Use local chunk_size_ for looping in embed_documents (#29761)
This fix ensures that the chunk size is correctly determined when
processing text embeddings. Previously, the code did not properly handle
cases where chunk_size was None, potentially leading to incorrect
chunking behavior.

Now, chunk_size_ is explicitly set to either the provided chunk_size or
the default self.chunk_size, ensuring consistent chunking. This update
improves reliability when processing large text inputs in batches and
prevents unintended behavior when chunk_size is not specified.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-02-13 01:28:26 +00:00
Anton Dubovik
3e2cb4e8a4
openai: embeddings: supported chunk_size when check_embedding_ctx_length is disabled (#23767)
Chunking of the input array controlled by `self.chunk_size` is being
ignored when `self.check_embedding_ctx_length` is disabled. Effectively,
the chunk size is assumed to be equal 1 in such a case. This is
suprising.

The PR takes into account `self.chunk_size` passed by the user.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-09-20 16:58:45 -07:00
Erick Friis
c2a3021bb0
multiple: pydantic 2 compatibility, v0.3 (#26443)
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com>
Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com>
Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com>
Co-authored-by: ZhangShenao <15201440436@163.com>
Co-authored-by: Friso H. Kingma <fhkingma@gmail.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Morgante Pell <morgantep@google.com>
2024-09-13 14:38:45 -07:00
Eugene Yurtsev
bc3b851f08
openai[patch]: Upgrade @root_validators in preparation for pydantic 2 migration (#25491)
* Upgrade @root_validator in openai pkg
* Ran notebooks for all but AzureAI embeddings

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-09-03 14:42:24 -07:00
Kyle Winkelman
09c2d8faca
langchain_openai: Cleanup OpenAIEmbeddings validate_environment. (#25855)
**Description:** [This portion of
code](https://github.com/langchain-ai/langchain/blob/v0.1.16/libs/partners/openai/langchain_openai/embeddings/base.py#L189-L196)
has no use as a couple lines later a [`ValueError` is
thrown](https://github.com/langchain-ai/langchain/blob/v0.1.16/libs/partners/openai/langchain_openai/embeddings/base.py#L209-L213).
**Issue:** A follow up to #25852.
2024-08-29 13:54:43 -04:00
Bagatur
2b4fbcb4b4
docs: format oai embeddings docstring (#25448) 2024-08-15 16:57:54 +00:00
Eugene Yurtsev
d00176e523
openai[patch]: Update extra to match pydantic 2 (#25382)
Backwards compatible change that converts pydantic extras to literals
which is consistent with pydantic 2 usage.
2024-08-14 09:55:18 -04:00
Eugene Yurtsev
0a3500808d
openai[patch]: Docs fix RST formatting in OpenAIEmbeddings (#25293) 2024-08-12 11:24:35 -04:00
Eugene Yurtsev
ee8a585791
openai[patch]: Add API Reference docs to OpenAIEmbeddings (#25290)
Issue: [24856](https://github.com/langchain-ai/langchain/issues/24856)
2024-08-12 14:53:51 +00:00
Pavel
7fcfe7c1f4
openai[patch]: openai proxy added to base embeddings (#24539)
- [ ] **PR title**: "langchain-openai: openai proxy added to base
embeddings"

- [ ] **PR message**: 
    - **Description:** 
    Dear langchain developers,
You've already supported proxy for ChatOpenAI implementation in your
package. At the same time, if somebody needed to use proxy for chat, it
also could be necessary to be able to use it for OpenAIEmbeddings.
That's why I think it's important to add proxy support for OpenAI
embeddings. That's what I've done in this PR.

@baskaryan

---------

Co-authored-by: karpov <karpov@dohod.ru>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-07-28 20:54:13 +00:00
Bagatur
8698cb9b28
infra: add more formatter rules to openai (#23189)
Turns on
https://docs.astral.sh/ruff/settings/#format_docstring-code-format and
https://docs.astral.sh/ruff/settings/#format_skip-magic-trailing-comma

```toml
[tool.ruff.format]
docstring-code-format = true
skip-magic-trailing-comma = true
```
2024-06-19 11:39:58 -07:00
seyf97
2904c50cd5
openai[patch]: correct grammar in exception message in embeddings/base.py (#22629)
Correct the grammar error for missing transformers package ValueError
2024-06-06 18:55:04 +00:00
Erick Friis
e41d801369
openai[patch]: fix embedding float precision issue (#21736)
also clean up + comment some of the embedding batching code
2024-05-16 02:06:51 +00:00
Bagatur
bef50ded63
openai[patch]: fix special token default behavior (#21131)
By default handle special sequences as regular text
2024-04-30 20:08:24 -04:00
Charlie Marsh
8f38b7a725
multiple: Remove unnecessary Ruff suppression comments (#21050)
## Summary

I ran `ruff check --extend-select RUF100 -n` to identify `# noqa`
comments that weren't having any effect in Ruff, and then `ruff check
--extend-select RUF100 -n --fix` on select files to remove all of the
unnecessary `# noqa: F401` violations. It's possible that these were
needed at some point in the past, but they're not necessary in Ruff
v0.1.15 (used by LangChain) or in the latest release.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-04-30 17:13:48 +00:00
YISH
ed26149a29
openai[patch]: Allow disablling safe_len_embeddings(OpenAIEmbeddings) (#19743)
OpenAI API compatible server may not support `safe_len_embedding`, 

use `disable_safe_len_embeddings=True` to disable it.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-25 09:45:52 -07:00
Bagatur
611d5a1618
openai[patch]: fix async http client (#19164)
Fix #19116
2024-03-16 17:50:22 -07:00
aditya thomas
5c2f7e6b2b
partners[openai]: update the docstring of OpenAI, OpenAIEmbeddings and ChatOpenAI classes (#18908)
**Description:** Update the docstring of OpenAI, OpenAIEmbeddings and
ChatOpenAI classes
**Issue:** Update import module paths to the current LangChain API
**Dependencies:** None
**Lint and test**: `make format` and `make lint` were run

This incorporates the review comments from langchain-ai/langchain#18637
which I closed due to an issue I had in updating that pr branch

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-11 20:48:54 +00:00
kkdamowang
6782dac420
docs: remove duplicate quote in AzureOpenAIEmbeddings doc (#18315)
- **Description:** Remove duplicate quote in AzureOpenAIEmbeddings doc,
remove trailing spaces.
- **Issue:** No
- **Dependencies:** No
2024-02-29 11:25:50 -08:00
Erick Friis
a05fb19f42
openai[patch]: remove numpy dep (#18034) 2024-02-23 21:12:05 +00:00
Savvas Mantzouranidis
691ff67096
partners/openai: fix depracation errors of pydantic's .dict() function (reopen #16629) (#17404)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-20 16:57:34 -08:00
Bagatur
35446c814e
openai[patch]: rm tiktoken model warning (#16964) 2024-02-03 16:36:57 -08:00
Erick Friis
bb3b6bde33
openai[minor]: change to secretstr (#16803) 2024-01-30 15:49:56 -08:00
Bagatur
61e876aad8
openai[patch]: Explicitly support embedding dimensions (#16596) 2024-01-25 15:16:04 -08:00
Erick Friis
ebc75c5ca7
openai[minor]: implement langchain-openai package (#15503)
Todo

- [x] copy over integration tests
- [x] update docs with new instructions in #15513 
- [x] add linear ticket to bump core -> community, community->langchain,
and core->openai deps
- [ ] (optional): add `pip install langchain-openai` command to each
notebook using it
- [x] Update docstrings to not need `openai` install
- [x] Add serialization
- [x] deprecate old models

Contributor steps:

- [x] Add secret names to manual integrations workflow in
.github/workflows/_integration_test.yml
- [x] Add secrets to release workflow (for pre-release testing) in
.github/workflows/_release.yml

Maintainer steps (Contributors should not do these):

- [x] set up pypi and test pypi projects
- [x] add credential secrets to Github Actions
- [ ] add package to conda-forge


Functional changes to existing classes:

- now relies on openai client v1 (1.6.1) via concrete dep in
langchain-openai package

Codebase organization

- some function calling stuff moved to
`langchain_core.utils.function_calling` in order to be used in both
community and langchain-openai
2024-01-05 15:03:28 -08:00