Commit Graph

195 Commits

Author SHA1 Message Date
Christophe Bornet
723031d548
community: Bump ruff version to 0.9 (#29206)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-08 01:21:10 +00:00
Marlene
4fa3ef0d55
Community/Partner: Adding Azure community and partner user agent to better track usage in Python (#29561)
- This pull request includes various changes to add a `user_agent`
parameter to Azure OpenAI, Azure Search and Whisper in the Community and
Partner packages. This helps in identifying the source of API requests
so we can better track usage and help support the community better. I
will also be adding the user_agent to the new `langchain-azure` repo as
well.

- No issue connected or  updated dependencies. 
- Utilises existing tests and docs

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-02-07 23:28:30 +00:00
Sunish Sheth
25ce1e211a
docs: Updating the imports for langchain-databricks to databricks-langchain (#29646)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-02-06 13:28:07 -08:00
Vincent Emonet
08b9eaaa6f
community: improve FastEmbedEmbeddings support for ONNX execution provider (e.g. GPU) (#29645)
I made a change to how was implemented the support for GPU in
`FastEmbedEmbeddings` to be more consistent with the existing
implementation `langchain-qdrant` sparse embeddings implementation

It is directly enabling to provide the list of ONNX execution providers:
https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/fastembed_sparse.py#L15

It is a bit less clear to a user that just wants to enable GPU, but
gives more capabilities to work with other execution providers that are
not the `CUDAExecutionProvider`, and is more future proof

Sorry for the disturbance @ccurme

> Nice to see you just moved to `uv`! It is so much nicer to run
format/lint/test! No need to manually rerun the `poetry install` with
all required extras now
2025-02-06 15:31:23 -05:00
Vincent Emonet
db8201d4da
community: fix typo in the module imported when using GPU with FastEmbedEmbeddings (#29631)
Made a mistake in the module to import (the module stay the same only
the installed package changes), fixed it and tested it

https://github.com/langchain-ai/langchain/pull/29627

@ccurme
2025-02-06 10:26:08 -05:00
Vincent Emonet
0ac5536f04
community: add support for using GPUs with FastEmbedEmbeddings (#29627)
- **Description:** add a `gpu: bool = False` field to the
`FastEmbedEmbeddings` class which enables to use GPU (through ONNX CUDA
provider) when generating embeddings with any fastembed model. It just
requires the user to install a different dependency and we use a
different provider when instantiating `fastembed.TextEmbedding`
- **Issue:** when generating embeddings for a really large amount of
documents this drastically increase performance (honestly that is a must
have in some situations, you can't just use CPU it is way too slow)
- **Dependencies:** no direct change to dependencies, but internally the
users will need to install `fastembed-gpu` instead of `fastembed`, I
made all the changes to the init function to properly let the user know
which dependency they should install depending on if they enabled `gpu`
or not
 
cf. fastembed docs about GPU for more details:
https://qdrant.github.io/fastembed/examples/FastEmbed_GPU/

I did not added test because it would require access to a GPU in the
testing environment
2025-02-06 08:04:19 -05:00
Ashutosh Kumar
65b404a2d1
[oci_generative_ai] Option to pass auth_file_location (#29481)
**PR title**: "community: Option to pass auth_file_location for
oci_generative_ai"

**Description:** Option to pass auth_file_location, to overwrite config
file default location "~/.oci/config" where profile name configs
present. This is not fixing any issues. Just added optional parameter
called "auth_file_location", which internally supported by any OCI
client including GenerativeAiInferenceClient.
2025-02-03 21:44:13 -05:00
Jorge Piedrahita Ortiz
3b886cdbb2
libs: add sambanova-lagchain integration package (#29417)
- **Description:**: Add sambanova-langchain integration package as
suggested in previous PRs

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-27 20:34:55 +00:00
CLOVA Studio 개발
7a95ffc775
community: fix some features on Naver ChatModel & embedding model 2 (#29243)
## Description
- Responding to `NCP API Key` changes.
- To fix `ChatClovaX` `astream` function to raise `SSEError` when an
error event occurs.
- To add `token length` and `ai_filter` to ChatClovaX's
`response_metadata`.
- To update document for apply NCP API Key changes.

cc. @efriis @vbarda
2025-01-20 11:01:03 -05:00
Inah Jeon
9d290abccd
partner: Update Upstage Model Names and Remove Deprecated Model (#29093)
This PR updates model names in the upstage library to reflect the latest
naming conventions and removes deprecated models.

Changes:

Renamed Models:
- `solar-1-mini-chat` -> `solar-mini`
- `solar-1-mini-embedding-query` -> `embedding-query`

Removed Deprecated Models:
- `layout-analysis` (replaced to `document-parse`)

Reference:
- https://console.upstage.ai/docs/getting-started/overview
-
https://github.com/langchain-ai/langchain-upstage/releases/tag/libs%2Fupstage%2Fv0.5.0

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-08 10:13:22 -05:00
Mohammad Mohtashim
41b6a86bbe
Community: LlamaCppEmbeddings embed_documents and embed_query (#28827)
- **Description:** `embed_documents` and `embed_query` was throwing off
the error as stated in the issue. The issue was that `Llama` client is
returning the embeddings in a nested list which is not being accounted
for in the current implementation and therefore the stated error is
being raised.
- **Issue:** #28813

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-23 09:50:22 -05:00
Lvlvko
5c17a4ace9
community: support Hunyuan Embedding (#23160)
## description

- I refactor `Chathunyuan` using tencentcloud sdk because I found the
original one can't work in my application
- I add `HunyuanEmbeddings` using tencentcloud sdk
- Both of them are extend the basic class of langchain. I have fully
tested them in my application

## Dependencies
- tencentcloud-sdk-python

---------

Co-authored-by: centonhuang <centonhuang@tencent.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 19:27:19 +00:00
Mohammad Mohtashim
4c1871d9a8
community: Passing the model_kwargs correctly while maintaing backward compatability (#28439)
- **Description:** `Model_Kwargs` was not being passed correctly to
`sentence_transformers.SentenceTransformer` which has been corrected
while maintaing backward compatability
- **Issue:** #28436

---------

Co-authored-by: MoosaTae <sadhis.tae@gmail.com>
Co-authored-by: Sadit Wongprayon <101176694+MoosaTae@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-15 20:34:29 +00:00
Thomas van Dongen
ee640d6bd3
community: fixed bug in model2vec embedding code (#28670)
This PR fixes a bug with the current implementation for Model2Vec
embeddings where `embed_documents` does not work as expected.

- **Description**: the current implementation uses `encode_as_sequence`
for encoding documents. This is incorrect, as `encode_as_sequence`
creates token embeddings and not mean embeddings. The normal `encode`
function handles both single and batched inputs and should be used
instead. The return type was also incorrect, as encode returns a NumPy
array. This PR converts the embedding to a list so that the output is
consistent with the Embeddings ABC.
2024-12-11 15:50:56 -08:00
Bagatur
e6a62d8422
core,langchain,community[patch]: allow langsmith 0.2 (#28598) 2024-12-10 18:50:58 +00:00
Abhinav
317a38b83e
community[minor]: Add support for modle2vec embeddings (#28507)
This PR add an embeddings integration for model2vec, the
`Model2vecEmbeddings` class.

- **Description**: [Model2Vec](https://github.com/MinishLab/model2vec)
lets you turn any sentence transformer into a really small static model
and makes running the model faster.
- **Issue**:
- **Dependencies**: model2vec
([pypi](https://pypi.org/project/model2vec/))
- **Twitter handle:**:

- [x] **Add tests and docs**: 
-
[Test](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/libs/community/langchain_community/embeddings/model2vec.py),
[docs](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/docs/docs/integrations/text_embedding/model2vec.ipynb)

- [x] **Lint and test**:

---------

Co-authored-by: Abhinav KM <abhinav.m@zerone-consulting.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-12-09 02:17:22 +00:00
ccurme
203d20caa5
community[patch]: fix errors introduced by pydantic 2.10 (#28297) 2024-11-22 17:50:13 -05:00
Erick Friis
d1108607f4
multiple: push deprecation removals to 1.0 (#28236) 2024-11-20 19:56:29 -08:00
Mikelarg
2901fa20cc
community: Add deprecation warning for GigaChat integration in langchain-community (#28022)
- **Description:** We have released the
[langchain-gigachat](https://github.com/ai-forever/langchain-gigachat?tab=readme-ov-file)
with new GigaChat integration that support's function/tool calling. This
PR deprecated legacy GigaChat class in community package.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-20 21:03:47 +00:00
CLOVA Studio 개발
218b4e073e
community: fix some features on Naver ChatModel & embedding model (#28228)
# Description

- adding stopReason to response_metadata to call stream and astream
- excluding NCP_APIGW_API_KEY input required validation
- to remove warning Field "model_name" has conflict with protected
namespace "model_".

cc. @vbarda
2024-11-20 10:35:41 -08:00
ZhangShenao
ca7375ac20
Improvement[Community]Improve Embeddings API (#28038)
- Fix `BaichuanTextEmbeddings` api url
- Remove unused params in api doc
- Fix word spelling
2024-11-12 13:57:35 -05:00
Stéphane Philippart
4b8cd7a09a
community: Use new OVHcloud batch embedding (#26209)
- **Description:** change to do the batch embedding server side and not
client side
- **Twitter handle:** @wildagsx

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-11-04 16:40:30 -05:00
L
8ef0df3539
feat: add batch request support for text-embedding-v3 model (#26375)
PR title: “langchain: add batch request support for text-embedding-v3
model”

PR message:

• Description: This PR introduces batch request support for the
text-embedding-v3 model within LangChain. The new functionality allows
users to process multiple text inputs in a single request, improving
efficiency and performance for high-volume applications.
	•	Issue: This PR addresses #<issue_number> (if applicable).
• Dependencies: No new external dependencies are required for this
change.
• Twitter handle: If announced on Twitter, please mention me at
@yourhandle.

Add tests and docs:

1. Added unit tests to cover the batch request functionality, ensuring
it operates without requiring network access.
2. Included an example notebook demonstrating the batch request feature,
located in docs/docs/integrations.

Lint and test: All required formatting and linting checks have been
performed using make format and make lint. The changes have been
verified with make test to ensure compatibility.

Additional notes:

	•	The changes are fully backwards compatible.
• No modifications were made to pyproject.toml, ensuring no new
dependencies were added.
• The update only affects the langchain package and does not involve
other packages.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-10-31 18:56:22 +00:00
Baptiste Pasquier
440c162b8b
community: Fix closed session in Infinity (#26933)
**Description:** 

The `aiohttp.ClientSession` is closed at the end of the with statement,
which causes an error during a second call.

The implemented fix is to define the session directly within the with
block, exactly like in the textembed code:


c6350d636e/libs/community/langchain_community/embeddings/textembed.py (L335-L346)
 
**Issue:** Fix #26932

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-10-27 11:37:21 -04:00
Erick Friis
600b7bdd61
all: test 3.13 ci (#27197)
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-10-25 12:56:58 -07:00
Steve Moss
24605bcdb6
community[patch]: Fix missing protected_namespaces(). (#27610)
- [x] **PR message**:
- **Description:** Fixes warning messages raised due to missing
`protected_namespaces` parameter in `ConfigDict`.
    - **Issue:** https://github.com/langchain-ai/langchain/issues/27609
    - **Dependencies:** No dependencies
    - **Twitter handle:** @gawbul
2024-10-25 02:16:26 +00:00
CLOVA Studio 개발
846a75284f
community: Add Naver chat model & embeddings (#25162)
Reopened as a personal repo outside the organization.

## Description
- Naver HyperCLOVA X community package 
  - Add chat model & embeddings
  - Add unit test & integration test
  - Add chat model & embeddings docs
- I changed partner
package(https://github.com/langchain-ai/langchain/pull/24252) to
community package on this PR
- Could this
embeddings(https://github.com/langchain-ai/langchain/pull/21890) be
deprecated? We are trying to replace it with embedding
model(**ClovaXEmbeddings**) in this PR.

Twitter handle: None. (if needed, contact with
joonha.jeon@navercorp.com)

---
you can check our previous discussion below:

> one question on namespaces - would it make sense to have these in
.clova namespaces instead of .naver?

I would like to keep it as is, unless it is essential to unify the
package name.
(ClovaX is a branding for the model, and I plan to add other models and
components. They need to be managed as separate classes.)

> also, could you clarify the difference between ClovaEmbeddings and
ClovaXEmbeddings?

There are 3 models that are being serviced by embedding, and all are
supported in the current PR. In addition, all the functionality of CLOVA
Studio that serves actual models, such as distinguishing between test
apps and service apps, is supported. The existing PR does not support
this content because it is hard-coded.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Vadym Barda <vadym@langchain.dev>
2024-10-24 20:54:13 +00:00
Fernando de Oliveira
ab205e7389
partners/openai + community: Async Azure AD token provider support for Azure OpenAI (#27488)
This PR introduces a new `azure_ad_async_token_provider` attribute to
the `AzureOpenAI` and `AzureChatOpenAI` classes in `partners/openai` and
`community` packages, given it's currently supported on `openai` package
as
[AsyncAzureADTokenProvider](https://github.com/openai/openai-python/blob/main/src/openai/lib/azure.py#L33)
type.

The reason for creating a new attribute is to avoid breaking changes.
Let's say you have an existing code that uses a `AzureOpenAI` or
`AzureChatOpenAI` instance to perform both sync and async operations.
The `azure_ad_token_provider` will work exactly as it is today, while
`azure_ad_async_token_provider` will override it for async requests.


If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-10-22 21:43:06 +00:00
Yuki Watanabe
b8bfebd382
community: Add deprecation notice for Databricks integration in langchain-community (#27355)
We have released the
[langchain-databricks](https://github.com/langchain-ai/langchain-databricks)
package for Databricks integration. This PR deprecates the legacy
classes within `langchain-community`.

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-16 02:20:40 +00:00
ZhangShenao
f3925d71b9
community: Fix word spelling in Text2vecEmbeddings (#27183)
Fix word spelling in `Text2vecEmbeddings`
2024-10-15 09:28:48 -07:00
Erick Friis
95a87291fd
community: deprecate community ollama integrations (#26733) 2024-10-01 09:18:07 -07:00
ZhangShenao
e317d457cf
Bug-Fix[Community] Fix FastEmbedEmbeddings (#26764)
#26759 

- Fix https://github.com/langchain-ai/langchain/issues/26759 
- Change `model` param from private to public, which may not be
initiated.
- Add test case
2024-09-30 21:23:08 -04:00
Gabriel Altay
bb40a0fb32
Remove pydantic restricted namespaces from HuggingFaceInferenceAPIEmbedings (#26744)
without this `model_config` importing this package produces warnings
about "model_name" having conflicts with protected namespace "model_".

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-09-22 08:05:37 -04:00
Jorge Piedrahita Ortiz
55b641b761
community: fix error in sambastudio embeddings (#26260)
fix error in samba studio embeddings  result unpacking
2024-09-19 09:57:04 -04:00
Erick Friis
c2a3021bb0
multiple: pydantic 2 compatibility, v0.3 (#26443)
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com>
Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com>
Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com>
Co-authored-by: ZhangShenao <15201440436@163.com>
Co-authored-by: Friso H. Kingma <fhkingma@gmail.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Morgante Pell <morgantep@google.com>
2024-09-13 14:38:45 -07:00
JonZeolla
78ff51ce83
community[patch]: update the default hf bge embeddings (#22627)
**Description:** This updates the langchain_community > huggingface >
default bge embeddings ([the current default recommends this
change](https://huggingface.co/BAAI/bge-large-en))
**Issue:** None
**Dependencies:** None
**Twitter handle:** @jonzeolla

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-09-02 22:10:21 +00:00
Leonid Ganeline
150251fd49
docs: integrations reference updates 13 (#25711)
Added missed provider pages and links. Fixed inconsistent formatting.
Added arxiv references to docstirngs.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-09-02 22:08:50 +00:00
Emmanuel Leroy
654da27255
improve llamacpp embeddings (#12972)
- **Description:**
Improve llamacpp embedding class by adding the `device` parameter so it
can be passed to the model and used with `gpu`, `cpu` or Apple metal
(`mps`).
Improve performance by making use of the bulk client api to compute
embeddings in batches.
  
  - **Dependencies:** none
  - **Tag maintainer:** 
@hwchase17

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-31 18:27:59 +00:00
Djordje
862ef32fdc
community: Fixed infinity embeddings async request (#25882)
**Description:** Fix async infinity embeddings
**Issue:** #24942  

@baskaryan, @ccurme
2024-08-30 12:10:34 -04:00
Kyle Winkelman
201bdf7148
community: Cap AzureOpenAIEmbeddings chunk_size at 2048 instead of 16. (#25852)
**Description:** Within AzureOpenAIEmbeddings there is a validation to
cap `chunk_size` at 16. The value of 16 is either an old limitation or
was erroneously chosen. I have checked all of the `preview` and `stable`
releases to ensure that the `embeddings` endpoint can handle 2048
entries
[Azure/azure-rest-api-specs](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference).
I have also found many locations that confirm this limit should be 2048:
-
https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#embeddings
-
https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits

**Issue:** fixes #25462
2024-08-29 16:48:04 +00:00
Jorge Piedrahita Ortiz
9ac953a948
Community: sambastudio embeddings GenericV2 API support (#25064)
- **Description:** 
        SambaStudio GenericV2 API support 
        Minor changes for requests error handling
2024-08-29 09:52:49 -04:00
Mikhail Khludnev
a017f49fd3
comminity[patch]: fix #25575 YandexGPTs for _grpc_metadata (#25617)
it fixes two issues:

### YGPTs are broken #25575

```
File ....conda/lib/python3.11/site-packages/langchain_community/embeddings/yandex.py:211, in _make_request(self, texts, **kwargs)
..
--> 211 res = stub.TextEmbedding(request, metadata=self._grpc_metadata)  # type: ignore[attr-defined]

AttributeError: 'YandexGPTEmbeddings' object has no attribute '_grpc_metadata'
```
My gut feeling that #23841 is the cause.

I have to drop leading underscore from `_grpc_metadata` for quickfix,
but I just don't know how to do it _pydantic_ enough.

### minor issue:

if we use `api_key`, which is not the best practice the code fails with 

```
File ~/git/...../python3.11/site-packages/langchain_community/embeddings/yandex.py:119, in YandexGPTEmbeddings.validate_environment(cls, values)
...

AttributeError: 'tuple' object has no attribute 'append'
```

- Added new integration test. But it requires YGPT env available and
active account. I don't know how int tests dis\enabled in CI.
 - added small unit tests with mocks. Should be fine.

---------

Co-authored-by: mikhail-khludnev <mikhail_khludnev@rntgroup.com>
2024-08-28 18:48:10 -07:00
James Espichan Vilca
644e0d3463
Use extend method for embeddings concatenation in mlflow_gateway (#14358)
## Description
There is a bug in the concatenation of embeddings obtained from MLflow
that does not conform to the type hint requested by the function.
``` python  
def _query(self, texts: List[str]) -> List[List[float]]:
```
It is logical to expect a **List[List[float]]** for a **List[str]**.
However, the append method encapsulates the response in a global List.
To avoid this, the extend method should be used, which will add the
embeddings of all strings at the same list level.

## Testing
I have tried using OpenAI-ADA to obtain the embeddings, and the result
of executing this snippet is as follows:

``` python  
embeds = await MlflowAIGatewayEmbeddings().aembed_documents(texts=["hi", "how are you?"])
print(embeds)
```  

``` python  
[[[-0.03512698, -0.020624293, -0.015343423, ...], [-0.021260535, -0.011461929, -0.00033121882, ...]]]
```
When in reality, the expected result should be:

``` python  
[[-0.03512698, -0.020624293, -0.015343423, ...], [-0.021260535, -0.011461929, -0.00033121882, ...]]
```
The above result complies with the expected type hint:
**List[List[float]]** . As I mentioned, we can achieve that by using the
extend method instead of the append method.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-23 14:43:43 +00:00
ccurme
27690506d0
multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
ZhangShenao
43deed2a95
Improvement[Embeddings] Add dimension support to ZhipuAIEmbeddings (#25274)
- In the in ` embedding-3 ` and later models of Zhipu AI, it is
supported to specify the dimensions parameter of Embedding. Ref:
https://bigmodel.cn/dev/api#text_embedding-3 .
- Add test case for `embedding-3` model by assigning dimensions.
2024-08-11 16:20:37 -04:00
Eugene Yurtsev
bd6c31617e
community[patch]: Remove more @allow_reuse=True validators (#25236)
Remove some additional allow_reuse=True usage in @root_validators.
2024-08-09 11:10:27 -04:00
Eugene Yurtsev
6e57aa7c36
community[patch]: Remove usage of @root_validator(allow_reuse=True) (#25235)
Remove usage of @root_validator(allow_reuse=True)
2024-08-09 10:57:42 -04:00
Eugene Yurtsev
98779797fe
community[patch]: Use get_fields adapter for pydantic (#25191)
Change all usages of __fields__ with get_fields adapter merged into
langchain_core.

Code mod generated using the following grit pattern:

```
engine marzano(0.1)
language python


`$X.__fields__` => `get_fields($X)` where {
    add_import(source="langchain_core.utils.pydantic", name="get_fields")
}
```
2024-08-08 14:43:09 -04:00
Eugene Yurtsev
bf5193bb99
community[patch]: Upgrade pydantic extra (#25185)
Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.

This works correctly also in pydantic v1.

```python
from pydantic.v1 import BaseModel

class Foo(BaseModel, extra="forbid"):
    x: int

Foo(x=5, y=1)
```

And 


```python
from pydantic.v1 import BaseModel

class Foo(BaseModel):
    x: int

    class Config:
      extra = "forbid"

Foo(x=5, y=1)
```


## Enum -> literal using grit pattern:

```
engine marzano(0.1)
language python
or {
    `extra=Extra.allow` => `extra="allow"`,
    `extra=Extra.forbid` => `extra="forbid"`,
    `extra=Extra.ignore` => `extra="ignore"`
}
```

Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)


## Sort attributes in Config:

```
engine marzano(0.1)
language python


function sort($values) js {
    return $values.text.split(',').sort().join("\n");
}


class_definition($name, $body) as $C where {
    $name <: `Config`,
    $body <: block($statements),
    $values = [],
    $statements <: some bubble($values) assignment() as $A where {
        $values += $A
    },
    $body => sort($values),
}

```
2024-08-08 17:20:39 +00:00
Pat Patterson
7e7fcf5b1f
community: Fix ValidationError on creating GPT4AllEmbeddings with no gpt4all_kwargs (#25124)
- **Description:** Instantiating `GPT4AllEmbeddings` with no
`gpt4all_kwargs` argument raised a `ValidationError`. Root cause: #21238
added the capability to pass `gpt4all_kwargs` through to the `GPT4All`
instance via `Embed4All`, but broke code that did not specify a
`gpt4all_kwargs` argument.
- **Issue:** #25119 
- **Dependencies:** None
- **Twitter handle:** [`@metadaddy`](https://twitter.com/metadaddy)
2024-08-07 13:34:01 +00:00