Commit Graph

1804 Commits

Author SHA1 Message Date
Jorge Piedrahita Ortiz
9ac953a948
Community: sambastudio embeddings GenericV2 API support (#25064)
- **Description:** 
        SambaStudio GenericV2 API support 
        Minor changes for requests error handling
2024-08-29 09:52:49 -04:00
Sam Jove
bdce9a47d0
community[patch]: callback before yield for _astream (gigachat) (#25834)
Description: Moves yield to after callback for _astream for gigachat in
the community package
Issue: #16913

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-29 13:29:28 +00:00
Jinoos Lee
703af9ffe3
Patch enable to use Amazon OpenSearch Serverless(aoss) for Semantic Cache store (#25833)
- [x] **PR title**: "community: Patch enable to use Amazon OpenSearch
Serverless for Semantic Cache store"

- [x] **PR message**: 
- **Description:** OpenSearchSemanticCache class support Amazon
OpenSearch Serverless for Semantic Cache store, it's only required to
pass auth(http_auth) parameter to initializer
    - **Dependencies:** none

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Jinoos Lee <jinoos@amazon.com>
2024-08-29 13:28:22 +00:00
Mikhail Khludnev
a017f49fd3
comminity[patch]: fix #25575 YandexGPTs for _grpc_metadata (#25617)
it fixes two issues:

### YGPTs are broken #25575

```
File ....conda/lib/python3.11/site-packages/langchain_community/embeddings/yandex.py:211, in _make_request(self, texts, **kwargs)
..
--> 211 res = stub.TextEmbedding(request, metadata=self._grpc_metadata)  # type: ignore[attr-defined]

AttributeError: 'YandexGPTEmbeddings' object has no attribute '_grpc_metadata'
```
My gut feeling that #23841 is the cause.

I have to drop leading underscore from `_grpc_metadata` for quickfix,
but I just don't know how to do it _pydantic_ enough.

### minor issue:

if we use `api_key`, which is not the best practice the code fails with 

```
File ~/git/...../python3.11/site-packages/langchain_community/embeddings/yandex.py:119, in YandexGPTEmbeddings.validate_environment(cls, values)
...

AttributeError: 'tuple' object has no attribute 'append'
```

- Added new integration test. But it requires YGPT env available and
active account. I don't know how int tests dis\enabled in CI.
 - added small unit tests with mocks. Should be fine.

---------

Co-authored-by: mikhail-khludnev <mikhail_khludnev@rntgroup.com>
2024-08-28 18:48:10 -07:00
Serena Ruan
850bf89e48
community[patch]: Support passing extra params for executing functions in UCFunctionToolkit (#25652)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


Support passing extra params when executing UC functions:
The params should be a dictionary with key EXECUTE_FUNCTION_ARG_NAME,
the assumption is that the function itself doesn't use such variable
name (starting and ending with double underscores), and if it does we
raise Exception.
If invalid params passing to the execute_statement, we raise Exception
as well.


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-08-28 18:47:32 -07:00
崔浩
3555882a0d
community[patch]: optimize xinference llm import (#25809)
Thank you for contributing to LangChain!

- [ ] **PR title**: "community: optimize xinference llm import"

- [ ] **PR message**: 
- **Description:** from xinferece_client import RESTfulClient when there
is no importing xinference.
    - **Dependencies:** xinferece_client
- **Why do so:** the total xinference(pip install xinference[all]) is
too heavy for installing, let alone it is useless for langchain user
except RESTfulClient. The modification has maintained consistency with
the xinference embeddings
[embeddings/xinference](../blob/master/libs/community/langchain_community/embeddings/xinference.py#L89).
2024-08-29 01:41:43 +00:00
ccurme
afe8ccaaa6
community[patch]: Add ID field back to Azure AI Search results (#25828)
Commandeering https://github.com/langchain-ai/langchain/pull/23243 as
maintainers don't have ability to modify that PR.

Fixes https://github.com/langchain-ai/langchain/issues/22827

---------

Co-authored-by: Ming Quah <fleetadmiralbutter@icloud.com>
2024-08-28 17:56:50 -04:00
Erick Friis
5db6c6d96d
community: release 0.2.14 (#25822) 2024-08-28 19:05:53 +00:00
Cillian Berragan
754f3c41f9
community: add score to PineconeHybridSearchRetriever (#25781)
**Description:**

Adds the 'score' returned by Pinecone to the
`PineconeHybridSearchRetriever` list of returned Documents.

There is currently no way to return the score when using Pinecone hybrid
search, so in this PR I include it by default.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-28 13:11:06 +00:00
ZhangShenao
3f1d652f15
Improvement[Community] Improve api doc for PineconeHybridSearchRetriever (#25803)
- Complete missing args in api doc
2024-08-28 08:38:56 -04:00
Moritz Schlager
555f97becb
community[patch]: fix model initialization bug for deepinfra (#25727)
### Description
adds an init method to ChatDeepInfra to set the model_name attribute
accordings to the argument
### Issue
currently, the model_name specified by the user during initialization of
the ChatDeepInfra class is never set. Therefore, it always chooses the
default model (meta-llama/Llama-2-70b-chat-hf, however probably since
this is deprecated it always uses meta-llama/Llama-3-70b-Instruct). We
stumbled across this issue and fixed it as proposed in this pull
request. Feel free to change the fix according to your coding guidelines
and style, this is just a proposal and we want to draw attention to this
problem.
### Dependencies
no additional dependencies required

Feel free to contact me or @timo282 and @finitearth if you have any
questions.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-08-28 02:02:35 -07:00
Bagatur
b0ac6fe8d3
community[patch]: Release 0.2.13 (#25806) 2024-08-28 08:57:49 +00:00
zysoong
25a6790e1a
community[patch]: Minor Improvement of extract hyperlinks tool output (#25728)
**Description:** Make the hyperlink only appear once in the
extract_hyperlinks tool output. (for some websites output contains
meaningless '#' hyperlinks multiple times which will extend the tokens
of context window without any advantage)
**Issue:** None
**Dependencies:** None
2024-08-28 08:02:40 +00:00
Isaac Francisco
d5ddaac1fc
docs minor fix (#25794) 2024-08-28 04:14:36 +00:00
Tomaz Bratanic
f359e6b0a5
Add mmr to neo4j vector (#25765) 2024-08-27 08:55:19 -04:00
Luis Valencia
99f9a664a5
community: Azure Search Vector Store is missing Access Token Authentication (#24330)
Added Azure Search Access Token Authentication instead of API KEY auth.
Fixes Issue: https://github.com/langchain-ai/langchain/issues/24263
Dependencies: None
Twitter: @levalencia

@baskaryan

Could you please review? First time creating a PR that fixes some code.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-26 15:41:50 -07:00
ZhangShenao
44e3e2391c
Improvement[Community] Improve methods in IMessageChatLoader (#25746)
- Add @staticmethod to static methods in `IMessageChatLoader`.
- Format args name.
2024-08-26 14:20:22 -04:00
maang-h
a566a15930
Fix MoonshotChat instantiate with alias (#25755)
- **Description:**
   -  Fix `MoonshotChat` instantiate with alias
   - Add `MoonshotChat` to `__init__.py`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-26 17:33:22 +00:00
Ashvin
af3b3a4474
Update endpoint for AzureMLEndpointApiType class. (#25725)
This addresses the issue mentioned in #25702

I have updated the endpoint used in validating the endpoint API type in
the AzureMLBaseEndpoint class from `/v1/completions` to `/completions`
and `/v1/chat/completions` to `/chat/completions`.

Co-authored-by: = <=>
2024-08-26 08:50:02 -04:00
Dristy Srivastava
7205057c3e
[Community][minor]: Added langchain_version while calling discover API (#24428)
- **Description:** Added langchain version while calling discover API
during both ingestion and retrieval
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA

---------

Co-authored-by: dristy.cd <dristy@clouddefense.io>
2024-08-26 08:47:48 -04:00
Dristy Srivastava
fbb4761199
[Community][minor]: Updating source path, and file path for SharePoint loader in PebbloSafeLoader (#25592)
- **Description:** Updating source path and file path in Pebblo safe
loader for SharePoint apps during loading
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA

---------

Co-authored-by: dristy.cd <dristy@clouddefense.io>
2024-08-26 08:38:40 -04:00
Rajendra Kadam
745d1c2b8d
community[minor]: [Pebblo] Fix URL construction in newer Python versions (#25747)
- **PR message**: **Fix URL construction in newer Python versions**
- **Description:** 
- Update the URL construction logic to use the .value attribute for
Routes enum members.
- This adjustment resolves an issue where the code worked correctly in
Python 3.9 but failed in Python 3.11.
  - Clean up unused routes.
- **Issue:** NA
- **Dependencies:** NA
2024-08-26 07:27:30 -04:00
Rajendra Kadam
58a98c7d8a
community: [PebbloRetrievalQA] Implemented Async support for prompt APIs (#25748)
- **Description:** PebbloRetrievalQA: Implemented Async support for
prompt APIs (classification and governance)
- **Issue:** NA
- **Dependencies:** NA
2024-08-26 07:27:05 -04:00
Christophe Bornet
038c287b3a
all: Improve make lint command (#25344)
* Removed `ruff check --select I` as `I` is already selected and checked
in the main `ruff check` command
* Added checks for non-empty `PYTHON_FILES`
* Run `ruff check` only on `PYTHON_FILES`

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-23 18:23:52 -07:00
Erick Friis
f6491ceb7d
community: remove integration test deps (#24460)
they arent used
2024-08-23 23:25:17 +00:00
Sharmistha S. Gupta
90439b12f6
Added support for Nebula Chat model (#21925)
Description: Added support for Nebula Chat model in addition to Nebula
Instruct
Dependencies: N/A
Twitter handle: @Symbldotai

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-23 22:34:32 +00:00
Ian
64ace25eb8
<Community>: tidb vector support vector index (#19984)
This PR introduces adjustments to ensure compatibility with the recently
released preview version of [TiDB Serverless Vector
Search](https://tidb.cloud/ai), aiming to prevent user confusion.

- TiDB Vector now supports vector indexing with cosine and l2 distance
strategies, although inner_product remains unsupported.
- Changing the distance strategy is currently not supported, so the test
cased should be adjusted.
2024-08-23 13:59:23 -04:00
Austin Burdette
f355a98bb6
community:yuan2[patch]: standardize init args (#21462)
updated stop and request_timeout so they aliased to stop_sequences, and
timeout respectively. Added test that both continue to set the same
underlying attributes.

Related to
[20085](https://github.com/langchain-ai/langchain/issues/20085)

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-23 17:56:19 +00:00
Erick Friis
b365ee996b
community: remove unused verify_ssl kwarg from aiohttp request (#25707)
it's not a valid kwarg in aiohttp request
2024-08-23 17:14:04 +00:00
Ashvin
2cd77a53a3
docs: Add docstrings for CassandraChatMessageHistory class and package namespace function. (#24222)
- Modified docstring for CassandraChatMessageHistory in
libs/community/langchain_community/chat_message_history/cassandra.py.

- Added docstring for _package_namespace function in
docs/api_reference/create_api_rst.py

---------

Co-authored-by: ashvin <ashvin.anilkumar@qburst.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-23 15:49:41 +00:00
Leonid Ganeline
8788a34bfa
community: NeptuneGraph fix (#23281)
Issue: the `service` optional parameter was mentioned but not used.
Fix: added this parameter.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-23 15:34:26 +00:00
Djordje
22f9ae489f
community: Opensearch - added score function for similarity_score_threshold (#23928)
This PR resolves the NotImplemented error for the
similarity_score_threshold search type for OpenSearch.
2024-08-23 11:30:04 -04:00
ZhangShenao
b38c83ff93
patch[Community] Optimize methods in several ChatLoaders (#24806)
There are some static methods in ChatLoaders, try to add @staticmethod
decorator for them.
2024-08-23 11:00:41 -04:00
James Espichan Vilca
644e0d3463
Use extend method for embeddings concatenation in mlflow_gateway (#14358)
## Description
There is a bug in the concatenation of embeddings obtained from MLflow
that does not conform to the type hint requested by the function.
``` python  
def _query(self, texts: List[str]) -> List[List[float]]:
```
It is logical to expect a **List[List[float]]** for a **List[str]**.
However, the append method encapsulates the response in a global List.
To avoid this, the extend method should be used, which will add the
embeddings of all strings at the same list level.

## Testing
I have tried using OpenAI-ADA to obtain the embeddings, and the result
of executing this snippet is as follows:

``` python  
embeds = await MlflowAIGatewayEmbeddings().aembed_documents(texts=["hi", "how are you?"])
print(embeds)
```  

``` python  
[[[-0.03512698, -0.020624293, -0.015343423, ...], [-0.021260535, -0.011461929, -0.00033121882, ...]]]
```
When in reality, the expected result should be:

``` python  
[[-0.03512698, -0.020624293, -0.015343423, ...], [-0.021260535, -0.011461929, -0.00033121882, ...]]
```
The above result complies with the expected type hint:
**List[List[float]]** . As I mentioned, we can achieve that by using the
extend method instead of the append method.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-23 14:43:43 +00:00
Christophe Bornet
7f1e444efa
partners: Use simsimd types (#25299)
The simsimd package [now has
types](https://github.com/ashvardanian/SimSIMD/releases/tag/v5.0.0)
2024-08-23 10:41:39 -04:00
clement.l
642f9530cd
community: add supported blockchains to Blockchain Document Loader (#25428)
- Remove deprecated chains.
- Add more supported chains.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-23 14:39:42 +00:00
conjuncts
818267bbc3
community: allow chroma DB delete() to use "where" argument (#19826)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"

Description: Simply pass kwargs to allow arguments like "where" to be
propagated
Issue: Previously, db.delete(where={}) wouldn't work for chroma
vectorstores
Dependencies: N/A
Twitter handle: N/A

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-08-23 10:10:57 -04:00
Kevin Engelke
3c7f12cbf5
community[minor]: Fix missing 'keep_newlines' parameter forward-pass to 'process_pages' function in confluence loader (#20086) (#20087)
- **Description:** Fixed missing `keep_newlines` parameter forward-pass
in confluence-loader
- **Issue:** #20086 
- **Dependencies:** None

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-23 12:59:38 +00:00
Erik Lindgren
583b0449eb
community[patch]: Fix Hybrid Search for non-Databricks managed embeddings (#25590)
Description: Send both the query and query_embedding to the Databricks
index for hybrid search.

Issue: When using hybrid search with non-Databricks managed embedding we
currently don't pass both the embedding and query_text to the index.
Hybrid search requires both of these. This change fixes this issue for
both `similarity_search` and `similarity_search_by_vector`.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-23 08:57:13 +00:00
Alejandro Companioni
bcd5842b5d
community[patch]: Updating default PPLX model to supported llama-3.1 model. (#25643)
# Issue

As of late July, Perplexity [no longer supports Llama 3
models](https://docs.perplexity.ai/changelog/introducing-new-and-improved-sonar-models).

# Description

This PR updates the default model and doc examples to reflect their
latest supported model. (Mostly updating the same places changed by
#23723.)

# Twitter handle

`@acompa_` on behalf of the team at Not Diamond. Check us out
[here](https://notdiamond.ai).

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-08-23 08:33:30 +00:00
Jakub W.
b865ee49a0
community[patch]: Dynamodb history messages key (#25658)
- **Description:** adding the history_messages_key so you don't have to
use "History" as a key in langchain
2024-08-23 08:05:28 +00:00
Manuel Jaiczay
1c31234eed
community: fix HuggingFacePipeline pipeline_kwargs (#19920)
Fix handling of pipeline_kwargs to prioritize class attribute defaults.

#19770

Co-authored-by: jaizo <manuel.jaiczay@polygons.at>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
2024-08-22 18:29:46 -04:00
Nobuhiko Otoba
4b63a217c2
"community: Fix GithubFileLoader source code", "docs: Fix GithubFileLoader code sample" (#19943)
This PR adds tiny improvements to the `GithubFileLoader` document loader
and its code sample, addressing the following issues:

1. Currently, the `file_extension` argument of `GithubFileLoader` does
not change its behavior at all.
1. The `GithubFileLoader` sample code in
`docs/docs/integrations/document_loaders/github.ipynb` does not work as
it stands.

The respective solutions I propose are the following:

1. Remove `file_extension` argument from `GithubFileLoader`.
1. Specify the branch as `master` (not the default `main`) and rename
`documents` as `document`.

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
2024-08-22 18:24:57 -04:00
Nada Amin
ac7b71e0d7
langchain_community.graphs: Neo4JGraph: prop min_size might be None (#23944)
When I used the Neo4JGraph enhanced_schema=True option, I ran into an
error because a prop min_size of None was compared numerically with an
int.

The fix I applied is similar to the pattern of skipping embeddings
elsewhere in the file.

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-22 20:29:52 +00:00
William FH
fad6fc866a
Rm DeepInfra Breakpoint Comment (#25206)
tbh should rm the print staement too
2024-08-22 14:43:44 -04:00
Eric Pinzur
01ded5e2f9
community: add metadata filter to CassandraGraphVectorStore (#25663)
- **Description:** 
- Added metadata filtering support to
`langchain_community.graph_vectorstores.cassandra.CassandraGraphVectorStore`
  - Also fixed type conversion issues highlighted by mypy.
- **Dependencies:** 
  - `ragstack-ai-knowledge-store 0.2.0` (released July 23, 2024)

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-22 14:27:16 -04:00
mschoenb97IL
e499caa9cd
community: Give more context on DeepInfra 500 errors (#25671)
Description: DeepInfra 500 errors have useful information in the text
field that isn't being exposed to the user. I updated the error message
to fix this.

As an example, this code

```
from langchain_community.chat_models import ChatDeepInfra
from langchain_core.messages import HumanMessage

model = "meta-llama/Meta-Llama-3-70B-Instruct"
deepinfra_api_token = "..."

model = ChatDeepInfra(model=model, deepinfra_api_token=deepinfra_api_token)

messages = [HumanMessage("All work and no play makes Jack a dull boy\n" * 9000)]
response = model.invoke(messages)
```

Currently gives this error:
```
langchain_community.chat_models.deepinfra.ChatDeepInfraException: DeepInfra Server: Error 500
```

This change would give the following error:
```
langchain_community.chat_models.deepinfra.ChatDeepInfraException: DeepInfra Server error status 500: {"error":{"message":"Requested input length 99009 exceeds maximum input length 8192"}}
```
2024-08-22 10:10:51 -07:00
Rajendra Kadam
4ff2f4499e
community: Refactor PebbloRetrievalQA (#25583)
**Refactor PebbloRetrievalQA**
  - Created `APIWrapper` and moved API logic into it.
  - Created smaller functions/methods for better readability.
  - Properly read environment variables.
  - Removed unused code.
  - Updated models

**Issue:** NA
**Dependencies:** NA
**tests**:  NA
2024-08-22 11:51:21 -04:00
Rajendra Kadam
1f1679e960
community: Refactor PebbloSafeLoader (#25582)
**Refactor PebbloSafeLoader**
  - Created `APIWrapper` and moved API logic into it.
  - Moved helper functions to the utility file.
  - Created smaller functions and methods for better readability.
  - Properly read environment variables.
  - Removed unused code.

**Issue:** NA
**Dependencies:** NA
**tests**:  Updated
2024-08-22 11:46:52 -04:00
maang-h
5e3a321f71
docs: Add ChatZhipuAI tool calling and structured output docstring (#25669)
- **Description:** Add `ChatZhipuAI` tool calling and structured output
docstring.
2024-08-22 10:34:41 -04:00
Noah Mayerhofer
0091947efd
community: add retry for session expired exception in neo4j (#25660)
Description: The neo4j driver can raise a SessionExpired error, which is
considered a retriable error. If a query fails with a SessionExpired
error, this change retries every query once. This change will make the
neo4j integration less flaky.
Twitter handle: noahmay_
2024-08-22 13:07:36 +00:00
Dristy Srivastava
b002702af6
[Community][minor]: Updating metadata with full_path in SharePoint loader (#25593)
- **Description:** Updating metadata for sharepoint loader with full
path i.e., webUrl
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA

Co-authored-by: dristy.cd <dristy@clouddefense.io>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-21 13:10:14 +00:00
Jabir
12e490ea56
Update azuresearch.py (#25577)
This will allow complextype metadata to be returned. the current
implementation throws error when dealing with nested metadata

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-20 12:53:30 +00:00
Erick Friis
e01c6789c4
core,community: add beta decorator to missed GraphVectorStore extensions (#25562) 2024-08-19 17:29:09 -07:00
maang-h
015ab91b83
community[patch]: Add ToolMessage for ChatZhipuAI (#25547)
- **Description:** Add ToolMessage for `ChatZhipuAI` to solve the issue
#25490
2024-08-19 11:26:38 -04:00
Mohammad Mohtashim
75c3c81b8c
[Community]: Fix - Open AI Whisper client.audio.transcriptions returning Text Object which raises error (#25271)
- **Description:** The following
[line](fd546196ef/libs/community/langchain_community/document_loaders/parsers/audio.py (L117))
in `OpenAIWhisperParser` returns a text object for some odd reason
despite the official documentation saying it should return `Transcript`
Instance which should have the text attribute. But for the example given
in the issue and even when I tried running on my own, I was directly
getting the text. The small PR accounts for that.
 - **Issue:** : #25218
 

I was able to replicate the error even without the GenericLoader as
shown below and the issue was with `OpenAIWhisperParser`

```python
parser = OpenAIWhisperParser(api_key="sk-fxxxxxxxxx",
                                            response_format="srt",
                                            temperature=0)

list(parser.lazy_parse(Blob.from_path('path_to_file.m4a')))
```
2024-08-19 09:36:42 -04:00
maang-h
32f5147523
docs: Fix QianfanLLMEndpoint and Tongyi input text (#25529)
- **Description:** Fix `QianfanLLMEndpoint` and `Tongyi` input text.
2024-08-19 09:23:09 -04:00
ZhangShenao
4255a30f20
Improvement[Community] Improve api doc for SingleFileFacebookMessengerChatLoader (#25536)
Delete redundant args in api doc
2024-08-19 09:00:21 -04:00
ccurme
b83f1eb0d5
core, partners: implement standard tracing params for LLMs (#25410) 2024-08-16 13:18:09 -04:00
Bagatur
253ceca76a
docs: fix mimetype parser docstring (#25463) 2024-08-15 16:16:52 -07:00
ccurme
8afbab4cf6
langchain[patch]: deprecate various chains (#25310)
- [x] NatbotChain: move to community, deprecate langchain version.
Update to use `prompt | llm | output_parser` instead of LLMChain.
- [x] LLMMathChain: deprecate + add langgraph replacement example to API
ref
- [x] HypotheticalDocumentEmbedder (retriever): update to use `prompt |
llm | output_parser` instead of LLMChain
- [x] FlareChain: update to use `prompt | llm | output_parser` instead
of LLMChain
- [x] ConstitutionalChain: deprecate + add langgraph replacement example
to API ref
- [x] LLMChainExtractor (document compressor): update to use `prompt |
llm | output_parser` instead of LLMChain
- [x] LLMChainFilter (document compressor): update to use `prompt | llm
| output_parser` instead of LLMChain
- [x] RePhraseQueryRetriever (retriever): update to use `prompt | llm |
output_parser` instead of LLMChain
2024-08-15 10:49:26 -04:00
ccurme
ba167dc158
community[patch]: update connection string in azure cosmos integration test (#25438) 2024-08-15 14:07:54 +00:00
Isaac Francisco
966b408634
[docs]: doc loader changes (#25417) 2024-08-14 19:46:33 -07:00
Werner van der Merwe
1d3f7231b8
fix: typo where github should be gitlab (#25397)
**PR title**: "GitLabToolkit: fix typo"
    - **Description:** fix typo where GitHub should have been GitLab
    - **Dependencies:** None
2024-08-14 18:36:25 +00:00
Bagatur
493e474063
docs: udpated api reference (#25172)
- Move the API reference into the vercel build
- Update api reference organization and styling
2024-08-14 07:00:17 -07:00
ccurme
27690506d0
multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
Harrison Chase
967b6f21f6
docs: improve document loaders index (#25365)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-14 01:48:48 +00:00
Isaac Francisco
f4ffd692a3
[docs]: standardize doc loader doc strings (#25325) 2024-08-13 23:18:56 +00:00
Isaac Francisco
e0bbb81d04
[docs]: standardize tool docstrings (#25351) 2024-08-13 16:10:00 -07:00
thedavgar
9d08369442
community: fix AzureSearch vectorstore asyncronous methods (#24921)
**Description**
Fix the asyncronous methods to retrieve documents from AzureSearch
VectorStore. The previous changes from [this
commit](ffe6ca986e)
create a similar code for the syncronous methods and the asyncronous
ones but the asyncronous client return an asyncronous iterator
"AsyncSearchItemPaged" as said in the issue #24740.
To solve this issue, the syncronous iterators in asyncronous methods
where changed to asyncronous iterators.

@chrislrobert said in [this
comment](https://github.com/langchain-ai/langchain/issues/24740#issuecomment-2254168302)
that there was a still a flaw due to `with` blocks that close the client
after each call. I removed this `with` blocks in the `async_client`
following the same pattern as the sync `client`.

In order to close up the connections, a __del__ method is included to
gently close up clients once the vectorstore object is destroyed.

**Issue:** #24740 and #24064
**Dependencies:** No new dependencies for this change

**Example notebook:** I created a notebook just to test the changes work
and gives the same results as the syncronous methods for vector and
hybrid search. With these changes, the asyncronous methods in the
retriever work as well.

![image](https://github.com/user-attachments/assets/697e431b-9d7f-4d0d-b205-59d051ac2b67)


**Lint and test**: Passes the tests and the linter
2024-08-13 14:20:51 -07:00
Fedor Nikolaev
2b15518c5f
community: add args_schema to SearxSearchResults tool (#25350)
This adds `args_schema` member to `SearxSearchResults` tool. This member
is already present in the `SearxSearchRun` tool in the same file.

I was having `TypeError: Type is not JSON serializable:
AsyncCallbackManagerForToolRun` being thrown in langserve playground
when I was using `SearxSearchResults` tool as a part of chain there.
This fixes the issue, so the error is not raised anymore.

This is a example langserve app that was giving me the error, but it
works properly after the proposed fix:
```python
#!/usr/bin/env python

from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_community.utilities import SearxSearchWrapper
from langchain_community.tools.searx_search.tool import SearxSearchResults
from langserve import add_routes

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

s = SearxSearchWrapper(searx_host="http://localhost:8080")

search = SearxSearchResults(wrapper=s)

search_chain = (
    {"context": search, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

app = FastAPI()

add_routes(
    app,
    search_chain,
    path="/chain",
)

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="localhost", port=8000)
```
2024-08-13 18:26:09 +00:00
maang-h
089f5e6cad
Standardize SparkLLM (#25239)
- **Description:** Standardize SparkLLM, include:
  - docs, the issue #24803 
  - to support stream
  - update api url
  - model init arg names, the issue #20085
2024-08-13 09:50:12 -04:00
Chen Xiabin
24155aa1ac
qianfan generate/agenerate with usage_metadata (#25332) 2024-08-13 09:24:41 -04:00
Erick Friis
2907ab2297
community: release 0.2.12 (#25324) 2024-08-12 23:30:27 +00:00
Ben Chambers
1adc161642
community: kwargs for CassandraGraphVectorStore (#25300)
- **Description:** pass kwargs from CassandraGraphVectorStore to
underlying store

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-12 18:01:29 +00:00
ccurme
e77eeee6ee
core[patch]: add standard tracing params for retrievers (#25240) 2024-08-12 14:51:59 +00:00
Mohammad Mohtashim
9927a4866d
[Community] - Added bind_tools and with_structured_output for ChatZhipuAI (#23887)
- **Description:** This PR implements the `bind_tool` functionality for
ChatZhipuAI as requested by the user. ChatZhipuAI models support tool
calling according to the `OpenAI` tool format, as outlined in their
official documentation [here](https://open.bigmodel.cn/dev/api#glm-4).
- **Issue:**  ##23868

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-12 14:11:43 +00:00
maang-h
bc60cddc1b
docs: Fix ChatBaichuan, QianfanChatEndpoint, ChatSparkLLM, ChatZhipuAI docs (#25265)
- **Description:** Fix some chat models docs, include:
  - ChatBaichuan
  - QianfanChatEndpoint
  - ChatSparkLLM
  - ChatZhipuAI
2024-08-11 16:23:55 -04:00
ZhangShenao
43deed2a95
Improvement[Embeddings] Add dimension support to ZhipuAIEmbeddings (#25274)
- In the in ` embedding-3 ` and later models of Zhipu AI, it is
supported to specify the dimensions parameter of Embedding. Ref:
https://bigmodel.cn/dev/api#text_embedding-3 .
- Add test case for `embedding-3` model by assigning dimensions.
2024-08-11 16:20:37 -04:00
Eugene Yurtsev
6dd9f053e3
core[patch]: Deprecating beta upsert APIs in vectorstore (#25069)
This PR deprecates the beta upsert APIs in vectorstore.

We'll introduce them in a V2 abstraction instead to keep the existing
vectorstore implementations lighter weight.

The main problem with the existing APIs is that it's a bit more
challenging to
implement the correct behavior w/ respect to IDs since ID can be present
in
both the function signature and as an optional attribute on the document
object.

But VectorStores that pass the standard tests should have implemented
the semantics properly!

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-09 17:17:36 -04:00
Eugene Yurtsev
b6f0174bb9
community[patch],core[patch]: Update EdenaiTool root_validator and add unit test in core (#25233)
This PR gets rid `root_validators(allow_reuse=True)` logic used in
EdenAI Tool in preparation for pydantic 2 upgrade.
- add another test to secret_from_env_factory
2024-08-09 15:59:27 +00:00
Eugene Yurtsev
bd6c31617e
community[patch]: Remove more @allow_reuse=True validators (#25236)
Remove some additional allow_reuse=True usage in @root_validators.
2024-08-09 11:10:27 -04:00
Eugene Yurtsev
6e57aa7c36
community[patch]: Remove usage of @root_validator(allow_reuse=True) (#25235)
Remove usage of @root_validator(allow_reuse=True)
2024-08-09 10:57:42 -04:00
thiswillbeyourgithub
a2b4c33bd6
community[patch]: FAISS: ValueError mentions normalize_score_fn isntead of relevance_score_fn (#25225)
Thank you for contributing to LangChain!

- [X] **PR title**: "community: fix valueerror mentions wrong argument
missing"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [X] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** when faiss.py has a None relevance_score_fn it raises
a ValueError that says a normalize_fn_score argument is needed.

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-09 14:40:29 +00:00
Shivendra Soni
66b7206ab6
community: Add llm-extraction option to FireCrawl Document Loader (#25231)
**Description:** This minor PR aims to add `llm_extraction` to Firecrawl
loader. This feature is supported on API and PythonSDK, but the
langchain loader omits adding this to the response.
**Twitter handle:** [scalable_pizza](https://x.com/scalablepizza)

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-09 13:59:10 +00:00
ccurme
3b7437d184
docs: update integration api refs (#25195)
- [x] toolkits
- [x] retrievers (in this repo)
2024-08-09 12:27:32 +00:00
Eugene Yurtsev
98779797fe
community[patch]: Use get_fields adapter for pydantic (#25191)
Change all usages of __fields__ with get_fields adapter merged into
langchain_core.

Code mod generated using the following grit pattern:

```
engine marzano(0.1)
language python


`$X.__fields__` => `get_fields($X)` where {
    add_import(source="langchain_core.utils.pydantic", name="get_fields")
}
```
2024-08-08 14:43:09 -04:00
Rajendra Kadam
663638d6a8
community[minor]: [SharePointLoader] Load extended metadata for the root folder (#24872)
- **Title:** [SharePointLoader] Load extended metadata for the root
folder
- **Description:** 
    - Ensure extended metadata loads correctly for the root folder.
- Cleanup: Refactor SharePointLoader to remove unused fields(`file_id` &
`site_id`).
- **Dependencies:** NA
- **Add tests and docs:** NA
2024-08-08 14:39:16 -04:00
Eugene Yurtsev
bf5193bb99
community[patch]: Upgrade pydantic extra (#25185)
Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.

This works correctly also in pydantic v1.

```python
from pydantic.v1 import BaseModel

class Foo(BaseModel, extra="forbid"):
    x: int

Foo(x=5, y=1)
```

And 


```python
from pydantic.v1 import BaseModel

class Foo(BaseModel):
    x: int

    class Config:
      extra = "forbid"

Foo(x=5, y=1)
```


## Enum -> literal using grit pattern:

```
engine marzano(0.1)
language python
or {
    `extra=Extra.allow` => `extra="allow"`,
    `extra=Extra.forbid` => `extra="forbid"`,
    `extra=Extra.ignore` => `extra="ignore"`
}
```

Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)


## Sort attributes in Config:

```
engine marzano(0.1)
language python


function sort($values) js {
    return $values.text.split(',').sort().join("\n");
}


class_definition($name, $body) as $C where {
    $name <: `Config`,
    $body <: block($statements),
    $values = [],
    $statements <: some bubble($values) assignment() as $A where {
        $values += $A
    },
    $body => sort($values),
}

```
2024-08-08 17:20:39 +00:00
ololand
249945a572
Update polygon.py for business subscription (#25085)
For business subscription the status is STOCKSBUSINESS not OK

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-08-08 15:28:41 +00:00
ogawa
d895db11d6
community[patch]: gpt-4o-2024-08-06 costs (#25164)
- **Description:** updated OpenAI cost definitions according to the
following:
  - https://openai.com/api/pricing/
- **Twitter handle:** `@ogawa65a`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-08 13:22:11 +00:00
maang-h
0ba125c3cd
docs: Standardize QianfanLLMEndpoint LLM (#25139)
- **Description:** Standardize QianfanLLMEndpoint LLM,include:
  - docs, the issue #24803 
  - model init arg names, the issue #20085
2024-08-07 10:57:27 -04:00
Pat Patterson
7e7fcf5b1f
community: Fix ValidationError on creating GPT4AllEmbeddings with no gpt4all_kwargs (#25124)
- **Description:** Instantiating `GPT4AllEmbeddings` with no
`gpt4all_kwargs` argument raised a `ValidationError`. Root cause: #21238
added the capability to pass `gpt4all_kwargs` through to the `GPT4All`
instance via `Embed4All`, but broke code that did not specify a
`gpt4all_kwargs` argument.
- **Issue:** #25119 
- **Dependencies:** None
- **Twitter handle:** [`@metadaddy`](https://twitter.com/metadaddy)
2024-08-07 13:34:01 +00:00
Virat Singh
264ab96980
community: Add stock market tools from financialdatasets.ai (#25025)
**Description:**
In this PR, I am adding three stock market tools from
financialdatasets.ai (my API!):
- get balance sheets
- get cash flow statements
- get income statements

Twitter handle: [@virattt](https://twitter.com/virattt)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-06 18:28:12 +00:00
Naval Chand
71c0698ee4
Added bedrock 3-5 sonnet cost detials for BedrockAnthropicTokenUsageCallbackHandler (#25104)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Example: "community: Added bedrock 3-5 sonnet cost detials for
BedrockAnthropicTokenUsageCallbackHandler"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Naval Chand <navalchand@192.168.1.36>
2024-08-06 17:28:47 +00:00
Isaac Francisco
a72fddbf8d
[docs]: vector store integration pages (#24858)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-06 17:20:27 +00:00
maang-h
1028af17e7
docs: Standardize Tongyi (#25103)
- **Description:** Standardize Tongyi LLM,include:
  - docs, the issue #24803
  - model init arg names, the issue #20085
2024-08-06 11:44:12 -04:00
Dobiichi-Origami
061ed250f6
delete the default model value from langchain and discard the need fo… (#24915)
- description: I remove the limitation of mandatory existence of
`QIANFAN_AK` and default model name which langchain uses cause there is
already a default model nama underlying `qianfan` SDK powering langchain
component.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-06 14:11:05 +00:00
Dominik Fladung
ffa0c838d8
Allow ConfluenceLoader authorization via Personal Access Tokens (#25096)
- community: Allow authorization to Confluence with bearer token

- **Description:** Allow authorization to Confluence with [Personal
Access
Token](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html)
by checking for the keys `['client_id', token: ['access_token',
'token_type']]`

- **Issue:** 

Currently the following error occurs when using an personal access token
for authorization.

```python
loader = ConfluenceLoader(
    url=os.getenv('CONFLUENCE_URL'),
    oauth2={
        'token': {"access_token": os.getenv("CONFLUENCE_ACCESS_TOKEN"), "token_type": "bearer"},
        'client_id': 'client_id',
    },
    page_ids=['12345678'], 
)
```

```
ValueError: Error(s) while validating input: ["You have either omitted require keys or added extra keys to the oauth2 dictionary. key values should be `['access_token', 'access_token_secret', 'consumer_key', 'key_cert']`"]
```

With this PR the loader runs as expected.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-06 13:42:47 +00:00
jigsawlabs-student
427a04151c
community: fix neo4j from_existing_graph (#24912)
Fixes Neo4JVector.from_existing_graph integration with huggingface

Previously threw an error with existing databases, because
from_existing_graph query returns empty list of new nodes, which are
then passed to embedding function, and huggingface errors with empty
list.

Fixes [24401](https://github.com/langchain-ai/langchain/issues/24401)

---------

Co-authored-by: Jeff Katzy <jeffreyerickatz@gmail.com>
2024-08-05 21:01:46 +00:00
Jim Baldwin
6890daa90c
community: make AthenaLoader profile_name optional and fix type hint (#24958)
- **Description:** This PR makes the AthenaLoader profile_name optional
and fixes the type hint which says the type is `str` but it should be
`str` or `None` as None is handled in the loader init. This is a minor
problem but it just confused me when I was using the Athena Loader to
why we had to use a Profile, as I want that for local but not
production.
- **Issue:** #24957 
- **Dependencies:** None.
2024-08-05 14:28:58 +00:00
Dobiichi-Origami
c5cb52a3c6
community: fix issue of the existence of numeric object in additional_kwargs a… (#24863)
- **Description:** A previous PR breaks the code from
`baidu_qianfan_endpoint.py` which causes the malfunction of streaming
2024-08-05 10:15:55 -04:00
ZhangShenao
cda79dbb6c
community[patch]: Optimize test case for MoonshotChat (#25050)
Optimize test case for `MoonshotChat`. Use standard
ChatModelIntegrationTests.
2024-08-05 10:11:25 -04:00
Alex Sherstinsky
208042e0f2
community: Fix Predibase Integration for HuggingFace-hosted fine-tuned adapters (#25015)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-08-03 14:05:43 -07:00
maang-h
f5da0d6d87
docs: Standardize MiniMaxEmbeddings (#24983)
- **Description:** Standardize MiniMaxEmbeddings
  - docs, the issue #24856 
  - model init arg names, the issue #20085
2024-08-03 14:01:23 -04:00
maang-h
7de62abc91
docs: Standardize SparkLLMTextEmbeddings docstrings (#25021)
- **Description:** Standardize SparkLLMTextEmbeddings docstrings
- **Issue:** the issue #24856
2024-08-03 13:44:09 -04:00
Bagatur
e81ddb32a6
docs: fix kwargs docstring (#25010)
Fix:
![Screenshot 2024-08-02 at 5 33 37
PM](https://github.com/user-attachments/assets/7c56cdeb-ee81-454c-b3eb-86aa8a9bdc8d)
2024-08-02 19:54:54 -07:00
Isaac Francisco
73570873ab
docs: standardizing tavily tool docs (#24736)
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-08-02 22:25:27 +00:00
Bagatur
8e2316b8c2
community[patch]: Release 0.2.11 (#24989) 2024-08-02 20:08:44 +00:00
ccurme
22c1a4041b
community[patch]: support named arguments in github toolkit (#24986)
Parameters may be passed in by name if generated from tool calls.
2024-08-02 18:27:32 +00:00
ZhangShenao
71c0564c9f
community[patch]: Add test case for MoonshotChat (#24960)
Add test case for `MoonshotChat`.
2024-08-02 09:37:31 -04:00
Isaac Francisco
d7688a4328
community[patch]: adding artifact to Tavily search (#24376)
This allows you to get raw content as well as the answer, instead of
just getting the results.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-08-01 21:12:11 -07:00
maang-h
ea505985c4
docs: Standardize ZhipuAIEmbeddings docstrings (#24933)
- **Description:** Standardize ZhipuAIEmbeddings rich docstrings.
- **Issue:** the issue #24856
2024-08-01 14:06:53 -04:00
Anneli Samuel
2204d8cb7d
community[patch]: Invoke on_llm_new_token callback before yielding chunk (#24938)
**Description**: Invoke on_llm_new_token callback before yielding chunk
in streaming mode
**Issue**:
[#16913](https://github.com/langchain-ai/langchain/issues/16913)
2024-08-01 16:39:04 +00:00
Serena Ruan
1827bb4042
community[patch]: support bind_tools for ChatMlflow (#24547)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- **Description:** 
Support ChatMlflow.bind_tools method
Tested in Databricks:
<img width="836" alt="image"
src="https://github.com/user-attachments/assets/fa28ef50-0110-4698-8eda-4faf6f0b9ef8">



- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
2024-08-01 08:43:07 -07:00
BottlePumpkin
bfc59c1d26
community: Fix KeyError in NotionDB loader when 'name' is missing (#24224)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.



**Description:** This PR fixes a KeyError in NotionDBLoader when the
"name" key is missing in the "people" property.

**Issue:** Fixes #24223 

**Dependencies:** None

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-01 13:55:40 +00:00
alexqiao
8eb0bdead3
community[patch]: Invoke callback prior to yielding token (#24917)
**Description: Invoke callback prior to yielding token in stream method
for chat_models .**
**Issue**: https://github.com/langchain-ai/langchain/issues/16913
#16913
2024-08-01 13:19:55 +00:00
Nikita Pakunov
c776471ac6
community: fix AttributeError: 'YandexGPT' object has no attribute '_grpc_metadata' (#24432)
Fixes #24049

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-31 21:18:33 +00:00
Eugene Yurtsev
add16111b9
community[patch]: Make the pydantic linter stricter (#24897)
Stricter linting of deprecated pydantic features.
2024-07-31 18:57:37 +00:00
Eugene Yurtsev
a4a444f73d
community[patch]: Fix arcee llm usage of root_validator(pre=False) (#24896)
Should be pre=True
2024-07-31 18:49:20 +00:00
Eugene Yurtsev
d24b82357f
community[patch]: Add missing annotations (#24890)
This PR adds annotations in comunity package.

Annotations are only strictly needed in subclasses of BaseModel for
pydantic 2 compatibility.

This PR adds some unnecessary annotations, but they're not bad to have
regardless for documentation pages.
2024-07-31 18:13:44 +00:00
ccurme
30f18c7b02
docs: add retriever integrations template (#24836) 2024-07-31 13:50:44 -04:00
Anirudh31415926535
4da3d4b18e
docs: Minor corrections and updates to Cohere docs (#22726)
- **Description:** Update the Cohere's provider and RagRetriever
documentations with latest updates.
    - **Twitter handle:** Anirudh1810
2024-07-31 10:16:26 -07:00
Nishan Jain
b00c0fc558
[Community][minor]: Added prompt governance in pebblo_retrieval (#24874)
Title: [pebblo_retrieval] Identifying entities in prompts given in
PebbloRetrievalQA leading to prompt governance
Description: Implemented identification of entities in the prompt using
Pebblo prompt governance API.
Issue: NA
Dependencies: NA
Add tests and docs: NA
2024-07-31 13:14:51 +00:00
Rajendra Kadam
a6add89bd4
community[minor]: [PebbloSafeLoader] Implement content-size-based batching (#24871)
- **Title:** [PebbloSafeLoader] Implement content-size-based batching in
the classification flow(loader/doc API)
- **Description:** 
- Implemented content-size-based batching in the loader/doc API, set to
100KB with no external configuration option, intentionally hard-coded to
prevent timeouts.
    - Remove unused field(pb_id) from doc_metadata
- **Issue:** NA
- **Dependencies:** NA
- **Add tests and docs:** Updated
2024-07-31 09:10:28 -04:00
TrumanYan
096b66db4a
community: replace it with Tencent Cloud SDK (#24172)
Description: The old method will be discontinued; use the official SDK
for more model options.
Issue: None
Dependencies: None
Twitter handle: None

Co-authored-by: trumanyan <trumanyan@tencent.com>
2024-07-31 09:05:38 -04:00
Erick Friis
1f5444817a
community: deprecate BedrockEmbeddings in favor of langchain-aws (#24846) 2024-07-30 23:13:17 +00:00
Shailendra Mishra
f2d810b3c0
clob_bugfix... (#24813)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-07-30 12:44:04 -04:00
Anush
51b15448cc
community: Fix FastEmbedEmbeddings (#24462)
## Description

This PR:
- Fixes the validation error in `FastEmbedEmbeddings`.
- Adds support for `batch_size`, `parallel` params.
- Removes support for very old FastEmbed versions.
- Updates the FastEmbed doc with the new params.

Associated Issues:
- Resolves #24039
- Resolves #https://github.com/qdrant/fastembed/issues/296
2024-07-30 12:42:46 -04:00
ccurme
73ec24fc56
docs[patch]: add toolkit template (#24791) 2024-07-30 12:36:09 -04:00
Igor Drozdov
c2706cfb9e
feat(community): add tools support for litellm (#23906)
I used the following example to validate the behavior

```python
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import ConfigurableField
from langchain_anthropic import ChatAnthropic
from langchain_community.chat_models import ChatLiteLLM
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor

@tool
def multiply(x: float, y: float) -> float:
    """Multiply 'x' times 'y'."""
    return x * y

@tool
def exponentiate(x: float, y: float) -> float:
    """Raise 'x' to the 'y'."""
    return x**y

@tool
def add(x: float, y: float) -> float:
    """Add 'x' and 'y'."""
    return x + y

prompt = ChatPromptTemplate.from_messages([
    ("system", "you're a helpful assistant"),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

tools = [multiply, exponentiate, add]

llm = ChatAnthropic(model="claude-3-sonnet-20240229", temperature=0)
# llm = ChatLiteLLM(model="claude-3-sonnet-20240229", temperature=0)

agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

agent_executor.invoke({"input": "what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241", })
```

`ChatAnthropic` version works:

```
> Entering new AgentExecutor chain...

Invoking: `exponentiate` with `{'x': 5, 'y': 2.743}`
responded: [{'text': 'To calculate 3 + 5^2.743, we can use the "exponentiate" and "add" tools:', 'type': 'text', 'index': 0}, {'id': 'toolu_01Gf54DFTkfLMJQX3TXffmxe', 'input': {}, 'name': 'exponentiate', 'type': 'tool_use', 'index': 1, 'partial_json': '{"x": 5, "y": 2.743}'}]

82.65606421491815
Invoking: `add` with `{'x': 3, 'y': 82.65606421491815}`
responded: [{'id': 'toolu_01XUq9S56GT3Yv2N1KmNmmWp', 'input': {}, 'name': 'add', 'type': 'tool_use', 'index': 0, 'partial_json': '{"x": 3, "y": 82.65606421491815}'}]

85.65606421491815
Invoking: `add` with `{'x': 17.24, 'y': -918.1241}`
responded: [{'text': '\n\nSo 3 + 5^2.743 = 85.66\n\nTo calculate 17.24 - 918.1241, we can use:', 'type': 'text', 'index': 0}, {'id': 'toolu_01BkXTwP7ec9JKYtZPy5JKjm', 'input': {}, 'name': 'add', 'type': 'tool_use', 'index': 1, 'partial_json': '{"x": 17.24, "y": -918.1241}'}]

-900.8841[{'text': '\n\nTherefore, 17.24 - 918.1241 = -900.88', 'type': 'text', 'index': 0}]

> Finished chain.
```

While `ChatLiteLLM` version doesn't.

But with the changes in this PR, along with:

- https://github.com/langchain-ai/langchain/pull/23823
- https://github.com/BerriAI/litellm/pull/4554

The result is _almost_ the same:

```
> Entering new AgentExecutor chain...

Invoking: `exponentiate` with `{'x': 5, 'y': 2.743}`
responded: To calculate 3 + 5^2.743, we can use the "exponentiate" and "add" tools:

82.65606421491815
Invoking: `add` with `{'x': 3, 'y': 82.65606421491815}`


85.65606421491815
Invoking: `add` with `{'x': 17.24, 'y': -918.1241}`
responded:

So 3 + 5^2.743 = 85.66

To calculate 17.24 - 918.1241, we can use:

-900.8841

Therefore, 17.24 - 918.1241 = -900.88

> Finished chain.
```

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-30 15:39:34 +00:00
David Robertson
bfb7f8d40a
Brave Search: Enhance search result details with extra snippets (#19209)
**Description:** 

This update significantly improves the Brave Search Tool's utility
within the LangChain library by enriching the search results it returns.
The tool previously returned title, link, and snippet, with the snippet
being a truncated 140-character description from the search engine. To
make the search results more informative, this update enables
extra_snippets by default and introduces additional result fields:
title, link, description (enhancing and renaming the former snippet
field), age, and snippets. The snippets field provides a list of strings
summarizing the webpage, utilizing Brave's capability for more detailed
search insights. This enhancement aims to make the search tool far more
informative and beneficial for users.

**Issue:** N/A

**Dependencies:** No additional dependencies introduced.

**Twitter handle:** @davidalexr987

**Code Changes Summary:**

- Changed the default setting to include extra_snippets in search
results.
- Renamed the snippet field to description to accurately reflect its
content and included an age field for search results.
- Introduced a snippets field that lists webpage summaries, providing
users with comprehensive search result insights.

**Backward Compatibility Note:**

The renaming of snippet to description improves the accuracy of the
returned data field but may impact existing users who have developed
integration's or analyses based on the snippet field. I believe this
change is essential for clarity and utility, and it aligns better with
the data provided by Brave Search.

**Additional Notes:**

This proposal focuses exclusively on the Brave Search package, without
affecting other LangChain packages or introducing new dependencies.
2024-07-30 15:29:38 +00:00
Ben Chambers
435771fe74
[community]: Fix package name mismatch (#24824)
- **Description:** fix a mismatch in pypi package names
2024-07-30 11:21:39 -04:00
maang-h
4bb1a11e02
community: Add MiniMaxChat bind_tools and structured output (#24310)
- **Description:** 
  - Add `bind_tools` method to support tool calling 
  - Add `with_structured_output` method to support structured output
2024-07-29 15:51:52 -04:00
maang-h
bf685c242f
docs: Standardize QianfanEmbeddingsEndpoint (#24786)
- **Description:** Standardize QianfanEmbeddingsEndpoint, include:
  - docstrings, the issue #21983 
  - model init arg names, the issue #20085
2024-07-29 13:19:24 -04:00
M. Ali
c086410677
fix docs typos (#23668)
Thank you for contributing to LangChain!

- [x] **PR title**: "docs: fix multiple typos"

Co-authored-by: mohblnk <mohamed.ali@blnk.ai>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-29 16:10:55 +00:00
Pere Pasamonte
98175860ad
community: Fix AWS DocumentDB similarity_search when filter is None (#24777)
**Description**

Fixes DocumentDBVectorSearch similarity_search when no filter is used;
it defaults to None but $match does not accept None, so changed default
to empty {} before pipeline is created.

**Issue**

AWS DocumentDB similarity search does not work when no filter is used.
Error msg: "the match filter must be an expression in an object" #24775

**Dependencies**

No dependencies

**Twitter handle**

https://x.com/perepasamonte

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-29 15:32:05 +00:00
AmosDinh
c113682328
community:Add support for specifying document_loaders.firecrawl api url. (#24747)
community:Add support for specifying document_loaders.firecrawl api url.


Add support for specifying document_loaders.firecrawl api url. 
This is mainly to support the
[self-hosting](https://github.com/mendableai/firecrawl/blob/main/SELF_HOST.md)
option firecrawl provides. Eg. now I can specify localhost:....

The corresponding firecrawl class already provides functionality to pass
the argument. See here:
4c9d62f6d3/apps/python-sdk/firecrawl/firecrawl.py (L29)

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-28 14:30:36 -04:00
Haijian Wang
cda3025ee1
Integrating the Yi family of models. (#24491)
Thank you for contributing to LangChain!

- [x] **PR title**: "community:add Yi LLM", "docs:add Yi Documentation"
                          
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** This PR adds support for the Yi model to LangChain.
- **Dependencies:**
[langchain_core,requests,contextlib,typing,logging,json,langchain_community]
    - **Twitter handle:** 01.AI


- [x] **Add tests and docs**: I've added the corresponding documentation
to the relevant paths

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-07-26 10:57:33 -07:00
Marc Gibbons
cc451effd1
community[patch]: langchain_community.vectorstores.azuresearch Raise LangChainException instead of bare Exception (#23935)
Raise `LangChainException` instead of `Exception`. This alleviates the
need for library users to use bare try/except to handle exceptions
raised by `AzureSearch`.

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-07-26 15:59:06 +00:00
Diverrez morgan
c4d2a53f18
community: creation score_threshold in flashrank_rerank.py (#24016)
Description: 
add a optional score relevance threshold for select only coherent
document, it's in complement of top_n

Discussion:
add relevance score threshold in flashrank_rerank document compressors
#24013

Dependencies:
 no dependencies

---------

Co-authored-by: Benjamin BERNARD <benjamin.bernard@openpathview.fr>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-26 13:34:39 +00:00
Cong Peng
190988d93e
community: Add parameter allow_dangerous_requests to WebResearchRetriever.from_llm construct (#24712)
**Description:** To avoid ValueError when construct the retriever from
method `from_llm()`.
2024-07-26 06:24:58 -07:00
monysun
5f593c172a
community: fix dashcope embeddings embed_query func post too much req to api (#24707)
the fuc of embed_query of dashcope embeddings send a str param, and in
the embed_with_retry func will send error content to api
2024-07-26 12:44:07 +00:00
yonarw
b65ac8d39c
community[minor]: Self query retriever for HANA Cloud Vector Engine (#24494)
Description:

- This PR adds a self query retriever implementation for SAP HANA Cloud
Vector Engine. The retriever supports all operators except for contains.
- Issue: N/A
- Dependencies: no new dependencies added

**Add tests and docs:**
Added integration tests to:
libs/community/tests/unit_tests/query_constructors/test_hanavector.py

**Documentation for self query retriever:**
/docs/integrations/retrievers/self_query/hanavector_self_query.ipynb

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-07-26 06:56:51 +00:00
nobbbbby
4f3b4fc7fe
community[patch]: Extend Baichuan model with tool support (#24529)
**Description:** Expanded the chat model functionality to support tools
in the 'baichuan.py' file. Updated module imports and added tool object
handling in message conversions. Additional changes include the
implementation of tool binding and related unit tests. The alterations
offer enhanced model capabilities by enabling interaction with tool-like
objects.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-25 23:20:44 -07:00
Rave Harpaz
ee399e3ec5
community[patch]: Add OCI Generative AI tool and structured output support (#24693)
- [x] **PR title**: 
  community: Add OCI Generative AI tool and structured output support


- [x] **PR message**: 
- **Description:** adding tool calling and structured output support for
chat models offered by OCI Generative AI services. This is an update to
our last PR 22880 with changes in
/langchain_community/chat_models/oci_generative_ai.py
    - **Issue:** NA
    - **Dependencies:** NA
    - **Twitter handle:** NA


- [x] **Add tests and docs**: 
  1. we have updated our unit tests
2. we have updated our documentation under
/docs/docs/integrations/chat/oci_generative_ai.ipynb


- [x] **Lint and test**: `make format`, `make lint` and `make test` we
run successfully

---------

Co-authored-by: RHARPAZ <RHARPAZ@RHARPAZ-5750.us.oracle.com>
Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
2024-07-25 23:19:00 -07:00
Yuki Watanabe
2b6a262f84
community[patch]: Replace filters argument to filter in DatabricksVectorSearch (#24530)
The
[DatabricksVectorSearch](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/databricks_vector_search.py#L21)
class exposes similarity search APIs with argument `filters`, which is
inconsistent with other VS classes who uses `filter` (singular). This PR
updates the argument and add alias for backward compatibility.

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
2024-07-25 21:20:18 -07:00
Sunish Sheth
59880a9147
community[patch]: mlflow handle empty chunk(#24689) 2024-07-25 20:36:29 -07:00
Chaunte W. Lacewell
69eacaa887
Community[minor]: Update VDMS vectorstore (#23729)
**Description:** 
- This PR exposes some functions in VDMS vectorstore, updates VDMS
related notebooks, updates tests, and upgrade version of VDMS (>=0.0.20)

**Issue:** N/A

**Dependencies:** 
- Update vdms>=0.0.20
2024-07-25 22:13:04 -04:00
KyrianC
0fdbaf4a8d
community: fix ChatEdenAI + EdenAI Tools (#23715)
Fixes for Eden AI Custom tools and ChatEdenAI:
- add missing import in __init__ of chat_models
- add `args_schema` to custom tools. otherwise '__arg1' would sometimes
be passed to the `run` method
- fix IndexError when no human msg is added in ChatEdenAI
2024-07-25 15:19:14 -04:00
maang-h
38d30e285a
docs: Standardize BaichuanTextEmbeddings docstrings (#24674)
- **Description:** Standardize BaichuanTextEmbeddings docstrings.
- **Issue:** the issue #21983
2024-07-25 12:12:00 -04:00
rick-SOPTIM
cd563fb628
community[minor]: passthrough auth parameter on requests to Ollama-LLMs (#24068)
Thank you for contributing to LangChain!

**Description:**
This PR allows users of `langchain_community.llms.ollama.Ollama` to
specify the `auth` parameter, which is then forwarded to all internal
calls of `requests.request`. This works in the same way as the existing
`headers` parameters. The auth parameter enables the usage of the given
class with Ollama instances, which are secured by more complex
authentication mechanisms, that do not only rely on static headers. An
example are AWS API Gateways secured by the IAM authorizer, which
expects signatures dynamically calculated on the specific HTTP request.

**Issue:**

Integrating a remote LLM running through Ollama using
`langchain_community.llms.ollama.Ollama` only allows setting static HTTP
headers with the parameter `headers`. This does not work, if the given
instance of Ollama is secured with an authentication mechanism that
makes use of dynamically created HTTP headers which for example may
depend on the content of a given request.

**Dependencies:**

None

**Twitter handle:**

None

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-07-25 15:48:35 +00:00
Luca Dorigo
5fdbdd6bec
community[patch]: Fix invalid iohttp verify parameter (#24655)
Should fix https://github.com/langchain-ai/langchain/issues/24654
2024-07-25 11:09:21 -04:00
Oleg Kulyk
4b1b7959a2
community[minor]: Add ScrapingAnt Loader Community Integration (#24514)
Added [ScrapingAnt](https://scrapingant.com/) Web Loader integration.
ScrapingAnt is a web scraping API that allows extracting web page data
into accessible and well-formatted markdown.

Description: Added ScrapingAnt web loader for retrieving web page data
as markdown
Dependencies: scrapingant-client
Twitter: @WeRunTheWorld3

---------

Co-authored-by: Oleg Kulyk <oleg@scrapingant.com>
2024-07-24 21:11:43 -04:00
John
d59c656ea5
unstructured, community, initialize langchain-unstructured package (#22779)
#### Update (2): 
A single `UnstructuredLoader` is added to handle both local and api
partitioning. This loader also handles single or multiple documents.

#### Changes in `community`:
Changes here do not affect users. In the initial process of using the
SDK for the API Loaders, the Loaders in community were refactored.
Other changes include:
The `UnstructuredBaseLoader` has a new check to see if both
`mode="paged"` and `chunking_strategy="by_page"`. It also now has
`Element.element_id` added to the `Document.metadata`.
`UnstructuredAPIFileLoader` and `UnstructuredAPIFileIOLoader`. As such,
now both directly inherit from `UnstructuredBaseLoader` and initialize
their `file_path`/`file` attributes respectively and implement their own
`_post_process_elements` methods.

--------
#### Update:
New SDK Loaders in a [partner
package](https://python.langchain.com/v0.1/docs/contributing/integrations/#partner-package-in-langchain-repo)
are introduced to prevent breaking changes for users (see discussion
below).

##### TODO:
- [x] Test docstring examples
--------
- **Description:** UnstructuredAPIFileIOLoader and
UnstructuredAPIFileLoader calls to the unstructured api are now made
using the unstructured-client sdk.
- **New Dependencies:** unstructured-client

- [x] **Add tests and docs**: If you're adding a new integration, please
include
- [x] a test for the integration, preferably unit tests that do not rely
on network access,
- [x] update the description in
`docs/docs/integrations/providers/unstructured.mdx`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

TODO:
- [x] Update
https://python.langchain.com/v0.1/docs/integrations/document_loaders/unstructured_file/#unstructured-api
-
`langchain/docs/docs/integrations/document_loaders/unstructured_file.ipynb`
- The description here needs to indicate that users should install
`unstructured-client` instead of `unstructured`. Read over closely to
look for any other changes that need to be made.
- [x] Update the `lazy_load` method in `UnstructuredBaseLoader` to
handle json responses from the API instead of just lists of elements.
- This method may need to be overwritten by the API loaders instead of
changing it in the `UnstructuredBaseLoader`.
- [x] Update the documentation links in the class docstrings (the
Unstructured documents have moved)
- [x] Update Document.metadata to include `element_id` (see thread
[here](https://unstructuredw-kbe4326.slack.com/archives/C044N0YV08G/p1718187499818419))

---------

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
2024-07-24 23:21:20 +00:00
Eugene Yurtsev
b55f6105c6
community[patch]: Add linter to prevent further usage of root_validator and validator (#24613)
This linter is meant to move development to use __init__ instead of
root_validator and validator.

We need to investigate whether we need to lint some of the functionality
of Field (e.g., `lt` and `gt`, `alias`)

`alias` is the one that's most popular:

(community) ➜ community git:(eugene/add_linter_to_community) ✗ git grep
" Field(" | grep "alias=" | wc -l
144

(community) ➜ community git:(eugene/add_linter_to_community) ✗ git grep
" Field(" | grep "ge=" | wc -l
10

(community) ➜ community git:(eugene/add_linter_to_community) ✗ git grep
" Field(" | grep "gt=" | wc -l
4
2024-07-24 12:35:21 -04:00
Anindyadeep
12c3454fd9
[Community] PremAI Tool Calling Functionality (#23931)
This PR is under WIP and adds the following functionalities:

- [X] Supports tool calling across the langchain ecosystem. (However
streaming is not supported)
- [X] Update documentation
2024-07-24 09:53:58 -04:00
Vishnu Nandakumar
e271965d1e
community: retrievers: added capability for using Product Quantization as one of the retriever. (#22424)
- [ ] **Community**: "Retrievers: Product Quantization"
- [X] This PR adds Product Quantization feature to the retrievers to the
Langchain Community. PQ is one of the fastest retrieval methods if the
embeddings are rich enough in context due to the concepts of
quantization and representation through centroids
    - **Description:** Adding PQ as one of the retrievers
    - **Dependencies:** using the package nanopq for this PR
    - **Twitter handle:** vishnunkumar_


- [X] **Add tests and docs**: If you're adding a new integration, please
include
   - [X] Added unit tests for the same in the retrievers.
   - [] Will add an example notebook subsequently

- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/ -
done the same

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-24 13:52:15 +00:00
stydxm
b9bea36dd4
community: fix typo in warning message (#24597)
- **Description:** 
  This PR fixes a small typo in a warning message
- **Issue:**

![](https://github.com/user-attachments/assets/5aa57724-26c5-49f6-8bc1-5a54bb67ed49)
There were double `Use` and double `instead`
2024-07-24 13:19:07 +00:00
cüre
da06d4d7af
community: update finetuned model cost for 4o-mini (#24605)
- **Description:** adds model price for. reference:
https://openai.com/api/pricing/
- **Issue:** -
- **Dependencies:** -
- **Twitter handle:** cureef
2024-07-24 13:17:26 +00:00
ZhangShenao
ad18afc3ec
community[patch]: Fix param spelling error in ElasticsearchChatMessageHistory (#24589)
Fix param spelling error in `ElasticsearchChatMessageHistory`
2024-07-23 19:29:42 -07:00
Aayush Kataria
0f45ac4088
LangChain Community: VectorStores: Azure Cosmos DB Filtered Vector Search (#24087)
Thank you for contributing to LangChain!

- This PR adds vector search filtering for Azure Cosmos DB Mongo vCore
and NoSQL.


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-07-23 16:59:23 -07:00
Carlos André Antunes
325068bb53
community: Fix azure_openai.py (#24572)
In some lines its trying to read a key that do not exists yet. In this
cases I changed the direct access to dict.get() method


- [ x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2024-07-23 16:22:21 -04:00
Bagatur
8691a5a37f
community[patch]: Release 0.2.10 (#24560) 2024-07-23 09:24:57 -07:00
Ben Chambers
e80b0932ee
community[patch]: small fixes to link extractors (#24528)
- **Description:** small fixes to imports / types in the link extraction
work

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-07-23 14:28:06 +00:00
Morteza Hosseini
9e06991aae
community[patch]: Update URL to the 2markdown API (#24546)
Update the URL to Markdown endpoint.

API information is available here: https://2markdown.com/docs#url2md
2024-07-23 14:27:55 +00:00
maang-h
378db2e1a5
docs: Add RedisChatMessageHistory docstrings (#24548)
- **Description:** Add `RedisChatMessageHistory ` rich docstrings.
- **Issue:** the issue #21983

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-23 14:23:46 +00:00
Ben Chambers
a5a3d28776
community[patch]: Remove targets_table from C* GraphVectorStore (#24502)
- **Description:** Remove the unnecessary `targets_table` parameter
2024-07-22 22:09:36 -04:00
Alexander Golodkov
2a70a07aad
community[minor]: added new document loaders based on dedoc library (#24303)
### Description
This pull request added new document loaders to load documents of
various formats using [Dedoc](https://github.com/ispras/dedoc):
  - `DedocFileLoader` (determine file types automatically and parse)
  - `DedocPDFLoader` (for `PDF` and images parsing)
- `DedocAPIFileLoader` (determine file types automatically and parse
using Dedoc API without library installation)

[Dedoc](https://dedoc.readthedocs.io) is an open-source library/service
that extracts texts, tables, attached files and document structure
(e.g., titles, list items, etc.) from files of various formats. The
library is actively developed and maintained by a group of developers.

`Dedoc` supports `DOCX`, `XLSX`, `PPTX`, `EML`, `HTML`, `PDF`, images
and more.
Full list of supported formats can be found
[here](https://dedoc.readthedocs.io/en/latest/#id1).
For `PDF` documents, `Dedoc` allows to determine textual layer
correctness and split the document into paragraphs.


### Issue
This pull request extends variety of document loaders supported by
`langchain_community` allowing users to choose the most suitable option
for raw documents parsing.

### Dependencies
The PR added a new (optional) dependency `dedoc>=2.2.5` ([library
documentation](https://dedoc.readthedocs.io)) to the
`extended_testing_deps.txt`

### Twitter handle
None

### Add tests and docs
1. Test for the integration:
`libs/community/tests/integration_tests/document_loaders/test_dedoc.py`
2. Example notebook:
`docs/docs/integrations/document_loaders/dedoc.ipynb`
3. Information about the library:
`docs/docs/integrations/providers/dedoc.mdx`

### Lint and test

Done locally:

  - `make format`
  - `make lint`
  - `make integration_tests`
  - `make docs_build` (from the project root)

---------

Co-authored-by: Nasty <bogatenkova.anastasiya@mail.ru>
2024-07-23 02:04:53 +00:00
Ben Chambers
5ac936a284
community[minor]: add document transformer for extracting links (#24186)
- **Description:** Add a DocumentTransformer for executing one or more
`LinkExtractor`s and adding the extracted links to each document.
- **Issue:** n/a
- **Depedencies:** none

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-07-22 22:01:21 -04:00
Erick Friis
3dce2e1d35
all: add release notes to pypi (#24519) 2024-07-22 13:59:13 -07:00
Bagatur
236e957abb
core,groq,openai,mistralai,robocorp,fireworks,anthropic[patch]: Update BaseModel subclass and instance checks to handle both v1 and proper namespaces (#24417)
After this PR chat models will correctly handle pydantic 2 with
bind_tools and with_structured_output.


```python
import pydantic
print(pydantic.__version__)
```
2.8.2

```python
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

class Add(BaseModel):
    x: int
    y: int

model = ChatOpenAI().bind_tools([Add])
print(model.invoke('2 + 5').tool_calls)

model = ChatOpenAI().with_structured_output(Add)
print(type(model.invoke('2 + 5')))
```

```
[{'name': 'Add', 'args': {'x': 2, 'y': 5}, 'id': 'call_PNUFa4pdfNOYXxIMHc6ps2Do', 'type': 'tool_call'}]
<class '__main__.Add'>
```


```python
from langchain_openai import ChatOpenAI
from pydantic.v1 import BaseModel, Field

class Add(BaseModel):
    x: int
    y: int

model = ChatOpenAI().bind_tools([Add])
print(model.invoke('2 + 5').tool_calls)

model = ChatOpenAI().with_structured_output(Add)
print(type(model.invoke('2 + 5')))
```

```python
[{'name': 'Add', 'args': {'x': 2, 'y': 5}, 'id': 'call_hhiHYP441cp14TtrHKx3Upg0', 'type': 'tool_call'}]
<class '__main__.Add'>
```

Addresses issues: https://github.com/langchain-ai/langchain/issues/22782

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-07-22 20:07:39 +00:00
Naka Masato
884f76e05a
fix: load google credentials properly in GoogleDriveLoader (#12871)
- **Description:** 
- Fix #12870: set scope in `default` func (ref:
https://google-auth.readthedocs.io/en/master/reference/google.auth.html)
- Moved the code to load default credentials to the bottom for clarity
of the logic
	- Add docstring and comment for each credential loading logic
- **Issue:** https://github.com/langchain-ai/langchain/issues/12870
- **Dependencies:** no dependencies change
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** @gymnstcs

<!-- If no one reviews your PR within a few days, please @-mention one
of @baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-22 17:43:33 +00:00
Jorge Piedrahita Ortiz
10e3982b59
community: sambanova integration minor changes (#24503)
- Minor changes in samabanova llm integration 
  - default api 
  - docstrings
- minor changes in docs
2024-07-22 17:06:35 +00:00
maang-h
721f709dec
community: Improve QianfanChatEndpoint tool result to model (#24466)
- **Description:** `QianfanChatEndpoint` When using tool result to
answer questions, the content of the tool is required to be in Dict
format. Of course, this can require users to return Dict format when
calling the tool, but in order to be consistent with other Chat Models,
I think such modifications are necessary.
2024-07-22 11:29:00 -04:00
ccurme
dcba7df2fe
community[patch]: deprecate langchain_community Chroma in favor of langchain_chroma (#24474) 2024-07-22 11:00:13 -04:00
Mohammad Mohtashim
5ade0187d0
[Commutiy]: Prompts Fixed for ZERO_SHOT_REACT React Agent Type in create_sql_agent function (#23693)
- **Description:** The correct Prompts for ZERO_SHOT_REACT were not
being used in the `create_sql_agent` function. They were not using the
specific `SQL_PREFIX` and `SQL_SUFFIX` prompts if client does not
provide any prompts. This is fixed.
- **Issue:** #23585

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-22 14:04:20 +00:00
ZhangShenao
0f6737cbfe
[Vector Store] Fix function add_texts in TencentVectorDB (#24469)
Regardless of whether `embedding_func` is set or not, the 'text'
attribute of document should be assigned, otherwise the `page_content`
in the document of the final search result will be lost
2024-07-22 09:50:22 -04:00
clement.l
d98b830e4b
community: add flag to toggle progress bar (#24463)
- **Description:** Add a flag to determine whether to show progress bar 
- **Issue:** n/a
- **Dependencies:** n/a
- **Twitter handle:** n/a

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-20 13:18:02 +00:00
chuanbei888
6b08a33fa4
community: fix QianfanChatEndpoint default model (#24464)
the baidu_qianfan_endpoint has been changed from ERNIE-Bot-turbo to
ERNIE-Lite-8K
2024-07-20 13:00:29 +00:00
maang-h
7b28359719
docs: Add ChatSparkLLM docstrings (#24449)
- **Description:** 
  - Add `ChatSparkLLM` docstrings, the issue #22296 
  - To support `stream` method
2024-07-19 20:19:14 -07:00
Erick Friis
f4ee3c8a22
infra: add min version testing to pr test flow (#24358)
xfailing some sql tests that do not currently work on sqlalchemy v1

#22207 was very much not sqlalchemy v1 compatible. 

Moving forward, implementations should be compatible with both to pass
CI
2024-07-19 22:03:19 +00:00
Bagatur
842065a9cc
community[patch]: Release 0.2.9 (#24453) 2024-07-19 12:50:22 -07:00
Bagatur
dda9438e87
community[patch]: gpt-4o-mini costs (#24421) 2024-07-19 19:02:44 +00:00
Eugene Yurtsev
604dfe2d99
community[patch]: Force opt-in for WebResearchRetriever (CVE-2024-3095) (#24451)
This PR addresses the issue raised by (CVE-2024-3095)

https://huntr.com/bounties/e62d4895-2901-405b-9559-38276b6a5273

Unfortunately, we didn't do a good job writing the initial report. It's
pointing at both the wrong package and the wrong code.

The affected code is the Web Retriever not the AsyncHTMLLoader, and the
WebRetriever lives in langchain-community

The vulnerable code lives here: 

0bd3f4e129/libs/community/langchain_community/retrievers/web_research.py (L233-L233)


This PR adds a forced opt-in for users to make sure they are aware of
the risk and can mitigate by configuring a proxy:


0bd3f4e129/libs/community/langchain_community/retrievers/web_research.py (L84-L84)
2024-07-19 18:51:35 +00:00
Asi Greenholts
372c27f2e5
community[minor]: [GoogleApiYoutubeLoader] Replace API used in _get_document_for_channel from search to playlistItem (#24034)
- **Description:** Search has a limit of 500 results, playlistItems
doesn't. Added a class in except clause to catch another common error.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** @TupleType

---------

Co-authored-by: asi-cider <88270351+asi-cider@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-07-19 14:04:34 -04:00
Rafael Pereira
6a45bf9554
community[minor]: GraphCypherQAChain to accept additional inputs as provided by the user for cypher generation (#24300)
**Description:** This PR introduces a change to the
`cypher_generation_chain` to dynamically concatenate inputs. This
improvement aims to streamline the input handling process and make the
method more flexible. The change involves updating the arguments
dictionary with all elements from the `inputs` dictionary, ensuring that
all necessary inputs are dynamically appended. This will ensure that any
cypher generation template will not require a new `_call` method patch.

**Issue:** This PR fixes issue #24260.
2024-07-19 14:03:14 -04:00
Philippe PRADOS
f5856680fe
community[minor]: add mongodb byte store (#23876)
The `MongoDBStore` can manage only documents.
It's not possible to use MongoDB for an `CacheBackedEmbeddings`.

With this new implementation, it's possible to use:
```python
CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=embeddings,
    document_embedding_cache=MongoDBByteStore(
      connection_string=db_uri,
      db_name=db_name,
      collection_name=collection_name,
  ),
)
```
and use MongoDB to cache the embeddings !
2024-07-19 13:54:12 -04:00
yabooung
07715f815b
community[minor]: Add ability to specify file encoding and json encoding for FileChatMessageHistory (#24258)
Description:
Add UTF-8 encoding support

Issue:
Inability to properly handle characters from certain languages (e.g.,
Korean)

Fix:
Implement UTF-8 encoding in FileChatMessageHistory

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-07-19 13:53:21 -04:00
Dristy Srivastava
020cc1cf3e
Community[minor]: Added checksum in while send data to pebblo-cloud (#23968)
- **Description:** 
            - Updated checksum in doc metadata
- Sending checksum and removing actual content, while sending data to
`pebblo-cloud` if `classifier-location `is `pebblo-cloud` in
`/loader/doc` API
            - Adding `pb_id` i.e. pebblo id to doc metadata
            - Refactoring as needed.
- Sending `content-checksum` and removing actual content, while sending
data to `pebblo-cloud` if `classifier-location `is `pebblo-cloud` in
`prmopt` API
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** Updated
- **Docs** NA

---------

Co-authored-by: dristy.cd <dristy@clouddefense.io>
2024-07-19 13:52:54 -04:00
keval dekivadiya
06f47678ae
community[minor]: Add TextEmbed Embedding Integration (#22946)
**Description:**

**TextEmbed** is a high-performance embedding inference server designed
to provide a high-throughput, low-latency solution for serving
embeddings. It supports various sentence-transformer models and includes
the ability to deploy image and text embedding models. TextEmbed offers
flexibility and scalability for diverse applications.

- **PyPI Package:** [TextEmbed on
PyPI](https://pypi.org/project/textembed/)
- **Docker Image:** [TextEmbed on Docker
Hub](https://hub.docker.com/r/kevaldekivadiya/textembed)
- **GitHub Repository:** [TextEmbed on
GitHub](https://github.com/kevaldekivadiya2415/textembed)

**PR Description**
This PR adds functionality for embedding documents and queries using the
`TextEmbedEmbeddings` class. The implementation allows for both
synchronous and asynchronous embedding requests to a TextEmbed API
endpoint. The class handles batching and permuting of input texts to
optimize the embedding process.

**Example Usage:**

```python
from langchain_community.embeddings import TextEmbedEmbeddings

# Initialise the embeddings class
embeddings = TextEmbedEmbeddings(model="your-model-id", api_key="your-api-key", api_url="your_api_url")

# Define a list of documents
documents = [
    "Data science involves extracting insights from data.",
    "Artificial intelligence is transforming various industries.",
    "Cloud computing provides scalable computing resources over the internet.",
    "Big data analytics helps in understanding large datasets.",
    "India has a diverse cultural heritage."
]

# Define a query
query = "What is the cultural heritage of India?"

# Embed all documents
document_embeddings = embeddings.embed_documents(documents)

# Embed the query
query_embedding = embeddings.embed_query(query)

# Print embeddings for each document
for i, embedding in enumerate(document_embeddings):
    print(f"Document {i+1} Embedding:", embedding)

# Print the query embedding
print("Query Embedding:", query_embedding)

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-07-19 17:30:25 +00:00
Andrew Benton
f9d64d22e5
community[minor]: Add Riza Python/JS code execution tool (#23995)
- **Description:** Add Riza Python/JS code execution tool
- **Issue:** N/A
- **Dependencies:** an optional dependency on the `rizaio` pypi package
- **Twitter handle:** [@rizaio](https://x.com/rizaio)

[Riza](https://riza.io) is a safe code execution environment for
agent-generated Python and JavaScript that's easy to integrate into
langchain apps. This PR adds two new tool classes to the community
package.
2024-07-19 17:03:22 +00:00
Ben Chambers
3691701d58
community[minor]: Add keybert-based link extractor (#24311)
- **Description:** Add a `KeybertLinkExtractor` for graph vectorstores.
This allows extracting links from keywords in a Document and linking
nodes that have common keywords.
- **Issue:** None
- **Dependencies:** None.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-19 12:25:07 -04:00
Ben Chambers
83f3d95ffa
community[minor]: GLiNER link extraction (#24314)
- **Description:** This allows extracting links between documents with
common named entities using [GLiNER](https://github.com/urchade/GLiNER).
- **Issue:** None
- **Dependencies:** None

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-07-19 15:34:54 +00:00
Anas Khan
b5acb91080
Mask API keys for various LLM/ChatModel Modules (#13885)
**Description:** 
- Added masking of the API Keys for the modules:
  - `langchain/chat_models/openai.py`
  - `langchain/llms/openai.py`
  - `langchain/llms/google_palm.py`
  - `langchain/chat_models/google_palm.py`
  - `langchain/llms/edenai.py`

- Updated the modules to utilize `SecretStr` from pydantic to securely
manage API key.
- Added unit/integration tests
- `langchain/chat_models/asure_openai.py` used the `open_api_key` that
is derived from the `ChatOpenAI` Class and it was assuming
`openai_api_key` is a str so we changed it to expect `SecretStr`
instead.

**Issue:** https://github.com/langchain-ai/langchain/issues/12165 ,
**Dependencies:** none,
**Tag maintainer:** @eyurtsev

---------

Co-authored-by: HassanA01 <anikeboss@gmail.com>
Co-authored-by: Aneeq Hassan <aneeq.hassan@utoronto.ca>
Co-authored-by: kristinspenc <kristinspenc2003@gmail.com>
Co-authored-by: faisalt14 <faisalt14@gmail.com>
Co-authored-by: Harshil-Patel28 <76663814+Harshil-Patel28@users.noreply.github.com>
Co-authored-by: kristinspenc <146893228+kristinspenc@users.noreply.github.com>
Co-authored-by: faisalt14 <90787271+faisalt14@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-19 15:23:34 +00:00
ccurme
f99369a54c
community[patch]: fix formatting (#24443)
Somehow this got through CI:
https://github.com/langchain-ai/langchain/pull/24363
2024-07-19 14:38:53 +00:00
Ben Chambers
242b085be7
Merge pull request #24315
* community: Add Hierarchy link extractor

* add example

* lint
2024-07-19 09:42:26 -04:00
Rhuan Barros
c3308f31bc
Merge pull request #24363
* important email fields
2024-07-19 09:41:20 -04:00
Han Sol Park
aade9bfde5
Mask API key for ChatOpenAI based chat_models (#14293)
- **Description**: Mask API key for ChatOpenAi based chat_models
(openai, azureopenai, anyscale, everlyai).
Made changes to all chat_models that are based on ChatOpenAI since all
of them assumes that openai_api_key is str rather than SecretStr.
  - **Issue:**: #12165 
  - **Dependencies:**  N/A
  - **Tag maintainer:** @eyurtsev
  - **Twitter handle:** N/A

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-19 02:25:38 +00:00
Eun Hye Kim
07c5c60f63
community: fix tool appending logic and update planner prompt in OpenAPI agent toolkit (#24384)
**Description:**
- Updated the format for the 'Action' section in the planner prompt to
ensure it must be one of the tools without additional words. Adjusted
the phrasing from "should be" to "must be" for clarity and
enforceability.
- Corrected the tool appending logic in the
`_create_api_controller_agent` function to ensure that
`RequestsDeleteToolWithParsing` and `RequestsPatchToolWithParsing` are
properly added to the tools list for "DELETE" and "PATCH" operations.

**Issue:** #24382

**Dependencies:** None

**Twitter handle:** @lunara_x

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-18 13:37:46 +00:00
Chen Xiabin
63c60a31f0
[fix] baidu qianfan AiMessage with usage_metadata (#24389)
make AIMessage usage_metadata has error
2024-07-18 09:28:16 -04:00
Rajendra Kadam
1c65529fd7
community[minor]: [PebbloSafeLoader] Rename loader type and add SharePointLoader to supported loaders (#24393)
Thank you for contributing to LangChain!

- [x] **PR title**: [PebbloSafeLoader] Rename loader type and add
SharePointLoader to supported loaders
    - **Description:** Minor fixes in the PebbloSafeLoader:
        - Renamed the loader type from `remote_db` to `cloud_folder`.
- Added `SharePointLoader` to the list of loaders supported by
PebbloSafeLoader.
    - **Issue:** NA
    - **Dependencies:** NA
- [x] **Add tests and docs**: NA
2024-07-18 08:23:12 -04:00
Paolo Ráez
0dec72cab0
Community[patch]: Missing "stream" parameter in cloudflare_workersai (#23987)
### Description
Missing "stream" parameter. Without it, you'd never receive a stream of
tokens when using stream() or astream()

### Issue
No existing issue available
2024-07-18 02:09:39 +00:00
Brice Fotzo
034a8c7c1b
community: support advanced text extraction options for pdf documents (#20265)
**Description:** 
- Updated constructors in PyPDFParser and PyPDFLoader to handle
`extraction_mode` and additional kwargs, aligning with the capabilities
of `PageObject.extract_text()` from pypdf.

- Added `test_pypdf_loader_with_layout` along with a corresponding
example text file to validate layout extraction from PDFs.

**Issue:** fixes #19735 

**Dependencies:** This change requires updating the pypdf dependency
from version 3.4.0 to at least 4.0.0.

Additional changes include the addition of a new test
test_pypdf_loader_with_layout and an example text file to ensure the
functionality of layout extraction from PDFs aligns with the new
capabilities.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-17 20:47:09 +00:00
Bagatur
b5360e2e5f
community[patch]: Release 0.2.8 (#24354) 2024-07-17 17:07:27 +00:00
Luis Moros
bcb5f354ad
community: Fix SQLDatabse.from_databricks issue when ran from Job (#24346)
- Description: When SQLDatabase.from_databricks is ran from a Databricks
Workflow job, line 205 (default_host = context.browserHostName) throws
an ``AttributeError`` as the ``context`` object has no
``browserHostName`` attribute. The fix handles the exception and sets
the ``default_host`` variable to null

---------

Co-authored-by: lmorosdb <lmorosdb>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-07-17 12:40:12 -04:00
Rafael Pereira
cf28708e7b
Neo4j: Update with non-deprecated cypher methods, and new method to associate relationship embeddings (#23725)
**Description:** At the moment neo4j wrapper is using setVectorProperty,
which is deprecated
([link](https://neo4j.com/docs/operations-manual/5/reference/procedures/#procedure_db_create_setVectorProperty)).
I replaced with the non-deprecated version.

Neo4j recently introduced a new cypher method to associate embeddings
into relations using "setRelationshipVectorProperty" method. In this PR
I also implemented a new method to perform this association maintaining
the same format used in the "add_embeddings" method which is used to
associate embeddings into Nodes.
I also included a test case for this new method.
2024-07-17 12:37:47 -04:00
maang-h
2a3288b15d
docs: Add ChatBaichuan docstrings (#24348)
- **Description:** Add ChatBaichuan rich docstrings.
- **Issue:** the issue #22296
2024-07-17 12:00:16 -04:00
Rafael Pereira
fc41730e28
neo4j: Fix test for order-insensitive comparison and floating-point precision issues (#24338)
**Description:** 
This PR addresses two main issues in the `test_neo4jvector.py`:
1. **Order-insensitive Comparison:** Modified the
`test_retrieval_dictionary` to ensure that it passes regardless of the
order of returned values by parsing `page_content` into a structured
format (dictionary) before comparison.
2. **Floating-point Precision:** Updated
`test_neo4jvector_relevance_score` to handle minor floating-point
precision differences by using the `isclose` function for comparing
relevance scores with a relative tolerance.

Errors addressed:

- **test_neo4jvector_relevance_score:**
  ```
AssertionError: assert [(Document(page_content='foo', metadata={'page':
'0'}), 1.0000014305114746), (Document(page_content='bar',
metadata={'page': '1'}), 0.9998371005058289),
(Document(page_content='baz', metadata={'page': '2'}),
0.9993508458137512)] == [(Document(page_content='foo', metadata={'page':
'0'}), 1.0), (Document(page_content='bar', metadata={'page': '1'}),
0.9998376369476318), (Document(page_content='baz', metadata={'page':
'2'}), 0.9993523359298706)]
At index 0 diff: (Document(page_content='foo', metadata={'page': '0'}),
1.0000014305114746) != (Document(page_content='foo', metadata={'page':
'0'}), 1.0)
  Full diff:
  - [(Document(page_content='foo', metadata={'page': '0'}), 1.0),
+ [(Document(page_content='foo', metadata={'page': '0'}),
1.0000014305114746),
? +++++++++++++++
- (Document(page_content='bar', metadata={'page': '1'}),
0.9998376369476318),
? ^^^ ------
+ (Document(page_content='bar', metadata={'page': '1'}),
0.9998371005058289),
? ^^^^^^^^^
- (Document(page_content='baz', metadata={'page': '2'}),
0.9993523359298706),
? ----------
+ (Document(page_content='baz', metadata={'page': '2'}),
0.9993508458137512),
? ++++++++++
  ]
  ```

- **test_retrieval_dictionary:**
  ```
AssertionError: assert [Document(page_content='skills:\n- Python\n- Data
Analysis\n- Machine Learning\nname: John\nage: 30\n')] ==
[Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine
Learning\nage: 30\nname: John\n')]
At index 0 diff: Document(page_content='skills:\n- Python\n- Data
Analysis\n- Machine Learning\nname: John\nage: 30\n') !=
Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine
Learning\nage: 30\nname: John\n')
  Full diff:
- [Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine
Learning\nage: 30\nname: John\n')]
? ---------
+ [Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine
Learning\nage: John\nage: 30\n')]
? +++++++++
  ```
2024-07-17 09:28:25 -04:00
bovlb
5caa381177
community[minor]: Add ApertureDB as a vectorstore (#24088)
Thank you for contributing to LangChain!

- [X] *ApertureDB as vectorstore**: "community: Add ApertureDB as a
vectorestore"

- **Description:** this change provides a new community integration that
uses ApertureData's ApertureDB as a vector store.
    - **Issue:** none
    - **Dependencies:** depends on ApertureDB Python SDK
    - **Twitter handle:** ApertureData

- [X] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

Integration tests rely on a local run of a public docker image.
Example notebook additionally relies on a local Ollama server.

- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

All lint tests pass.

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Gautam <gautam@aperturedata.io>
2024-07-16 09:32:59 -07:00
frob
c59e663365
community[patch]: Fix docstring for ollama parameter "keep_alive" (#23973)
Fix doc-string for ollama integration
2024-07-16 14:48:38 +00:00
Rahul Raghavendra Choudhury
f5a38772a8
community[patch]: Update TavilySearch to use TavilyClient instead of the deprecated Client (#24270)
On using TavilySearchAPIRetriever with any conversation chain getting
error :

`TypeError: Client.__init__() got an unexpected keyword argument
'api_key'`

It is because the retreiver class is using the depreciated `Client`
class, `TavilyClient` need to be used instead.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-07-16 13:35:28 +00:00
Chen Xiabin
69b1603173
baidu qianfan AiMessage with usage_metadata (#24288)
add usage_metadata to qianfan AIMessage. Thanks
2024-07-16 09:30:50 -04:00
Dobiichi-Origami
7aeaa1974d
community[patch]: change the class of qianfan_ak and qianfan_sk parameters (#24293)
- **Description:** we changed the class of two parameters to fix a bug,
which causes validation failure when using QianfanEmbeddingEndpoint
2024-07-16 09:17:48 -04:00
Lage Ragnarsson
a3c10fc6ce
community: Add support for specifying hybrid search for Databricks vector search (#23528)
**Description:**

Databricks Vector Search recently added support for hybrid
keyword-similarity search.
See [usage
examples](https://docs.databricks.com/en/generative-ai/create-query-vector-search.html#query-a-vector-search-endpoint)
from their documentation.

This PR updates the Langchain vectorstore interface for Databricks to
enable the user to pass the *query_type* parameter to
*similarity_search* to make use of this functionality.
By default, there will not be any changes for existing users of this
interface. To use the new hybrid search feature, it is now possible to
do

```python
# ...
dvs = DatabricksVectorSearch(index)
dvs.similarity_search("my search query", query_type="HYBRID")
```

Or using the retriever:

```python
retriever = dvs.as_retriever(
    search_kwargs={
        "query_type": "HYBRID",
    }
)
retriever.invoke("my search query")
```

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-15 22:14:08 +00:00
Christopher Tee
5171ffc026
community(you): Integrate You.com conversational APIs (#23046)
You.com is releasing two new conversational APIs — Smart and Research. 

This PR:
- integrates those APIs with Langchain, as an LLM
- streaming is supported

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-07-15 17:46:58 -04:00
maang-h
6c7d9f93b9
feat: Add ChatTongyi structured output (#24187)
- **Description:** Add `with_structured_output` method to ChatTongyi to
support structured output.
2024-07-15 15:57:21 -04:00
Chen Xiabin
8f4620f4b8
baidu qianfan streaming token_usage (#24117)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-15 19:52:31 +00:00
maang-h
9d97de34ae
community[patch]: Improve ChatBaichuan init args and role (#23878)
- **Description:** Improve ChatBaichuan init args and role
   -  ChatBaichuan adds `system` role
   - alias: `baichuan_api_base` -> `base_url`
   - `with_search_enhance` is deprecated
   - Add `max_tokens` argument
2024-07-15 15:17:00 -04:00
mrugank-wadekar
66bebeb76a
partners: add similarity search by image functionality to langchain_chroma partner package (#22982)
- **Description:** This pull request introduces two new methods to the
Langchain Chroma partner package that enable similarity search based on
image embeddings. These methods enhance the package's functionality by
allowing users to search for images similar to a given image URI. Also
introduces a notebook to demonstrate it's use.
  - **Issue:** N/A
  - **Dependencies:** None
  - **Twitter handle:** @mrugank9009

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-15 18:48:22 +00:00
Carlos André Antunes
20151384d7
fix azure_openai.py: some keys do not exists (#24158)
In some lines its trying to read a key that do not exists yet. In this
cases I changed the direct access to dict.get() method

Thank you for contributing to LangChain!

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2024-07-15 17:17:05 +00:00
Harold Martin
ccdaf14eff
docs: Spell check fixes (#24217)
**Description:** Spell check fixes for docs, comments, and a couple of
strings. No code change e.g. variable names.
**Issue:** none
**Dependencies:** none
**Twitter handle:** hmartin
2024-07-15 15:51:43 +00:00
thehunmonkgroup
e8a21146d3
community[patch]: upgrade default model for ChatAnyscale (#24232)
Old default `meta-llama/Llama-2-7b-chat-hf` no longer supported.
2024-07-15 11:34:59 -04:00
Bagatur
65321bf975
core[patch]: fix ToolCall "type" when streaming (#24218) 2024-07-13 08:59:03 -07:00
Miroslav
aee55eda39
community: Skip Login to HuggubgFaceHub when token is not set (#21561)
Thank you for contributing to LangChain!

- [ ] **HuggingFaceEndpoint**: "Skip Login to HuggingFaceHub"
  - Where:  langchain, community, llm, huggingface_endpoint
 


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Skip login to huggingface hub when when
`huggingfacehub_api_token` is not set. This is needed when using custom
`endpoint_url` outside of HuggingFaceHub.
- **Issue:** the issue # it fixes
https://github.com/langchain-ai/langchain/issues/20342 and
https://github.com/langchain-ai/langchain/issues/19685
    - **Dependencies:** None


- [ ] **Add tests and docs**: 
  1. Tested with locally available TGI endpoint
  2.  Example Usage
```python
from langchain_community.llms import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    endpoint_url='http://localhost:8080',
    server_kwargs={
        "headers": {"Content-Type": "application/json"}
    }
)
resp = llm.invoke("Tell me a joke")
print(resp)
```
 Also tested against HF Endpoints
 ```python
 from langchain_community.llms import HuggingFaceEndpoint
huggingfacehub_api_token = "hf_xyz"
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(
    huggingfacehub_api_token=huggingfacehub_api_token,
    repo_id=repo_id,
)
resp = llm.invoke("Tell me a joke")
print(resp)
 ```
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-12 22:10:32 +00:00
Jean Nshuti
d77d9bfc00
community[patch]: update typo document content returned from semanticscholar (#24175)
Update "astract" -> abstract
2024-07-12 15:40:47 +00:00
Tomaz Bratanic
d3a2b9fae0
Fix neo4j type error on missing constraint information (#24177)
If you use `refresh_schema=False`, then the metadata constraint doesn't
exist. ATM, we used default `None` in the constraint check, but then
`any` fails because it can't iterate over None value
2024-07-12 06:39:29 -04:00
thedavgar
ffe6ca986e
community: Fix Bug in Azure Search Vectorstore search asyncronously (#24081)
Thank you for contributing to LangChain!

**Description**:
This PR fixes a bug described in the issue in #24064, when using the
AzureSearch Vectorstore with the asyncronous methods to do search which
is also the method used for the retriever. The proposed change includes
just change the access of the embedding as optional because is it not
used anywhere to retrieve documents. Actually, the syncronous methods of
retrieval do not use the embedding neither.

With this PR the code given by the user in the issue works.

```python
vectorstore = AzureSearch(
    azure_search_endpoint=os.getenv("AI_SEARCH_ENDPOINT_SECRET"),
    azure_search_key=os.getenv("AI_SEARCH_API_KEY"),
    index_name=os.getenv("AI_SEARCH_INDEX_NAME_SECRET"),
    fields=fields,
    embedding_function=encoder,
)

retriever = vectorstore.as_retriever(search_type="hybrid", k=2)

await vectorstore.avector_search("what is the capital of France")
await retriever.ainvoke("what is the capital of France")
```

**Issue**:
The Azure Search Vectorstore is not working when searching for documents
with asyncronous methods, as described in issue #24064

**Dependencies**:
There are no extra dependencies required for this change.

---------

Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-07-11 18:32:19 -07:00
Bagatur
5fd1e67808
core[minor], integrations...[patch]: Support ToolCall as Tool input and ToolMessage as Tool output (#24038)
Changes:
- ToolCall, InvalidToolCall and ToolCallChunk can all accept a "type"
parameter now
- LLM integration packages add "type" to all the above
- Tool supports ToolCall inputs that have "type" specified
- Tool outputs ToolMessage when a ToolCall is passed as input
- Tools can separately specify ToolMessage.content and
ToolMessage.raw_output
- Tools emit events for validation errors (using on_tool_error and
on_tool_end)

Example:
```python
@tool("structured_api", response_format="content_and_raw_output")
def _mock_structured_tool_with_raw_output(
    arg1: int, arg2: bool, arg3: Optional[dict] = None
) -> Tuple[str, dict]:
    """A Structured Tool"""
    return f"{arg1} {arg2}", {"arg1": arg1, "arg2": arg2, "arg3": arg3}


def test_tool_call_input_tool_message_with_raw_output() -> None:
    tool_call: Dict = {
        "name": "structured_api",
        "args": {"arg1": 1, "arg2": True, "arg3": {"img": "base64string..."}},
        "id": "123",
        "type": "tool_call",
    }
    expected = ToolMessage("1 True", raw_output=tool_call["args"], tool_call_id="123")
    tool = _mock_structured_tool_with_raw_output
    actual = tool.invoke(tool_call)
    assert actual == expected

    tool_call.pop("type")
    with pytest.raises(ValidationError):
        tool.invoke(tool_call)

    actual_content = tool.invoke(tool_call["args"])
    assert actual_content == expected.content
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-11 14:54:02 -07:00
Jacob Lee
f1f1f75782
community[patch]: Make AzureML endpoint return AI messages for type assistant (#24085) 2024-07-11 21:45:30 +02:00
Atul R
457677c1b7
community: Fixes use of ImagePromptTemplate with Ollama (#24140)
Description: ImagePromptTemplate for Multimodal llms like llava when
using Ollama
Twitter handle: https://x.com/a7ulr

Details:

When using llava models / any ollama multimodal llms and passing images
in the prompt as urls, langchain breaks with this error.

```python
image_url_components = image_url.split(",")
                           ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'split'
```

From the looks of it, there was bug where the condition did check for a
`url` field in the variable but missed to actually assign it.

This PR fixes ImagePromptTemplate for Multimodal llms like llava when
using Ollama specifically.

@hwchase17
2024-07-11 11:31:48 -07:00
Matt
8327925ab7
community:support additional Azure Search Options (#24134)
- **Description:** Support additional kwargs options for the Azure
Search client (Described here
https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/core/azure-core/README.md#configurations)
    - **Issue:** N/A
    - **Dependencies:** No additional Dependencies

---------
2024-07-11 18:22:36 +00:00
Eugene Yurtsev
08638ccc88
community[patch]: QianfanLLMEndpoint fix type information for the keys (#24128)
Fix for issue: https://github.com/langchain-ai/langchain/issues/24126
2024-07-11 16:24:26 +00:00
Eugene Yurtsev
1e7d8ba9a6
ci[patch]: Update community linter to provide a helpful error message (#24127)
Update community import linter to explain what's wrong
2024-07-11 16:22:08 +00:00
maang-h
16e178a8c2
docs: Add MiniMaxChat docstrings (#24026)
- **Description:** Add MiniMaxChat rich docstrings.
- **Issue:** the issue #22296
2024-07-11 10:55:02 -04:00
Christophe Bornet
5fc5ef2b52
community[minor]: Add graph store extractors (#24065)
This adds an extractor interface and an implementation for HTML pages.
Extractors are used to create GraphVectorStore Links on loaded content.

**Twitter handle:** cbornet_
2024-07-11 10:35:31 -04:00
maang-h
9bcf8f867d
docs: Add SQLChatMessageHistory docstring (#23978)
- **Description:** Add SQLChatMessageHistory docstring.
- **Issue:** the issue #21983

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-07-11 14:24:28 +00:00
Rafael Pereira
092e9ee0e6
community[minor]: Neo4j Fixed similarity docs (#23913)
**Description:** There was missing some documentation regarding the
`filter` and `params` attributes in similarity search methods.

---------

Co-authored-by: rpereira <rafael.pereira@criticalsoftware.com>
2024-07-11 10:16:48 -04:00
ccurme
975b6129f6
core[patch]: support conversion of runnables to tools (#23992)
Open to other thoughts on UX.

string input:
```python
as_tool = retriever.as_tool()
as_tool.invoke("cat")  # [Document(...), ...]
```

typed dict input:
```python
class Args(TypedDict):
    key: int

def f(x: Args) -> str:
    return str(x["key"] * 2)

as_tool = RunnableLambda(f).as_tool(
    name="my tool",
    description="description",  # name, description are inferred if not supplied
)
as_tool.invoke({"key": 3})  # "6"
```

for untyped dict input, allow specification of parameters + types
```python
def g(x: Dict[str, Any]) -> str:
    return str(x["key"] * 2)

as_tool = RunnableLambda(g).as_tool(arg_types={"key": int})
result = as_tool.invoke({"key": 3})  # "6"
```

Passing the `arg_types` is slightly awkward but necessary to ensure tool
calls populate parameters correctly:
```python
from typing import Any, Dict

from langchain_core.runnables import RunnableLambda
from langchain_openai import ChatOpenAI


def f(x: Dict[str, Any]) -> str:
    return str(x["key"] * 2)

runnable = RunnableLambda(f)
as_tool = runnable.as_tool(arg_types={"key": int})

llm = ChatOpenAI().bind_tools([as_tool])

result = llm.invoke("Use the tool on 3.")
tool_call = result.tool_calls[0]
args = tool_call["args"]
assert args == {"key": 3}

as_tool.run(args)
```

Contrived (?) example with langgraph agent as a tool:
```python
from typing import List, Literal
from typing_extensions import TypedDict

from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent


llm = ChatOpenAI(temperature=0)


def magic_function(input: int) -> int:
    """Applies a magic function to an input."""
    return input + 2


agent_1 = create_react_agent(llm, [magic_function])


class Message(TypedDict):
    role: Literal["human"]
    content: str

agent_tool = agent_1.as_tool(
    arg_types={"messages": List[Message]},
    name="Jeeves",
    description="Ask Jeeves.",
)

agent_2 = create_react_agent(llm, [agent_tool])
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-10 19:29:59 -04:00
Eugene Yurtsev
c4e149d4f1
community[patch]: Add linter to catch @root_validator (#24070)
- Add linter to prevent further usage of vanilla root validator
- Udpate remaining root validators
2024-07-10 14:51:03 +00:00
ccurme
9c6efadec3
community[patch]: propagate cost information to OpenAI callback (#23996)
This is enabled following
https://github.com/langchain-ai/langchain/pull/22716.
2024-07-10 14:50:35 +00:00
Ethan Yang
13855ef0c3
[HuggingFace Pipeline] add streaming support (#23852) 2024-07-09 17:02:00 -04:00
Erick Friis
e80c150c44
community: release 0.2.7 (prev was langchain) (#23997) 2024-07-08 23:43:32 +00:00
Eugene Yurtsev
f765e8fa9d
core[minor],community[patch],standard-tests[patch]: Move InMemoryImplementation to langchain-core (#23986)
This PR moves the in memory implementation to langchain-core.

* The implementation remains importable from langchain-community.
* Supporting utilities are marked as private for now.
2024-07-08 14:11:51 -07:00
Eugene Yurtsev
aa8c9bb4a9
community[patch]: Add constraint for pdfminer.six to unbreak CI (#23988)
Something changed in pdfminer six. This PR unreaks CI without
fixing the underlying PDF parser.
2024-07-08 20:55:19 +00:00
Eugene Yurtsev
2c180d645e
core[minor],community[minor]: Upgrade all @root_validator() to @pre_init (#23841)
This PR introduces a @pre_init decorator that's a @root_validator(pre=True) but with all the defaults populated!
2024-07-08 16:09:29 -04:00
Rajendra Kadam
8b84457b17
community[minor]: Support PGVector in PebbloRetrievalQA (#23874)
- **Description:** Support PGVector in PebbloRetrievalQA
  - Identity and Semantic Enforcement support for PGVector
  - Refactor Vectorstore validation and name check
  - Clear the overridden identity and semantic enforcement filters
- **Issue:** NA
- **Dependencies:** NA
- **Tests**: NA(already added)
-  **Docs**: Updated
- **Twitter handle:** [@Raj__725](https://twitter.com/Raj__725)
2024-07-05 16:02:25 -04:00
Rajendra Kadam
ee8aa54f53
community[patch]: Fix source path mismatch in PebbloSafeLoader (#23857)
**Description:** Fix for source path mismatch in PebbloSafeLoader. The
fix involves storing the full path in the doc metadata in VectorDB
**Issue:** NA, caught in internal testing
**Dependencies:** NA
**Add tests**:  Updated tests
2024-07-05 15:24:17 -04:00
Christophe Bornet
42d049f618
core[minor]: Add Graph Store component (#23092)
This PR introduces a GraphStore component. GraphStore extends
VectorStore with the concept of links between documents based on
document metadata. This allows linking documents based on a variety of
techniques, including common keywords, explicit links in the content,
and other patterns.

This works with existing Documents, so it’s easy to extend existing
VectorStores to be used as GraphStores. The interface can be implemented
for any Vector Store technology that supports metadata, not only graph
DBs.

When retrieving documents for a given query, the first level of search
is done using classical similarity search. Next, links may be followed
using various traversal strategies to get additional documents. This
allows documents to be retrieved that aren’t directly similar to the
query but contain relevant information.

2 retrieving methods are added to the VectorStore ones : 
* traversal_search which gets all linked documents up to a certain depth
* mmr_traversal_search which selects linked documents using an MMR
algorithm to have more diverse results.

If a depth of retrieval of 0 is used, GraphStore is effectively a
VectorStore. It enables an easy transition from a simple VectorStore to
GraphStore by adding links between documents as a second step.

An implementation for Apache Cassandra is also proposed.

See
https://github.com/datastax/ragstack-ai/blob/main/libs/knowledge-store/notebooks/astra_support.ipynb
for a notebook explaining how to use GraphStore and that shows that it
can answer correctly to questions that a simple VectorStore cannot.

**Twitter handle:** _cbornet
2024-07-05 12:24:10 -04:00
Eugene Yurtsev
6f08e11d7c
core[minor]: add upsert, streaming_upsert, aupsert, astreaming_upsert methods to the VectorStore abstraction (#23774)
This PR rolls out part of the new proposed interface for vectorstores
(https://github.com/langchain-ai/langchain/pull/23544) to existing store
implementations.

The PR makes the following changes:

1. Adds standard upsert, streaming_upsert, aupsert, astreaming_upsert
methods to the vectorstore.
2. Updates `add_texts` and `aadd_texts` to be non required with a
default implementation that delegates to `upsert` and `aupsert` if those
have been implemented. The original `add_texts` and `aadd_texts` methods
are problematic as they spread object specific information across
document and **kwargs. (e.g., ids are not a part of the document)
3. Adds a default implementation to `add_documents` and `aadd_documents`
that delegates to `upsert` and `aupsert` respectively.
4. Adds standard unit tests to verify that a given vectorstore
implements a correct read/write API.

A downside of this implementation is that it creates `upsert` with a
very similar signature to `add_documents`.
The reason for introducing `upsert` is to:
* Remove any ambiguities about what information is allowed in `kwargs`.
Specifically kwargs should only be used for information common to all
indexed data. (e.g., indexing timeout).
*Allow inheriting from an anticipated generalized interface for indexing
that will allow indexing `BaseMedia` (i.e., allow making a vectorstore
for images/audio etc.)
 
`add_documents` can be deprecated in the future in favor of `upsert` to
make sure that users have a single correct way of indexing content.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-05 12:21:40 -04:00
Philippe PRADOS
289960bc60
community[patch]: Redis.delete should be a regular method not a static method (#23873)
The `langchain_common.vectostore.Redis.delete()` must not be a
`@staticmethod`.

With the current implementation, it's not possible to have multiple
instances of Redis vectorstore because all versions must share the
`REDIS_URL`.

It's not conform with the base class.
2024-07-05 12:04:58 -04:00
Klaudia Lemiec
a2082bc1f8
docs: Arxiv docs update (#23871)
- [X] **PR title**
- [X] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** Update of docstrings and docpages
- **Issue:**
[22866](https://github.com/langchain-ai/langchain/issues/22866)

- [X] **Add tests and docs**

- [X] **Lint and test**
2024-07-05 11:43:51 -04:00
André Quintino
99b1467b63
community: add support for 'cloud' parameter in JiraAPIWrapper (#23057)
- **Description:** Enhance JiraAPIWrapper to accept the 'cloud'
parameter through an environment variable. This update allows more
flexibility in configuring the environment for the Jira API.
 - **Twitter handle:** Andre_Q_Pereira

---------

Co-authored-by: André Quintino <andre.quintino@tui.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-05 15:11:10 +00:00
wenngong
b1e90b3075
community: add model_name param valid for GPT4AllEmbeddings (#23867)
Description: add model_name param valid for GPT4AllEmbeddings

Issue: #23863 #22819

---------

Co-authored-by: gongwn1 <gongwn1@lenovo.com>
2024-07-05 10:46:34 -04:00
volodymyr-memsql
a4eb6d0fb1
community: add SingleStoreDB semantic cache (#23218)
This PR adds a `SingleStoreDBSemanticCache` class that implements a
cache based on SingleStoreDB vector store, integration tests, and a
notebook example.

Additionally, this PR contains minor changes to SingleStoreDB vector
store:
 - change add texts/documents methods to return a list of inserted ids
 - implement delete(ids) method to delete documents by list of ids
 - added drop() method to drop a correspondent database table
- updated integration tests to use and check functionality implemented
above


CC: @baskaryan, @hwchase17

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
2024-07-05 09:26:06 -04:00
Igor Drozdov
bb597b1286
feat(community): add bind_tools function for ChatLiteLLM (#23823)
It's a follow-up to https://github.com/langchain-ai/langchain/pull/23765

Now the tools can be bound by calling `bind_tools`

```python
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.utils.function_calling import convert_to_openai_tool
from langchain_community.chat_models import ChatLiteLLM

class GetWeather(BaseModel):
    '''Get the current weather in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

class GetPopulation(BaseModel):
    '''Get the current population in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

prompt = "Which city is hotter today and which is bigger: LA or NY?"
# tools = [convert_to_openai_tool(GetWeather), convert_to_openai_tool(GetPopulation)]
tools = [GetWeather, GetPopulation]

llm = ChatLiteLLM(model="claude-3-sonnet-20240229").bind_tools(tools)
ai_msg = llm.invoke(prompt)
print(ai_msg.tool_calls)
```

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Igor Drozdov <idrozdov@gitlab.com>
2024-07-05 09:19:41 -04:00
Jiejun Tan
2be66a38d8
huggingface: Fix huggingface tei support (#22653)
Update former pull request:
https://github.com/langchain-ai/langchain/pull/22595.

Modified
`libs/partners/huggingface/langchain_huggingface/embeddings/huggingface_endpoint.py`,
where the API call function does not match current [Text Embeddings
Inference
API](https://huggingface.github.io/text-embeddings-inference/#/Text%20Embeddings%20Inference/embed).
One example is:
```json
{
  "inputs": "string",
  "normalize": true,
  "truncate": false
}
```
Parameters in `_model_kwargs` are not passed properly in the latest
version. By the way, the issue *[why cause 413?
#50](https://github.com/huggingface/text-embeddings-inference/issues/50)*
might be solved.
2024-07-03 13:30:29 -07:00
Ikko Eltociear Ashimine
75734fbcf1
community: fix typo in unit tests for test_zenguard.py (#23819)
enviroment -> environment


- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"
2024-07-03 14:05:42 -04:00
Bagatur
a0c2281540
infra: update mypy 1.10, ruff 0.5 (#23721)
```python
"""python scripts/update_mypy_ruff.py"""
import glob
import tomllib
from pathlib import Path

import toml
import subprocess
import re

ROOT_DIR = Path(__file__).parents[1]


def main():
    for path in glob.glob(str(ROOT_DIR / "libs/**/pyproject.toml"), recursive=True):
        print(path)
        with open(path, "rb") as f:
            pyproject = tomllib.load(f)
        try:
            pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = (
                "^1.10"
            )
            pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = (
                "^0.5"
            )
        except KeyError:
            continue
        with open(path, "w") as f:
            toml.dump(pyproject, f)
        cwd = "/".join(path.split("/")[:-1])
        completed = subprocess.run(
            "poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color",
            cwd=cwd,
            shell=True,
            capture_output=True,
            text=True,
        )
        logs = completed.stdout.split("\n")

        to_ignore = {}
        for l in logs:
            if re.match("^(.*)\:(\d+)\: error:.*\[(.*)\]", l):
                path, line_no, error_type = re.match(
                    "^(.*)\:(\d+)\: error:.*\[(.*)\]", l
                ).groups()
                if (path, line_no) in to_ignore:
                    to_ignore[(path, line_no)].append(error_type)
                else:
                    to_ignore[(path, line_no)] = [error_type]
        print(len(to_ignore))
        for (error_path, line_no), error_types in to_ignore.items():
            all_errors = ", ".join(error_types)
            full_path = f"{cwd}/{error_path}"
            try:
                with open(full_path, "r") as f:
                    file_lines = f.readlines()
            except FileNotFoundError:
                continue
            file_lines[int(line_no) - 1] = (
                file_lines[int(line_no) - 1][:-1] + f"  # type: ignore[{all_errors}]\n"
            )
            with open(full_path, "w") as f:
                f.write("".join(file_lines))

        subprocess.run(
            "poetry run ruff format .; poetry run ruff --select I --fix .",
            cwd=cwd,
            shell=True,
            capture_output=True,
            text=True,
        )


if __name__ == "__main__":
    main()

```
2024-07-03 10:33:27 -07:00
Oguz Vuruskaner
2a2c0d1a94
community[deepinfra]: fix tool call parsing. (#23162)
This PR includes fix for DeepInfra tool call parsing.
2024-07-03 12:11:37 -04:00
maang-h
525109e506
feat: Implement ChatBaichuan asynchronous interface (#23589)
- **Description:** Add interface to `ChatBaichuan` to support
asynchronous requests
    - `_agenerate` method
    - `_astream` method

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-03 12:10:04 -04:00
Qingchuan Hao
5cd4083457
community: make bing web search as the only option (#23523)
This PR make bing web search as the option for BingSearchAPIWrapper to
facilitate and simply the user interface on Langchain.
This is a follow-up work of
https://github.com/langchain-ai/langchain/pull/23306.
2024-07-02 17:13:54 -04:00
maang-h
e4e28a6ff5
community[patch]: Fix MiniMaxChat validate_environment error (#23770)
- **Description:** Fix some issues in MiniMaxChat 
  - Fix `minimax_api_host` not in `values` error
- Remove `minimax_group_id` from reading environment variables, the
`minimax_group_id` no longer use in MiniMaxChat
  - Invoke callback prior to yielding token, the issus #16913
2024-07-02 13:23:32 -04:00
Eugene Yurtsev
46ff0f7a3c
community[patch]: Update @root_validators to use explicit pre=True or pre=False (#23737) 2024-07-02 10:47:21 -04:00
Igor Drozdov
b664dbcc36
feat(community): add support for tool_calls response (#23765)
When `model_kwargs={"tools": tools}` are passed to `ChatLiteLLM`, they
are executed, but the response is not recognized correctly

Let's add `tool_calls` to the `additional_kwargs`

Thank you for contributing to LangChain!

## ChatAnthropic

I used the following example to verify the output of llm with tools:

```python
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_anthropic import ChatAnthropic

class GetWeather(BaseModel):
    '''Get the current weather in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

class GetPopulation(BaseModel):
    '''Get the current population in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

llm = ChatAnthropic(model="claude-3-sonnet-20240229")
llm_with_tools = llm.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke("Which city is hotter today and which is bigger: LA or NY?")
print(ai_msg.tool_calls)
```

I get the following response:

```json
[{'name': 'GetWeather', 'args': {'location': 'Los Angeles, CA'}, 'id': 'toolu_01UfDA89knrhw3vFV9X47neT'}, {'name': 'GetWeather', 'args': {'location': 'New York, NY'}, 'id': 'toolu_01NrYVRYae7m7z7tBgyPb3Gd'}, {'name': 'GetPopulation', 'args': {'location': 'Los Angeles, CA'}, 'id': 'toolu_01EPFEpDgzL6vV2dTpD9SVP5'}, {'name': 'GetPopulation', 'args': {'location': 'New York, NY'}, 'id': 'toolu_01B5J6tPJXgwwfhQX9BHP2dt'}]
```

## LiteLLM

Based on https://litellm.vercel.app/docs/completion/function_call

```python
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.utils.function_calling import convert_to_openai_tool
import litellm

class GetWeather(BaseModel):
    '''Get the current weather in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

class GetPopulation(BaseModel):
    '''Get the current population in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

prompt = "Which city is hotter today and which is bigger: LA or NY?"
tools = [convert_to_openai_tool(GetWeather), convert_to_openai_tool(GetPopulation)]

response = litellm.completion(model="claude-3-sonnet-20240229", messages=[{'role': 'user', 'content': prompt}], tools=tools)
print(response.choices[0].message.tool_calls)
```

```python
[ChatCompletionMessageToolCall(function=Function(arguments='{"location": "Los Angeles, CA"}', name='GetWeather'), id='toolu_01HeDWV5vP7BDFfytH5FJsja', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"location": "New York, NY"}', name='GetWeather'), id='toolu_01EiLesUSEr3YK1DaE2jxsQv', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"location": "Los Angeles, CA"}', name='GetPopulation'), id='toolu_01Xz26zvkBDRxEUEWm9pX6xa', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"location": "New York, NY"}', name='GetPopulation'), id='toolu_01SDqKnsLjvUXuBsgAZdEEpp', type='function')]
```

## ChatLiteLLM

When I try the following

```python
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.utils.function_calling import convert_to_openai_tool
from langchain_community.chat_models import ChatLiteLLM

class GetWeather(BaseModel):
    '''Get the current weather in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

class GetPopulation(BaseModel):
    '''Get the current population in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

prompt = "Which city is hotter today and which is bigger: LA or NY?"
tools = [convert_to_openai_tool(GetWeather), convert_to_openai_tool(GetPopulation)]

llm = ChatLiteLLM(model="claude-3-sonnet-20240229", model_kwargs={"tools": tools})
ai_msg = llm.invoke(prompt)
print(ai_msg)
print(ai_msg.tool_calls)
```

```python
content="Okay, let's find out the current weather and populations for Los Angeles and New York City:" response_metadata={'token_usage': Usage(prompt_tokens=329, completion_tokens=193, total_tokens=522), 'model': 'claude-3-sonnet-20240229', 'finish_reason': 'tool_calls'} id='run-748b7a84-84f4-497e-bba1-320bd4823937-0'
[]
```

---

When I apply the changes of this PR, the output is

```json
[{'name': 'GetWeather', 'args': {'location': 'Los Angeles, CA'}, 'id': 'toolu_017D2tGjiaiakB1HadsEFZ4e'}, {'name': 'GetWeather', 'args': {'location': 'New York, NY'}, 'id': 'toolu_01WrDpJfVqLkPejWzonPCbLW'}, {'name': 'GetPopulation', 'args': {'location': 'Los Angeles, CA'}, 'id': 'toolu_016UKyYrVAV9Pz99iZGgGU7V'}, {'name': 'GetPopulation', 'args': {'location': 'New York, NY'}, 'id': 'toolu_01Sgv1imExFX1oiR1Cw88zKy'}]
```

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Igor Drozdov <idrozdov@gitlab.com>
2024-07-02 10:42:08 -04:00
Eugene Yurtsev
338cef35b4
community[patch]: update @root_validator in utilities namespace (#23768)
Update all utilities to use `pre=True` or `pre=False`

https://github.com/langchain-ai/langchain/issues/22819
2024-07-02 14:33:01 +00:00
antonpibm
ffde8a6a09
Milvus vectorstore: fix pass ids as argument after upsert (#23761)
**Description**: Milvus vectorstore supports both `add_documents` via
the base class and `upsert` method which deletes and re-adds documents
based on their ids

**Issue**: Due to mismatch in the interfaces the ids used by `upsert`
are neglected in `add_documents`, as `ids` are passed as argument in
`upsert` but via `kwargs` is `add_documents`

This caused exceptions and inconsistency in the DB, tested with
`auto_id=False`

**Fix**: pass `ids` via `kwargs` to `add_documents`
2024-07-02 13:45:30 +00:00
Eugene Yurtsev
d084172b63
community[patch]: root validator set explicit pre=False or pre=True (#23764)
See issue: https://github.com/langchain-ai/langchain/issues/22819
2024-07-02 09:42:05 -04:00
mattthomps1
cc55823486
docs: updated PPLX model (#23723)
Description: updated pplx docs to reference a currently [supported
model](https://docs.perplexity.ai/docs/model-cards). pplx-70b-online
->llama-3-sonar-small-32k-online

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-02 08:48:49 -04:00
Jacob Lee
7791d92711
community[patch]: Fix requests alias for load_tools (#23734)
CC @baskaryan
2024-07-01 15:02:14 -07:00
Eugene Yurtsev
f24e38876a
community[patch]: Update root_validators to use explicit pre=True or pre=False (#23736) 2024-07-01 17:13:23 -04:00
Eugene Yurtsev
5d2262af34
community[patch]: Update root_validators to use pre=True or pre=False (#23731)
Update root_validators in preparation for pydantic 2 migration.
2024-07-01 20:10:15 +00:00
maang-h
96af8f31ae
community[patch]: Invoke callback prior to yielding token (#23638)
- **Description:** Invoke callback prior to yielding token in stream and
astream methods for ChatZhipuAI.
- **Issue:** the issue #16913
2024-07-01 18:12:24 +00:00
Valentin
bf402f902e
community: Fix LanceDB similarity search bug (#23591)
**Description:** LanceDB didn't allow querying the database using
similarity score thresholds because the metrics value was missing. This
PR simply fixes that bug.
**Issue:** not applicable
**Dependencies:** none
**Twitter handle:** not available

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-01 16:33:45 +00:00
Rafael Pereira
4b9517db85
Jira: Allow Jira access using only the token (#23708)
- **Description:** At the moment the Jira wrapper only accepts the the
usage of the Username and Password/Token at the same time. However Jira
allows the connection using only is useful for enterprise context.

Co-authored-by: rpereira <rafael.pereira@criticalsoftware.com>
2024-07-01 13:13:51 +00:00
Tim Van Wassenhove
24916c6703
community: Register pandas df in duckdb when creating vector_store (#23690)
- **Description:** Register pandas df in duckdb when creating
vector_store
- **Issue:** Resolves #23308
- **Dependencies:** None
- **Twitter handle:** @timvw

Co-authored-by: Tim Van Wassenhove <tim.van.wassenhove@telenetgroup.be>
2024-07-01 09:12:06 -04:00
Bagatur
fc8fd49328
openai, anthropic, ...: with_structured_output to pass in explicit tool choice (#23645)
...community, mistralai, groq, fireworks

part of #23644
2024-06-28 16:39:53 -07:00
Bagatur
381aedcc61
docs: standardize azure openai page (#23642)
part of #22296
2024-06-28 15:15:41 -07:00
Vadym Barda
e8d77002ea
core: add RemoveMessage (#23636)
This change adds a new message type `RemoveMessage`. This will enable
`langgraph` users to manually modify graph state (or have the graph
nodes modify the state) to remove messages by `id`

Examples:

* allow users to delete messages from state by calling

```python
graph.update_state(config, values=[RemoveMessage(id=state.values[-1].id)])
```

* allow nodes to delete messages

```python
graph.add_node("delete_messages", lambda state: [RemoveMessage(id=state[-1].id)])
```
2024-06-28 14:40:02 -07:00
ccurme
8fce8c6771
community: fix extended tests (#23640) 2024-06-28 16:35:38 -04:00
Jacob Lee
a032583b17
docs[patch]: Update diagrams (#23613) 2024-06-28 12:36:00 -07:00
j pradhan
5f21eab491
community:perplexity[patch]: standardize init args (#21794)
updated request_timeout default alias value per related docstring.

Related to
[20085](https://github.com/langchain-ai/langchain/issues/20085)

Thank you for contributing to LangChain!

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-06-28 13:26:12 +00:00
mackong
11483b0fb8
community[patch]: set tool name for tongyi&qianfan llm (#22889)
- **Description:** The name of ToolMessage is default to None, which
makes tool message send to LLM likes
 ```json
{"role": "tool",
   "tool_call_id": "",
   "content": "{\"time\": \"12:12\"}",
   "name": null}
```
But the name seems essential for some LLMs like TongYi Qwen. so we need to set the name use agent_action's tool value.
  - **Issue:** N/A
  - **Dependencies:** N/A
2024-06-28 09:17:05 -04:00
Leonid Ganeline
e4caa41aa9
community: docstrings toolkits (#23616)
Added missed docstrings. Formatted docstrings to the consistent form.
2024-06-28 08:40:52 -04:00
ccurme
adf2dc13de
community: fix lint (#23611) 2024-06-27 22:12:16 +00:00
Eugene Yurtsev
68f348357e
community[patch]: Test InMemoryVectorStore with RWAPI test suite (#23603)
Add standard test suite to InMemoryVectorStore implementation.
2024-06-27 16:43:43 -04:00
NG Sai Prasanth
5e6d23f27d
community: Standardise tool import for arxiv & semantic scholar (#23578)
- **Description:** Fixing the way users have to import Arxiv and
Semantic Scholar
- **Issue:** Changed to use `from langchain_community.tools.arxiv import
ArxivQueryRun` instead of `from langchain_community.tools.arxiv.tool
import ArxivQueryRun`
    - **Dependencies:** None
    - **Twitter handle:** Nope
2024-06-27 16:35:50 -04:00
Ayo Ayibiowu
c6f700b7cb
fix(community): allow support for disabling max_tokens args (#21534)
This PR fixes an issue with not able to use unlimited/infinity tokens
from the respective provider for the LiteLLM provider.

This is an issue when working in an agent environment that the token
usage can drastically increase beyond the initial value set causing
unexpected behavior.
2024-06-27 16:28:59 -04:00
mackong
70834cd741
community[patch]: support convert FunctionMessage for Tongyi (#23569)
**Description:** For function call agent with Tongyi, cause the
AgentAction will be converted to FunctionMessage by

47f69fe0d8/libs/core/langchain_core/agents.py (L188)
But now Tongyi's *convert_message_to_dict* doesn't support
FunctionMessage

47f69fe0d8/libs/community/langchain_community/chat_models/tongyi.py (L184-L207)
Then next round conversation will be failed by the *TypeError*
exception.

This patch adds the support to convert FunctionMessage for Tongyi.

**Issue:** N/A
**Dependencies:** N/A
2024-06-27 15:49:26 -04:00
Jacob Lee
60fc15a56b
docs[patch]: Update docs introduction and README (#23558)
CC @hwchase17 @baskaryan
2024-06-27 08:51:43 -07:00
maang-h
5070004e8a
docs: Update Tongyi ChatModel docstring (#23540)
- **Description:** Update Tongyi ChatModel rich docstring
- **Issue:** the issue #22296
2024-06-26 13:07:13 -04:00
yonarw
6d0ebbca1e
community: SAP HANA Vector Engine fix for latest HANA release (#23516)
- **Description:** This PR fixes an issue with SAP HANA Cloud QRC03
version. In that version the number to indicate no length being set for
a vector column changed from -1 to 0. The change in this PR support both
behaviours (old/new).
- **Dependencies:** No dependencies have been introduced.

- **Tests**:  The change is covered by previous unit tests.
2024-06-26 13:15:51 +00:00
Alireza Kashani
c39521b70d
Update grobid.py (#23399)
fixed potential `IndexError: list index out of range` in case there is
no title

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-06-26 09:11:02 -04:00
Qingchuan Hao
ee282a1d2e
community: add missing link (#23526) 2024-06-26 09:06:28 -04:00
ccurme
99ce84ef23
community: release 0.2.6 (#23501) 2024-06-25 21:29:52 +00:00
Nuradil
c93d9e66e4
Community: Update and fix ZenGuardTool docs and add ZenguardTool to init files (#23415)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: update docs and add tool to init.py"

- [x] **PR message**: 
- **Description:** Fixed some errors and comments in the docs and added
our ZenGuardTool and additional classes to init.py for easy access when
importing
- **Question:** when will you update the langchain-community package in
pypi to make our tool available?


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Thank you for review!

---------

Co-authored-by: Baur <baur.krykpayev@gmail.com>
2024-06-25 19:26:32 +00:00
Qingchuan Hao
ad50702934
community: add default value to bing_search_url (#23306)
bing_search_url is an endpoint to requests bing search resource and is
normally invariant to users, we can give it the default value to simply
the uesages of this utility/tool
2024-06-25 08:08:41 -04:00
wenngong
b33d2346db
community: FlashrankRerank support loading customer client (#23350)
Description: FlashrankRerank Document compressor support loading
customer client
Issue: #23338

Co-authored-by: gongwn1 <gongwn1@lenovo.com>
2024-06-24 17:50:08 -04:00
maang-h
f58c40b4e3
docs: Update QianfanChatEndpoint ChatModel docstring (#23337)
- **Description:** Update QianfanChatEndpoint ChatModel rich docstring
- **Issue:** the issue #22296
2024-06-24 17:42:46 -04:00
Rahul Triptahi
9ef93ecd7c
community[minor]: Added classification_location parameter in PebbloSafeLoader. (#22565)
Description: Add classifier_location feature flag. This flag enables
Pebblo to decide the classifier location, local or pebblo-cloud.
Unit Tests: N/A
Documentation: N/A

---------

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-06-24 17:30:38 -04:00
yuncliu
398b2b9c51
community[minor]: Add Ascend NPU optimized Embeddings (#20260)
- **Description:** Add NPU support for embeddings

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-06-24 20:15:11 +00:00
RUO
2b87e330b0
community: fix issue with nested field extraction in MongodbLoader (#22801)
**Description:** 
This PR addresses an issue in the `MongodbLoader` where nested fields
were not being correctly extracted. The loader now correctly handles
nested fields specified in the `field_names` parameter.

**Issue:** 
Fixes an issue where attempting to extract nested fields from MongoDB
documents resulted in `KeyError`.

**Dependencies:** 
No new dependencies are required for this change.

**Twitter handle:** 
(Optional, your Twitter handle if you'd like a mention when the PR is
announced)

### Changes
1. **Field Name Parsing**:
- Added logic to parse nested field names and safely extract their
values from the MongoDB documents.

2. **Projection Construction**:
- Updated the projection dictionary to include nested fields correctly.

3. **Field Extraction**:
- Updated the `aload` method to handle nested field extraction using a
recursive approach to traverse the nested dictionaries.

### Example Usage
Updated usage example to demonstrate how to specify nested fields in the
`field_names` parameter:

```python
loader = MongodbLoader(
    connection_string=MONGO_URI,
    db_name=MONGO_DB,
    collection_name=MONGO_COLLECTION,
    filter_criteria={"data.job.company.industry_name": "IT", "data.job.detail": { "$exists": True }},
    field_names=[
        "data.job.detail.id",
        "data.job.detail.position",
        "data.job.detail.intro",
        "data.job.detail.main_tasks",
        "data.job.detail.requirements",
        "data.job.detail.preferred_points",
        "data.job.detail.benefits",
    ],
)

docs = loader.load()
print(len(docs))
for doc in docs:
    print(doc.page_content)
```
### Testing
Tested with a MongoDB collection containing nested documents to ensure
that the nested fields are correctly extracted and concatenated into a
single page_content string.
### Note
This change ensures backward compatibility for non-nested fields and
improves functionality for nested field extraction.
### Output Sample
```python
print(docs[:3])
```
```shell
# output sample:
[
    Document(
        # Here in this example, page_content is the combined text from the fields below
        # "position", "intro", "main_tasks", "requirements", "preferred_points", "benefits"
        page_content='all combined contents from the requested fields in the document',
        metadata={'database': 'Your Database name', 'collection': 'Your Collection name'}
    ),
    ...
]
```

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-06-24 19:29:11 +00:00
Tomaz Bratanic
aeeda370aa
Sanitize backticks from neo4j labels and types for import (#23367) 2024-06-24 19:05:31 +00:00
Rave Harpaz
f5ff7f178b
Add OCI Generative AI new model support (#22880)
- [x] PR title: 
community: Add OCI Generative AI new model support
 
- [x] PR message:
- Description: adding support for new models offered by OCI Generative
AI services. This is a moderate update of our initial integration PR
16548 and includes a new integration for our chat models under
/langchain_community/chat_models/oci_generative_ai.py
    - Issue: NA
- Dependencies: No new Dependencies, just latest version of our OCI sdk
    - Twitter handle: NA


- [x] Add tests and docs: 
  1. we have updated our unit tests
2. we have updated our documentation including a new ipynb for our new
chat integration


- [x] Lint and test: 
 `make format`, `make lint`, and `make test` run successfully

---------

Co-authored-by: RHARPAZ <RHARPAZ@RHARPAZ-5750.us.oracle.com>
Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
2024-06-24 14:48:23 -04:00
Baur
aa358f2be4
community: Add ZenGuard tool (#22959)
** Description**
This is the community integration of ZenGuard AI - the fastest
guardrails for GenAI applications. ZenGuard AI protects against:

- Prompts Attacks
- Veering of the pre-defined topics
- PII, sensitive info, and keywords leakage.
- Toxicity
- Etc.

**Twitter Handle** : @zenguardai

- [x] **Add tests and docs**: If you're adding a new integration, please
include
  1. Added an integration test
  2. Added colab


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

---------

Co-authored-by: Nuradil <nuradil.maksut@icloud.com>
Co-authored-by: Nuradil <133880216+yaksh0nti@users.noreply.github.com>
2024-06-24 17:40:56 +00:00
Mathis Joffre
60103fc4a5
community: Fix OVHcloud 401 Unauthorized on embedding. (#23260)
They are now rejecting with code 401 calls from users with expired or
invalid tokens (while before they were being considered anonymous).
Thus, the authorization header has to be removed when there is no token.

Related to: #23178

---------

Signed-off-by: Joffref <mariusjoffre@gmail.com>
2024-06-24 12:58:32 -04:00
Leonid Ganeline
987099cfcd
community: toolkits docstrings (#23286)
Added missed docstrings. Formatted docstrings to the consistent form.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-06-22 14:37:52 +00:00
Rahul Triptahi
0cd3f93361
Enhance metadata of sharepointLoader. (#22248)
Description: 2 feature flags added to SharePointLoader in this PR:

1. load_auth: if set to True, adds authorised identities to metadata
2. load_extended_metadata, adds source, owner and full_path to metadata

Unit tests:N/A
Documentation: To be done.

---------

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-06-21 17:03:38 -07:00
Rajendra Kadam
7ee2822ec2
community: Fix TypeError in PebbloRetrievalQA (#23170)
**Description:** 
Fix "`TypeError: 'NoneType' object is not iterable`" when the
auth_context is absent in PebbloRetrievalQA. The auth_context is
optional; hence, PebbloRetrievalQA should work without it, but it throws
an error at the moment. This PR fixes that issue.

**Issue:** NA
**Dependencies:** None
**Unit tests:** NA

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-06-21 17:04:00 -04:00
Iurii Umnov
3b7b933aa2
community[minor]: OpenAPI agent. Add support for PUT, DELETE and PATCH (#22962)
**Description**: Add PUT, DELETE and PATCH tools to tool list for
OpenAPI agent if dangerous requests are allowed.

**Issue**: https://github.com/langchain-ai/langchain/issues/20469
2024-06-21 20:44:23 +00:00
Guangdong Liu
3c42bf8d97
community(patch):Fix PineconeHynridSearchRetriever not having search_kwargs (#21577)
- close #21521
2024-06-21 16:27:52 -04:00
Rahul Triptahi
4bb3d5c488
[community][quick-fix]: changed from blob.path to blob.path.name in 0365BaseLoader. (#22287)
Description: file_metadata_ was not getting propagated to returned
documents. Changed the lookup key to the name of the blob's path.
Changed blob.path key to blob.path.name for metadata_dict key lookup.
Documentation: N/A
Unit tests: N/A

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-06-21 15:51:03 -04:00
mackong
360a70c8a8
core[patch]: fix no current event loop for sql history in async mode (#22933)
- **Description:** When use
RunnableWithMessageHistory/SQLChatMessageHistory in async mode, we'll
get the following error:
```
Error in RootListenersTracer.on_chain_end callback: RuntimeError("There is no current event loop in thread 'asyncio_3'.")
```
which throwed by
ddfbca38df/libs/community/langchain_community/chat_message_histories/sql.py (L259).
and no message history will be add to database.

In this patch, a new _aexit_history function which will'be called in
async mode is added, and in turn aadd_messages will be called.

In this patch, we use `afunc` attribute of a Runnable to check if the
end listener should be run in async mode or not.

  - **Issue:** #22021, #22022 
  - **Dependencies:** N/A
2024-06-21 10:39:47 -04:00
Zheng Robert Jia
a349fce880
docs[minor],community[patch]: Minor tutorial docs improvement, minor import error quick fix. (#22725)
minor changes to module import error handling and minor issues in
tutorial documents.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-06-20 15:36:49 -04:00
Luis Moros
d5be160af0
community[patch]: Fix sql_databse.from_databricks issue when ran from Job (#23224)
**Desscription**: When the ``sql_database.from_databricks`` is executed
from a Workflow Job, the ``context`` object does not have a
"browserHostName" property, resulting in an error. This change manages
the error so the "DATABRICKS_HOST" env variable value is used instead of
stoping the flow

Co-authored-by: lmorosdb <lmorosdb>
2024-06-20 19:34:15 +00:00
maang-h
bc4cd9c5cc
community[patch]: Update root_validators ChatModels: ChatBaichuan, QianfanChatEndpoint, MiniMaxChat, ChatSparkLLM, ChatZhipuAI (#22853)
This PR updates root validators for:

- ChatModels: ChatBaichuan, QianfanChatEndpoint, MiniMaxChat,
ChatSparkLLM, ChatZhipuAI

Issues #22819

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-06-20 16:36:41 +00:00
Leonid Ganeline
51e75cf59d
community: docstrings (#23202)
Added missed docstrings. Format docstrings to the consistent format
(used in the API Reference)
2024-06-20 11:08:13 -04:00
xyd
9b3a025f9c
fix https://github.com/langchain-ai/langchain/issues/23215 (#23216)
fix bug 
The ZhipuAIEmbeddings class is not working.

Co-authored-by: xu yandong <shaonian@acsx1.onexmail.com>
2024-06-20 13:04:50 +00:00
Michał Krassowski
710197e18c
community[patch]: restore compatibility with SQLAlchemy 1.x (#22546)
- **Description:** Restores compatibility with SQLAlchemy 1.4.x that was
broken since #18992 and adds a test run for this version on CI (only for
Python 3.11)
- **Issue:** fixes #19681
- **Dependencies:** None
- **Twitter handle:** `@krassowski_m`

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-06-19 17:58:57 +00:00
Jorge Piedrahita Ortiz
b3e53ffca0
community[patch]: sambanova llm integration improvement (#23137)
- **Description:** sambanova sambaverse integration improvement: removed
input parsing that was changing raw user input, and was making to use
process prompt parameter as true mandatory
2024-06-19 10:30:14 -07:00
Jorge Piedrahita Ortiz
e162893d7f
community[patch]: update sambastudio embeddings (#23133)
Description: update sambastudio embeddings integration, now compatible
with generic endpoints and CoE endpoints
2024-06-19 10:26:56 -07:00
chenxi
505a2e8743
fix: MoonshotChat fails when setting the moonshot_api_key through the OS environment. (#23176)
Close #23174

Co-authored-by: tianming <tianming@bytenew.com>
2024-06-19 16:28:24 +00:00
Eugene Yurtsev
1007a715a5
community[patch]: Prevent unit tests from making network requests (#23180)
* Prevent unit tests from making network requests
2024-06-19 14:56:30 +00:00
ccurme
ca798bc6ea
community: move test to integration tests (#23178)
Tests failing on master with

> FAILED
tests/unit_tests/embeddings/test_ovhcloud.py::test_ovhcloud_embed_documents
- ValueError: Request failed with status code: 401, {"message":"Bad
token; invalid JSON"}
2024-06-19 14:39:48 +00:00
鹿鹿鹿鲨
6b46b5e9ce
community: add **request_kwargs and expect TimeError AsyncHtmlLoader (#23068)
- **Description:** add `**request_kwargs` and expect `TimeError` in
`_fetch` function for AsyncHtmlLoader. This allows you to fill in the
kwargs parameter when using the `load()` method of the `AsyncHtmlLoader`
class.

Co-authored-by: Yucolu <yucolu@tencent.com>
2024-06-18 20:02:46 -07:00
Artem Mukhin
e271f75bee
docs: Fix URL formatting in deprecation warnings (#23075)
**Description**

Updated the URLs in deprecation warning messages. The URLs were
previously written as raw strings and are now formatted to be clickable
HTML links.

Example of a broken link in the current API Reference:
https://api.python.langchain.com/en/latest/chains/langchain.chains.openai_functions.extraction.create_extraction_chain_pydantic.html

<img width="942" alt="Screenshot 2024-06-18 at 13 21 07"
src="https://github.com/langchain-ai/langchain/assets/4854600/a1b1863c-cd03-4af2-a9bc-70375407fb00">
2024-06-18 14:49:58 -04:00
Gabriel Petracca
c6660df58e
community[minor]: Implement Doctran async execution (#22372)
**Description**

The DoctranTextTranslator has an async transform function that was not
implemented because [the doctran
library](https://github.com/psychic-api/doctran) uses a sync version of
the `execute` method.

- I implemented the `DoctranTextTranslator.atransform_documents()`
method using `asyncio.to_thread` to run the function in a separate
thread.
- I updated the example in the Notebook with the new async version.
- The performance improvements can be appreciated when a big document is
divided into multiple chunks.

Relates to:
- Issue #14645: https://github.com/langchain-ai/langchain/issues/14645
- Issue #14437: https://github.com/langchain-ai/langchain/issues/14437
- https://github.com/langchain-ai/langchain/pull/15264

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-06-18 18:17:37 +00:00
nold
226802f0c4
community: add args_schema to SearxSearch (#22954)
This change adds args_schema (pydantic BaseModel) to SearxSearchRun for
correct schema formatting on LLM function calls

Issue: currently using SearxSearchRun with OpenAI function calling
returns the following error "TypeError: SearxSearchRun._run() got an
unexpected keyword argument '__arg1' ".

This happens because the schema sent to the LLM is "input:
'{"__arg1":"foobar"}'" while the method should be called with the
"query" parameter.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-06-18 17:27:39 +00:00
Finlay Macklon
616d06d7fe
community: glob multiple patterns when using DirectoryLoader (#22852)
- **Description:** Updated
*community.langchain_community.document_loaders.directory.py* to enable
the use of multiple glob patterns in the `DirectoryLoader` class. Now,
the glob parameter is of type `list[str] | str` and still defaults to
the same value as before. I updated the docstring of the class to
reflect this, and added a unit test to
*community.tests.unit_tests.document_loaders.test_directory.py* named
`test_directory_loader_glob_multiple`. This test also shows an example
of how to use the new functionality.
- ~~Issue:~~**Discussion Thread:**
https://github.com/langchain-ai/langchain/discussions/18559
- **Dependencies:** None
- **Twitter handle:** N/a

- [x] **Add tests and docs**
    - Added test (described above)
    - Updated class docstring

- [x] **Lint and test**

---------

Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
2024-06-18 09:24:50 -07:00
Takuya Igei
9f791b6ad5
core[patch],community[patch],langchain[patch]: tenacity dependency to version >=8.1.0,<8.4.0 (#22973)
Fix https://github.com/langchain-ai/langchain/issues/22972.

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-06-18 10:34:28 -04:00
Raghav Dixit
55705c0f5e
LanceDB integration update (#22869)
Added : 

- [x] relevance search (w/wo scores)
- [x] maximal marginal search
- [x] image ingestion
- [x] filtering support
- [x] hybrid search w reranking 

make test, lint_diff and format checked.
2024-06-17 20:54:26 -07:00
Chang Liu
62c8a67f56
community: add KafkaChatMessageHistory (#22216)
Add chat history store based on Kafka.

Files added: 
`libs/community/langchain_community/chat_message_histories/kafka.py`
`docs/docs/integrations/memory/kafka_chat_message_history.ipynb`

New issue to be created for future improvement:
1. Async method implementation.
2. Message retrieval based on timestamp.
3. Support for other configs when connecting to cloud hosted Kafka (e.g.
add `api_key` field)
4. Improve unit testing & integration testing.
2024-06-17 20:34:01 -07:00
Lucas Tucker
e25a5966b5
docs: Standardize DocumentLoader docstrings (#22932)
**Standardizing DocumentLoader docstrings (of which there are many)**

This PR addresses issue #22866 and adds docstrings according to the
issue's specified format (in the appendix) for files csv_loader.py and
json_loader.py in langchain_community.document_loaders. In particular,
the following sections have been added to both CSVLoader and JSONLoader:
Setup, Instantiate, Load, Async load, and Lazy load. It may be worth
adding a 'Metadata' section to the JSONLoader docstring to clarify how
we want to extract the JSON metadata (using the `metadata_func`
argument). The files I used to walkthrough the various sections were
`example_2.json` from
[HERE](https://support.oneskyapp.com/hc/en-us/articles/208047697-JSON-sample-files)
and `hw_200.csv` from
[HERE](https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html).

---------

Co-authored-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-06-18 03:26:36 +00:00
Mohammad Mohtashim
60ba02f5db
[Community]: Fixed DDG DuckDuckGoSearchResults Docstring (#22968)
- **Description:** A very small fix in the Docstring of
`DuckDuckGoSearchResults` identified in the following issue.
- **Issue:** #22961

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-06-18 03:16:24 +00:00
Eun Hye Kim
70761af8cf
community: Fix #22975 (Add SSL Verification Option to Requests Class in langchain_community) (#22977)
- **PR title**: "community: Fix #22975 (Add SSL Verification Option to
Requests Class in langchain_community)"
- **PR message**: 
    - **Description:**
- Added an optional verify parameter to the Requests class with a
default value of True.
- Modified the get, post, patch, put, and delete methods to include the
verify parameter.
- Updated the _arequest async context manager to include the verify
parameter.
- Added the verify parameter to the GenericRequestsWrapper class and
passed it to the Requests class.
    - **Issue:** This PR fixes issue #22975.
- **Dependencies:** No additional dependencies are required for this
change.
    - **Twitter handle:** @lunara_x

You can check this change with below code.
```python
from langchain_openai.chat_models import ChatOpenAI
from langchain.requests import RequestsWrapper
from langchain_community.agent_toolkits.openapi import planner
from langchain_community.agent_toolkits.openapi.spec import reduce_openapi_spec

with open("swagger.yaml") as f:
    data = yaml.load(f, Loader=yaml.FullLoader)
swagger_api_spec = reduce_openapi_spec(data)

llm = ChatOpenAI(model='gpt-4o')
swagger_requests_wrapper = RequestsWrapper(verify=False) # modified point
superset_agent = planner.create_openapi_agent(swagger_api_spec, swagger_requests_wrapper, llm, allow_dangerous_requests=True, handle_parsing_errors=True)

superset_agent.run(
    "Tell me the number and types of charts and dashboards available."
)
```

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-06-18 03:12:40 +00:00
Mohammad Mohtashim
bf839676c7
[Community]: FIxed the DocumentDBVectorSearch _similarity_search_without_score (#22970)
- **Description:** The PR #22777 introduced a bug in
`_similarity_search_without_score` which was raising the
`OperationFailure` error. The mistake was syntax error for MongoDB
pipeline which has been corrected now.
    - **Issue:** #22770
2024-06-17 20:08:42 -07:00
Anders Swanson
aacc6198b9
community: OCI GenAI embedding batch size (#22986)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: OCI GenAI embedding batch size"



- [x] **PR message**:
    - **Issue:** #22985 


- [ ] **Add tests and docs**: N/A


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Signed-off-by: Anders Swanson <anders.swanson@oracle.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-06-17 22:06:45 +00:00
Oguz Vuruskaner
dd25d08c06
community[minor]: add tool calling for DeepInfraChat (#22745)
DeepInfra now supports tool calling for supported models.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-06-17 15:21:49 -04:00
maang-h
c6b7db6587
community: Add Baichuan Embeddings batch size (#22942)
- **Support batch size** 
Baichuan updates the document, indicating that up to 16 documents can be
imported at a time

- **Standardized model init arg names**
    - baichuan_api_key -> api_key
    - model_name  -> model
2024-06-17 14:11:04 -04:00
Shubham Pandey
56ac94e014
community[minor]: add ChatSnowflakeCortex chat model (#21490)
**Description:** This PR adds a chat model integration for [Snowflake
Cortex](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions),
which gives an instant access to industry-leading large language models
(LLMs) trained by researchers at companies like Mistral, Reka, Meta, and
Google, including [Snowflake
Arctic](https://www.snowflake.com/en/data-cloud/arctic/), an open
enterprise-grade model developed by Snowflake.

**Dependencies:** Snowflake's
[snowpark](https://pypi.org/project/snowflake-snowpark-python/) library
is required for using this integration.

**Twitter handle:** [@gethouseware](https://twitter.com/gethouseware)

- [x] **Add tests and docs**:
1. integration tests:
`libs/community/tests/integration_tests/chat_models/test_snowflake.py`
2. unit tests:
`libs/community/tests/unit_tests/chat_models/test_snowflake.py`
  3. example notebook: `docs/docs/integrations/chat/snowflake.ipynb`


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2024-06-17 09:47:05 -07:00
Christopher Tee
ada03dd273
community(you): Better support for You.com News API (#22622)
## Description
While `YouRetriever` supports both You.com's Search and News APIs, news
is supported as an afterthought.
More specifically, not all of the News API parameters are exposed for
the user, only those that happen to overlap with the Search API.

This PR:
- improves support for both APIs, exposing the remaining News API
parameters while retaining backward compatibility
- refactor some REST parameter generation logic
- updates the docstring of `YouSearchAPIWrapper`
- add input validation and warnings to ensure parameters are properly
set by user
- 🚨 Breaking: Limit the news results to `k` items

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-06-15 20:05:19 +00:00
maang-h
7a0af56177
docs: update ZhipuAI ChatModel docstring (#22934)
- **Description:** Update ZhipuAI ChatModel rich docstring
- **Issue:** the issue #22296
2024-06-15 09:12:21 -04:00
Bitmonkey
570d45b2a1
Update ollama.py with optional raw setting. (#21486)
Ollama has a raw option now. 

https://github.com/ollama/ollama/blob/main/docs/api.md

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-06-14 17:19:26 -07:00
caiyueliang
9944ad7f5f
community: 'Solve the issue where the _search function in ElasticsearchStore supports passing a query_vector parameter, but the parameter does not take effect. (#21532)
**Issue:**
When using the similarity_search_with_score function in
ElasticsearchStore, I expected to pass in the query_vector that I have
already obtained. I noticed that the _search function does support the
query_vector parameter, but it seems to be ineffective. I am attempting
to resolve this issue.

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
2024-06-14 17:13:11 -07:00
Erick Friis
79a64207f5
community: release 0.2.5 (#22923) 2024-06-14 15:45:07 -07:00
Baskar Gopinath
c4f2bc9540
docs: Fix wrongly referenced class name in confluence.py (#22879)
Fixes #22542

Changed ConfluenceReader to ConfluenceLoader
2024-06-14 14:00:48 -07:00
Philippe PRADOS
b61de9728e
community[minor]: Fix long_context_reorder.py async (#22839)
Implement `async def atransform_documents( self, documents:
Sequence[Document], **kwargs: Any ) -> Sequence[Document]` for
`LongContextReorder`
2024-06-14 13:55:18 -04:00
Eugene Yurtsev
c72bcda4f2
community[major], experimental[patch]: Remove Python REPL from community (#22904)
Remove the REPL from community, and suggest an alternative import from
langchain_experimental.

Fix for this issue:
https://github.com/langchain-ai/langchain/issues/14345

This is not a bug in the code or an actual security risk. The python
REPL itself is behaving as expected.

The PR is done to appease blanket security policies that are just
looking for the presence of exec in the code.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-06-14 17:53:29 +00:00
Eugene Yurtsev
9a877c7adb
community[patch]: SitemapLoader restrict depth of parsing sitemap (CVE-2024-2965) (#22903)
This PR restricts the depth to which the sitemap can be parsed.

Fix for: CVE-2024-2965
2024-06-14 13:04:40 -04:00
Eugene Yurtsev
4a77a3ab19
core[patch]: fix validation of @deprecated decorator (#22513)
This PR moves the validation of the decorator to a better place to avoid
creating bugs while deprecating code.

Prevent issues like this from arising:
https://github.com/langchain-ai/langchain/issues/22510

we should replace with a linter at some point that just does static
analysis
2024-06-14 16:52:30 +00:00