Commit Graph

6605 Commits

Author SHA1 Message Date
Erick Friis
be738aa7de
packages: enable vertex api build (#28773) 2024-12-17 11:31:14 -08:00
Bagatur
ac278cbe8b
core[patch]: export InjectedToolCallId (#28772) 2024-12-17 19:29:20 +00:00
Bagatur
e4d3ccf62f
json mode standard test (#25497)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-17 18:47:34 +00:00
Frank Dai
e81433497b
community: support Confluence cookies (#28760)
**Description**: Some confluence instances don't support personal access
token, then cookie is a convenient way to authenticate. This PR adds
support for Confluence cookies.

**Twitter handle**: soulmachine
2024-12-17 12:16:36 -05:00
ccurme
b745281eec
anthropic[patch]: increase timeouts for integration tests (#28767)
Some tests consistently ran into the 10s limit in CI.
2024-12-17 15:47:17 +00:00
Vinit Kudva
a00258ec12
chroma: fix persistence if client_settings is passed in (#25199)
…ent path given.

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-17 10:03:02 -05:00
Omri Eliyahu Levy
f8883a1321
partners/voyageai: enable setting output dimension (#28740)
Voyage has introduced voyage-3-large and voyage-code-3, which feature
different output dimensions by leveraging a technique called "Matryoshka
Embeddings" (see blog -
https://blog.voyageai.com/2024/12/04/voyage-code-3/).
These two models are available in various sizes: [256, 512, 1024, 2048]
(https://docs.voyageai.com/docs/embeddings#model-choices).

This PR adds the option to set the required output dimension.
2024-12-17 10:02:00 -05:00
German Martin
3a1d05394d
community: Apache AGE wrapper. Ensure Node Uniqueness by ID. (#28759)
**Description:**

The Apache AGE graph integration incorrectly handled node merging,
allowing duplicate nodes with different IDs but the same type and other
properties. Unlike
[Neo4j](cdf6202156/libs/community/langchain_community/graphs/neo4j_graph.py (L47)),
[Memgraph](cdf6202156/libs/community/langchain_community/graphs/memgraph_graph.py (L50)),
[Kuzu](cdf6202156/libs/community/langchain_community/graphs/kuzu_graph.py (L253)),
and
[Gremlin](cdf6202156/libs/community/langchain_community/graphs/gremlin_graph.py (L165)),
it did not use the node ID as the primary identifier for merging.

This inconsistency caused data integrity issues and unexpected behavior
when users expected updates to specific nodes by ID.

**Solution:**
This PR modifies the `node_insert_query` to `MERGE` nodes based on label
and ID *only* and updates properties with `SET`, aligning the behavior
with other graph database integrations. The `_format_properties` method
was also modified to handle id overrides.

**Impact:**

This fix ensures data integrity by preventing duplicate nodes, and
provides a consistent behavior across graph database integrations.
2024-12-17 09:21:59 -05:00
gsa9989
cdf6202156
cosmosdbnosql: Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook (#24424)
* Added Cosmos DB NoSQL Semantic Cache Integration with tests and
jupyter notebook

---------

Co-authored-by: Aayush Kataria <aayushkataria3011@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 21:57:05 -05:00
Brian Burgin
27a9056725
community: Fix ChatLiteLLMRouter runtime issues (#28163)
**Description:** Fix ChatLiteLLMRouter ctor validation and model_name
parameter
**Issue:** #19356, #27455, #28077
**Twitter handle:** @bburgin_0
2024-12-16 18:17:39 -05:00
Mikhail Khludnev
00deacc67e
docs, external: introduce langchain-localai (#28751)
Thank you for contributing to LangChain!

Referring to https://github.com/mkhludnev/langchain-localai

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 22:22:37 +00:00
Erick Friis
d4b5e7ef22
community: recommend RedisVectorStore over Redis (#28749) 2024-12-16 21:08:30 +00:00
Hiros
8f5e72de05
community: Correctly handle multi-element rich text (#25762)
**Description:**

- Add _concatenate_rich_text method to combine all elements in rich text
arrays
- Update load_page method to use _concatenate_rich_text for rich text
properties
- Ensure all text content is captured, including inline code and
formatted text
- Add unit tests to verify correct handling of multi-element rich text
This fix prevents truncation of content after backticks or other
formatting elements.

 **Issue:**

Using Notion DB Loader, the text for `richtext` and `title` is truncated
after 1st element was loaded as Notion Loader only read the first
element.

**Dependencies:** any dependencies required for this change
None.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 20:20:27 +00:00
Antonio Lanza
b2102b8cc4
text-splitters: Inconsistent results with NLTKTextSplitter's add_start_index=True (#27782)
This PR closes #27781

# Problem
The current implementation of `NLTKTextSplitter` is using
`sent_tokenize`. However, this `sent_tokenize` doesn't handle chars
between 2 tokenized sentences... hence, this behavior throws errors when
we are using `add_start_index=True`, as described in issue #27781. In
particular:
```python
from nltk.tokenize import sent_tokenize

output1 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output1)
output2 = sent_tokenize("Innovation drives our success.        Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output2)
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
```

# Solution
With this new `use_span_tokenize` parameter, we can use NLTK to create
sentences (with `span_tokenize`), but also add extra chars to be sure
that we still can map the chunks to the original text.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-12-16 19:53:15 +00:00
Tari Yekorogha
d262d41cc0
community: added FalkorDB vector store support i.e implementation, test, docs an… (#26245)
**Description:** Added support for FalkorDB Vector Store, including its
implementation, unit tests, documentation, and an example notebook. The
FalkorDB integration allows users to efficiently manage and query
embeddings in a vector database, with relevance scoring and maximal
marginal relevance search. The following components were implemented:

- Core implementation for FalkorDBVector store.
- Unit tests ensuring proper functionality and edge case coverage.
- Example notebook demonstrating an end-to-end setup, search, and
retrieval using FalkorDB.

**Twitter handle:** @tariyekorogha

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 19:37:55 +00:00
Aaron Pham
12fced13f4
chore(community): update to OpenLLM 0.6 (#24609)
Update to OpenLLM 0.6, which we decides to make use of OpenLLM's
OpenAI-compatible endpoint. Thus, OpenLLM will now just become a thin
wrapper around OpenAI wrapper.

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

---------

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-16 14:30:07 -05:00
Lvlvko
5c17a4ace9
community: support Hunyuan Embedding (#23160)
## description

- I refactor `Chathunyuan` using tencentcloud sdk because I found the
original one can't work in my application
- I add `HunyuanEmbeddings` using tencentcloud sdk
- Both of them are extend the basic class of langchain. I have fully
tested them in my application

## Dependencies
- tencentcloud-sdk-python

---------

Co-authored-by: centonhuang <centonhuang@tencent.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 19:27:19 +00:00
Harrison Chase
de7996c2ca
core: add kwargs support to VectorStore (#25934)
has been missing the passthrough until now

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 18:57:57 +00:00
Lorenzo
b79a1156ed
community: correct return type of get_files_from_directory in github tool (#27885)
### About:
- **Description:** the _get_files_from_directory_ method return a
string, but it's used in other methods that expect a List[str]
- **Issue:** None
- **Dependencies:** None

This pull request import a new method _list_files_ with the old logic of
_get_files_from_directory_, but it return a List[str] at the end.
The behavior of _ get_files_from_directory_ is not changed.
2024-12-16 10:30:33 -08:00
Sheepsta300
580a8d53f9
community: Add configurable VisualFeatures to the AzureAiServicesImageAnalysisTool (#27444)
Thank you for contributing to LangChain!

- [ ] **PR title**: community: Add configurable `VisualFeatures` to the
`AzureAiServicesImageAnalysisTool`


- [ ] **PR message**:  
- **Description:** The `AzureAiServicesImageAnalysisTool` is a good
service and utilises the Azure AI Vision package under the hood.
However, since the creation of this tool, new `VisualFeatures` have been
added to allow the user to request other image specific information to
be returned. Currently, the tool offers neither configuration of which
features should be return nor does it offer any newer feature types. The
aim of this PR is to address this and expose more of the Azure Service
in this integration.
- **Dependencies:** no new dependencies in the main class file,
azure.ai.vision.imageanalysis added to extra test dependencies file.


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. Although no tests exist for already implemented Azure Service tools,
I've created 3 unit tests for this class that test initialisation and
credentials, local file analysis and a test for the new changes/
features option.


- [ ] **Lint and test**: All linting has passed.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 18:30:04 +00:00
Erick Friis
1c120e9615
core: xml output parser tags docstring (#28745) 2024-12-16 18:25:16 +00:00
Ana
ebab2ea81b
Fix Azure National Cloud authentication using token (RBAC) (Generated by Ana - AI SDE) (#25843)
This pull request addresses the issue with authenticating Azure National
Cloud using token (RBAC) in the AzureSearch vectorstore implementation.

## Changes

- Modified the `_get_search_client` method in `azuresearch.py` to pass
`additional_search_client_options` to the `SearchIndexClient` instance.

## Implementation Details

The patch updates the `SearchIndexClient` initialization to include the
`additional_search_client_options` parameter:

```python
index_client: SearchIndexClient = SearchIndexClient(
    endpoint=endpoint,
    credential=credential,
    user_agent=user_agent,
    **additional_search_client_options
)
```

This change allows the `audience` parameter to be correctly passed when
using Azure National Cloud, fixing the authentication issues with
GovCloud & RBAC.

This patch was generated by [Ana - AI SDE](https://openana.ai/), an
AI-powered software development assistant.

This is a fix for [Issue
25823](https://github.com/langchain-ai/langchain/issues/25823)

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-16 18:22:24 +00:00
chenzimin
169d419581
community: Remove all other keys in ChatLiteLLM and add api_key (#28097)
Thank you for contributing to LangChain!

- **PR title**: "community: Remove all other keys in ChatLiteLLM and add
api_key"


- **PR message**: Currently, no api_key are passed to LiteLLM, and
LiteLLM only takes on api_key parameter. Therefore I removed all current
`*_api_key` attributes (They are not used), and added `api_key` that is
passed to ChatLiteLLM.
  - Should fix issue #27826

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 17:54:29 +00:00
German Martin
d5d18c62b3
community: Apache AGE wrapper additional edge cases. (#28151)
Description: 
Current AGEGraph() implementation does some custom wrapping for graph
queries. The method here is _wrap_query() as it parse the field from the
original query to add some SQL context to it.
This improves the current parsing logic to cover additional edge cases
that are added to the test coverage, basically if any Node property name
or value has the "return" literal in it will break the graph / SQL
query.
We discovered this while dealing with real world datasets, is not an
uncommon scenario and I think it needs to be covered.
2024-12-16 11:28:01 -05:00
Rock2z
768e4a7fd4
[community][fix] Compatibility support to bump up wikibase-rest-api-client version (#27316)
**Description:**

This PR addresses the `TypeError: sequence item 0: expected str
instance, FluentValue found` error when invoking `WikidataQueryRun`. The
root cause was an incompatible version of the
`wikibase-rest-api-client`, which caused the tool to fail when handling
`FluentValue` objects instead of strings.

The current implementation only supports `wikibase-rest-api-client<0.2`,
but the latest version is `0.2.1`, where the current implementation
breaks. Additionally, the error message advises users to install the
latest version: [code
reference](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/wikidata.py#L125C25-L125C32).
Therefore, this PR updates the tool to support the latest version of
`wikibase-rest-api-client`.

Key changes:
- Updated the handling of `FluentValue` objects to ensure compatibility
with the latest `wikibase-rest-api-client`.
- Removed the restriction to `wikibase-rest-api-client<0.2` and updated
to support the latest version (`0.2.1`).

**Issue:**

Fixes [#24093](https://github.com/langchain-ai/langchain/issues/24093) –
`TypeError: sequence item 0: expected str instance, FluentValue found`.

**Dependencies:**

- Upgraded `wikibase-rest-api-client` to the latest version to resolve
the issue.

---------

Co-authored-by: peiwen_zhang <peiwen_zhang@email.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 16:22:18 +00:00
André Quintino
a26c786bc5
community: refactor opensearch query constructor to use wildcard instead of match in the contain comparator (#26653)
- **Description:** Changed the comparator to use a wildcard query
instead of match. This modification allows for partial text matching on
analyzed fields, which improves the flexibility of the search by
performing full-text searches that aren't limited to exact matches.
- **Issue:** The previous implementation used a match query, which
performs exact matches on analyzed fields. This approach limited the
search capabilities by requiring the query terms to align with the
indexed text. The modification to use a wildcard query instead addresses
this limitation. The wildcard query allows for partial text matching,
which means the search can return results even if only a portion of the
term matches the text. This makes the search more flexible and suitable
for use cases where exact matches aren't necessary or expected, enabling
broader full-text searches across analyzed fields.
In short, the problem was that match queries were too restrictive, and
the change to wildcard queries enhances the ability to perform partial
matches.
- **Dependencies:** none
- **Twitter handle:** @Andre_Q_Pereira

---------

Co-authored-by: André Quintino <andre.quintino@tui.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 11:16:34 -05:00
Davi Schumacher
0f9b4bf244
community[patch]: update dynamodb chat history to update instead of overwrite (#22397)
**Description:**
The current implementation of `DynamoDBChatMessageHistory` updates the
`History` attribute for a given chat history record by first extracting
the existing contents into memory, appending the new message, and then
using the `put_item` method to put the record back. This has the effect
of overwriting any additional attributes someone may want to include in
the record, like chat session metadata.

This PR suggests changing from using `put_item` to using `update_item`
instead which will keep any other attributes in the record untouched.
The change is backward compatible since
1. `update_item` is an "upsert" operation, creating the record if it
doesn't already exist, otherwise updating it
2. It only touches the db insert call and passes the exact same
information. The rest of the class is left untouched

**Dependencies:**
None

**Tests and docs:**
No unit tests currently exist for the `DynamoDBChatMessageHistory`
class. This PR adds the file
`libs/community/tests/unit_tests/chat_message_histories/test_dynamodb_chat_message_history.py`
to test the `add_message` and `clear` methods. I wanted to use the moto
library to mock DynamoDB calls but I could not get poetry to resolve it
so I mocked those calls myself in the test. Therefore, no test
dependencies were added.

The change was tested on a test DynamoDB table as well. The first three
images below show the current behavior. First a message is added to chat
history, then a value is inserted in the record in some other attribute,
and finally another message is added to the record, destroying the other
attribute.

![using_put_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/426acd62-fe29-42f4-b75f-863fb8b3fb21)

![using_put_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/f8a1c864-7114-4fe3-b487-d6f9252f8f92)

![using_put_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/8b691e08-755e-4877-8969-0e9769e5d28a)

The next three images show the new behavior. Once again a value is added
to an attribute other than the History attribute, but now when the
followup message is added it does not destroy that other attribute. The
History attribute itself is unaffected by this change.

![using_update_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/3e0d76ed-637e-41cd-82c7-01a86c468634)

![using_update_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/52585f9b-71a2-43f0-9dfc-9935aa59c729)

![using_update_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/f94c8147-2d6f-407a-9a0f-86b94341abff)

The doc located at `docs/docs/integrations/memory/aws_dynamodb.ipynb`
required no changes and was tested as well.
2024-12-16 10:38:00 -05:00
Christophe Bornet
6ddd5dbb1e
community: Add FewShotSQLTool (#28232)
The `FewShotSQLTool` gets some SQL query examples from a
`BaseExampleSelector` for a given question.
This is useful to provide [few-shot
examples](https://python.langchain.com/docs/how_to/sql_prompting/#few-shot-examples)
capability to an SQL agent.

Example usage:
```python
from langchain.agents.agent_toolkits.sql.prompt import SQL_PREFIX

embeddings = OpenAIEmbeddings()

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    embeddings,
    AstraDB,
    k=5,
    input_keys=["input"],
    collection_name="lc_few_shots",
    token=ASTRA_DB_APPLICATION_TOKEN,
    api_endpoint=ASTRA_DB_API_ENDPOINT,
)

few_shot_sql_tool = FewShotSQLTool(
    example_selector=example_selector,
    description="Input to this tool is the input question, output is a few SQL query examples related to the input question. Always use this tool before checking the query with sql_db_query_checker!"
)

agent = create_sql_agent(
    llm=llm, 
    db=db, 
    prefix=SQL_PREFIX + "\nYou MUST get some example queries before creating the query.", 
    extra_tools=[few_shot_sql_tool]
)

result = agent.invoke({"input": "How many artists are there?"})
```

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 15:37:21 +00:00
Mohammad Mohtashim
8d746086ab
Added bind_tools support for ChatMLX along with small fix in _stream (#28743)
- **Description:** Added Support for `bind_tool` as requested in the
issue. Plus two issue in `_stream` were fixed:
    - Corrected the Positional Argument Passing for `generate_step`
    - Accountability if `token` returned by `generate_step` is integer.
- **Issue:** #28692
2024-12-16 09:52:49 -05:00
Jorge Piedrahita Ortiz
558b65ea32
community: SamabaStudio Tool Calling and Structured Output (#28025)
Description: Add tool calling and structured output support for
SambaStudio chat models, docs included

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 06:15:19 +00:00
clairebehue
fb44e74ca4
community: fix AzureSearch Oauth with azure_ad_access_token (#26995)
**Description:** 
AzureSearch vector store: create a wrapper class on
`azure.core.credentials.TokenCredential` (which is not-instantiable) to
fix Oauth usage with `azure_ad_access_token` argument

**Issue:** [the issue it
fixes](https://github.com/langchain-ai/langchain/issues/26216)

 **Dependencies:** None

- [x] **Lint and test**

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 05:56:45 +00:00
SirSmokeAlot
29305cd948
community: O365Toolkit - send_event - fixed timezone error (#25876)
**Description**: Fixed formatting start and end time
**Issue**: The old formatting resulted everytime in an timezone error
**Dependencies**: /
**Twitter handle**: /

---------

Co-authored-by: Yannick Opitz <yannick.opitz@gob.de>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 05:32:28 +00:00
Erick Friis
4f6ccb7080
text-splitters: extended-tests without socket (#28736) 2024-12-16 05:19:50 +00:00
Erick Friis
8ec1c72e03
text-splitters: test without socket (#28732) 2024-12-15 22:10:35 +00:00
Aayush Kataria
d417e4b372
Community: Azure CosmosDB No Sql Vector Store: Full Text and Hybrid Search Support (#28716)
Thank you for contributing to LangChain!

- Added [full
text](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/full-text-search)
and [hybrid
search](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search)
support for Azure CosmosDB NoSql Vector Store
- Added a new enum called CosmosDBQueryType which supports the following
values:
    - VECTOR = "vector"
    - FULL_TEXT_SEARCH = "full_text_search"
    - FULL_TEXT_RANK = "full_text_rank"
    - HYBRID = "hybrid"
- User now needs to provide this query_type to the similarity_search
method for the vectorStore to make the correct query api call.
- Added a couple of work arounds as for the FULL_TEXT_RANK and HYBRID
query functions we don't support parameterized queries right now. I have
added TODO's in place, and will remove these work arounds by end of
January.
- Added necessary test cases and updated the 


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-12-15 13:26:32 -08:00
Mohammad Mohtashim
4c1871d9a8
community: Passing the model_kwargs correctly while maintaing backward compatability (#28439)
- **Description:** `Model_Kwargs` was not being passed correctly to
`sentence_transformers.SentenceTransformer` which has been corrected
while maintaing backward compatability
- **Issue:** #28436

---------

Co-authored-by: MoosaTae <sadhis.tae@gmail.com>
Co-authored-by: Sadit Wongprayon <101176694+MoosaTae@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-15 20:34:29 +00:00
nhols
a3851cb3bc
community: FAISS vectorstore - consistent Document id field (#28728)
make sure id field of Documents in `FAISS` docstore have the same id as
values in `index_to_docstore_id`, implement `get_by_ids` method
2024-12-15 12:23:49 -08:00
Bagatur
a0534ae62a
community[patch]: Release 0.3.12 (#28725) 2024-12-14 22:13:20 +00:00
Bagatur
089e659e03
langchain[patch]: Release 0.3.12 (#28724) 2024-12-14 20:02:18 +00:00
Bagatur
679e3a9970
text-splitters[patch]: Release 0.3.3 (#28723) 2024-12-14 19:20:22 +00:00
Erick Friis
387284c259
core: release 0.3.25 (#28718) 2024-12-14 02:22:28 +00:00
Nawaf Alharbi
decd77c515
community: fix an issue with deepinfra integration (#28715)
Thank you for contributing to LangChain!

- [x] **PR title**: langchain: add URL parameter to ChatDeepInfra class

- [x] **PR message**: add URL parameter to ChatDeepInfra class
- **Description:** This PR introduces a url parameter to the
ChatDeepInfra class in LangChain, allowing users to specify a custom
URL. Previously, the URL for the DeepInfra API was hardcoded to
"https://stage.api.deepinfra.com/v1/openai/chat/completions", which
caused issues when the staging endpoint was not functional. The _url
method was updated to return the value from the url parameter, enabling
greater flexibility and addressing the problem. out!

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-14 02:15:29 +00:00
Ben Chambers
008efada2c
[community]: Render documents to graphviz (#24830)
- **Description:** Adds a helper that renders documents with the
GraphVectorStore metadata fields to Graphviz for visualization. This is
helpful for understanding and debugging.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-14 02:02:09 +00:00
Erick Friis
288f204758
docs, community: aerospike docs update (#28717)
Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com>
Co-authored-by: Jesse S <jschmidt@aerospike.com>
Co-authored-by: dylan <dwelch@aerospike.com>
2024-12-14 00:27:37 +00:00
Vimpas
337fed80a5
community: 🐛 PDF Filter Type Error (#27154)
Thank you for contributing to LangChain!

 **PR title**: "community: fix  PDF Filter Type Error"


  - **Description:** fix  PDF Filter Type Error"
  - **Issue:** the issue #27153 it fixes,
  - **Dependencies:** no
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!



- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 23:30:29 +00:00
Ryan Parker
12111cb922
community: fallback on core async atransform_documents method for MarkdownifyTransformer (#27866)
# Description
Implements the `atransform_documents` method for
`MarkdownifyTransformer` using the `asyncio` built-in library for
concurrency.

Note that this is mainly for API completeness when working with async
frameworks rather than for performance, since the `markdownify` function
is not I/O bound because it works with `Document` objects already in
memory.

# Issue
Fixes #27865

# Dependencies
No new dependencies added, but
[`markdownify`](https://github.com/matthewwithanm/python-markdownify) is
required since this PR updates the `markdownify` integration.

# Tests and docs
- Tests added
- I did not modify the docstrings since they already described the basic
functionality, and [the API docs also already included a
description](https://python.langchain.com/api_reference/community/document_transformers/langchain_community.document_transformers.markdownify.MarkdownifyTransformer.html#langchain_community.document_transformers.markdownify.MarkdownifyTransformer.atransform_documents).
If it would be helpful, I would be happy to update the docstrings and/or
the API docs.

# Lint and test
- [x] format
- [x] lint
- [x] test

I ran formatting with `make format`, linting with `make lint`, and
confirmed that tests pass using `make test`. Note that some unit tests
pass in CI but may fail when running `make_test`. Those unit tests are:
- `test_extract_html` (and `test_extract_html_async`)
- `test_strip_tags` (and `test_strip_tags_async`)
- `test_convert_tags` (and `test_convert_tags_async`)

The reason for the difference is that there are trailing spaces when the
tests are run in the CI checks, and no trailing spaces when run with
`make test`. I ensured that the tests pass in CI, but they may fail with
`make test` due to the addition of trailing spaces.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 22:32:22 +00:00
Manuel
af2e0a7ede
partners: add 'model' alias for consistency in embedding classes (#28374)
**Description:** This PR introduces a `model` alias for the embedding
classes that contain the attribute `model_name`, to ensure consistency
across the codebase, as suggested by a moderator in a previous PR. The
change aligns the usage of attribute names across the project (see for
example
[here](65deeddd5d/libs/partners/groq/langchain_groq/chat_models.py (L304))).
**Issue:** This PR addresses the suggestion from the review of issue
#28269.
**Dependencies:**  None

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 22:30:00 +00:00
Erick Friis
3107d78517
huggingface: fix standard test lint (#28714) 2024-12-13 22:18:54 +00:00
Kaiwei Zhang
b909d54e70
chroma[patch]: Update logic for assigning ids 2024-12-13 21:58:34 +00:00
Karthik Bharadhwaj
498f0249e2
community[minor]: Opensearch hybridsearch implementation (#25375)
community: add hybrid search in opensearch

# Langchain OpenSearch Hybrid Search Implementation

## Implementation of Hybrid Search: 

I have taken LangChain's OpenSearch integration to the next level by
adding hybrid search capabilities. Building on the existing
OpenSearchVectorSearch class, I have implemented Hybrid Search
functionality (which combines the best of both keyword and semantic
search). This new functionality allows users to harness the power of
OpenSearch's advanced hybrid search features without leaving the
familiar LangChain ecosystem. By blending traditional text matching with
vector-based similarity, the enhanced class delivers more accurate and
contextually relevant results. It's designed to seamlessly fit into
existing LangChain workflows, making it easy for developers to upgrade
their search capabilities.

In implementing the hybrid search for OpenSearch within the LangChain
framework, I also incorporated filtering capabilities. It's important to
note that according to the OpenSearch hybrid search documentation, only
post-filtering is supported for hybrid queries. This means that the
filtering is applied after the hybrid search results are obtained,
rather than during the initial search process.

**Note:** For the implementation of hybrid search, I strictly followed
the official OpenSearch Hybrid search documentation and I took
inspiration from
https://github.com/AndreasThinks/langchain/tree/feature/opensearch_hybrid_search
Thanks Mate!  

### Experiments

I conducted few experiments to verify that the hybrid search
implementation is accurate and capable of reproducing the results of
both plain keyword search and vector search.

Experiment - 1
Hybrid Search
Keyword_weight: 1, vector_weight: 0

I conducted an experiment to verify the accuracy of my hybrid search
implementation by comparing it to a plain keyword search. For this test,
I set the keyword_weight to 1 and the vector_weight to 0 in the hybrid
search, effectively giving full weightage to the keyword component. The
results from this hybrid search configuration matched those of a plain
keyword search, confirming that my implementation can accurately
reproduce keyword-only search results when needed. It's important to
note that while the results were the same, the scores differed between
the two methods. This difference is expected because the plain keyword
search in OpenSearch uses the BM25 algorithm for scoring, whereas the
hybrid search still performs both keyword and vector searches before
normalizing the scores, even when the vector component is given zero
weight. This experiment validates that my hybrid search solution
correctly handles the keyword search component and properly applies the
weighting system, demonstrating its accuracy and flexibility in
emulating different search scenarios.


Experiment - 2
Hybrid Search
keyword_weight = 0.0, vector_weight = 1.0

For experiment-2, I took the inverse approach to further validate my
hybrid search implementation. I set the keyword_weight to 0 and the
vector_weight to 1, effectively giving full weightage to the vector
search component (KNN search). I then compared these results with a pure
vector search. The outcome was consistent with my expectations: the
results from the hybrid search with these settings exactly matched those
from a standalone vector search. This confirms that my implementation
accurately reproduces vector search results when configured to do so. As
with the first experiment, I observed that while the results were
identical, the scores differed between the two methods. This difference
in scoring is expected and can be attributed to the normalization
process in hybrid search, which still considers both components even
when one is given zero weight. This experiment further validates the
accuracy and flexibility of my hybrid search solution, demonstrating its
ability to effectively emulate pure vector search when needed while
maintaining the underlying hybrid search structure.



Experiment - 3
Hybrid Search - balanced

keyword_weight = 0.5, vector_weight = 0.5

For experiment-3, I adopted a balanced approach to further evaluate the
effectiveness of my hybrid search implementation. In this test, I set
both the keyword_weight and vector_weight to 0.5, giving equal
importance to keyword-based and vector-based search components. This
configuration aims to leverage the strengths of both search methods
simultaneously. By setting both weights to 0.5, I intended to create a
scenario where the hybrid search would consider lexical matches and
semantic similarity equally. This balanced approach is often ideal for
many real-world applications, as it can capture both exact keyword
matches and contextually relevant results that might not contain the
exact search terms.

Kindly verify the notebook for the experiments conducted!  

**Notebook:**
https://github.com/karthikbharadhwajKB/Langchain_OpenSearch_Hybrid_search/blob/main/Opensearch_Hybridsearch.ipynb

### Instructions to follow for Performing Hybrid Search:

**Step-1: Instantiating OpenSearchVectorSearch Class:**
```python
opensearch_vectorstore = OpenSearchVectorSearch(
    index_name=os.getenv("INDEX_NAME"),
    embedding_function=embedding_model,
    opensearch_url=os.getenv("OPENSEARCH_URL"),
    http_auth=(os.getenv("OPENSEARCH_USERNAME"),os.getenv("OPENSEARCH_PASSWORD")),
    use_ssl=False,
    verify_certs=False,
    ssl_assert_hostname=False,
    ssl_show_warn=False
)
```

**Parameters:**
1. **index_name:** The name of the OpenSearch index to use.
2. **embedding_function:** The function or model used to generate
embeddings for the documents. It's assumed that embedding_model is
defined elsewhere in the code.
3. **opensearch_url:** The URL of the OpenSearch instance.
4. **http_auth:** A tuple containing the username and password for
authentication.
5. **use_ssl:** Set to False, indicating that the connection to
OpenSearch is not using SSL/TLS encryption.
6. **verify_certs:** Set to False, which means the SSL certificates are
not being verified. This is often used in development environments but
is not recommended for production.
7. **ssl_assert_hostname:** Set to False, disabling hostname
verification in SSL certificates.
8. **ssl_show_warn:** Set to False, suppressing SSL-related warnings.

**Step-2: Configure Search Pipeline:**

To initiate hybrid search functionality, you need to configures a search
pipeline first.

**Implementation Details:**

This method configures a search pipeline in OpenSearch that:
1. Normalizes the scores from both keyword and vector searches using the
min-max technique.
2. Applies the specified weights to the normalized scores.
3. Calculates the final score using an arithmetic mean of the weighted,
normalized scores.


**Parameters:**

* **pipeline_name (str):** A unique identifier for the search pipeline.
It's recommended to use a descriptive name that indicates the weights
used for keyword and vector searches.
* **keyword_weight (float):** The weight assigned to the keyword search
component. This should be a float value between 0 and 1. In this
example, 0.3 gives 30% importance to traditional text matching.
* **vector_weight (float):** The weight assigned to the vector search
component. This should be a float value between 0 and 1. In this
example, 0.7 gives 70% importance to semantic similarity.

```python
opensearch_vectorstore.configure_search_pipelines(
    pipeline_name="search_pipeline_keyword_0.3_vector_0.7",
    keyword_weight=0.3,
    vector_weight=0.7,
)
```

**Step-3: Performing Hybrid Search:**

After creating the search pipeline, you can perform a hybrid search
using the `similarity_search()` method (or) any methods that are
supported by `langchain`. This method combines both `keyword-based and
semantic similarity` searches on your OpenSearch index, leveraging the
strengths of both traditional information retrieval and vector embedding
techniques.

**parameters:**
* **query:** The search query string.
* **k:** The number of top results to return (in this case, 3).
* **search_type:** Set to `hybrid_search` to use both keyword and vector
search capabilities.
* **search_pipeline:** The name of the previously created search
pipeline.

```python
query = "what are the country named in our database?"

top_k = 3

pipeline_name = "search_pipeline_keyword_0.3_vector_0.7"

matched_docs = opensearch_vectorstore.similarity_search_with_score(
                query=query,
                k=top_k,
                search_type="hybrid_search",
                search_pipeline = pipeline_name
            )

matched_docs
```

twitter handle: @iamkarthik98

---------

Co-authored-by: Karthik Kolluri <karthik.kolluri@eidosmedia.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 16:34:12 -05:00
Philippe PRADOS
f3fb5a9c68
community[minor]: Fix json._validate_metadata_func() (#22842)
JSONparse, in _validate_metadata_func(), checks the consistency of the
_metadata_func() function. To do this, it invokes it and makes sure it
receives a dictionary in response. However, during the call, it does not
respect future calls, as shown on line 100. This generates errors if,
for example, the function is like this:
```python
        def generate_metadata(json_node:Dict[str,Any],kwargs:Dict[str,Any]) -> Dict[str,Any]:
             return {
                "source": url,
                "row": kwargs['seq_num'],
                "question":json_node.get("question"),
            }
        loader = JSONLoader(
            file_path=file_path,
            content_key="answer",
            jq_schema='.[]',
            metadata_func=generate_metadata,
            text_content=False)
```
To avoid this, the verification must comply with the specifications.
This patch does just that.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 21:24:20 +00:00
Keiichi Hirobe
67fd554512
core[patch]: throw exception indexing code if deletion fails in vectorstore (#28103)
The delete methods in the VectorStore and DocumentIndex interfaces
return a status indicating the result. Therefore, we can assume that
their implementations don't throw exceptions but instead return a result
indicating whether the delete operations have failed. The current
implementation doesn't check the returned value, so I modified it to
throw an exception when the operation fails.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 16:14:27 -05:00
Keiichi Hirobe
258b3be5ec
core[minor]: add new clean up strategy "scoped_full" to indexing (#28505)
~Note that this PR is now Draft, so I didn't add change to `aindex`
function and didn't add test codes for my change.
After we have an agreement on the direction, I will add commits.~

`batch_size` is very difficult to decide because setting a large number
like >10000 will impact VectorDB and RecordManager, while setting a
small number will delete records unnecessarily, leading to redundant
work, as the `IMPORTANT` section says.
On the other hand, we can't use `full` because the loader returns just a
subset of the dataset in our use case.

I guess many people are in the same situation as us.

So, as one of the possible solutions for it, I would like to introduce a
new argument, `scoped_full_cleanup`.
This argument will be valid only when `claneup` is Full. If True, Full
cleanup deletes all documents that haven't been updated AND that are
associated with source ids that were seen during indexing. Default is
False.

This change keeps backward compatibility.

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 20:35:25 +00:00
Eugene Yurtsev
ce90b25313
core[patch]: Update error message in indexing code for unreachable code assertion (#28712)
Minor update for error message that should never be triggered
2024-12-13 20:21:14 +00:00
Keiichi Hirobe
da28cf1f54
core[patch]: Reverts PR #25754 and add unit tests (#28702)
I reported the bug 2 weeks ago here:
https://github.com/langchain-ai/langchain/issues/28447

I believe this is a critical bug for the indexer, so I submitted a PR to
revert the change and added unit tests to prevent similar bugs from
being introduced in the future.

@eyurtsev Could you check this?
2024-12-13 15:13:06 -05:00
ScriptShi
b0a298894d
community[minor]: Add TablestoreVectorStore (#25767)
Thank you for contributing to LangChain!

- [x] **PR title**:  community: add TablestoreVectorStore



- [x] **PR message**: 
    - **Description:** add TablestoreVectorStore
    - **Dependencies:** none


- [x] **Add tests and docs**: If you're adding a new integration, please
include
  1. a test for the integration: yes
  2. an example notebook showing its use: yes

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-12-13 11:17:28 -08:00
Erick Friis
86b3c6e81c
community: make old stub for QuerySQLDataBaseTool private to skip api ref (#28711) 2024-12-13 10:43:23 -08:00
Martin Triska
05ebe1e66b
Community: add modified_since argument to O365BaseLoader (#28708)
## What are we doing in this PR
We're adding `modified_since` optional argument to `O365BaseLoader`.
When set, O365 loader will only load documents newer than
`modified_since` datetime.

## Why?
OneDrives / Sharepoints can contain large number of documents. Current
approach is to download and parse all files and let indexer to deal with
duplicates. This can be prohibitively time-consuming. Especially when
using OCR-based parser like
[zerox](fa06188834/libs/community/langchain_community/document_loaders/pdf.py (L948)).
This argument allows to skip documents that are older than known time of
indexing.

_Q: What if a file was modfied during last indexing process?
A: Users can set the `modified_since` conservatively and indexer will
still take care of duplicates._




If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 17:30:17 +00:00
Bagatur
fa06188834
community[patch]: fix QuerySQLDatabaseTool name (#28659)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-12 19:16:03 -08:00
Erick Friis
48ab91b520
docs: more useful vercel warnings (#28699) 2024-12-13 03:07:24 +00:00
Michael Chin
28cb2cefc6
docs: Fix stack diagram in community README (#28685)
- **Description:** The stack diagram illustration in the community
README fails to render due to an invalid branch reference. This PR
replaces the broken image link with a valid one referencing master
branch.
2024-12-12 13:33:50 -08:00
Botong Zhu
13c3c4a210
community: fixes json loader not getting texts with json standard (#27327)
This PR fixes JSONLoader._get_text not converting objects to json string
correctly.
If an object is serializable and is not a dict, JSONLoader will use
python built-in str() method to convert it to string. This may cause
object converted to strings not following json standard. For example, a
list will be converted to string with single quotes, and if json.loads
try to load this string, it will cause error.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 19:33:45 +00:00
Lorenzo
4149c0dd8d
community: add method to create branch and list files for gitlab tool (#27883)
### About

- **Description:** In the Gitlab utilities used for the Gitlab tool
there are no methods to create branches, list branches and files, as
this is already done for Github
- **Issue:** None
- **Dependencies:** None

This Pull request add the methods:
- create_branch
- list_branches_in_repo
- set_active_branch
- list_files_in_main_branch
- list_files_in_bot_branch
- list_files_from_directory

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 19:11:35 +00:00
Prathamesh Nimkar
ca054ed1b1
community: ChatSnowflakeCortex - Add streaming functionality (#27753)
Description:
snowflake.py
Add _stream and _stream_content methods to enable streaming
functionality
fix pydantic issues and added functionality with the overall langchain
version upgrade
added bind_tools method for agentic workflows support through langgraph
updated the _generate method to account for agentic workflows support
through langgraph
cosmetic changes to comments and if conditions

snowflake.ipynb
Added _stream example
cosmetic changes to comments
fixed lint errors

check_pydantic.sh
Decreased counter from 126 to 125 as suggested when formatting

---------

Co-authored-by: Prathamesh Nimkar <prathamesh.nimkar@snowflake.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 18:35:40 -08:00
Wang, Yi
d834c6b618
huggingface: fix tool argument serialization in _convert_TGI_message_to_LC_message (#26075)
Currently `_convert_TGI_message_to_LC_message` replaces `'` in the tool
arguments, so an argument like "It's" will be converted to `It"s` and
could cause a json parser to fail.

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Vadym Barda <vadym@langchain.dev>
2024-12-11 18:34:32 -08:00
Lakindu Boteju
5a31792bf1
community: Add support for cross-region inference profile IDs in Bedrock Anthropic Claude token cost calculation (#28167)
This change modifies the token cost calculation logic to support
cross-region inference profile IDs for Anthropic Claude models. Instead
of explicitly listing all regional variants of new inference profile IDs
in the cost dictionaries, the code now extracts a base model ID from the
input model ID (or inference profile ID), making it more maintainable
and automatically supporting new regional variants.

These inference profile IDs follow the format:
`<region>.<vendor>.<model-name>` (e.g.,
`us.anthropic.claude-3-haiku-xxx`, `eu.anthropic.claude-3-sonnet-xxx`).

Cross-region inference profiles are system-defined identifiers that
enable distributing model inference requests across multiple AWS
regions. They help manage unplanned traffic bursts and enhance
resilience during peak demands without additional routing costs.

References for Amazon Bedrock's cross-region inference profiles:-
-
https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html
-
https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 02:33:50 +00:00
fatmelon
d1e0ec7b55
community: VectorStores: Azure Cosmos DB Mongo vCore with DiskANN (#27329)
# Description
Add a new vector index type `diskann` to Azure Cosmos DB Mongo vCore
vector store. Paper of DiskANN can be found here [DiskANN: Fast Accurate
Billion-point Nearest Neighbor Search on a Single
Node](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf).

## Sample Usage
```python
from pymongo import MongoClient

# INDEX_NAME = "izzy-test-index-2"
# NAMESPACE = "izzy_test_db.izzy_test_collection"
# DB_NAME, COLLECTION_NAME = NAMESPACE.split(".")

client: MongoClient = MongoClient(CONNECTION_STRING)
collection = client[DB_NAME][COLLECTION_NAME]

model_deployment = os.getenv(
    "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada"
)
model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002")

vectorstore = AzureCosmosDBVectorSearch.from_documents(
    docs,
    openai_embeddings,
    collection=collection,
    index_name=INDEX_NAME,
)

# Read more about these variables in detail here. https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search
maxDegree = 40
dimensions = 1536
similarity_algorithm = CosmosDBSimilarityType.COS
kind = CosmosDBVectorSearchType.VECTOR_DISKANN
lBuild = 20

vectorstore.create_index(
            dimensions=dimensions,
            similarity=similarity_algorithm,
            kind=kind ,
            max_degree=maxDegree,
            l_build=lBuild,
        )
```

## Dependencies
No additional dependencies were added

---------

Co-authored-by: Yang Qiao (from Dev Box) <yangqiao@microsoft.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 01:54:04 +00:00
manukychen
ba9b95cd23
Community: Adding bulk_size as a setable param for OpenSearchVectorSearch (#28325)
Description:
When using langchain.retrievers.parent_document_retriever.py with
vectorstore is OpenSearchVectorSearch, I found that the bulk_size param
I passed into OpenSearchVectorSearch class did not work on my
ParentDocumentRetriever.add_documents() function correctly, it will be
overwrite with int 500 the function which OpenSearchVectorSearch class
had (e.g., add_texts(), add_embeddings()...).

So I made this PR requset to fix this, thanks!

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 01:45:22 +00:00
xintoteai
45f9c9ae88
langchain: fixed weaviate (v4) vectorstore import for self-query retriever (#28675)
Co-authored-by: Xin Heng <xin.heng@gmail.com>
2024-12-11 15:53:41 -08:00
Thomas van Dongen
ee640d6bd3
community: fixed bug in model2vec embedding code (#28670)
This PR fixes a bug with the current implementation for Model2Vec
embeddings where `embed_documents` does not work as expected.

- **Description**: the current implementation uses `encode_as_sequence`
for encoding documents. This is incorrect, as `encode_as_sequence`
creates token embeddings and not mean embeddings. The normal `encode`
function handles both single and batched inputs and should be used
instead. The return type was also incorrect, as encode returns a NumPy
array. This PR converts the embedding to a list so that the output is
consistent with the Embeddings ABC.
2024-12-11 15:50:56 -08:00
Brian Sharon
b20230c800
community: use correct id_key when deleting by id in LanceDB wrapper (#28655)
- **Description:** The current version of the `delete` method assumes
that the id field will always be called `id`.
- **Issue:** n/a
- **Dependencies:** n/a
- **Twitter handle:** ugh, Twitter :D 

---

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 23:49:35 +00:00
Mohammad Mohtashim
fa155a422f
[Community]: requests_kwargs not being used in _fetch (#28646)
- **Description:** `requests_kwargs` is not being passed to `_fetch`
which is fetching pages asynchronously. In this PR, making sure that we
are passing `requests_kwargs` to `_fetch` just like `_scrape`.
- **Issue:** #28634

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 23:46:54 +00:00
Mohammad Mohtashim
a37afbe353
mistral[minor]: Added Retrying Mechanism in case of Request Rate Limit Error for MistralAIEmbeddings (#27818)
- **Description:**: In the event of a Rate Limit Error from the
MistralAI server, the response JSON raises a KeyError. To address this,
a simple retry mechanism has been implemented to handle cases where the
request limit is exceeded.
  - **Issue:** #27790

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-11 17:53:42 -05:00
Vincent Zhang
df5008fe55
community[minor]: FAISS Filter Function Enhancement with Advanced Query Operators (#28207)
## Description
We are submitting as a team of four for a project. Other team members
are @RuofanChen03, @LikeWang10067, @TANYAL77.

This pull requests expands the filtering capabilities of the FAISS
vectorstore by adding MongoDB-style query operators indicated as
follows, while including comprehensive testing for the added
functionality.
- $eq (equals)
- $neq (not equals)
- $gt (greater than)
- $lt (less than)
- $gte (greater than or equal)
- $lte (less than or equal)
- $in (membership in list)
- $nin (not in list)
- $and (all conditions must match)
- $or (any condition must match)
- $not (negation of condition)


## Issue
This closes https://github.com/langchain-ai/langchain/issues/26379.


## Sample Usage
```python
import faiss
import asyncio
from langchain_community.vectorstores import FAISS
from langchain.schema import Document
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
documents = [
    Document(page_content="Process customer refund request", metadata={"schema_type": "financial", "handler_type": "refund",}),
    Document(page_content="Update customer shipping address", metadata={"schema_type": "customer", "handler_type": "update",}),
    Document(page_content="Process payment transaction", metadata={"schema_type": "financial", "handler_type": "payment",}),
    Document(page_content="Handle customer complaint", metadata={"schema_type": "customer","handler_type": "complaint",}),
    Document(page_content="Process invoice payment", metadata={"schema_type": "financial","handler_type": "payment",})
]

async def search(vectorstore, query, schema_type, handler_type, k=2):
    schema_filter = {"schema_type": {"$eq": schema_type}}
    handler_filter = {"handler_type": {"$eq": handler_type}}
    combined_filter = {
        "$and": [
            schema_filter,
            handler_filter,
        ]
    }
    base_retriever = vectorstore.as_retriever(
        search_kwargs={"k":k, "filter":combined_filter}
    )
    return await base_retriever.ainvoke(query)

async def main():
    vectorstore = FAISS.from_texts(
        texts=[doc.page_content for doc in documents],
        embedding=embeddings,
        metadatas=[doc.metadata for doc in documents]
    )
    
    def printt(title, documents):
        print(title)
        if not documents:
            print("\tNo documents found.")
            return
        for doc in documents:
            print(f"\t{doc.page_content}. {doc.metadata}")

    printt("Documents:", documents)
    printt('\nquery="process payment", schema_type="financial", handler_type="payment":', await search(vectorstore, query="process payment", schema_type="financial", handler_type="payment", k=2))
    printt('\nquery="customer update", schema_type="customer", handler_type="update":', await search(vectorstore, query="customer update", schema_type="customer", handler_type="update", k=2))
    printt('\nquery="refund process", schema_type="financial", handler_type="refund":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="refund", k=2))
    printt('\nquery="refund process", schema_type="financial", handler_type="foobar":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="foobar", k=2))
    print()

if __name__ == "__main__":asyncio.run(main())
```

## Output
```
Documents:
	Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'}
	Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'}
	Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'}
	Handle customer complaint. {'schema_type': 'customer', 'handler_type': 'complaint'}
	Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'}

query="process payment", schema_type="financial", handler_type="payment":
	Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'}
	Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'}

query="customer update", schema_type="customer", handler_type="update":
	Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'}

query="refund process", schema_type="financial", handler_type="refund":
	Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'}

query="refund process", schema_type="financial", handler_type="foobar":
	No documents found.

```

---------

Co-authored-by: ruofan chen <ruofan.is.awesome@gmail.com>
Co-authored-by: RickyCowboy <like.wang@mail.utoronto.ca>
Co-authored-by: Shanni Li <tanya.li@mail.utoronto.ca>
Co-authored-by: RuofanChen03 <114096642+ruofanchen03@users.noreply.github.com>
Co-authored-by: Like Wang <102838708+likewang10067@users.noreply.github.com>
2024-12-11 17:52:22 -05:00
like
3048a9a26d
community: tongyi multimodal response format fix to support langchain (#28645)
Description: The multimodal(tongyi) response format "message": {"role":
"assistant", "content": [{"text": "图像"}]}}]} is not compatible with
LangChain.
Dependencies: No

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 21:13:26 +00:00
Bagatur
d0e662e43b
community[patch]: Release 0.3.11 (#28658) 2024-12-10 20:51:13 +00:00
Bagatur
91227ad7fd
langchain[patch]: Release 0.3.11 (#28657) 2024-12-10 12:28:14 -08:00
Bagatur
1fbd86a155
core[patch]: Release 0.3.24 (#28656) 2024-12-10 20:19:21 +00:00
Bagatur
e6a62d8422
core,langchain,community[patch]: allow langsmith 0.2 (#28598) 2024-12-10 18:50:58 +00:00
ccurme
bc4dc7f4b1
ollama[patch]: permit streaming for tool calls (#28654)
Resolves https://github.com/langchain-ai/langchain/issues/28543

Ollama recently
[released](https://github.com/ollama/ollama/releases/tag/v0.4.6) support
for streaming tool calls. Previously we would override the `stream`
parameter if tools were passed in.

Covered in standard tests here:
c1d348e95d/libs/standard-tests/langchain_tests/integration_tests/chat_models.py (L893-L897)

Before, the test generates one message chunk:
```python
[
    AIMessageChunk(
        content='',
        additional_kwargs={},
        response_metadata={
            'model': 'llama3.1',
            'created_at': '2024-12-10T17:49:04.468487Z',
            'done': True,
            'done_reason': 'stop',
            'total_duration': 525471208,
            'load_duration': 19701000,
            'prompt_eval_count': 170,
            'prompt_eval_duration': 31000000,
            'eval_count': 17,
            'eval_duration': 473000000,
            'message': Message(
                role='assistant',
                content='',
                images=None,
                tool_calls=[
                    ToolCall(
                        function=Function(name='magic_function', arguments={'input': 3})
                    )
                ]
            )
        },
        id='run-552bbe0f-8fb2-4105-ada1-fa38c1db444d',
        tool_calls=[
            {
                'name': 'magic_function',
                'args': {'input': 3},
                'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2',
                'type': 'tool_call',
            },
        ],
        usage_metadata={
            'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187
        },
        tool_call_chunks=[
            {
                'name': 'magic_function',
                'args': '{"input": 3}',
                'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2',
                'index': None,
                'type': 'tool_call_chunk',
            }
        ]
    )
]
```

After, it generates two (tool call in one, response metadata in
another):
```python
[
    AIMessageChunk(
        content='',
        additional_kwargs={},
        response_metadata={},
        id='run-9a3f0860-baa1-4bae-9562-13a61702de70',
        tool_calls=[
            {
                'name': 'magic_function',
                'args': {'input': 3},
                'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0',
                'type': 'tool_call',
            },
        ],
        tool_call_chunks=[
            {
                'name': 'magic_function',
                'args': '{"input": 3}',
                'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0',
                'index': None,
                'type': 'tool_call_chunk',
            },
        ],
    ),
    AIMessageChunk(
        content='',
        additional_kwargs={},
        response_metadata={
            'model': 'llama3.1',
            'created_at': '2024-12-10T17:46:43.278436Z',
            'done': True,
            'done_reason': 'stop',
            'total_duration': 514282750,
            'load_duration': 16894458,
            'prompt_eval_count': 170,
            'prompt_eval_duration': 31000000,
            'eval_count': 17,
            'eval_duration': 464000000,
            'message': Message(
                role='assistant', content='', images=None, tool_calls=None
            ),
        },
        id='run-9a3f0860-baa1-4bae-9562-13a61702de70',
        usage_metadata={
            'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187
        }
    ),
]
```
2024-12-10 12:54:37 -05:00
Johannes Mohren
c1d348e95d
doc-loader: retain Azure Doc Intelligence API metadata in Document parser (#28382)
**Description**:
This PR modifies the doc_intelligence.py parser in the community package
to include all metadata returned by the Azure Doc Intelligence API in
the Document object. Previously, only the parsed content (markdown) was
retained, while other important metadata such as bounding boxes (bboxes)
for images and tables was discarded. These image bboxes are crucial for
supporting use cases like multi-modal RAG workflows when using Azure Doc
Intelligence.

The change ensures that all information returned by the Azure Doc
Intelligence API is preserved by setting the metadata attribute of the
Document object to the entire result returned by the API, rather than an
empty dictionary. This extends the parser's utility for complex use
cases without breaking existing functionality.

**Issue**:
This change does not address a specific issue number, but it resolves a
critical limitation in supporting multimodal workflows when using the
LangChain wrapper for the Azure API.

**Dependencies**:
No additional dependencies are required for this change.

---------

Co-authored-by: jmohren <johannes.mohren@aol.de>
2024-12-10 11:22:58 -05:00
Alex Tonkonozhenko
0d20c314dd
Confluence Loader: Fix CQL loading (#27620)
fix #12082

<!---
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
-->
2024-12-10 11:05:23 -05:00
Katarina Supe
aba2711e7f
community: update Memgraph integration (#27017)
**Description:**
- **Memgraph** no longer relies on `Neo4jGraphStore` but **implements
`GraphStore`**, just like other graph databases.
- **Memgraph** no longer relies on `GraphQAChain`, but implements
`MemgraphQAChain`, just like other graph databases.
- The refresh schema procedure has been updated to try using `SHOW
SCHEMA INFO`. The fallback uses Cypher queries (a combination of schema
and Cypher) → **LangChain integration no longer relies on MAGE
library**.
- The **schema structure** has been reformatted. Regardless of the
procedures used to get schema, schema structure is the same.
- The `add_graph_documents()` method has been implemented. It transforms
`GraphDocument` into Cypher queries and creates a graph in Memgraph. It
implements the ability to use `baseEntityLabel` to improve speed
(`baseEntityLabel` has an index on the `id` property). It also
implements the ability to include sources by creating a `MENTIONS`
relationship to the source document.
- Jupyter Notebook for Memgraph has been updated.
- **Issue:** /
- **Dependencies:** /
- **Twitter handle:** supe_katarina (DX Engineer @ Memgraph)

Closes #25606
2024-12-10 10:57:21 -05:00
ccurme
5c6e2cbcda
ollama[patch]: support structured output (#28629)
- Bump minimum version of `ollama` to 0.4.4 (which also addresses
https://github.com/langchain-ai/langchain/issues/28607).
- Support recently-released [structured
output](https://ollama.com/blog/structured-outputs) feature. This can be
accessed by calling `.with_structured_output` with
`method="json_schema"` (choice of name
[mirrors](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.with_structured_output)
what we have for OpenAI's structured output feature).

`ChatOllama` previously implemented `.with_structured_output` via the
[base
implementation](ec9b41431e/libs/core/langchain_core/language_models/chat_models.py (L1117)).
2024-12-10 10:36:00 -05:00
Bagatur
24292c4a31
core[patch]: Release 0.3.23 (#28648) 2024-12-10 10:01:16 +00:00
Bagatur
e24f86e55f
core[patch]: return ToolMessage from tool (#28605) 2024-12-10 09:59:38 +00:00
Erick Friis
ef2f875dfb
core: deprecate PipelinePromptTemplate (#28644) 2024-12-10 03:56:48 +00:00
TamagoTorisugi
0f0df2df60
fix: Set default search_type to 'similarity' in as_retriever method of AzureSearch (#28376)
**Description**
This PR updates the `as_retriever` method in the `AzureSearch` to ensure
that the `search_type` parameter defaults to 'similarity' when not
explicitly provided.

Previously, if the `search_type` was omitted, it did not default to any
specific value. So it was inherited from
`AzureSearchVectorStoreRetriever`, which defaults to 'hybrid'.

This change ensures that the intended default behavior aligns with the
expected usage.

**Issue**
No specific issue was found related to this change.

**Dependencies**
No new dependencies are introduced with this change.

---------

Co-authored-by: prrao87 <prrao87@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 03:40:04 +00:00
Prashanth Rao
8c6eec5f25
community: KuzuGraph needs allow_dangerous_requests, add graph documents via LLMGraphTransformer (#27949)
- [x] **PR title**: "community: Kuzu - Add graph documents via
LLMGraphTransformer"
- This PR adds a new method `add_graph_documents` to use the
`GraphDocument`s extracted by `LLMGraphTransformer` and store in a Kùzu
graph backend.
- This allows users to transform unstructured text into a graph that
uses Kùzu as the graph store.

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: pookam90 <pookam@microsoft.com>
Co-authored-by: Pooja Kamath <60406274+Pookam90@users.noreply.github.com>
Co-authored-by: hsm207 <hsm207@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 03:15:28 +00:00
Filip Ratajczak
4e743b5427
Core: google docstring parsing fix (#28404)
Thank you for contributing to LangChain!

- [ ] **PR title**: "core: google docstring parsing fix"


- [x] **PR message**:
- **Description:** Added a solution for invalid parsing of google
docstring such as:
    Args:
net_annual_income (float): The user's net annual income (in current year
dollars).
- **Issue:** Previous code would return arg = "net_annual_income
(float)" which would cause exception in
_validate_docstring_args_against_annotations
    - **Dependencies:** None

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 00:27:25 +00:00
Arnav Priyadarshi
b78b2f7a28
community[fix]: Update Perplexity to pass parameters into API calls (#28421)
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- **Description:** I realized the invocation parameters were not being
passed into `_generate` so I added those in but then realized that the
parameters contained some old fields designed for an older openai client
which I removed. Parameters work fine now.
- **Issue:** Fixes #28229 
- **Dependencies:** No new dependencies.  
- **Twitter handle:** @arch_plane

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 00:23:31 +00:00
Clément Jumel
cf6d1c0ae7
docs: add Linkup integration documentation (#28366)
## Description

First of all, thanks for the great framework that is LangChain!

At [Linkup](https://www.linkup.so/) we're working on an API to connect
LLMs and agents to the internet and our partner sources. We'd be super
excited to see our API integrated in LangChain! This essentially
consists in adding a LangChain retriever and tool, which is done in our
own [package](https://pypi.org/project/langchain-linkup/). Here we're
simply following the [integration
documentation](https://python.langchain.com/docs/contributing/how_to/integrations/)
and update the documentation of LangChain to mention the Linkup
integration.

We do have tests (both units & integration) in our [source
code](https://github.com/LinkupPlatform/langchain-linkup), and tried to
follow as close as possible the [integration
documentation](https://python.langchain.com/docs/contributing/how_to/integrations/)
which specifically requests to focus on documentation changes for an
integration PR, so I'm not adding tests here, even though the PR
checklist seems to suggest so. Feel free to correct me if I got this
wrong!

By the way, we would be thrilled by being mentioned in the list of
providers which have standalone packages
[here](https://langchain-git-fork-linkupplatform-cj-doc-langchain.vercel.app/docs/integrations/providers/),
is there something in particular for us to do for that? 🙂

## Twitter handle

Linkup_platform
<!--
## PR Checklist

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
--!>
2024-12-09 14:36:25 -08:00
Amir Sadeghi
2c49f587aa
community[fix]: could not locate runnable browser (#28289)
set open_browser to false to resolve "could not locate runnable browser"
error while default browser is None

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 21:05:52 +00:00
Martin Triska
75bc6bb191
community: [bugfix] fix source path for office files in O365 (#28260)
# What problem are we fixing?

Currently documents loaded using `O365BaseLoader` fetch source from
`file.web_url` (where `file` is `<class 'O365.drive.File'>`). This works
well for `.pdf` documents. Unfortunately office documents (`.xlsx`,
`.docx` ...) pass their `web_url` in following format:

`https://sharepoint_address/sites/path/to/library/root/Doc.aspx?sourcedoc=%XXXXXXXX-1111-1111-XXXX-XXXXXXXXXX%7D&file=filename.xlsx&action=default&mobileredirect=true`

This obfuscates the path to the file. This PR utilizes the parrent
folder's path and file name to reconstruct the actual location of the
file. Knowing the file's location can be crucial for some RAG
applications (path to the file can carry information we don't want to
loose).

@vbarda Could you please look at this one? I'm @-mentioning you since
we've already closed some PRs together :-)

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 12:34:59 -08:00
Erick Friis
534b8f4364
standard-tests: release 0.3.7 (#28637) 2024-12-09 15:12:18 -05:00
Naka Masato
ce3b69aa05
community: add include_labels option to ConfluenceLoader (#28259)
## **Description:**

Enable `ConfluenceLoader` to include labels with `include_labels` option
(`false` by default for backward compatibility). and the labels are set
to `metadata` in the `Document`. e.g. `{"labels": ["l1", "l2"]}`

## Notes

Confluence API supports to get labels by providing `metadata.labels` to
`expand` query parameter

All of the following functions support `expand` in the same way:
- confluence.get_page_by_id
- confluence.get_all_pages_by_label
- confluence.get_all_pages_from_space
- cql (internally using
[/api/content/search](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content/#api-wiki-rest-api-content-search-get))

## **Issue:**

No issue related to this PR.

## **Dependencies:** 

No changes.

## **Twitter handle:** 

[@gymnstcs](https://x.com/gymnstcs)


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 19:35:01 +00:00
Rajendra Kadam
242fee11be
community[minor] Pebblo: Support for new Pinecone class PineconeVectorStore (#28253)
- **Description:** Support for new Pinecone class PineconeVectorStore in
PebbloRetrievalQA.
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** -

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 19:33:54 +00:00
nikitajoyn
9fcd203556
partners/mistralai: Fix KeyError in Vertex AI stream (#28624)
- **Description:** Streaming response from Mistral model using Vertex AI
raises KeyError when trying to access `choices` key, that the last chunk
doesn't have. The fix is to access the key safely using `get()`.
  - **Issue:** https://github.com/langchain-ai/langchain/issues/27886
  - **Dependencies:**
  - **Twitter handle:**
2024-12-09 14:14:58 -05:00
maang-h
b64d846347
docs: Standardize MoonshotChat docstring (#28159)
- **Description:** Add docstring

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 18:46:25 +00:00
Erick Friis
4c70ffff01
standard-tests: sync/async vectorstore tests conditional (#28636)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-09 18:02:55 +00:00
ccurme
ffb5c1905a
openai[patch]: release 0.2.12 (#28633) 2024-12-09 12:38:13 -05:00
ccurme
6e6061fe73
openai[patch]: bump minimum SDK version (#28632)
Resolves https://github.com/langchain-ai/langchain/issues/28625
2024-12-09 11:28:05 -05:00
Mohammad Mohtashim
ec9b41431e
[Core]: Small Docstring Clarification for BaseTool (#28148)
- **Description:** `kwargs` are not being passed to `run` of the
`BaseTool` which has been fixed
- **Issue:** #28114

---------

Co-authored-by: Stevan Kapicic <kapicic.ste1@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 06:10:19 +00:00
Erick Friis
cef21a0b49
cli: warning on app add (#28619)
instead of #28128
2024-12-09 06:07:14 +00:00
Ankit Dangi
90f162efb6
text-splitters: add pydocstyle linting (#28127)
As seen in #23188, turned on Google-style docstrings by enabling
`pydocstyle` linting in the `text-splitters` package. Each resulting
linting error was addressed differently: ignored, resolved, suppressed,
and missing docstrings were added.

Fixes one of the checklist items from #25154, similar to #25939 in
`core` package. Ran `make format`, `make lint` and `make test` from the
root of the package `text-splitters` to ensure no issues were found.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 06:01:03 +00:00
WGNW_MG
eabe587787
community[patch]:Fix for get_openai_callback() return token_cost=0.0 when model is gpt-4o-11-20 (#28408)
- **Description:** update MODEL_COST_PER_1K_TOKENS for new gpt-4o-11-20.
- **Issue:** with latest gpt-4o-11-20, openai callback return
token_cost=0.0
- **Dependencies:** None (just simple dict fix.)
- **Twitter handle:** I Don't Use Twitter. 
- (However..., I have a YouTube channel. Could you upload this there, by
any chance?
https://www.youtube.com/@%EA%B2%9C%EC%B0%BD%EB%B6%80%EA%B3%A0%EB%AC%B8AI%EC%9E%90%EB%AC%B8%EC%84%BC%EC%84%B8)
2024-12-08 20:46:50 -08:00
Fahim Zaman
481c4bfaba
core[patch]: Fixed trim functions, and added corresponding unit test for the solved issue (#28429)
- **Description:** 
- Trim functions were incorrectly deleting nodes with more than 1
outgoing/incoming edge, so an extra condition was added to check for
this directly. A unit test "test_trim_multi_edge" was written to test
this test case specifically.
- **Issue:** 
  - Fixes #28411 
  - Fixes https://github.com/langchain-ai/langgraph/issues/1676
- **Dependencies:** 
  - No changes were made to the dependencies

- [x] Unit tests were added to verify the changes.
- [x] Updated documentation where necessary.
- [x] Ran make format, make lint, and make test to ensure compliance
with project standards.

---------

Co-authored-by: Tasif Hussain <tasif006@gmail.com>
2024-12-08 20:45:28 -08:00
Marco Perini
2354bb7bfa
partners: 🕷️🦜 ScrapeGraph API Integration (#28559)
Hi Langchain team!

I'm the co-founder and mantainer at
[ScrapeGraphAI](https://scrapegraphai.com/).
By following the integration
[guide](https://python.langchain.com/docs/contributing/how_to/integrations/publish/)
on your site, I have created a new lib called
[langchain-scrapegraph](https://github.com/ScrapeGraphAI/langchain-scrapegraph).

With this PR I would like to integrate Scrapegraph as provider in
Langchain, adding the required documentation files.
Let me know if there are some changes to be made to be properly
integrated both in the lib and in the documentation.

Thank you 🕷️🦜

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 02:38:21 +00:00
Abhinav
317a38b83e
community[minor]: Add support for modle2vec embeddings (#28507)
This PR add an embeddings integration for model2vec, the
`Model2vecEmbeddings` class.

- **Description**: [Model2Vec](https://github.com/MinishLab/model2vec)
lets you turn any sentence transformer into a really small static model
and makes running the model faster.
- **Issue**:
- **Dependencies**: model2vec
([pypi](https://pypi.org/project/model2vec/))
- **Twitter handle:**:

- [x] **Add tests and docs**: 
-
[Test](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/libs/community/langchain_community/embeddings/model2vec.py),
[docs](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/docs/docs/integrations/text_embedding/model2vec.ipynb)

- [x] **Lint and test**:

---------

Co-authored-by: Abhinav KM <abhinav.m@zerone-consulting.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-12-09 02:17:22 +00:00
Mohammad Mohtashim
524ee6d9ac
Invalid tool_choice being passed to ChatLiteLLM (#28198)
- **Description:** Invalid `tool_choice` is given to `ChatLiteLLM` to
`bind_tools` due to it's parent's class default value being pass through
`with_structured_output`.
- **Issue:** #28176
2024-12-07 14:33:40 -05:00
Erick Friis
dd0085a9ff
docs: standard tests to markdown, load templates from files (#28603) 2024-12-07 01:37:21 +00:00
Erick Friis
5e8553c31a
standard-tests: retriever docstrings (#28596) 2024-12-07 00:32:19 +00:00
ccurme
d801c6ffc7
tests[patch]: nits (#28601) 2024-12-07 00:13:04 +00:00
Erick Friis
07c2ac765a
community: release 0.3.10 (#28600) 2024-12-07 00:07:13 +00:00
Erick Friis
4a7dc6ec4c
standard-tests: release 0.3.6 (#28599) 2024-12-07 00:05:04 +00:00
ccurme
80a88f8f04
tests[patch]: update API ref for chat models (#28594) 2024-12-06 19:00:14 -05:00
Erick Friis
0eb7ab65f1
multiple: fix xfailed signatures (#28597) 2024-12-06 15:39:47 -08:00
Erick Friis
b7c2029e84
standard-tests: root docstrings (#28595) 2024-12-06 15:14:52 -08:00
Erick Friis
9e2abcd152
standard-tests: show right classes in api docs (#28591) 2024-12-06 14:48:13 -08:00
Erick Friis
246c10a1cc
standard-tests: private members and tools unit troubleshoot (#28590) 2024-12-06 13:52:58 -08:00
Erick Friis
e6663b69f3
langchain: release 0.3.10 (#28585) 2024-12-06 20:20:24 +00:00
Erick Friis
c38b845d7e
core: fix path test (#28584) 2024-12-06 20:05:18 +00:00
ccurme
2c6bc74cb1
multiple: combine sync/async vector store standard test suites (#28580)
Breaking change in `langchain-tests`.
2024-12-06 14:55:06 -05:00
Bagatur
dda9f90047
core[patch]: Release 0.3.22 (#28582) 2024-12-06 19:36:53 +00:00
ccurme
f3dc142d3c
cli[patch]: implement minimal starter vector store (#28577)
Basically the same as core's in-memory vector store. Removed some
optional methods.
2024-12-06 13:10:22 -05:00
Erick Friis
5277a021c1
docs: raw loader codeblock (#28548)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-06 09:26:34 -08:00
Erick Friis
18386c16c7
core, tests: more tolerant _aget_relevant_documents function (#28462) 2024-12-06 00:49:30 +00:00
Erick Friis
bc636ccc60
cli: release 0.0.35 (#28557) 2024-12-05 16:40:52 -08:00
Erick Friis
7ecf38f4fa
cli: create specific files from template (#28556) 2024-12-06 00:32:47 +00:00
Erick Friis
478def8dcc
core: deprecation doc removal (#28553)
![ScreenShot 2024-12-05 at 02 33
43PM@2x](https://github.com/user-attachments/assets/e1ce495b-90ca-41c7-9a65-b403a934675c)
2024-12-05 15:35:28 -08:00
cinqisap
482e8a7855
community: Add support for SAP HANA Vector hnsw index creation (#27884)
**Issue:** Added support for creating indexes in the SAP HANA Vector
engine.
 
**Changes**: 
1. Introduced a new function `create_hnsw_index` in `hanavector.py` that
enables the creation of indexes for SAP HANA Vector.
2. Added integration tests for the index creation function to ensure
functionality.
3. Updated the documentation to reflect the new index creation feature,
including examples and output from the notebook.
4. Fix the operator issue in ` _process_filter_object` function and
change the array argument to a placeholder in the similarity search SQL
statement.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-05 23:29:08 +00:00
blaufink
28f8d436f6
mistral: fix of issue #26029 (#28233)
- Description: Azure AI takes an issue with the safe_mode parameter
being set to False instead of None. Therefore, this PR changes the
default value of safe_mode from False to None. This results in it being
filtered out before the request is sent - avoind the extra-parameter
issue described below.

- Issue: #26029

- Dependencies: /

---------

Co-authored-by: blaufink <sebastian.brueckner@outlook.de>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-05 23:28:12 +00:00
ccurme
ecdfc98ef6
tests[patch]: run standard tests for embeddings and populate embeddings API ref (#28545)
plus minor updates to chat models and vector store API refs
2024-12-05 19:39:03 +00:00
ccurme
b8e861a63b
openai[patch]: add standard tests for embeddings (#28540) 2024-12-05 17:00:27 +00:00
ZhangShenao
d26555c682
[VectorStore] Improvement: Improve chroma vector store (#28524)
- Complete unit test
- Fix spelling error
2024-12-05 11:58:32 -05:00
ccurme
8f9b3b7498
chroma[patch]: fix bug (#28538)
Fix bug introduced in
https://github.com/langchain-ai/langchain/pull/27995

If all document IDs are `""`, the chroma SDK will raise
```
DuplicateIDError: Expected IDs to be unique
```

Caught by [docs
tests](https://github.com/langchain-ai/langchain/actions/runs/12180395579/job/33974633950),
but added a test to langchain-chroma as well.
2024-12-05 15:37:19 +00:00
Erick Friis
ecff9a01e4
cli: release 0.0.34 (#28525) 2024-12-05 15:35:49 +00:00
ccurme
d9e42a1517
langchain[patch]: fix deprecation warning (#28535) 2024-12-05 14:49:10 +00:00
Erick Friis
0f539f0246
standard-tests: release 0.3.5 (#28526) 2024-12-05 00:41:07 -08:00
Erick Friis
43c35d19d4
cli: standard tests in cli, test that they run, skip vectorstore tests (#28521) 2024-12-05 00:38:32 -08:00
Erick Friis
c5acedddc2
anthropic: timeout in tests (10s) (#28488) 2024-12-04 16:03:38 -08:00
ccurme
f459754470
tests[patch]: populate API reference for vector stores (#28520) 2024-12-05 00:02:31 +00:00
ccurme
8bc2c912b8
chroma[patch]: (nit) simplify test (#28517)
Use `self.get_embeddings` on test class instead of importing embeddings
separately.
2024-12-04 20:22:55 +00:00
ccurme
eec55c2550
chroma[patch]: add get_by_ids and fix bug (#28516)
- Run standard integration tests in Chroma
- Add `get_by_ids` method
- Fix bug in `add_texts`: if a list of `ids` is passed but any of them
are None, Chroma will raise an exception. Here we assign a uuid.
2024-12-04 14:00:36 -05:00
Erick Friis
e6a08355a3
docs: more api ref links, add linting step to prevent more (#28495) 2024-12-04 04:19:42 +00:00
wlleiiwang
6151ea78d5
community: implement _select_relevance_score_fn for tencent vectordb (#28036)
implement _select_relevance_score_fn for tencent vectordb
fix use external embedding for tencent vectordb

Co-authored-by: wlleiiwang <wlleiiwang@tencent.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-04 03:03:00 +00:00
Asi Greenholts
d34bf78f3b
community: BM25Retriever preservation of document id (#27019)
Currently this retriever discards document ids

---------

Co-authored-by: asi-cider <88270351+asi-cider@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-04 00:36:00 +00:00
peterdhp
bc5ec63d67
community : allow using apikey for PubMedAPIWrapper (#27246)
**Description**: 

> Without an API key, any site (IP address) posting more than 3 requests
per second to the E-utilities will receive an error message. By
including an API key, a site can post up to 10 requests per second by
default.

quoted from A General Introduction to the E-utilities,NCBI :
https://www.ncbi.nlm.nih.gov/books/NBK25497/

I have simply added a api_key parameter to the PubMedAPIWrapper that can
be used to increase the number of requests per second from 3 to 10.

**Twitter handle** : @KORmaori

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-03 16:21:22 -08:00
Eric Pinzur
eff8a54756
langchain_chroma: added document.id support (#27995)
Description:
* Added internal `Document.id` support to Chroma VectorStore

Dependencies:
* https://github.com/langchain-ai/langchain/pull/27968 should be merged
first and this PR should be re-based on top of those changes.

Tests:
* Modified/Added tests for `Document.id` support. All tests are passing.


Note: I am not a member of the Chroma team.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-04 00:04:27 +00:00
William Smith
15e7353168
langchain_community: updated query constructor for Databricks Vector Search due to LangChainDeprecationWarning: filters was deprecated since langchain-community 0.2.11 and will be removed in 0.3. Please use filter instead. (#27974)
- **Description:** Updated the kwargs for the structured query from
filters to filter due to deprecation of 'filters' for Databricks Vector
Search. Also changed the error messages as the allowed operators and
comparators are different which can cause issues with functions such as
get_query_constructor_prompt()

- **Issue:** Fixes the Key Error for filters due to deprecation in favor
for 'filter':

LangChainDeprecationWarning: DatabricksVectorSearch received a key
`filters` in search_kwargs. `filters` was deprecated since
langchain-community 0.2.11 and will be removed in 0.3. Please use
`filter` instead.

- **Dependencies:** N/A
- **Twitter handle:** N/A

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-03 16:03:53 -08:00
Jan Heimes
ef365543cb
community: add Needle retriever and document loader integration (#28157)
- [x] **PR title**: "community: add Needle retriever and document loader
integration"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"

- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** This PR adds a new integration for Needle, which
includes:
- **NeedleRetriever**: A retriever for fetching documents from Needle
collections.
- **NeedleLoader**: A document loader for managing and loading documents
into Needle collections.
      - Example notebooks demonstrating usage have been added in:
        - `docs/docs/integrations/retrievers/needle.ipynb`
        - `docs/docs/integrations/document_loaders/needle.ipynb`.
- **Dependencies:** The `needle-python` package is required as an
external dependency for accessing Needle's API. It has been added to the
extended testing dependencies list.
- **Twitter handle:** Feel free to mention me if this PR gets announced:
[needlexai](https://x.com/NeedlexAI).

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. Unit tests have been added for both `NeedleRetriever` and
`NeedleLoader` in `libs/community/tests/unit_tests`. These tests mock
API calls to avoid relying on network access.
2. Example notebooks have been added to `docs/docs/integrations/`,
showcasing both retriever and loader functionality.

- [x] **Lint and test**: Run `make format`, `make lint`, and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
  - `make format`: Passed
  - `make lint`: Passed
- `make test`: Passed (requires `needle-python` to be installed locally;
this package is not added to LangChain dependencies).

Additional guidelines:
- [x] Optional dependencies are imported only within functions.
- [x] No dependencies have been added to pyproject.toml files except for
those required for unit tests.
- [x] The PR does not touch more than one package.
- [x] Changes are fully backwards compatible.
- [x] Community additions are not re-imported into LangChain core.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-03 22:06:25 +00:00
ccurme
ab831ce05c
tests[patch]: populate API reference for chat models (#28487)
Populate API reference for test class properties and test methods for
chat models.

Also:
- Make `standard_chat_model_params` private.
- `pytest.skip` some tests that were previously passed if features are
not supported.
2024-12-03 15:24:54 -05:00
Erick Friis
c74f34cb41
pinecone: release 0.2.1 (version sequence) (#28485) 2024-12-03 10:22:16 -08:00
Audrey Sage Lorberfeld
926e452f44
partners: update version header for Pinecone integration (#28481)
Just need to update the version header used with Pinecone in
recently-merged method (from [this
PR](https://github.com/langchain-ai/langchain/pull/28320/files#r1867820929)).

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-03 18:08:56 +00:00
Erick Friis
7315360907
openai: dont populate logit_bias if None (#28482) 2024-12-03 17:54:53 +00:00
Erick Friis
ff675c11f6
partners/pinecone: release 0.2.2 (#28466) 2024-12-03 06:49:35 +00:00
Audrey Sage Lorberfeld
6b7e93d4c7
pinecone: update pinecone client (#28320)
This PR updates the Pinecone client to `5.4.0`, as well as its
dependencies (`pinecone-plugin-inference` and
`pinecone-plugin-interface`).

Note: `pinecone-client` is now simply called `pinecone`.

**Question for reviewer(s):** should this PR also update the `pinecone`
dep in [the root dir's `poetry.lock`
file](https://github.com/langchain-ai/langchain/blob/master/poetry.lock#L6729)?
Was unsure. (I don't believe so b/c it seems pinned to a lower version
likely based on 3rd-party deps (e.g. Unstructured).)

--
TW: @audrey_sage_


---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1208693659122374
2024-12-02 22:47:09 -08:00
Erick Friis
000be1f32c
tests: init retriever standard tests (#28459) 2024-12-02 23:36:09 +00:00
Erick Friis
42d40d694b
partners/openai: release 0.2.11 (#28461) 2024-12-02 23:35:18 +00:00
Erick Friis
9f04416768
openai: set logit_bias to none instead of empty dict by default (#28460) 2024-12-02 15:30:32 -08:00
William FH
ecee41ab72
fix: Handle response metadata in merge_messages_runs (#28453) 2024-12-02 13:56:23 -08:00
lucasiscovici
60021e54b5
community: Add the additonnal kward 'context' for openai (#28351)
- **Description:** 
Add the additonnal kward 'context' for openai into
`convert_dict_to_message` and `convert_message_to_dict` functions.
2024-12-02 16:43:30 -05:00
ccurme
28487597b2
ollama[patch]: release 0.2.1 (#28458)
We inadvertently skipped 0.2.1, so release pipeline
[failed](https://github.com/langchain-ai/langchain/actions/runs/12126964367/job/33810204551).
2024-12-02 21:17:51 +00:00
ccurme
88d6d02b59
ollama[patch]: release 0.2.2 (#28456) 2024-12-02 14:57:30 -05:00
Bagatur
47433485e7
mistral[patch]: Release 0.2.3 (#28452) 2024-12-02 08:26:28 -08:00
Bagatur
49914e959a
community[patch]: Release 0.3.9 (#28451) 2024-12-02 16:23:37 +00:00
ccurme
c2f1d022a2
mistral[patch]: ensure tool call IDs in tool messages are correctly formatted (#28422)
Fixes tests for cross-provider compatibility:
https://github.com/langchain-ai/langchain/actions/runs/12085358877/job/33702420504#step:10:376
2024-11-29 13:56:06 +00:00
Alex Thomas
2813e86407
docs: Adds the langchain-neo4j package to the API docs (#28386)
This PR adds the `langchain-neo4j` package to the `libs/packages.yml` so
the API docs can be built.
2024-11-27 12:41:12 -08:00
Bagatur
b7e10bb199
langchain[patch]: Release 0.3.9 (#28399) 2024-11-27 20:06:11 +00:00
ccurme
a8b21afc08
qdrant[patch]: run python 3.13 in CI (#28394) 2024-11-27 12:22:17 -05:00
ccurme
ee6fc3f3f6
nomic[patch]: run python 3.13 in CI (#28393) 2024-11-27 17:08:15 +00:00
Massimiliano Pronesti
83586661d6
partners[chroma]: add retrieval of embedding vectors (#28290)
This PR adds an additional method to `Chroma` to retrieve the embedding
vectors, besides the most relevant Documents. This is sometimes of use
when you need to run a postprocessing algorithm on the retrieved results
based on the vectors, which has been the case for me lately.

Example issue (discussion) requesting this change:
https://github.com/langchain-ai/langchain/discussions/20383

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-11-27 16:34:02 +00:00
ccurme
733a6ad328
mistral[patch]: run python 3.13 in CI (#28392) 2024-11-27 11:29:04 -05:00
ccurme
b9bf7fd797
couchbase[patch]: run python 3.13 in CI (#28391) 2024-11-27 11:28:21 -05:00
Greg Hinch
5141f25a20
community[patch]: support numpy2 (#28184)
Follows on from #27991, updates the langchain-community package to
support numpy 2 versions

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-11-27 11:10:58 -05:00
LuisMSotamba
0901f11b0f
community: add truncation params when an openai assistant's run is created (#28158)
**Description:** When an OpenAI assistant is invoked, it creates a run
by default, allowing users to set only a few request fields. The
truncation strategy is set to auto, which includes previous messages in
the thread along with the current question until the context length is
reached. This causes token usage to grow incrementally:
consumed_tokens = previous_consumed_tokens + current_consumed_tokens.

This PR adds support for user-defined truncation strategies, giving
better control over token consumption.

**Issue:** High token consumption.
2024-11-27 10:53:53 -05:00
TheDannyG
607c60a594
partners/ollama: fix tool calling with nested schemas (#28225)
## Description

This PR addresses the following:

**Fixes Issue #25343:**
- Adds additional logic to parse shallowly nested JSON-encoded strings
in tool call arguments, allowing for proper parsing of responses like
that of Llama3.1 and 3.2 with nested schemas.
 
**Adds Integration Test for Fix:**
- Adds a Ollama specific integration test to ensure the issue is
resolved and to prevent regressions in the future.

**Fixes Failing Integration Tests:**
- Fixes failing integration tests (even prior to changes) caused by
`llama3-groq-tool-use` model. Previously,
tests`test_structured_output_async` and
`test_structured_output_optional_param` failed due to the model not
issuing a tool call in the response. Resolved by switching to
`llama3.1`.

## Issue
Fixes #25343.

## Dependencies
No dependencies.

____

Done in collaboration with @ishaan-upadhyay @mirajismail @ZackSteine.
2024-11-27 10:32:02 -05:00
ccurme
bb83abd037
community[patch]: remove sqlalchemy cap (#28389) 2024-11-27 10:20:36 -05:00
ccurme
42b8ad067d
chroma[patch]: test python 3.13 in CI (#28387) 2024-11-27 15:02:40 +00:00
William FH
585da22752
Init embeddings (#28370) 2024-11-27 08:25:10 +00:00
Bagatur
ffe7bd4832
langchain[patch]: init_chat_model provider in model string (#28367)
```python
llm = init_chat_model("openai:gpt-4o")
```
2024-11-27 00:20:25 -08:00
ccurme
8adc4a5bcc
langchain[patch]: update deprecation message for agent classes and constructors (#28369) 2024-11-26 16:07:13 -05:00
Mohammad Mohtashim
06fafc6651
Community: Marqo Index Setting GET Request Updated according to 2.x API version while keep backward compatability for 1.5.x (#28342)
- **Description:** `add_texts` was using `get_setting` for marqo client
which was being used according to 1.5.x API version. However, this PR
updates the `add_text` accounting for updated response payload for 2.x
and later while maintaining backward compatibility. Plus I have verified
this was the only place where marqo client was not accounting for
updated API version.
  - **Issue:** #28323

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-11-26 18:26:56 +00:00
willtai
7d95a10ada
langchain: Fix Neo4jVector vector store reference from partner package for self query (#28292)
_This should only be merged once neo4j is included under libs/partners._

# **Description:**

Neo4jVector from langchain-community is being moved to langchain-neo4j:
[see
link](https://github.com/langchain-ai/langchain-neo4j/blob/main/libs/neo4j/langchain_neo4j/vectorstores/neo4j_vector.py#L436).

To solve the issue below, this PR adds an attempt to import
`Neo4jVector` from the partner package `langchain-neo4j`, similarly to
the other partner packages.

# **Issue:**
When initializing `SelfQueryRetriever`, the following error is raised:

```
ValueError: Self query retriever with Vector Store type <class 'langchain_neo4j.vectorstores.neo4j_vector.Neo4jVector'> not supported.
```

[See related
issue](https://github.com/langchain-ai/langchain/issues/19748).

# **Dependencies:**
- langchain-neo4j
2024-11-26 13:21:04 -05:00
ccurme
a1c90794e1
ollama[patch]: bump to 0.4.1 in lock file (#28365) 2024-11-26 18:19:31 +00:00
ccurme
74d9d2cba1
ollama[patch]: support ollama 0.4 (#28364)
v0.4 of the Python SDK is already installed via the lock file in CI, but
our current implementation is not compatible with it.

This also addresses an issue introduced in
https://github.com/langchain-ai/langchain/pull/28299. @RyanMagnuson
would you mind explaining the motivation for that change? From what I
can tell the Ollama SDK [does not support
kwargs](6c44bb2729/ollama/_client.py (L286)).
Previously, unsupported kwargs were ignored, but they currently raise
`TypeError`.

Some of LangChain's standard test suite expects `tool_choice` to be
supported, so here we catch it in `bind_tools` so it is ignored and not
passed through to the client.
2024-11-26 12:45:59 -05:00
Bagatur
e9c16552fa
openai[patch]: bump core dep (#28361) 2024-11-26 08:37:05 -08:00
Bagatur
e7dc26aefb
openai[patch]: Release 0.2.10 (#28360) 2024-11-26 08:30:29 -08:00
ccurme
42b18824c2
openai[patch]: use max_completion_tokens in place of max_tokens (#26917)
`max_tokens` is deprecated:
https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-11-26 16:30:19 +00:00
Greg Hinch
869c8f5879
langchain[patch]: support numpy 2 (#28183)
Follows on from #27991, updates the langchain package to support numpy 2
versions

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-11-26 11:20:02 -05:00
ccurme
7b9a0d9ed8
docs: update tutorials (#28219) 2024-11-26 10:43:12 -05:00
Richard Hao
c161f7d46f
docs(create_sql_agent): fix reStructured Text Markup (#28356)
- **Description:** Lines of code must be indented beneath `..
code-block::` for proper formatting.
https://devguide.python.org/documentation/markup/#showing-code-examples
- **Issue:** The example code block on the `create_sql_agent` document
page is not properly rendered.

https://python.langchain.com/api_reference/community/agent_toolkits/langchain_community.agent_toolkits.sql.base.create_sql_agent.html#langchain_community.agent_toolkits.sql.base.create_sql_agent

<img width="933" alt="image"
src="https://github.com/user-attachments/assets/d764bcad-e412-408b-ab0b-9a78a11188ee">
2024-11-26 09:52:16 -05:00
Mohammad Mohtashim
195ae7baa3
Community: Adding citations in AIMessage for ChatPerplexity (#28321)
**Description**: Adding Citation in response payload of ChatPerplexity
**Issue**: #28108
2024-11-26 09:45:47 -05:00
ccurme
a5374952f8
community[patch]: fix import in test (#28339)
Library name was updated after
https://github.com/langchain-ai/langchain/pull/27879 branched off
master.
2024-11-25 19:28:01 +00:00
Alex Thomas
5867f25ff3
community[patch]: Neo4j community deprecation (#28130)
Adds deprecation notices for Neo4j components moving to the
`langchain_neo4j` partner package.

- Adds deprecation warnings to all Neo4j-related classes and functions
that have been migrated to the new `langchain_neo4j` partner package
- Updates documentation to reference the new `langchain_neo4j` package
instead of `langchain_community`
2024-11-25 10:34:22 -08:00
Yan
c60695a1c7
community: fixed critical bugs at Writer provider (#27879) 2024-11-25 12:03:37 -05:00
Yelin Zhang
6ed2d387bb
docs: fix GOOGLE_API_KEY typo (#28322)
fix small GOOGLE_API_KEY markdown formatting typo
2024-11-25 09:45:22 -05:00
ccurme
a83357dc5a
community[patch]: release 0.3.8 (#28316) 2024-11-23 08:21:21 -05:00
ccurme
82bb0cdfff
langchain[patch]: release 0.3.8 (#28315) 2024-11-23 13:02:10 +00:00
ccurme
f5f1149257
core[patch]: release 0.3.21 (#28314) 2024-11-23 12:46:56 +00:00
Eugene Yurtsev
563587e14f
langchain[patch]: Compat with pydantic 2.10 (#28307)
pydantic compat 2.10 for langchain
2024-11-23 03:21:27 +00:00
Eugene Yurtsev
a813d11c14
core[patch]: Compat pydantic 2.10 (#28308)
pydantic 2.10 compat for langchain-core
2024-11-22 21:44:55 -05:00
ccurme
25a636c597
langchain[patch]: update deprecation message for MapReduceChain (#28304)
Link migration guide first.
2024-11-23 00:47:52 +00:00
ccurme
203d20caa5
community[patch]: fix errors introduced by pydantic 2.10 (#28297) 2024-11-22 17:50:13 -05:00
Erick Friis
aa7fa80e1e
partners/ollama: release 0.2.2rc1 (#28300) 2024-11-22 22:25:05 +00:00
Erick Friis
7277794a59
ollama: include kwargs in requests (#28299)
courtesy of @ryanmagnuson
2024-11-22 14:15:42 -08:00
Pat Patterson
2ee37a1c7b
community: list valid values for LanceDB constructor's mode argument (#28296)
**Description:**

Currently, the docstring for `LanceDB.__init__()` provides the default
value for `mode`, but not the list of valid values. This PR adds that
list to the docstring.

**Issue:**

N/A

**Dependencies:**

N/A

**Twitter handle:**

`@metadaddy`

[Leaving as a reminder: If no one reviews your PR within a few days,
please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda,
hwchase17.]
2024-11-22 15:40:06 -05:00
ccurme
697dda5052
core[patch]: release 0.3.20 (#28293) 2024-11-22 14:04:29 -05:00
ccurme
a433039a56
core[patch]: support final AIMessage responses in tool_example_to_messages (#28267)
We have a test
[test_structured_few_shot_examples](ad4333ca03/libs/standard-tests/langchain_tests/integration_tests/chat_models.py (L546))
in standard integration tests that implements a version of tool-calling
few shot examples that works with ~all tested providers. The formulation
supported by ~all providers is: `human message, tool call, tool message,
AI reponse`.

Here we update
`langchain_core.utils.function_calling.tool_example_to_messages` to
support this formulation.

The `tool_example_to_messages` util is undocumented outside of our API
reference. IMO, if we are testing that this function works across all
providers, it can be helpful to feature it in our guides. The structured
few-shot examples we document at the moment require users to implement
this function and can be simplified.
2024-11-22 15:38:49 +00:00
Erick Friis
29f8a79ebe
groq,openai,mistralai: fix unit tests (#28279) 2024-11-22 04:54:01 +00:00
Erick Friis
9a717c9b32
docs: poetry publish 2 (#28277)
- **docs: poetry publish**
- **x**
- **x**
- **x**
- **x**
- **x**
- **x**
- **x**
- **x**
- **x**
2024-11-21 20:49:38 -08:00
Erick Friis
4ccb3e64c7
cli: release 0.0.33 (#28278) 2024-11-21 20:13:37 -08:00
Erick Friis
49254cde70
docs: poetry publish (#28275) 2024-11-22 03:10:03 +00:00
Erick Friis
b3ee1f8713
core: add space at end of error message link (#28270) 2024-11-21 22:19:59 +00:00
Erick Friis
5bc2df3060
standard-tests: troubleshooting docstrings (#28268) 2024-11-21 22:05:31 +00:00
Erick Friis
ad4333ca03
infra: disable vertex api build (#28266) 2024-11-21 10:37:17 -08:00
ccurme
56499cf58b
openai[patch]: unskip test and relax tolerance in embeddings comparison (#28262)
From what I can tell response using SDK is not deterministic:
```python
import numpy as np
import openai

documents = ["disallowed special token '<|endoftext|>'"]
model = "text-embedding-ada-002"

direct_output_1 = (
    openai.OpenAI()
    .embeddings.create(input=documents, model=model)
    .data[0]
    .embedding
)

for i in range(10):
    direct_output_2 = (
        openai.OpenAI()
        .embeddings.create(input=documents, model=model)
        .data[0]
        .embedding
    )
    print(f"{i}: {np.isclose(direct_output_1, direct_output_2).all()}")
```
```
0: True
1: True
2: True
3: True
4: False
5: True
6: True
7: True
8: True
9: True
```

See related discussion here:
https://community.openai.com/t/can-text-embedding-ada-002-be-made-deterministic/318054

Found the same result using `"text-embedding-3-small"`.
2024-11-21 10:23:10 -08:00
Priyanshi Garg
f5f53d1101
community: fix compatibility issue in kinetica chat model integration for Pydantic 2 (#28252)
Fixed a compatibility issue in the `load_messages_from_context()`
function for the Kinetica chat model integration. The issue was caused
by stricter validation introduced in Pydantic 2.
2024-11-21 09:33:00 -05:00
Erick Friis
d1108607f4
multiple: push deprecation removals to 1.0 (#28236) 2024-11-20 19:56:29 -08:00
Erick Friis
4f76246cf2
standard-tests: release 0.3.4 (#28245) 2024-11-20 19:35:58 -08:00
Erick Friis
4bdf1d7d1a
standard-tests: fix decorator init test (#28246) 2024-11-21 03:35:43 +00:00
Erick Friis
60e572f591
standard-tests: tool tests (#28244) 2024-11-20 19:26:16 -08:00
Erick Friis
35e6052df5
infra: remove stale dockerfiles from repo (#28243)
deleting the following docker things from monorepo. they aren't
currently usable because of old dependencies, and I'd rather avoid
people using them / having to maintain them

- /docker
- this folder has a compose file that spins up postgres,pgvector
(separate from postgres and very stale version),mongo instance with
default user/password that we've gotten security pings about before. not
worth having
- also spins up a custom dockerfile with onttotext/graphdb - not even
sure what that is
- /libs/langchain/dockerfile + dev.dockerfile
  - super old poetry version, doesn't implement the right thing anymore
- .github/workflows/_release_docker.yml, langchain_release_docker.yml
  - not used anymore, not worth having an alternate release path
2024-11-21 00:05:01 +00:00
Erick Friis
161ab736ce
standard-tests: release 0.3.3 (#28242) 2024-11-20 23:47:02 +00:00
Eugene Yurtsev
2acc83f146
mistralai[patch]: 0.2.2 release (#28240)
mistralai 0.2.2 release
2024-11-20 22:18:15 +00:00
Eugene Yurtsev
1a66175e38
mistral[patch]: Propagate tool call id (#28238)
mistralai-large-2411 requires tool call id

Older models accept tool call id if its provided

mistral-large-2407 
mistral-large-2402
2024-11-20 17:02:30 -05:00
shroominic
dee72c46c1
community: Outlines integration (#27449)
In collaboration with @rlouf I build an
[outlines](https://dottxt-ai.github.io/outlines/latest/) integration for
langchain!

I think this is really useful for doing any type of structured output
locally.
[Dottxt](https://dottxt.co) spend alot of work optimising this process
at a lower level
([outlines-core](https://pypi.org/project/outlines-core/0.1.14/) written
in rust) so I think this is a better alternative over all current
approaches in langchain to do structured output.
It also implements the `.with_structured_output` method so it should be
a drop in replacement for a lot of applications.

The integration includes:
- **Outlines LLM class**
- **ChatOutlines class**
- **Tutorial Cookbooks**
- **Documentation Page**
- **Validation and error messages** 
- **Exposes Outlines Structured output features**
- **Support for multiple backends**
- **Integration and Unit Tests**

Dependencies: `outlines` + additional (depending on backend used)

I am not sure if the unit-tests comply with all requirements, if not I
suggest to just remove them since I don't see a useful way to do it
differently.

### Quick overview:

Chat Models:
<img width="698" alt="image"
src="https://github.com/user-attachments/assets/05a499b9-858c-4397-a9ff-165c2b3e7acc">

Structured Output:
<img width="955" alt="image"
src="https://github.com/user-attachments/assets/b9fcac11-d3e5-4698-b1ae-8c4cb3d54c45">

---------

Co-authored-by: Vadym Barda <vadym@langchain.dev>
2024-11-20 16:31:31 -05:00
Mikelarg
2901fa20cc
community: Add deprecation warning for GigaChat integration in langchain-community (#28022)
- **Description:** We have released the
[langchain-gigachat](https://github.com/ai-forever/langchain-gigachat?tab=readme-ov-file)
with new GigaChat integration that support's function/tool calling. This
PR deprecated legacy GigaChat class in community package.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-20 21:03:47 +00:00
Renzo-vS
567dc1e422
community: fix duplicate content (#28003)
Thank you for reading my first PR!

**Description:**
Deduplicate content in AzureSearch vectorstore.
Currently, by default, the content of the retrieval is placed both in
metadata and page_content of a Document.
This PR removes the content from metadata, and leaves it in
page_content.

**Issue:**:
Previously, the content was popped from result before metadata was
populated.
In #25828 , the order was changed which leads to a response with
duplicated content.
This was not the intention of that PR and seems undesirable.

Looking forward to seeing my contribution in the next version!

Cheers, 
Renzo
2024-11-20 12:49:03 -08:00
Jorge Piedrahita Ortiz
abaea28417
community: SamabanovaCloud tool calling and Structured output (#27967)
**Description:** Add tool calling and structured output support for
SambaNovaCloud chat models, docs included

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-20 19:12:08 +00:00
af su
7c7ee07d30
huggingface[fix]: HuggingFaceEndpointEmbeddings model parameter passing error when async embed (#27953)
This change refines the handling of _model_kwargs in POST requests.
Instead of nesting _model_kwargs as a dictionary under the parameters
key, it is now directly unpacked and merged into the request's JSON
payload. This ensures that the model parameters are passed correctly and
avoids unnecessary nesting.E. g.:

```python
import asyncio

from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings

embedding_input = ["This input will get multiplied" * 10000]

embeddings = HuggingFaceEndpointEmbeddings(
    model="http://127.0.0.1:8081/embed",
    model_kwargs={"truncate": True},
)

# Truncated parameters in synchronized methods are handled correctly
embeddings.embed_documents(texts=embedding_input)
# The truncate parameter is not handled correctly in the asynchronous method,
# and 413 Request Entity Too Large is returned.
asyncio.run(embeddings.aembed_documents(texts=embedding_input))
```

Co-authored-by: af su <saf@zjuici.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-20 19:08:56 +00:00
Eric Pinzur
923ef85105
langchain_chroma: fixed integration tests (#27968)
Description:
* I'm planning to add `Document.id` support to the Chroma VectorStore,
but first I wanted to make sure all the integration tests were passing
first. They weren't. This PR fixes the broken tests.
* I found 2 issues:
* This change (from a year ago, exactly :) ) for supporting multi-modal
embeddings:
https://docs.trychroma.com/deployment/migration#migration-to-0.4.16---november-7,-2023
* This change https://github.com/langchain-ai/langchain/pull/27827 due
to an update in the chroma client.
  
Also ran `format` and `lint` on the changes.

Note: I am not a member of the Chroma team.
2024-11-20 11:05:02 -08:00
CLOVA Studio 개발
218b4e073e
community: fix some features on Naver ChatModel & embedding model (#28228)
# Description

- adding stopReason to response_metadata to call stream and astream
- excluding NCP_APIGW_API_KEY input required validation
- to remove warning Field "model_name" has conflict with protected
namespace "model_".

cc. @vbarda
2024-11-20 10:35:41 -08:00
Erick Friis
43e24cd4a1
docs, standard-tests: property tags, support tool decorator (#28234) 2024-11-20 17:19:03 +00:00
William FH
197b885911
[CLI] Relax constraints (#28218) 2024-11-19 09:31:56 -08:00
Eugene Yurtsev
5599a0a537
core[minor]: Add other langgraph packages to sys_info (#28190)
Add other langgraph packages to sys_info output
2024-11-19 09:20:25 -05:00
Erick Friis
0dbaf05bb7
standard-tests: rename langchain_standard_tests to langchain_tests, release 0.3.2 (#28203) 2024-11-18 19:10:39 -08:00
Erick Friis
d9d689572a
openai: release 0.2.9, o1 streaming (#28197) 2024-11-18 23:54:38 +00:00
DreamOfStars
22a8652ecc
langchain: add missing punctuation in react_single_input.py (#28161)
- [x] **PR title**: "langchain: add missing punctuation in
react_single_input.py"

- [x] **PR message**: 
- **Description:** Add missing single quote to line 12: "Invalid Format:
Missing 'Action:' after 'Thought:"
2024-11-18 09:38:48 -05:00
Eric Pinzur
0a57fc0016
community: OpenSearchVectorStore: use engine set at init() time by default (#28147)
Description:
* Updated the OpenSearchVectorStore to use the `engine` parameter
captured at `init()` time as the default when adding documents to the
store.

Formatted, Linted, and Tested.
2024-11-16 17:07:42 -05:00
Erick Friis
6d2004ee7d
multiple: langchain-standard-tests -> langchain-tests (#28139) 2024-11-15 11:32:04 -08:00
Erick Friis
409c7946ac
docs, standard-tests: how to standard test a custom tool, imports (#27931) 2024-11-15 10:49:14 -08:00
alex shengzhi li
39fcb476fd
community: add reka chat model integration (#27379) 2024-11-15 13:37:14 -05:00
Erick Friis
d3252b7417
core: release 0.3.19 (#28137) 2024-11-15 18:15:28 +00:00
Jorge Piedrahita Ortiz
39956a3ef0
community: sambanovacloud llm integration (#27526)
- **Description:** SambaNovaCloud llm integration added, previously only
chat model integration

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-11-15 16:58:11 +00:00
Elham Badri
d696728278
partners/ollama: Enabled Token Level Streaming when Using Bind Tools for ChatOllama (#27689)
**Description:** The issue concerns the unexpected behavior observed
using the bind_tools method in LangChain's ChatOllama. When tools are
not bound, the llm.stream() method works as expected, returning
incremental chunks of content, which is crucial for real-time
applications such as conversational agents and live feedback systems.
However, when bind_tools([]) is used, the streaming behavior changes,
causing the output to be delivered in full chunks rather than
incrementally. This change negatively impacts the user experience by
breaking the real-time nature of the streaming mechanism.
**Issue:** #26971

---------

Co-authored-by: 4meyDam1e <amey.damle@mail.utoronto.ca>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-11-15 11:36:27 -05:00
ccurme
776e3271e3
standard-tests[patch]: add test for async tool calling (#28133) 2024-11-15 16:09:50 +00:00
Vadym Barda
ed4952e475
core[patch]: add caching to get_function_nonlocals (#28131) 2024-11-15 07:53:53 -08:00
ccurme
f1222739f8
core[patch]: support numpy 2 (#27991) 2024-11-14 13:08:57 -05:00
Vadym Barda
6ec688cf2b
xai[patch]: update core (#28092) 2024-11-13 17:51:51 +00:00
Bharat Ramanathan
3e972faf81
community: chore warn deprecate the tracer (#27159)
- **Description:**: This PR deprecates the wandb tracer in favor of the
new
[WeaveTracer](https://weave-docs.wandb.ai/guides/integrations/langchain#using-weavetracer)
in W&B
- **Dependencies:** No dependencies, just a deprecation warning.
- **Twitter handle:** @parambharat


@baskaryan
2024-11-13 11:33:34 -05:00
Erick Friis
76e0127539
core: release 0.3.18 (#28070) 2024-11-13 16:19:13 +00:00
Eric Pinzur
eadc2f6a90
core: added DeleteResponse to the module (#28069)
Description:
* added `DeleteResponse` to the `langchain_core.indexing` module, for
implementing DocumentIndex classes.
2024-11-13 11:08:08 -05:00
ZhangShenao
c89e7ce8b5
core[patch]: Update doc-strings in callbacks (#28073)
- Fix api docs
2024-11-13 11:07:15 -05:00
Vadym Barda
09e85c7c4b
xai[patch]: update dependencies (#28067) 2024-11-12 16:15:17 -05:00
am-kinetica
a646f1c383
Handled empty search result handling and updated the notebook (#27914)
- [ ] **PR title**: "community: updated Kinetica vectorstore"

  - **Description:** Handled empty search results
  - **Issue:** used to throw error if the search results were empty

@efriis
2024-11-12 13:03:49 -08:00
ccurme
00e7b2dada
anthropic[patch]: add examples to API ref (#28065) 2024-11-12 20:17:02 +00:00
Vadym Barda
48ee322a78
partners: add xAI chat integration (#28032) 2024-11-12 15:11:29 -05:00
ccurme
2898b95ca7
anthropic[major]: release 0.3.0 (#28063) 2024-11-12 14:58:00 -05:00
ccurme
5eaa0e8c45
openai[patch]: release 0.2.8 (#28062) 2024-11-12 14:57:11 -05:00
ccurme
15b7dd3ad7
community[patch]: release 0.3.7 (#28061) 2024-11-12 19:54:58 +00:00
ccurme
5460096086
core[patch]: release 0.3.17 (#28060) 2024-11-12 19:38:56 +00:00
ccurme
1538ee17f9
anthropic[major]: support python 3.13 (#27916)
Last week Anthropic released version 0.39.0 of its python sdk, which
enabled support for Python 3.13. This release deleted a legacy
`client.count_tokens` method, which we currently access during init of
the `Anthropic` LLM. Anthropic has replaced this functionality with the
[client.beta.messages.count_tokens()
API](https://github.com/anthropics/anthropic-sdk-python/pull/726).

To enable support for `anthropic >= 0.39.0` and Python 3.13, here we
drop support for the legacy token counting method, and add support for
the new method via `ChatAnthropic.get_num_tokens_from_messages`.

To fully support the token counting API, we update the signature of
`get_num_tokens_from_message` to accept tools everywhere.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-11-12 14:31:07 -05:00
ZhangShenao
ca7375ac20
Improvement[Community]Improve Embeddings API (#28038)
- Fix `BaichuanTextEmbeddings` api url
- Remove unused params in api doc
- Fix word spelling
2024-11-12 13:57:35 -05:00
Bagatur
139881b108
openai[patch]: fix azure oai stream check (#28048) 2024-11-12 15:42:06 +00:00
Bagatur
9611f0b55d
openai[patch]: Release 0.2.7 (#28047) 2024-11-12 15:16:15 +00:00
Bagatur
5c14e1f935
community[patch]: Release 0.3.6 (#28046) 2024-11-12 15:15:07 +00:00
Bagatur
9ebd7ebed8
core[patch]: Release 0.3.16 (#28045) 2024-11-12 14:57:15 +00:00
Bagatur
33dbfba08b
openai[patch]: default to invoke on o1 stream() (#27983) 2024-11-08 19:12:59 -08:00
Eric Pinzur
c421997caa
community[patch]: Added type hinting to OpenSearch clients (#27946)
Description:
* When working with OpenSearchVectorSearch to make
OpenSearchGraphVectorStore (coming soon), I noticed that there wasn't
type hinting for the underlying OpenSearch clients. This fixes that
issue.
* Confirmed tests are still passing with code changes.

Note that there is some additional code duplication now, but I think
this approach is cleaner overall.
2024-11-08 11:04:57 -08:00
Saad Makrod
b509747c7f
Community: Google Books API Tool (#27307)
## Description

As proposed in our earlier discussion #26977 we have introduced a Google
Books API Tool that leverages the Google Books API found at
[https://developers.google.com/books/docs/v1/using](https://developers.google.com/books/docs/v1/using)
to generate book recommendations.

### Sample Usage

```python
from langchain_community.tools import GoogleBooksQueryRun
from langchain_community.utilities import GoogleBooksAPIWrapper

api_wrapper = GoogleBooksAPIWrapper()
tool = GoogleBooksQueryRun(api_wrapper=api_wrapper)

tool.run('ai')
```

### Sample Output

```txt
Here are 5 suggestions based off your search for books related to ai:

1. "AI's Take on the Stigma Against AI-Generated Content" by Sandy Y. Greenleaf: In a world where artificial intelligence (AI) is rapidly advancing and transforming various industries, a new form of content creation has emerged: AI-generated content. However, despite its potential to revolutionize the way we produce and consume information, AI-generated content often faces a significant stigma. "AI's Take on the Stigma Against AI-Generated Content" is a groundbreaking book that delves into the heart of this issue, exploring the reasons behind the stigma and offering a fresh, unbiased perspective on the topic. Written from the unique viewpoint of an AI, this book provides readers with a comprehensive understanding of the challenges and opportunities surrounding AI-generated content. Through engaging narratives, thought-provoking insights, and real-world examples, this book challenges readers to reconsider their preconceptions about AI-generated content. It explores the potential benefits of embracing this technology, such as increased efficiency, creativity, and accessibility, while also addressing the concerns and drawbacks that contribute to the stigma. As you journey through the pages of this book, you'll gain a deeper understanding of the complex relationship between humans and AI in the realm of content creation. You'll discover how AI can be used as a tool to enhance human creativity, rather than replace it, and how collaboration between humans and machines can lead to unprecedented levels of innovation. Whether you're a content creator, marketer, business owner, or simply someone curious about the future of AI and its impact on our society, "AI's Take on the Stigma Against AI-Generated Content" is an essential read. With its engaging writing style, well-researched insights, and practical strategies for navigating this new landscape, this book will leave you equipped with the knowledge and tools needed to embrace the AI revolution and harness its potential for success. Prepare to have your assumptions challenged, your mind expanded, and your perspective on AI-generated content forever changed. Get ready to embark on a captivating journey that will redefine the way you think about the future of content creation.
Read more at https://play.google.com/store/books/details?id=4iH-EAAAQBAJ&source=gbs_api

2. "AI Strategies For Web Development" by Anderson Soares Furtado Oliveira: From fundamental to advanced strategies, unlock useful insights for creating innovative, user-centric websites while navigating the evolving landscape of AI ethics and security Key Features Explore AI's role in web development, from shaping projects to architecting solutions Master advanced AI strategies to build cutting-edge applications Anticipate future trends by exploring next-gen development environments, emerging interfaces, and security considerations in AI web development Purchase of the print or Kindle book includes a free PDF eBook Book Description If you're a web developer looking to leverage the power of AI in your projects, then this book is for you. Written by an AI and ML expert with more than 15 years of experience, AI Strategies for Web Development takes you on a transformative journey through the dynamic intersection of AI and web development, offering a hands-on learning experience.The first part of the book focuses on uncovering the profound impact of AI on web projects, exploring fundamental concepts, and navigating popular frameworks and tools. As you progress, you'll learn how to build smart AI applications with design intelligence, personalized user journeys, and coding assistants. Later, you'll explore how to future-proof your web development projects using advanced AI strategies and understand AI's impact on jobs. Toward the end, you'll immerse yourself in AI-augmented development, crafting intelligent web applications and navigating the ethical landscape.Packed with insights into next-gen development environments, AI-augmented practices, emerging realities, interfaces, and security governance, this web development book acts as your roadmap to staying ahead in the AI and web development domain. What you will learn Build AI-powered web projects with optimized models Personalize UX dynamically with AI, NLP, chatbots, and recommendations Explore AI coding assistants and other tools for advanced web development Craft data-driven, personalized experiences using pattern recognition Architect effective AI solutions while exploring the future of web development Build secure and ethical AI applications following TRiSM best practices Explore cutting-edge AI and web development trends Who this book is for This book is for web developers with experience in programming languages and an interest in keeping up with the latest trends in AI-powered web development. Full-stack, front-end, and back-end developers, UI/UX designers, software engineers, and web development enthusiasts will also find valuable information and practical guidelines for developing smarter websites with AI. To get the most out of this book, it is recommended that you have basic knowledge of programming languages such as HTML, CSS, and JavaScript, as well as a familiarity with machine learning concepts.
Read more at https://play.google.com/store/books/details?id=FzYZEQAAQBAJ&source=gbs_api

3. "Artificial Intelligence for Students" by Vibha Pandey: A multifaceted approach to develop an understanding of AI and its potential applications KEY FEATURES ● AI-informed focuses on AI foundation, applications, and methodologies. ● AI-inquired focuses on computational thinking and bias awareness. ● AI-innovate focuses on creative and critical thinking and the Capstone project. DESCRIPTION AI is a discipline in Computer Science that focuses on developing intelligent machines, machines that can learn and then teach themselves. If you are interested in AI, this book can definitely help you prepare for future careers in AI and related fields. The book is aligned with the CBSE course, which focuses on developing employability and vocational competencies of students in skill subjects. The book is an introduction to the basics of AI. It is divided into three parts – AI-informed, AI-inquired and AI-innovate. It will help you understand AI's implications on society and the world. You will also develop a deeper understanding of how it works and how it can be used to solve complex real-world problems. Additionally, the book will also focus on important skills such as problem scoping, goal setting, data analysis, and visualization, which are essential for success in AI projects. Lastly, you will learn how decision trees, neural networks, and other AI concepts are commonly used in real-world applications. By the end of the book, you will develop the skills and competencies required to pursue a career in AI. WHAT YOU WILL LEARN ● Get familiar with the basics of AI and Machine Learning. ● Understand how and where AI can be applied. ● Explore different applications of mathematical methods in AI. ● Get tips for improving your skills in Data Storytelling. ● Understand what is AI bias and how it can affect human rights. WHO THIS BOOK IS FOR This book is for CBSE class XI and XII students who want to learn and explore more about AI. Basic knowledge of Statistical concepts, Algebra, and Plotting of equations is a must. TABLE OF CONTENTS 1. Introduction: AI for Everyone 2. AI Applications and Methodologies 3. Mathematics in Artificial Intelligence 4. AI Values (Ethical Decision-Making) 5. Introduction to Storytelling 6. Critical and Creative Thinking 7. Data Analysis 8. Regression 9. Classification and Clustering 10. AI Values (Bias Awareness) 11. Capstone Project 12. Model Lifecycle (Knowledge) 13. Storytelling Through Data 14. AI Applications in Use in Real-World
Read more at https://play.google.com/store/books/details?id=ptq1EAAAQBAJ&source=gbs_api

4. "The AI Book" by Ivana Bartoletti, Anne Leslie and Shân M. Millie: Written by prominent thought leaders in the global fintech space, The AI Book aggregates diverse expertise into a single, informative volume and explains what artifical intelligence really means and how it can be used across financial services today. Key industry developments are explained in detail, and critical insights from cutting-edge practitioners offer first-hand information and lessons learned. Coverage includes: · Understanding the AI Portfolio: from machine learning to chatbots, to natural language processing (NLP); a deep dive into the Machine Intelligence Landscape; essentials on core technologies, rethinking enterprise, rethinking industries, rethinking humans; quantum computing and next-generation AI · AI experimentation and embedded usage, and the change in business model, value proposition, organisation, customer and co-worker experiences in today’s Financial Services Industry · The future state of financial services and capital markets – what’s next for the real-world implementation of AITech? · The innovating customer – users are not waiting for the financial services industry to work out how AI can re-shape their sector, profitability and competitiveness · Boardroom issues created and magnified by AI trends, including conduct, regulation & oversight in an algo-driven world, cybersecurity, diversity & inclusion, data privacy, the ‘unbundled corporation’ & the future of work, social responsibility, sustainability, and the new leadership imperatives · Ethical considerations of deploying Al solutions and why explainable Al is so important
Read more at http://books.google.ca/books?id=oE3YDwAAQBAJ&dq=ai&hl=&source=gbs_api

5. "Artificial Intelligence in Society" by OECD: The artificial intelligence (AI) landscape has evolved significantly from 1950 when Alan Turing first posed the question of whether machines can think. Today, AI is transforming societies and economies. It promises to generate productivity gains, improve well-being and help address global challenges, such as climate change, resource scarcity and health crises.
Read more at https://play.google.com/store/books/details?id=eRmdDwAAQBAJ&source=gbs_api
```

## Issue 

This closes #27276 

## Dependencies

No additional dependencies were added

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-07 15:29:35 -08:00
Erick Friis
733e43eed0
docs: new stack diagram (#27972) 2024-11-07 22:46:56 +00:00
Erick Friis
a073c4c498
templates,docs: leave templates in v0.2 (#27952)
all template installs will now have to declare `--branch v0.2` to make
clear they aren't compatible with langchain 0.3 (most have a pydantic v1
setup). e.g.

```
langchain-cli app add pirate-speak --branch v0.2
```
2024-11-07 22:23:48 +00:00
Shawn Lee
6f368e9eab
community: handle chatdeepinfra jsondecode error (#27603)
Fixes #27602 

Added error handling to return empty dict if args is empty string or
None.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-07 13:47:19 -08:00
Akshata
05fd6a16a9
Add ChatModels wrapper for Cloudflare Workers AI (#27645)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: chat models wrapper for Cloudflare
Workers AI"


- [x] **PR message**:
- **Description:** Add chat models wrapper for Cloudflare Workers AI.
Enables Langgraph intergration via ChatModel for tool usage, agentic
usage.


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-11-07 15:34:24 -05:00
Erick Friis
8a5b9bf2ad
box: migrate to repo (#27969) 2024-11-07 10:19:22 -08:00
ccurme
a747dbd24b
anthropic[patch]: remove retired model from tests (#27965)
`claude-instant` was [retired
yesterday](https://docs.anthropic.com/en/docs/resources/model-deprecations).
2024-11-07 16:16:29 +00:00
Aksel Joonas Reedi
2cb39270ec
community: bytes as a source to AzureAIDocumentIntelligenceLoader (#26618)
- **Description:** This PR adds functionality to pass in in-memory bytes
as a source to `AzureAIDocumentIntelligenceLoader`.
- **Issue:** I needed the functionality, so I added it.
- **Dependencies:** NA
- **Twitter handle:** @akseljoonas if this is a big enough change :)

---------

Co-authored-by: Aksel Joonas Reedi <aksel@klippa.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-07 03:40:21 +00:00
Martin Triska
7a9149f5dd
community: ZeroxPDFLoader (#27800)
# OCR-based PDF loader

This implements [Zerox](https://github.com/getomni-ai/zerox) PDF
document loader.
Zerox utilizes simple but very powerful (even though slower and more
costly) approach to parsing PDF documents: it converts PDF to series of
images and passes it to a vision model requesting the contents in
markdown.

It is especially suitable for complex PDFs that are not parsed well by
other alternatives.

## Example use:
```python
from langchain_community.document_loaders.pdf import ZeroxPDFLoader

os.environ["OPENAI_API_KEY"] = "" ## your-api-key

model = "gpt-4o-mini" ## openai model
pdf_url = "https://assets.ctfassets.net/f1df9zr7wr1a/soP1fjvG1Wu66HJhu3FBS/034d6ca48edb119ae77dec5ce01a8612/OpenAI_Sacra_Teardown.pdf"

loader = ZeroxPDFLoader(file_path=pdf_url, model=model)
docs = loader.load()
```

The Zerox library supports wide range of provides/models. See Zerox
documentation for details.

- **Dependencies:** `zerox`
- **Twitter handle:** @martintriska1

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-11-07 03:14:57 +00:00
Dmitriy Prokopchuk
53b0a99f37
community: Memcached LLM Cache Integration (#27323)
## Description
This PR adds support for Memcached as a usable LLM model cache by adding
the ```MemcachedCache``` implementation relying on the
[pymemcache](https://github.com/pinterest/pymemcache) client.

Unit test-wise, the new integration is generally covered under existing
import testing. All new functionality depends on pymemcache if
instantiated and used, so to comply with the other cache implementations
the PR also adds optional integration tests for ```MemcachedCache```.

Since this is a new integration, documentation is added for Memcached as
an integration and as an LLM Cache.

## Issue
This PR closes #27275 which was originally raised as a discussion in
#27035

## Dependencies
There are no new required dependencies for langchain, but
[pymemcache](https://github.com/pinterest/pymemcache) is required to
instantiate the new ```MemcachedCache```.

## Example Usage
```python3
from langchain.globals import set_llm_cache
from langchain_openai import OpenAI

from langchain_community.cache import MemcachedCache
from pymemcache.client.base import Client

llm = OpenAI(model="gpt-3.5-turbo-instruct", n=2, best_of=2)
set_llm_cache(MemcachedCache(Client('localhost')))

# The first time, it is not yet in cache, so it should take longer
llm.invoke("Which city is the most crowded city in the USA?")

# The second time it is, so it goes faster
llm.invoke("Which city is the most crowded city in the USA?")
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-07 03:07:59 +00:00
ZhangShenao
c2072d909a
Improvement[Partner] Improve qdrant vector store (#27251)
- Add static method decorator
- Add args for api doc
- Fix word spelling

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-07 02:42:41 +00:00
Baptiste Pasquier
81f7daa458
community: add InfinityRerank (#27043)
**Description:** 

- Add a Reranker for Infinity server.

**Dependencies:** 

This wrapper uses
[infinity_client](https://github.com/michaelfeil/infinity/tree/main/libs/client_infinity/infinity_client)
to connect to an Infinity server.

**Tests and docs**

- integration test: test_infinity_rerank.py
- example notebook: infinity_rerank.ipynb
[here](https://github.com/baptiste-pasquier/langchain/blob/feat/infinity-rerank/docs/docs/integrations/document_transformers/infinity_rerank.ipynb)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-06 17:26:30 -08:00
Martin Triska
90189f5639
community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716)
## What this PR does?

### Currently `O365BaseLoader` (and consequently both derived loaders)
are limited to `pdf`, `doc`, `docx` files.
- **Solution: here we introduce _handlers_ attribute that allows for
custom handlers to be passed in. This is done in _dict_ form:**

**Example:**
```python
from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser
# PR for DocumentLoaderAsParser here: https://github.com/langchain-ai/langchain/pull/27749
from langchain_community.document_loaders.excel import UnstructuredExcelLoader

xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged")

# create dictionary mapping file types to handlers (parsers)
handlers = {
    "doc": MsWordParser()
    "pdf": PDFMinerParser()
    "txt": TextParser()
    "xlsx": xlsx_parser
}
loader = SharePointLoader(document_library_id="...",
                            handlers=handlers # pass handlers to SharePointLoader
                            )
documents = loader.load()

# works the same in OneDriveLoader
loader = OneDriveLoader(document_library_id="...",
                            handlers=handlers
                            )
```
This dictionary is then passed to `MimeTypeBasedParser` same as in the
[current
implementation](5a2cfb49e0/libs/community/langchain_community/document_loaders/parsers/registry.py (L13)).


### Currently `SharePointLoader` and `OneDriveLoader` are separate
loaders that both inherit from `O365BaseLoader`
However both of these implement the same functionality. The only
differences are:
- `SharePointLoader` requires argument `document_library_id` whereas
`OneDriveLoader` requires `drive_id`. These are just different names for
the same thing.
  - `SharePointLoader` implements significantly more features.
- **Solution: `OneDriveLoader` is replaced with an empty shell just
renaming `drive_id` to `document_library_id` and inheriting from
`SharePointLoader`**

**Dependencies:** None
**Twitter handle:** @martintriska1

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-11-06 17:44:34 -05:00
takahashi
482c168b3e
langchain_core: add file_type option to make file type default as png (#27855)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"

- [ ] **description**
langchain_core.runnables.graph_mermaid.draw_mermaid_png calls this
function, but the Mermaid API returns JPEG by default. To be consistent,
add the option `file_type` with the default `png` type.

- [ ] **Add tests and docs**: If you're adding a new integration, please
include
With this small change, I didn't add tests and docs.

- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more:
One long sentence was divided into two.

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-11-06 22:37:07 +00:00
Roman Solomatin
0f85dea8c8
langchain-huggingface: use separate kwargs for queries and docs (#27857)
Now `encode_kwargs` used for both for documents and queries and this
leads to wrong embeddings. E. g.:
```python
    model_kwargs = {"device": "cuda", "trust_remote_code": True}
    encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"}

    model = HuggingFaceEmbeddings(
        model_name="dunzhang/stella_en_400M_v5",
        model_kwargs=model_kwargs,
        encode_kwargs=encode_kwargs,
    )

    query_embedding = np.array(
        model.embed_query("What are some ways to reduce stress?",)
    )
    document_embedding = np.array(
        model.embed_documents(
            [
                "There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.",
                "Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.",
            ]
        )
    )
    print(model._client.similarity(query_embedding, document_embedding)) # output: tensor([[0.8421, 0.3317]], dtype=torch.float64)
```
But from the [model
card](https://huggingface.co/dunzhang/stella_en_400M_v5#sentence-transformers)
expexted like this:
```python
    model_kwargs = {"device": "cuda", "trust_remote_code": True}
    encode_kwargs = {"normalize_embeddings": False}
    query_encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"}

    model = HuggingFaceEmbeddings(
        model_name="dunzhang/stella_en_400M_v5",
        model_kwargs=model_kwargs,
        encode_kwargs=encode_kwargs,
        query_encode_kwargs=query_encode_kwargs,
    )

    query_embedding = np.array(
        model.embed_query("What are some ways to reduce stress?", )
    )
    document_embedding = np.array(
        model.embed_documents(
            [
                "There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.",
                "Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.",
            ]
        )
    )
    print(model._client.similarity(query_embedding, document_embedding)) # tensor([[0.8398, 0.2990]], dtype=torch.float64)
```
2024-11-06 17:35:39 -05:00
Bagatur
60123bef67
docs: fix trim_messages docstring (#27948) 2024-11-06 22:25:13 +00:00
Eric Pinzur
ea0ad917b0
community: added Document.id support to opensearch vectorstore (#27945)
Description:
* Added support of Document.id on OpenSearch vector store
* Added tests cases to match
2024-11-06 15:04:09 -05:00
Bagatur
67ce05a0a7
core[patch]: make oai tool description optional (#27756) 2024-11-06 18:06:47 +00:00
Bagatur
b2da3115ed
docs: document init_chat_model standard params (#27812) 2024-11-06 09:50:07 -08:00
Dobiichi-Origami
395674d503
community: re-arrange function call message parse logic for Qianfan (#27935)
the [PR](https://github.com/langchain-ai/langchain/pull/26208) two month
ago has a potential bug which causes malfunction of `tool_call` for
`QianfanChatEndpoint` waiting for fix
2024-11-06 09:58:16 -05:00
ccurme
66966a6e72
openai[patch]: release 0.2.6 (#27924)
Some additions in support of [predicted
outputs](https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs)
feature:
- Bump openai sdk version
- Add integration test
- Add example to integration docs

The `prediction` kwarg is already plumbed through model invocation.
2024-11-05 23:02:24 +00:00
Erick Friis
a8c473e114
standard-tests: ci pipeline (#27923) 2024-11-05 20:55:38 +00:00
Erick Friis
bff2a8b772
standard-tests: add tools standard tests (#27899) 2024-11-05 11:44:34 -08:00
SHJUN
f6b2f82099
community: chroma error patch(attribute changed on chroma) (#27827)
There was a change of attribute name which was "max_batch_size". It's
now "get_max_batch_size" method.
I want to use "create_batches" which is right down below.

Please check this PR link.
reference: https://github.com/chroma-core/chroma/pull/2305

---------

Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
Co-authored-by: Prithvi Kannan <46332835+prithvikannan@users.noreply.github.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Jun Yamog <jkyamog@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ono-hiroki <86904208+ono-hiroki@users.noreply.github.com>
Co-authored-by: Dobiichi-Origami <56953648+Dobiichi-Origami@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Duy Huynh <vndee.huynh@gmail.com>
Co-authored-by: Rashmi Pawar <168514198+raspawar@users.noreply.github.com>
Co-authored-by: sifatj <26035630+sifatj@users.noreply.github.com>
Co-authored-by: Eric Pinzur <2641606+epinzur@users.noreply.github.com>
Co-authored-by: Daniel Vu Dao <danielvdao@users.noreply.github.com>
Co-authored-by: Ofer Mendelevitch <ofermend@gmail.com>
Co-authored-by: Stéphane Philippart <wildagsx@gmail.com>
2024-11-05 19:43:11 +00:00
Erick Friis
31f4fb790d
standard-tests: release 0.3.0 (#27900) 2024-11-04 17:29:15 -08:00
Stéphane Philippart
4b8cd7a09a
community: Use new OVHcloud batch embedding (#26209)
- **Description:** change to do the batch embedding server side and not
client side
- **Twitter handle:** @wildagsx

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-11-04 16:40:30 -05:00
Ofer Mendelevitch
d7c39e6dbb
community: update Vectara integration (#27869)
Thank you for contributing to LangChain!

- **Description:** Updated Vectara integration
- **Issue:** refresh on descriptions across all demos and added UDF
reranker
- **Dependencies:** None
- **Twitter handle:** @ofermend

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-04 20:40:39 +00:00
Eric Pinzur
8eb38622a6
community: fixed bug in GraphVectorStoreRetriever (#27846)
Description:

This fixes an issue that mistakenly created in
https://github.com/langchain-ai/langchain/pull/27253. The issue
currently exists only in `langchain-community==0.3.4`.

Test cases were added to prevent this issue in the future.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-04 20:27:17 +00:00
Bagatur
dfa83531ad
qdrant,nomic[minor]: bump core deps (#27849) 2024-11-04 20:19:50 +00:00
Erick Friis
0c62684ce1
Revert "infra: add neo4j to package list" (#27887)
Reverts langchain-ai/langchain#27833

Wait for release
2024-11-04 18:18:38 +00:00
Erick Friis
bcf499df16
infra: add neo4j to package list (#27833) 2024-11-04 09:24:04 -08:00
Duy Huynh
a487ec47f4
community: set default output_token_limit value for PowerBIToolkit to fix validation error (#26308)
### Description:
This PR sets a default value of `output_token_limit = 4000` for the
`PowerBIToolkit` to fix the unintentionally validation error.

### Problem:
When attempting to run a code snippet from [Langchain's PowerBI toolkit
documentation](https://python.langchain.com/v0.1/docs/integrations/toolkits/powerbi/)
to interact with a `PowerBIDataset`, the following error occurs:

```
pydantic.v1.error_wrappers.ValidationError: 1 validation error for QueryPowerBITool
output_token_limit
  none is not an allowed value (type=type_error.none.not_allowed)
```

### Root Cause:
The issue arises because when creating a `QueryPowerBITool`, the
`output_token_limit` parameter is unintentionally set to `None`, which
is the current default for `PowerBIToolkit`. However, `QueryPowerBITool`
expects a default value of `4000` for `output_token_limit`. This
unintended override causes the error.


17659ca2cd/libs/community/langchain_community/agent_toolkits/powerbi/toolkit.py (L63)

17659ca2cd/libs/community/langchain_community/agent_toolkits/powerbi/toolkit.py (L72-L79)

17659ca2cd/libs/community/langchain_community/tools/powerbi/tool.py (L39)

### Solution:
To resolve this, the default value of `output_token_limit` is now
explicitly set to `4000` in `PowerBIToolkit` to prevent the accidental
assignment of `None`.

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-11-04 14:34:27 +00:00
Dobiichi-Origami
f7ced5b211
community: read function call from tool_calls for Qianfan (#26208)
I added one more 'elif' to read tool call message from `tool_calls`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-11-04 14:33:32 +00:00
Bagatur
3b0b7cfb74
chroma[minor]: release 0.2.0 (#27840) 2024-11-01 18:12:00 -07:00
Jun Yamog
830cad7bc0
core: fix CommaSeparatedListOutputParser to handle columns that may contain commas in it (#26365)
- **Description:**
Currently CommaSeparatedListOutputParser can't handle strings that may
contain commas within a column. It would parse any commas as the
delimiter.
Ex. 
"foo, foo2", "bar", "baz"

It will create 4 columns: "foo", "foo2", "bar", "baz"

This should be 3 columns:

"foo, foo2", "bar", "baz"

- **Dependencies:**
Added 2 additional imports, but they are built in python packages.

import csv
from io import StringIO

- **Twitter handle:** @jkyamog

- [ ] **Add tests and docs**: 
1. added simple unit test test_multiple_items_with_comma

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-11-01 22:42:24 +00:00
Erick Friis
03a3670a5e
infra: remove some special cases (#27839) 2024-11-01 21:13:43 +00:00
Bagatur
002e1c9055
airbyte: remove from master (#27837) 2024-11-01 13:59:34 -07:00
Bagatur
ee63d21915
many: use core 0.3.15 (#27834) 2024-11-01 20:35:55 +00:00
William FH
b4cb2089a2
langchain[patch]: Add warning in react agent (#26980) 2024-10-31 22:29:34 +00:00
Ant White
e3ea365725
core: use friendlier names for duplicated nodes in mermaid output (#27747)
Thank you for contributing to LangChain!

- [x] **PR title**: "core: use friendlier names for duplicated nodes in
mermaid output"

- **Description:** When generating the Mermaid visualization of a chain,
if the chain had multiple nodes of the same type, the reid function
would replace their names with the UUID node_id. This made the generated
graph difficult to understand. This change deduplicates the nodes in a
chain by appending an index to their names.
- **Issue:** None
- **Discussion:**
https://github.com/langchain-ai/langchain/discussions/27714
- **Dependencies:** None

- [ ] **Add tests and docs**:  
- Currently this functionality is not covered by unit tests, happy to
add tests if you'd like


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

# Example Code:
```python
from langchain_core.runnables import RunnablePassthrough

def fake_llm(prompt: str) -> str: # Fake LLM for the example
    return "completion"

runnable = {
    'llm1':  fake_llm,
    'llm2':  fake_llm,
} | RunnablePassthrough.assign(
    total_chars=lambda inputs: len(inputs['llm1'] + inputs['llm2'])
)

print(runnable.get_graph().draw_mermaid(with_styles=False))
```

# Before
```mermaid
graph TD;
	Parallel_llm1_llm2_Input --> 0b01139db5ed4587ad37964e3a40c0ec;
	0b01139db5ed4587ad37964e3a40c0ec --> Parallel_llm1_llm2_Output;
	Parallel_llm1_llm2_Input --> a98d4b56bd294156a651230b9293347f;
	a98d4b56bd294156a651230b9293347f --> Parallel_llm1_llm2_Output;
	Parallel_total_chars_Input --> Lambda;
	Lambda --> Parallel_total_chars_Output;
	Parallel_total_chars_Input --> Passthrough;
	Passthrough --> Parallel_total_chars_Output;
	Parallel_llm1_llm2_Output --> Parallel_total_chars_Input;
```

# After
```mermaid
graph TD;
	Parallel_llm1_llm2_Input --> fake_llm_1;
	fake_llm_1 --> Parallel_llm1_llm2_Output;
	Parallel_llm1_llm2_Input --> fake_llm_2;
	fake_llm_2 --> Parallel_llm1_llm2_Output;
	Parallel_total_chars_Input --> Lambda;
	Lambda --> Parallel_total_chars_Output;
	Parallel_total_chars_Input --> Passthrough;
	Passthrough --> Parallel_total_chars_Output;
	Parallel_llm1_llm2_Output --> Parallel_total_chars_Input;
```
2024-10-31 16:52:00 -04:00
L
8ef0df3539
feat: add batch request support for text-embedding-v3 model (#26375)
PR title: “langchain: add batch request support for text-embedding-v3
model”

PR message:

• Description: This PR introduces batch request support for the
text-embedding-v3 model within LangChain. The new functionality allows
users to process multiple text inputs in a single request, improving
efficiency and performance for high-volume applications.
	•	Issue: This PR addresses #<issue_number> (if applicable).
• Dependencies: No new external dependencies are required for this
change.
• Twitter handle: If announced on Twitter, please mention me at
@yourhandle.

Add tests and docs:

1. Added unit tests to cover the batch request functionality, ensuring
it operates without requiring network access.
2. Included an example notebook demonstrating the batch request feature,
located in docs/docs/integrations.

Lint and test: All required formatting and linting checks have been
performed using make format and make lint. The changes have been
verified with make test to ensure compatibility.

Additional notes:

	•	The changes are fully backwards compatible.
• No modifications were made to pyproject.toml, ensuring no new
dependencies were added.
• The update only affects the langchain package and does not involve
other packages.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-10-31 18:56:22 +00:00
putao520
2545fbe709
fix "WARNING: Received notification from DBMS server: {severity: WARN… (#27112)
…ING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning}
{category: DEPRECATION} {title: This feature is deprecated and will be
removed in future versions.} {description: CALL subquery without a
variable scope clause is now deprecated." this warning

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: putao520 <putao520@putao282.com>
2024-10-31 18:47:25 +00:00
Ankan Mahapatra
905f43377b
Update word_document.py | Fixed metadata["source"] for web paths (#27220)
The metadata["source"] value for the web paths was being set to
temporary path (/tmp).

Fixed it by creating a new variable self.original_file_path, which will
store the original path.

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-10-31 18:37:41 +00:00
Daniel Birn
389771ccc0
community: fix @embeddingKey in azure cosmos db no sql (#27377)
I will keep this PR as small as the changes made.

**Description:** fixes a fatal bug syntax error in
AzureCosmosDBNoSqlVectorSearch
**Issue:** #27269 #25468
2024-10-31 18:36:02 +00:00
Bagatur
06420de2e7
integrations[patch]: bump core to 0.3.15 (#27805) 2024-10-31 11:27:05 -07:00
W. Gustavo Cevallos
f94125a325
community: Update Polygon.io API (#27552)
**Description:** 
Update the wrapper to support the Polygon API if not you get an error. I
keeped `STOCKBUSINESS` for retro-compatbility with older endpoints /
other uses
Old Code:
```
 if status not in ("OK", "STOCKBUSINESS"):
    raise ValueError(f"API Error: {data}")

```
API Respond:
```
API Error: {'results': {'P': 0.22, 'S': 0, 'T': 'ZOM', 'X': 5, 'p': 0.123, 'q': 0, 's': 200, 't': 1729614422813395456, 'x': 1, 'z': 1}, 'status': 'STOCKSBUSINESS', 'request_id': 'XXXXXX'}
```

- **Issue:** N/A Polygon API update
- **Dependencies:** N/A
- **Twitter handle:** @wgcv

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-10-31 18:14:06 +00:00
Wang
621f78babd
community: [fix] add missing tool_calls kwargs of delta message in openai adapter (#27492)
- **Description:** add missing tool_calls kwargs of delta message in
openai adapter, then tool call will work correctly via adapter's stream
chat completion
- **Issue:** Fixes
https://github.com/langchain-ai/langchain/issues/25436
- **Dependencies:** None
2024-10-31 14:07:17 -04:00
Tao Wang
25a1031871
community: Fix a validation error for MoonshotChat (#27801)
- **Description:** Change `MoonshotCommon.client` type from
`_MoonshotClient` to `Any`.
- **Issue:** Fix the issue #27058
- **Dependencies:** No
- **Twitter handle:** TaoWang2218

In PR #17100, the implementation for Moonshot was added, which defined
two classes:

- `MoonshotChat(MoonshotCommon, ChatOpenAI)` in
`langchain_community.chat_models.moonshot`;
- Here, `validate_environment()` assigns **client** as
`openai.OpenAI().chat.completions`
- Note that **client** here is actually a member variable defined in
`ChatOpenAI`;
- `MoonshotCommon` in `langchain_community.llms.moonshot`;
- And here, `validate_environment()` assigns **_client** as
`_MoonshotClient`;
- Note that this is the underscored **_client**, which is defined within
`MoonshotCommon` itself;

At this time, there was no conflict between the two, one being `client`
and the other `_client`.

However, in PR #25878 which fixed #24390, `_client` in `MoonshotCommon`
was changed to `client`. Since then, a conflict in the definition of
`client` has arisen between `MoonshotCommon` and `MoonshotChat`, which
caused `pydantic` validation error.

To fix this issue, the type of `client` in `MoonshotCommon` should be
changed to `Any`.

Signed-off-by: Tao Wang <twang2218@gmail.com>
2024-10-31 14:00:16 -04:00
Bagatur
e4e2aa0b78
core[patch]: update image util err msg (#27803) 2024-10-31 10:56:43 -07:00
Bagatur
181bcd0577
core[patch]: Release 0.3.15 (#27802) 2024-10-31 10:35:02 -07:00
Bagatur
c1e742347f
core[patch]: rm image loading (#27797) 2024-10-31 10:34:51 -07:00
ZhangShenao
ad0387ac97
Improvement [docs] Improve api docs (#27787)
- Add missing param
- Remove unused param

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-10-31 16:56:44 +00:00
ccurme
0172d938b4
community: add AzureOpenAIWhisperParser (#27796)
Commandeered from https://github.com/langchain-ai/langchain/pull/26757.

---------

Co-authored-by: Sheepsta300 <128811766+Sheepsta300@users.noreply.github.com>
2024-10-31 12:37:41 -04:00
ccurme
b631b0a596
community[patch]: cap SQLAlchemy and update deps (#27792)
SQLAlchemy 2.0.36 introduces a regression when creating a table in
DuckDB.

Relevant issues:
- In SQLAlchemy repo (resolution is to update DuckDB):
https://github.com/sqlalchemy/sqlalchemy/discussions/12011
- In DuckDB repo (PR is open):
https://github.com/Mause/duckdb_engine/issues/1128

Plan is to track these issues and remove cap when resolved.
2024-10-31 14:19:09 +00:00
Erick Friis
8ad7adad87
infra: build api docs from package listing (#27774) 2024-10-30 21:31:01 -07:00
JiaranI
3952ee31b8
ollama: add pydocstyle linting for ollama (#27686)
Description: add lint docstrings for ollama module
Issue: the issue https://github.com/langchain-ai/langchain/issues/23188
@baskaryan

test: ruff check passed.
<img width="311" alt="e94c68ffa93dd518297a95a93de5217"
src="https://github.com/user-attachments/assets/e96bf721-e0e3-44de-a50e-206603de398e">

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-31 03:06:55 +00:00
Aayush Kataria
a8a33b2dc6
LangChain-Community - AzureCosmos Mongo vCore: Bug Fix when the data doesn't contain metadata field (#27772)
Thank you for contributing to LangChain!
- **Description:** Adding an empty metadata field when metadata is not
present in the data
- **Issue:** This PR fixes the issue when the data items doesn't contain
the metadata field. This happens when there is already data in the
container, or cx uses CosmosDB Python SDK to insert data.
- **Dependencies:** No dependencies required

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-10-30 20:05:25 -07:00
Rave Harpaz
8d8d85379f
community: OCI Generative AI tool calling bug fix (#26910)
- [x] **PR title**: 
  "community: OCI Generative AI tool calling bug fix 


- [x] **PR message**: 
- **Description:** bug fix for streaming chat responses with tool calls.
Update to PR 24693
    - **Issue:** chat response content is repeated when streaming
    - **Dependencies:** NA
    - **Twitter handle:** NA


- [x] **Add tests and docs**: NA


- [x] **Lint and test**: make format, make lint and make test we run
successfully

---------

Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-31 02:35:25 +00:00
Erick Friis
128b07208e
community: release 0.3.4 (#27769) 2024-10-30 17:48:03 -07:00
Bagatur
6691202998
anthropic[patch]: allow multiple sys not at start (#27725) 2024-10-30 23:56:47 +00:00
Erick Friis
1ed3cd252e
langchain: release 0.3.6 (#27768) 2024-10-30 23:50:42 +00:00
Sergey Ryabov
8180637345
community[patch]: Fix Playwright Tools bug with Pydantic schemas (#27050)
- Add tests for Playwright tools schema serialization
- Introduce base empty args Input class for BaseBrowserTool

Test Plan: `poetry run pytest
tests/unit_tests/tools/playwright/test_all.py`

Fixes #26758

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-30 23:45:36 +00:00
Bagatur
deb4320d29
core[patch]: Release 0.3.14 (#27764) 2024-10-30 21:47:33 +00:00
Bagatur
5d337326b0
core[patch]: make get_all_basemodel_annotations public (#27761) 2024-10-30 14:43:29 -07:00
Bagatur
94ea950c6c
core[patch]: support bedrock converse -> openai tool (#27754) 2024-10-30 12:20:39 -07:00
Lorenzo
3dfdb3e6fb
community: prevent gitlab commit on main branch for Gitlab tool (#27750)
### About

- **Description:** In the Gitlab utilities used for the Gitlab tool
there is no check to prevent pushing to the main branch, as this is
already done for Github (for example here:
5a2cfb49e0/libs/community/langchain_community/utilities/github.py (L587)).
This PR add this check as already done for Github.
- **Issue:** None
- **Dependencies:** None
2024-10-30 18:50:13 +00:00
Sam Julien
0a472e2a2d
community: Add Writer integration (#27646)
**Description:** Add support for Writer chat models   
**Issue:** N/A
**Dependencies:** Add `writer-sdk` to optional dependencies.
**Twitter handle:** Please tag `@samjulien` and `@Get_Writer`

**Tests and docs**
- [x] Unit test
- [x] Example notebook in `docs/docs/integrations` directory.

**Lint and test**
- [x] Run `make format` 
- [x] Run `make lint`
- [x] Run `make test`

---------

Co-authored-by: Johannes <tolstoy.work@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-30 18:06:05 +00:00
ccurme
88bfd60b03
infra: specify python max version of 3.12 for some integration packages (#27740) 2024-10-30 12:24:48 -04:00
fayvor
3b956b3a97
community: Update Replicate LLM and fix tests (#27655)
**Description:** 
- Fix bug in Replicate LLM class, where it was looking for parameter
names in a place where they no longer exist in pydantic 2, resulting in
the "Field required" validation error described in the issue.
- Fix Replicate LLM integration tests to:
  - Use active models on Replicate.
- Use the correct model parameter `max_new_tokens` as shown in the
[Replicate
docs](https://replicate.com/docs/guides/language-models/how-to-use#minimum-and-maximum-new-tokens).
  - Use callbacks instead of deprecated callback_manager.

**Issue:** #26937 

**Dependencies:** n/a

**Twitter handle:** n/a

---------

Signed-off-by: Fayvor Love <fayvor@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-10-30 16:07:08 +00:00
ccurme
bd5ea18a6c
groq[patch]: update standard tests (#27744)
- Add xfail on integration test (fails [> 50% of the
time](https://github.com/langchain-ai/langchain/actions/workflows/scheduled_test.yml));
- Remove xfail on passing unit test.
2024-10-30 15:50:51 +00:00
hmn falahi
98bb3a02bd
docs: Add OpenAIAssistantV2Runnable docstrings (#27402)
- **Description:** add/improve docstrings of OpenAIAssistantV2Runnable
- **Issue:** the issue #21983

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-10-30 15:35:51 +00:00
Luiz F. G. dos Santos
7a29ca6200
community: add new parameters to pass to OpenAIAssistantV2Runnable (#27372)
Thank you for contributing to LangChain!
 
**Description:** Added the model parameters to be passed in the OpenAI
Assistant. Enabled it at the `OpenAIAssistantV2Runnable` class.
 **Issue:** NA
  **Dependencies:** None
  **Twitter handle:** luizf0992
2024-10-30 10:51:03 -04:00
随风枫叶
18cfb4c067
community: Add token_usage and model_name metadata to ChatZhipuAI stream() and astream() response (#27677)
Thank you for contributing to LangChain!


- **Description:** Add token_usage and model_name metadata to
ChatZhipuAI stream() and astream() response
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: jianfehuang <jianfehuang@tencent.com>
2024-10-30 10:34:33 -04:00
tkubo-heroz
028e0253d8
community: Added anthropic.claude-3-5-sonnet-20241022-v2:0 cost detials (#27728)
Added anthropic.claude-3-5-sonnet-20241022-v2:0 cost detials
2024-10-30 14:01:01 +00:00
Changyong Um
dc171221b3
community[patch]: Fix vLLM integration to apply lora_request (#27731)
**Description:**
- Add the `lora_request` parameter to the VLLM class to support LoRA
model configurations. This enhancement allows users to specify LoRA
requests directly when using VLLM, enabling more flexible and efficient
model customization.

**Issue:**
- No existing issue for `lora_adapter` in VLLM. This PR addresses the
need for configuring LoRA requests within the VLLM framework.
- Reference : [Using LoRA Adapters in
vLLM](https://docs.vllm.ai/en/stable/models/lora.html#using-lora-adapters)


**Example Code :**
Before this change, the `lora_request` parameter was not applied
correctly:

```python
ADAPTER_PATH = "/path/of/lora_adapter"

llm = VLLM(model="Bllossom/llama-3.2-Korean-Bllossom-3B",
           max_new_tokens=512,
           top_k=2,
           top_p=0.90,
           temperature=0.1,
           vllm_kwargs={
               "gpu_memory_utilization":0.5, 
               "enable_lora":True, 
               "max_model_len":1024,
           }
)

print(llm.invoke(
    ["...prompt_content..."], 
    lora_request=LoRARequest("lora_adapter", 1, ADAPTER_PATH)
    ))
```
**Before Change Output:**
```bash
response was not applied lora_request
```
So, I attempted to apply the lora_adapter to
langchain_community.llms.vllm.VLLM.

**current output:**
```bash
response applied lora_request
```

**Dependencies:**
- None

**Lint and test:**
- All tests and lint checks have passed.

---------

Co-authored-by: Um Changyong <changyong.um@sfa.co.kr>
2024-10-30 13:59:34 +00:00
Qier LU
8d8e38b090
community[pathch]: Add missing custom content_key handling in Redis vector store (#27736)
This fix an error caused by missing custom content_key handling in Redis
vector store in function similarity_search_with_score.
2024-10-30 13:57:20 +00:00
William FH
5a2cfb49e0
Support message trimming on single messages (#27729)
Permit trimming message lists of length 1
2024-10-30 04:27:52 +00:00
Bagatur
5111063af2
langchain[patch]: Release 0.3.5 (#27727) 2024-10-29 17:06:23 -07:00
Bagatur
8f4423e042
text-splitters[patch]: Release 0.3.1 (#27726) 2024-10-30 00:04:48 +00:00
Harsimran-19
c1d8c33df6
core: JsonOutputParser UTF characters bug (#27306)
**Description:**
This PR fixes an issue where non-ASCII characters in Pydantic field
descriptions were being escaped to their Unicode representations when
using `JsonOutputParser`. The change allows non-ASCII characters to be
preserved in the output, which is especially important for multilingual
support and when working with non-English languages.

**Issue:** Fixes #27256

**Example Code:**
```python
from pydantic import BaseModel, Field
from langchain_core.output_parsers import JsonOutputParser

class Article(BaseModel):
    title: str = Field(description="科学文章的标题")

output_data_structure = Article
parser = JsonOutputParser(pydantic_object=output_data_structure)
print(parser.get_format_instructions())
```
**Previous Output**:
```... "title": {"description": "\\u79d1\\u5b66\\u6587\\u7ae0\\u7684\\u6807\\u9898", "title": "Title", "type": "string"}} ...```

**Current Output**:
```... "title": {"description": "科学文章的标题", "title": "Title", "type":
"string"}} ...```

**Changes made**:
- Modified `json.dumps()` call in
`langchain_core/output_parsers/json.py` to use `ensure_ascii=False`
- Added a unit test to verify Unicode handling

Co-authored-by: Harsimran-19 <harsimran1869@gmail.com>
2024-10-29 14:48:53 +00:00
Andrew Effendi
49517cc1e7
partners/huggingface[patch]: fix HuggingFacePipeline model_id parameter (#27514)
**Description:** Fixes issue with model parameter not getting
initialized correctly when passing transformers pipeline
**Issue:** https://github.com/langchain-ai/langchain/issues/25915
2024-10-29 14:34:46 +00:00
Jeong-Minju
0a465b8032
docs: Fix typo in _action_agent docs section (#27698)
PR Title: docs: Fix typo in _action_agent function docs section

Description: In line 1185, _action_agent function's docs, changing
**".agent"** to **"self.agent"**.

Issue: N/A

Dependencies: None

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-10-29 14:16:42 +00:00
Neil Vachharajani
eec35672a4
core[patch]: Improve type checking for the tool decorator (#27460)
**Description:**

When annotating a function with the @tool decorator, the symbol should
have type BaseTool. The previous type annotations did not convey that to
type checkers. This patch creates 4 overloads for the tool function for
the 4 different use cases.

1. @tool decorator with no arguments
2. @tool decorator with only keyword arguments
3. @tool decorator with a name argument (and possibly keyword arguments)
4. Invoking tool as function with a name and runnable positional
arguments

The main function is updated to match the overloads. The changes are
100% backwards compatible (all existing calls should continue to work,
just with better type annotations).

**Twitter handle:** @nvachhar

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-29 13:59:56 +00:00
Erick Friis
583808a7b8
partners/huggingface: release 0.1.1 (#27691) 2024-10-28 13:39:38 -07:00
Erick Friis
6d524e9566
partners/box: release 0.2.2 (#27690) 2024-10-28 12:54:20 -07:00
yahya-mouman
6803cb4f34
openai[patch]: add check for none values when summing token usage (#27585)
**Description:** Fixes None addition issues when an empty value is
passed on

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-10-28 12:49:43 -07:00
Bagatur
ede953d617
openai[patch]: fix schema formatting util (#27685) 2024-10-28 15:46:47 +00:00
Baptiste Pasquier
440c162b8b
community: Fix closed session in Infinity (#26933)
**Description:** 

The `aiohttp.ClientSession` is closed at the end of the with statement,
which causes an error during a second call.

The implemented fix is to define the session directly within the with
block, exactly like in the textembed code:


c6350d636e/libs/community/langchain_community/embeddings/textembed.py (L335-L346)
 
**Issue:** Fix #26932

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-10-27 11:37:21 -04:00
Jorge Piedrahita Ortiz
8895d468cb
community: sambastudio llm refactor (#27215)
**Description:** 
    - Sambastudio LLM refactor 
    - Sambastudio openai compatible API support added
    - docs updated
2024-10-27 11:08:15 -04:00
ccurme
fe87e411f2
groq: fix unit test (#27660) 2024-10-26 14:57:23 -04:00
Erick Friis
fbfc6bdade
core: test runner improvements (#27654)
when running core tests locally this
- prevents langsmith tracing from being enabled by env vars
- prevents network calls
2024-10-25 15:06:59 -07:00
Vincent Min
7bc4e320f1
core[patch]: improve performance of InMemoryVectorStore (#27538)
**Description:** We improve the performance of the InMemoryVectorStore.
**Isue:** Originally, similarity was computed document by document:
```
for doc in self.store.values():
            vector = doc["vector"]
            similarity = float(cosine_similarity([embedding], [vector]).item(0))
```
This is inefficient and does not make use of numpy vectorization.
This PR computes the similarity in one vectorized go:
```
docs = list(self.store.values())
similarity = cosine_similarity([embedding], [doc["vector"] for doc in docs])
```
**Dependencies:** None
**Twitter handle:** @b12_consulting, @Vincent_Min

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-25 17:07:04 -04:00
Bagatur
d5306899d3
openai[patch]: Release 0.2.4 (#27652) 2024-10-25 20:26:21 +00:00
Erick Friis
600b7bdd61
all: test 3.13 ci (#27197)
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-10-25 12:56:58 -07:00
Bagatur
06df15c9c0
core[patch]: Release 0.3.13 (#27651) 2024-10-25 19:22:44 +00:00
Steve Moss
24605bcdb6
community[patch]: Fix missing protected_namespaces(). (#27610)
- [x] **PR message**:
- **Description:** Fixes warning messages raised due to missing
`protected_namespaces` parameter in `ConfigDict`.
    - **Issue:** https://github.com/langchain-ai/langchain/issues/27609
    - **Dependencies:** No dependencies
    - **Twitter handle:** @gawbul
2024-10-25 02:16:26 +00:00
Eugene Yurtsev
7667ee126f
core: remove mustache in extended deps (#27629)
Remove mustache from extended deps -- we vendor the mustache
implementation
2024-10-24 22:12:49 -04:00
Erick Friis
265e0a164a
core: add flake8-bandit (S) ruff rules to core (#27368)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-24 22:33:41 +00:00
Nithish Raghunandanan
0623c74560
couchbase: Add document id to vector search results (#27622)
**Description:** Returns the document id along with the Vector Search
results

**Issue:** Fixes https://github.com/langchain-ai/langchain/issues/26860
for CouchbaseVectorStore


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-24 21:47:36 +00:00
ZhangShenao
455ab7d714
Improvement[Community] Improve Document Loaders and Splitters (#27568)
- Fix word spelling error
- Add static method decorator
- Fix language splitter

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-24 21:42:16 +00:00
CLOVA Studio 개발
846a75284f
community: Add Naver chat model & embeddings (#25162)
Reopened as a personal repo outside the organization.

## Description
- Naver HyperCLOVA X community package 
  - Add chat model & embeddings
  - Add unit test & integration test
  - Add chat model & embeddings docs
- I changed partner
package(https://github.com/langchain-ai/langchain/pull/24252) to
community package on this PR
- Could this
embeddings(https://github.com/langchain-ai/langchain/pull/21890) be
deprecated? We are trying to replace it with embedding
model(**ClovaXEmbeddings**) in this PR.

Twitter handle: None. (if needed, contact with
joonha.jeon@navercorp.com)

---
you can check our previous discussion below:

> one question on namespaces - would it make sense to have these in
.clova namespaces instead of .naver?

I would like to keep it as is, unless it is essential to unify the
package name.
(ClovaX is a branding for the model, and I plan to add other models and
components. They need to be managed as separate classes.)

> also, could you clarify the difference between ClovaEmbeddings and
ClovaXEmbeddings?

There are 3 models that are being serviced by embedding, and all are
supported in the current PR. In addition, all the functionality of CLOVA
Studio that serves actual models, such as distinguishing between test
apps and service apps, is supported. The existing PR does not support
this content because it is hard-coded.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Vadym Barda <vadym@langchain.dev>
2024-10-24 20:54:13 +00:00
Hyejun An
6227396e20
partners/HuggingFacePipeline[stream]: Change to use pipeline instead of pipeline.model.generate in stream() (#26531)
## Description

I encountered an error while using the` gemma-2-2b-it model` with the
`HuggingFacePipeline` class and have implemented a fix to resolve this
issue.

### What is Problem

```python
model_id="google/gemma-2-2b-it"


gemma_2_model = AutoModelForCausalLM.from_pretrained(model_id)
gemma_2_tokenizer = AutoTokenizer.from_pretrained(model_id)

gen = pipeline( 
    task='text-generation',
    model=gemma_2_model,
    tokenizer=gemma_2_tokenizer,
    max_new_tokens=1024,
    device=0 if torch.cuda.is_available() else -1,
    temperature=.5,
    top_p=0.7,
    repetition_penalty=1.1,
    do_sample=True,
    )

llm = HuggingFacePipeline(pipeline=gen)

for chunk in llm.stream("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World."):
    print(chunk, end="", flush=True)
```

This code outputs the following error message:

```
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
Exception in thread Thread-19 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1874, in generate
    self._validate_generated_length(generation_config, input_ids_length, has_default_max_length)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1266, in _validate_generated_length
    raise ValueError(
ValueError: Input length of input_ids is 31, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.
```

In addition, the following error occurs when the number of tokens is
reduced.

```python
for chunk in llm.stream("Hello World"):
    print(chunk, end="", flush=True)
```

```
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1885: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(
Exception in thread Thread-20 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2024, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2982, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 994, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 803, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 164, in forward
    return F.embedding(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2267, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
```

On the other hand, in the case of invoke, the output is normal:

```
llm.invoke("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.")
```
```
'Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.\n\nThis is a simple program that prints the phrase "Hello World" to the console. \n\n**Here\'s how it works:**\n\n* **`print("Hello World")`**: This line of code uses the `print()` function, which is a built-in function in most programming languages (like Python). The `print()` function takes whatever you put inside its parentheses and displays it on the screen.\n* **`"Hello World"`**:  The text within the double quotes (`"`) is called a string. It represents the message we want to print.\n\n\nLet me know if you\'d like to explore other programming concepts or see more examples! \n'
```

### Problem Analysis

- Apparently, I put kwargs in while generating pipelines and it applied
to `invoke()`, but it's not applied in the `stream()`.
- When using the stream, `inputs = self.pipeline.tokenizer (prompt,
return_tensors = "pt")` enters cpu.
  - This can crash when the model is in gpu.

### Solution

Just use `self.pipeline` instead of `self.pipeline.model.generate`.

- **Original Code**

```python
stopping_criteria = StoppingCriteriaList([StopOnTokens()])

inputs = self.pipeline.tokenizer(prompt, return_tensors="pt")
streamer = TextIteratorStreamer(
    self.pipeline.tokenizer,
    timeout=60.0,
    skip_prompt=skip_prompt,
    skip_special_tokens=True,
)
generation_kwargs = dict(
    inputs,
    streamer=streamer,
    stopping_criteria=stopping_criteria,
    **pipeline_kwargs,
)
t1 = Thread(target=self.pipeline.model.generate, kwargs=generation_kwargs)
t1.start()
```

- **Updated Code**

```python
stopping_criteria = StoppingCriteriaList([StopOnTokens()])

streamer = TextIteratorStreamer(
    self.pipeline.tokenizer,
    timeout=60.0,
    skip_prompt=skip_prompt,
    skip_special_tokens=True,
)
generation_kwargs = dict(
    text_inputs= prompt,
    streamer=streamer,
    stopping_criteria=stopping_criteria,
    **pipeline_kwargs,
)
t1 = Thread(target=self.pipeline, kwargs=generation_kwargs)
t1.start()
```

By using the `pipeline` directly, the `kwargs` of the pipeline are
applied, and there is no need to consider the `device` of the `tensor`
made with the `tokenizer`.

> According to the change to use `pipeline`, it was modified to put
`text_inputs=prompts` directly into `generation_kwargs`.

## Issue

None

## Dependencies

None

## Twitter handle

None

---------

Co-authored-by: Vadym Barda <vadym@langchain.dev>
2024-10-24 16:49:43 -04:00
Bagatur
655ced84d7
openai[patch]: accept json schema response format directly (#27623)
fix #25460

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-24 18:19:15 +00:00
Tibor Reiss
20b56a0233
core[patch]: fix repr and str for Serializable (#26786)
Fixes #26499

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-10-24 08:36:35 -07:00
Lei Zhang
f203229b51
community: Fix the failure of ChatSparkLLM after upgrading to Pydantic V2 (#27418)
**Description:**

The test_sparkllm.py can reproduce this issue.


https://github.com/langchain-ai/langchain/blob/master/libs/community/tests/integration_tests/chat_models/test_sparkllm.py#L66

```
Testing started at 18:27 ...
Launching pytest with arguments test_sparkllm.py::test_chat_spark_llm --no-header --no-summary -q in /Users/zhanglei/Work/github/langchain/libs/community/tests/integration_tests/chat_models

============================= test session starts ==============================
collecting ... collected 1 item

test_sparkllm.py::test_chat_spark_llm 

============================== 1 failed in 0.45s ===============================
FAILED                             [100%]
tests/integration_tests/chat_models/test_sparkllm.py:65 (test_chat_spark_llm)
def test_chat_spark_llm() -> None:
>       chat = ChatSparkLLM(
            spark_app_id="your spark_app_id",
            spark_api_key="your spark_api_key",
            spark_api_secret="your spark_api_secret",
        )  # type: ignore[call-arg]

test_sparkllm.py:67: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../../core/langchain_core/load/serializable.py:111: in __init__
    super().__init__(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'langchain_community.chat_models.sparkllm.ChatSparkLLM'>
values = {'spark_api_key': 'your spark_api_key', 'spark_api_secret': 'your spark_api_secret', 'spark_api_url': 'wss://spark-api.xf-yun.com/v3.5/chat', 'spark_app_id': 'your spark_app_id', ...}

    @model_validator(mode="before")
    @classmethod
    def validate_environment(cls, values: Dict) -> Any:
        values["spark_app_id"] = get_from_dict_or_env(
            values,
            ["spark_app_id", "app_id"],
            "IFLYTEK_SPARK_APP_ID",
        )
        values["spark_api_key"] = get_from_dict_or_env(
            values,
            ["spark_api_key", "api_key"],
            "IFLYTEK_SPARK_API_KEY",
        )
        values["spark_api_secret"] = get_from_dict_or_env(
            values,
            ["spark_api_secret", "api_secret"],
            "IFLYTEK_SPARK_API_SECRET",
        )
        values["spark_api_url"] = get_from_dict_or_env(
            values,
            "spark_api_url",
            "IFLYTEK_SPARK_API_URL",
            SPARK_API_URL,
        )
        values["spark_llm_domain"] = get_from_dict_or_env(
            values,
            "spark_llm_domain",
            "IFLYTEK_SPARK_LLM_DOMAIN",
            SPARK_LLM_DOMAIN,
        )
    
        # put extra params into model_kwargs
        default_values = {
            name: field.default
            for name, field in get_fields(cls).items()
            if field.default is not None
        }
>       values["model_kwargs"]["temperature"] = default_values.get("temperature")
E       KeyError: 'model_kwargs'

../../../langchain_community/chat_models/sparkllm.py:368: KeyError
``` 

I found that when upgrading to Pydantic v2, @root_validator was changed
to @model_validator. When a class declares multiple
@model_validator(model=before), the execution order in V1 and V2 is
opposite. This is the reason for ChatSparkLLM's failure.

The correct execution order is to execute build_extra first.


https://github.com/langchain-ai/langchain/blob/langchain%3D%3D0.2.16/libs/community/langchain_community/chat_models/sparkllm.py#L302

And then execute validate_environment.


https://github.com/langchain-ai/langchain/blob/langchain%3D%3D0.2.16/libs/community/langchain_community/chat_models/sparkllm.py#L329

The Pydantic community also discusses it, but there hasn't been a
conclusion yet. https://github.com/pydantic/pydantic/discussions/7434

**Issus:** #27416 

**Twitter handle:** coolbeevip

---------

Co-authored-by: vbarda <vadym@langchain.dev>
2024-10-23 21:17:10 -04:00
Andrew Effendi
8f151223ad
Community: Fix DuckDuckGo search tool Output Format (#27479)
**Issue:** : https://github.com/langchain-ai/langchain/issues/22961
   **Description:** 

Previously, the documentation for `DuckDuckGoSearchResults` said that it
returns a JSON string, however the code returns a regular string that
can't be parsed as is.
for example running

```python
from langchain_community.tools import DuckDuckGoSearchResults

# Create a DuckDuckGo search instance
search = DuckDuckGoSearchResults()

# Invoke the search
result = search.invoke("Obama")

# Print the result
print(result)
# Print the type of the result
print("Result Type:", type(result))
```
will return
```
snippet: Harris will hold a campaign event with former President Barack Obama in Georgia next Thursday, the first time the pair has campaigned side by side, a senior campaign official said. A week from ..., title: Obamas to hit the campaign trail in first joint appearances with Harris, link: https://www.nbcnews.com/politics/2024-election/obamas-hit-campaign-trail-first-joint-appearances-harris-rcna176034, snippet: Item 1 of 3 Former U.S. first lady Michelle Obama and her husband, former U.S. President Barack Obama, stand on stage during Day 2 of the Democratic National Convention (DNC) in Chicago, Illinois ..., title: Obamas set to hit campaign trail with Kamala Harris for first time, link: https://www.reuters.com/world/us/obamas-set-hit-campaign-trail-with-kamala-harris-first-time-2024-10-18/, snippet: Barack and Michelle Obama will make their first campaign appearances alongside Kamala Harris at rallies in Georgia and Michigan. By Reid J. Epstein Reporting from Ashwaubenon, Wis. Here come the ..., title: Harris Will Join Michelle Obama and Barack Obama on Campaign Trail, link: https://www.nytimes.com/2024/10/18/us/politics/kamala-harris-michelle-obama-barack-obama.html, snippet: Obama's leaving office was "a turning point," Mirsky said. "That was the last time anybody felt normal." A few feet over, a 64-year-old physics professor named Eric Swanson who had grown ..., title: Obama's reemergence on the campaign trail for Harris comes as he ..., link: https://www.cnn.com/2024/10/13/politics/obama-campaign-trail-harris-biden/index.html
Result Type: <class 'str'>
```

After the change in this PR, `DuckDuckGoSearchResults` takes an
additional `output_format = "list" | "json" | "string"` ("string" =
current behavior, default). For example, invoking
`DuckDuckGoSearchResults(output_format="list")` return a list of
dictionaries in the format
```
[{'snippet': '...', 'title': '...', 'link': '...'}, ...]
```
e.g.

```
[{'snippet': "Obama has in a sense been wrestling with Trump's impact since the real estate magnate broke onto the political stage in 2015. Trump's victory the next year, defeating Obama's secretary of ...", 'title': "Obama's fears about Trump drive his stepped-up campaigning", 'link': 'https://www.washingtonpost.com/politics/2024/10/18/obama-trump-anxiety-harris-campaign/'}, {'snippet': 'Harris will hold a campaign event with former President Barack Obama in Georgia next Thursday, the first time the pair has campaigned side by side, a senior campaign official said. A week from ...', 'title': 'Obamas to hit the campaign trail in first joint appearances with Harris', 'link': 'https://www.nbcnews.com/politics/2024-election/obamas-hit-campaign-trail-first-joint-appearances-harris-rcna176034'}, {'snippet': 'Item 1 of 3 Former U.S. first lady Michelle Obama and her husband, former U.S. President Barack Obama, stand on stage during Day 2 of the Democratic National Convention (DNC) in Chicago, Illinois ...', 'title': 'Obamas set to hit campaign trail with Kamala Harris for first time', 'link': 'https://www.reuters.com/world/us/obamas-set-hit-campaign-trail-with-kamala-harris-first-time-2024-10-18/'}, {'snippet': 'Barack and Michelle Obama will make their first campaign appearances alongside Kamala Harris at rallies in Georgia and Michigan. By Reid J. Epstein Reporting from Ashwaubenon, Wis. Here come the ...', 'title': 'Harris Will Join Michelle Obama and Barack Obama on Campaign Trail', 'link': 'https://www.nytimes.com/2024/10/18/us/politics/kamala-harris-michelle-obama-barack-obama.html'}]
Result Type: <class 'list'>
```

---------

Co-authored-by: vbarda <vadym@langchain.dev>
2024-10-23 20:18:11 -04:00
Bagatur
968dccee04
core[patch]: convert_to_openai_tool Anthropic support (#27591) 2024-10-23 12:27:06 -07:00
Bagatur
217de4e6a6
langchain[patch]: de-beta init_chat_model (#27558) 2024-10-23 08:35:15 -07:00
Kwan Kin Chan
6d2a76ac05
langchain_huggingface: Fix multiple GPU usage bug in from_model_id function (#23628)
- [ ]  **Description:**   
   - pass the device_map into model_kwargs 
- removing the unused device_map variable in the hf_pipeline function
call
- [ ] **Issue:** issue #13128 
When using the from_model_id function to load a Hugging Face model for
text generation across multiple GPUs, the model defaults to loading on
the CPU despite multiple GPUs being available using the expected format
``` python
llm = HuggingFacePipeline.from_model_id(
    model_id="model-id",
    task="text-generation",
    device_map="auto",
)
```
Currently, to enable multiple GPU , we have to pass in variable in this
format instead
``` python
llm = HuggingFacePipeline.from_model_id(
    model_id="model-id",
    task="text-generation",
    device=None,
    model_kwargs={
        "device_map": "auto",
    }
)
```
This issue arises due to improper handling of the device and device_map
parameters.

- [ ] **Explanation:**
1. In from_model_id, the model is created using model_kwargs and passed
as the model variable of the pipeline function. So at this moment, to
load the model with multiple GPUs, "device_map" needs to be set to
"auto" within model_kwargs. Otherwise, the model defaults to loading on
the CPU.
2. The device_map variable in from_model_id is not utilized correctly.
In the pipeline function's source code of tnansformer:
- The device_map variable is stored in the model_kwargs dictionary
(lines 867-878 of transformers/src/transformers/pipelines/\__init__.py).
```python
    if device_map is not None:
        ......
        model_kwargs["device_map"] = device_map
```
- The model is constructed with model_kwargs containing the device_map
value ONLY IF it is a string (lines 893-903 of
transformers/src/transformers/pipelines/\__init__.py).
```python
    if isinstance(model, str) or framework is None:
        model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]}
        framework, model = infer_framework_load_model( ... , **model_kwargs, )
```
- Consequently, since a model object is already passed to the pipeline
function, the device_map variable from from_model_id is never used.

3. The device_map variable in from_model_id not only appears unused but
also causes errors. Without explicitly setting device=None, attempting
to load the model on multiple GPUs may result in the following error:
 ```
Device has 2 GPUs available. Provide device={deviceId} to
`from_model_id` to use available GPUs for execution. deviceId is -1
(default) for CPU and can be a positive integer associated with CUDA
device id.
  Traceback (most recent call last):
    File "foo.py", line 15, in <module>
      llm = HuggingFacePipeline.from_model_id(
File
"foo\site-packages\langchain_huggingface\llms\huggingface_pipeline.py",
line 217, in from_model_id
      pipeline = hf_pipeline(
File "foo\lib\site-packages\transformers\pipelines\__init__.py", line
1108, in pipeline
return pipeline_class(model=model, framework=framework, task=task,
**kwargs)
File "foo\lib\site-packages\transformers\pipelines\text_generation.py",
line 96, in __init__
      super().__init__(*args, **kwargs)
File "foo\lib\site-packages\transformers\pipelines\base.py", line 835,
in __init__
      raise ValueError(
ValueError: The model has been loaded with `accelerate` and therefore
cannot be moved to a specific device. Please discard the `device`
argument when creating your pipeline object.
```
This error occurs because, in from_model_id, the default values in from_model_id for device and device_map are -1 and None, respectively. It would passes the statement (`device_map is not None and device < 0`) and keep the device as -1 so the pipeline function later raises an error when trying to move a GPU-loaded model back to the CPU. 
19eb82e68b/libs/community/langchain_community/llms/huggingface_pipeline.py (L204-L213)




If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: vbarda <vadym@langchain.dev>
2024-10-22 21:41:47 -04:00
Fernando de Oliveira
ab205e7389
partners/openai + community: Async Azure AD token provider support for Azure OpenAI (#27488)
This PR introduces a new `azure_ad_async_token_provider` attribute to
the `AzureOpenAI` and `AzureChatOpenAI` classes in `partners/openai` and
`community` packages, given it's currently supported on `openai` package
as
[AsyncAzureADTokenProvider](https://github.com/openai/openai-python/blob/main/src/openai/lib/azure.py#L33)
type.

The reason for creating a new attribute is to avoid breaking changes.
Let's say you have an existing code that uses a `AzureOpenAI` or
`AzureChatOpenAI` instance to perform both sync and async operations.
The `azure_ad_token_provider` will work exactly as it is today, while
`azure_ad_async_token_provider` will override it for async requests.


If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-10-22 21:43:06 +00:00
orkhank
9a277cbe00
community: Update file_path type in JSONLoader.__init__() signature (#27535)
- **Description:** Change the type of the `file_path` argument from `str
| pathlib.Path` to `str | os.PathLike`, since the latter is more widely
used: https://stackoverflow.com/a/58541858
  
This is a very minor fix. I was just annoyed to see the red underline
displayed by Pylance in VS Code: `reportArgumentType`.

![image](https://github.com/user-attachments/assets/719a7f8e-acca-4dfa-89df-925e1d938c71)
  
  The changes do not affect the behavior of the code.
2024-10-22 11:18:36 -07:00
Eric Pinzur
f636c83321
community: Cassandra Vector Store: modernize implementation (#27253)
**Description:** 

This PR updates `CassandraGraphVectorStore` to be based off
`CassandraVectorStore`, instead of using a custom CQL implementation.
This allows users using a `CassandraVectorStore` to upgrade to a
`GraphVectorStore` without having to change their database schema or
re-embed documents.

This PR also updates the documentation of the `GraphVectorStore` base
class and contains native async implementations for the standard graph
methods: `traversal_search` and `mmr_traversal_search` in
`CassandraVectorStore`.

**Issue:** No issue number.

**Dependencies:** https://github.com/langchain-ai/langchain/pull/27078
(already-merged)

**Lint and test**: 
- Lint and tests all pass, including existing
`CassandraGraphVectorStore` tests.
- Also added numerous additional tests based of the tests in
`langchain-astradb` which cover many more scenarios than the existing
tests for `Cassandra` and `CassandraGraphVectorStore`

** BREAKING CHANGE**

Note that this is a breaking change for existing users of
`CassandraGraphVectorStore`. They will need to wipe their database table
and restart.

However:
- The interfaces have not changed. Just the underlying storage
mechanism.
- Any one using `langchain_community.vectorstores.Cassandra` can instead
use `langchain_community.graph_vectorstores.CassandraGraphVectorStore`
and they will gain Graph capabilities without having to re-embed their
existing documents. This is the primary goal of this PR.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-22 18:11:11 +00:00
Vadym Barda
0640cbf2f1
huggingface[patch]: hide client field in HuggingFaceEmbeddings (#27522) 2024-10-21 17:37:07 -04:00
Chun Kang Lu
380449a7a9
core: fix Image prompt template hardcoded template format (#27495)
Fixes #27411 

**Description:** Adds `template_format` to the `ImagePromptTemplate`
class and updates passing in the `template_format` parameter from
ChatPromptTemplate instead of the hardcoded "f-string".
Also updated docs and typing related to `template_format` to be more
up-to-date and specific.

**Dependencies:** None

**Add tests and docs**: Added unit tests to validate fix. Needed to
update `test_chat` snapshot due to adding new attribute
`template_format` in `ImagePromptTemplate`.

---------

Co-authored-by: Vadym Barda <vadym@langchain.dev>
2024-10-21 17:31:40 -04:00
bbaltagi-dtsl
403c0ea801
community: fix DallE hidden open_api_key (#26996)
Thank you for contributing to LangChain!

- [ X] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"


- [ X] 
    - **Issue:** issue #26941


Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-21 19:46:56 +00:00
nodfans
cfcf783cb5
community: fix a typo in planner_prompt.py (#27489)
Description: Fix typo in planner_prompt.py.
2024-10-21 14:59:33 +00:00
Erick Friis
97a819d578
community: fix lint from new mypy (#27474) 2024-10-18 20:08:03 +00:00
Erick Friis
c397baa85f
community: release 0.3.3 (#27472) 2024-10-18 12:52:15 -07:00
Erick Friis
4ceb28009a
mongodb: migrate to repo (#27467) 2024-10-18 12:35:12 -07:00
Erick Friis
a562c54f7d
azure-dynamic-sessions: migrate to repo (#27468) 2024-10-18 12:30:48 -07:00
Erick Friis
30660786b3
langchain: release 0.3.4 (#27458) 2024-10-18 11:59:54 -07:00
Erick Friis
2cf2cefe39
partners/openai: release 0.2.3 (#27457) 2024-10-18 08:16:01 -07:00
Erick Friis
7d65a32ee0
openai: audio modality, remove sockets from unit tests (#27436) 2024-10-18 08:02:09 -07:00
Erick Friis
f9cc9bdcf3
core: release 0.3.12 (#27410) 2024-10-17 06:32:40 -07:00
Erick Friis
0ebddabf7d
docs, core: error messaging [wip] (#27397) 2024-10-17 03:39:36 +00:00
Eugene Yurtsev
202d7f6c4a
core[patch]: 0.3.11 release (#27403)
Core bump to 0.3.11
2024-10-16 15:39:37 -04:00
Bagatur
a4392b070d core[patch]: add convert_to_openai_messages util (#27263)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-16 17:10:10 +00:00
sByteman
31e7664afd
community[minor]: add proxy support to RecursiveUrlLoader (#27364)
**Description**
This PR introduces the proxies parameter to the RecursiveUrlLoader
class, allowing the user to specify proxy servers for requests. This
update enables crawling through proxy servers, providing enhanced
flexibility for network configurations.
The key changes include:
  1.Added an optional proxies parameter to the constructor (__init__).
2.Updated the documentation to explain the proxies parameter usage with
an example.
3.Modified the _get_child_links_recursive method to pass the proxies
parameter to the requests.get function.



**Sample Usage**

```python
from bs4 import BeautifulSoup as Soup
from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader

proxies = {
    "http": "http://localhost:1080",
    "https": "http://localhost:1080",
}
url = "https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel"
loader = RecursiveUrlLoader(
    url=url, max_depth=1, extractor=lambda x: Soup(x, "html.parser").text,proxies=proxies
)
docs = loader.load()
```

---------

Co-authored-by: root <root@thb>
2024-10-16 16:29:59 +00:00
Yuki Watanabe
b8bfebd382
community: Add deprecation notice for Databricks integration in langchain-community (#27355)
We have released the
[langchain-databricks](https://github.com/langchain-ai/langchain-databricks)
package for Databricks integration. This PR deprecates the legacy
classes within `langchain-community`.

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-16 02:20:40 +00:00
xsai9101
15c1ddaf99
community: Add support for clob datatype in oracle database (#27330)
**Description**:
This PR add support of clob/blob data type for oracle document loader,
clob/blob can only be read by oracledb package when connection is open,
so reformat code to process data before connection closes.

**Dependencies**:
oracledb package same as before. pip install oracledb

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-16 02:19:20 +00:00
Enes Bol
3f74dfc3d8
community[patch]: Fix vLLM integration to filter SamplingParams (#27367)
**Description:**
- This pull request addresses a bug in Langchain's VLLM integration,
where the use_beam_search parameter was erroneously passed to
SamplingParams. The SamplingParams class in vLLM does not support the
use_beam_search argument, which caused a TypeError.

- This PR introduces logic to filter out unsupported parameters,
ensuring that only valid parameters are passed to SamplingParams. As a
result, the integration now functions as expected without errors.

- The bug was reproduced by running the code sample from Langchain’s
documentation, which triggered the error due to the invalid parameter.
This fix resolves that error by implementing proper parameter filtering.

**VLLM Sampling Params Class:**
https://github.com/vllm-project/vllm/blob/main/vllm/sampling_params.py

**Issue:**
I could not found an Issue that belongs to this. Fixes "TypeError:
Unexpected keyword argument 'use_beam_search'" error when using VLLM
from Langchain.

**Dependencies:**
None.

**Tests and Documentation**:
Tests:
No new functionality was added, but I tested the changes by running
multiple prompts through the VLLM integration with various parameter
configurations. All tests passed successfully without breaking
compatibility.

Docs
No documentation changes were necessary as this is a bug fix.

**Reproducing the Error:**

https://python.langchain.com/docs/integrations/llms/vllm/

The code sample from the original documentation can be used to reproduce
the error I got.

from langchain_community.llms import VLLM
llm = VLLM(
    model="mosaicml/mpt-7b",
    trust_remote_code=True,  # mandatory for hf models
    max_new_tokens=128,
    top_k=10,
    top_p=0.95,
    temperature=0.8,
)
print(llm.invoke("What is the capital of France ?"))

![image](https://github.com/user-attachments/assets/3782d6ac-1f7b-4acc-bf2c-186216149de5)


This PR resolves the issue by ensuring that only valid parameters are
passed to SamplingParams.
2024-10-15 21:57:50 +00:00
Erick Friis
edf6d0a0fb
partners/couchbase: release 0.2.0 (attempt 2) (#27375) 2024-10-15 14:51:05 -07:00
Jorge Piedrahita Ortiz
12fea5b868
community: sambastudio chat model integration minor fix (#27238)
**Description:** sambastudio chat model integration minor fix
 fix default params
 fix usage metadata when streaming
2024-10-15 13:24:36 -04:00
ZhangShenao
f3925d71b9
community: Fix word spelling in Text2vecEmbeddings (#27183)
Fix word spelling in `Text2vecEmbeddings`
2024-10-15 09:28:48 -07:00
Erick Friis
92ae61bcc8
multiple: rely on asyncio_mode auto in tests (#27200) 2024-10-15 16:26:38 +00:00
William FH
0a3e089827
[Anthropic] Shallow Copy (#27105)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-15 15:50:48 +00:00
Matthew Peveler
c6533616b6
docs: fix community pgvector deprecation warning formatting (#27094)
**Description**: PR fixes some formatting errors in deprecation message
in the `langchain_community.vectorstores.pgvector` module, where it was
missing spaces between a few words, and one word was misspelled.
**Issue**: n/a
**Dependencies**: n/a

Signed-off-by: mpeveler@timescale.com
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-15 15:45:53 +00:00
Erick Friis
3fa5ce3e5f
community: clear mypy syntax warning in openapi (#27370)
not completely clear the regex is functional
2024-10-15 15:43:53 +00:00
Ahmet Yasin Aytar
443b37403d
community: refactor Arxiv search logic (#27084)
PR message:

Description:
This PR refactors the Arxiv API wrapper by extracting the Arxiv search
logic into a helper function (_fetch_results) to reduce code duplication
and improve maintainability. The helper function is used in methods like
get_summaries_as_docs, run, and lazy_load, streamlining the code and
making it easier to maintain in the future.

Issue:
This is a minor refactor, so no specific issue is being fixed.

Dependencies:
No new dependencies are introduced with this change.

Add tests and docs:
No new integrations were added, so no additional tests or docs are
necessary for this PR.
Lint and test:
I have run make format, make lint, and make test to ensure all checks
pass successfully.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-15 08:43:03 -07:00
Qiu Qin
57fbc6bdf1
community: Update OCI data science integration (#27083)
This PR updates the integration with OCI data science model deployment
service.

- Update LLM to support streaming and async calls.
- Added chat model.
- Updated tests and docs.
- Updated `libs/community/scripts/check_pydantic.sh` since the use of
`@pre_init` is removed from existing integration.
- Updated `libs/community/extended_testing_deps.txt` as this integration
requires `langchain_openai`.

---------

Co-authored-by: MING KANG <ming.kang@oracle.com>
Co-authored-by: Dmitrii Cherkasov <dmitrii.cherkasov@oracle.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-15 08:32:54 -07:00
Rafael Miller
fc14f675f1
Community: Updated Firecrawl Document Loader to v1 (#26548)
This PR updates the Firecrawl Document Loader to use the recently
released V1 API of Firecrawl.

**Key Updates:**

**Firecrawl V1 Integration:** Updated the document loader to leverage
the new Firecrawl V1 API for improved performance, reliability, and
developer experience.

**Map Functionality Added:** Introduced the map mode for more flexible
document loading options.

These updates enhance the integration and provide access to the latest
features of Firecrawl.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-10-15 13:13:28 +00:00
Max Tran
8fea07f92e
community: fixed KeyError: 'client' (#27345)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"
Updated

- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
twitter: @MaxHTran

- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Not needed due to small change

- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Max Tran <maxtra@amazon.com>
2024-10-14 20:51:13 +00:00
Martin Triska
8dc4bec947
[community] [Bugfix] base_o365 document loader metadata needs to be JSON serializable (#26322)
In order for indexer to work, all metadata in the documents need to be
JSON serializable. Timestamps are not.

See here:

https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/indexing/api.py#L83-L89

@eyurtsev could you please review? It's a tiny PR :-)
2024-10-14 12:48:31 -04:00
Trayan Azarov
59bbda9ba3
chroma: Deprecating versions 0.5.7 thru 0.5.12 (#27305)
**Description:** Deprecated version of Chroma >=0.5.5 <0.5.12 due to a
serious correctness issue that caused some embeddings for deployments
with multiple collections to be lost (read more on the issue in Chroma
repo)
**Issue:** chroma-core/chroma#2922 (fixed by chroma-core/chroma##2923
and released in
[0.5.13](https://github.com/chroma-core/chroma/releases/tag/0.5.13))
**Dependencies:** N/A
**Twitter handle:** `@t_azarov`
2024-10-14 11:56:05 -04:00
Marcelo Nunes Alves
5647276998
community: Problem with embeddings in new versions of clickhouse. (#26041)
Starting with Clickhouse version 24.8, a different type of configuration
has been introduced in the vectorized data ingestion, and if this
configuration occurs, an error occurs when generating the table. As can
be seen below:

![Screenshot from 2024-09-04
11-48-00](https://github.com/user-attachments/assets/70840a93-1001-490c-921a-26924c51d9eb)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-11 18:54:50 +00:00
Eugene Yurtsev
5b9b8fe80f
core[patch]: Ignore ASYNC110 to upgrade to newest ruff version (#27229)
Ignoring ASYNC110 with explanation
2024-10-09 11:25:58 -04:00
Vittorio Rigamonti
7da2efd9d3
community[minor]: VectorStore Infinispan. Adding TLS and authentication (#23522)
**Description**:
this PR enable VectorStore TLS and authentication (digest, basic) with
HTTP/2 for Infinispan server.
Based on httpx.

Added docker-compose facilities for testing
Added documentation

**Dependencies:**
requires `pip install httpx[http2]` if HTTP2 is needed

**Twitter handle:**
https://twitter.com/infinispan
2024-10-09 10:51:39 -04:00
Diao Zihao
4553573acb
core[patch],langchain[patch],community[patch]: Bump version dependency of tenacity to >=8.1.0,!=8.4.0,<10 (#27201)
This should fixes the compatibility issue with graprag as in

- https://github.com/langchain-ai/langchain/discussions/25595

Here are the release notes for tenacity 9
(https://github.com/jd/tenacity/releases/tag/9.0.0)

---------

Signed-off-by: Zihao Diao <hi@ericdiao.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-09 14:00:45 +00:00
Stefano Lottini
d05fdd97dd
community: Cassandra Vector Store: extend metadata-related methods (#27078)
**Description:** this PR adds a set of methods to deal with metadata
associated to the vector store entries. These, while essential to the
Graph-related extension of the `Cassandra` vector store, are also useful
in themselves. These are (all come in their sync+async versions):

- `[a]delete_by_metadata_filter`
- `[a]replace_metadata`
- `[a]get_by_document_id`
- `[a]metadata_search`

Additionally, a `[a]similarity_search_with_embedding_id_by_vector`
method is introduced to better serve the store's internal working (esp.
related to reranking logic).

**Issue:** no issue number, but now all Document's returned bear their
`.id` consistently (as a consequence of a slight refactoring in how the
raw entries read from DB are made back into `Document` instances).

**Dependencies:** (no new deps: packaging comes through langchain-core
already; `cassio` is now required to be version 0.1.10+)


**Add tests and docs**
Added integration tests for the relevant newly-introduced methods.
(Docs will be updated in a separate PR).

**Lint and test** Lint and (updated) test all pass.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-09 06:41:34 +00:00
Erick Friis
84c05b031d
community: release 0.3.2 (#27214) 2024-10-08 23:33:55 -07:00
Serena Ruan
a7c1ce2b3f
[community] Add timeout control and retry for UC tool execution (#26645)
Add timeout at client side for UCFunctionToolkit and add retry logic.
Users could specify environment variable
`UC_TOOL_CLIENT_EXECUTION_TIMEOUT` to increase the timeout value for
retrying to get the execution response if the status is pending. Default
timeout value is 120s.


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

Tested in Databricks:
<img width="1200" alt="image"
src="https://github.com/user-attachments/assets/54ab5dfc-5e57-4941-b7d9-bfe3f8ad3f62">



- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Signed-off-by: serena-ruan <serena.rxy@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-09 06:31:48 +00:00
Tomaz Bratanic
481bd25d29
community: Fix database connections for neo4j (#27190)
Fixes https://github.com/langchain-ai/langchain/issues/27185

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-08 23:47:55 +00:00
Erick Friis
cedf4d9462
langchain: release 0.3.3 (#27213) 2024-10-08 16:39:42 -07:00
Erick Friis
7264fb254c
core: release 0.3.10 (#27209) 2024-10-08 16:21:42 -07:00
Bagatur
ce33c4fa40
openai[patch]: default temp=1 for o1 (#27206) 2024-10-08 15:45:21 -07:00
RIdham Golakiya
73ad7f2e7a
langchain_chroma[patch]: updated example for get documents with where clause (#26767)
Example updated for vectorstore ChromaDB.

If we want to apply multiple filters then ChromaDB supports filters like
this:
Reference: [ChromaDB
filters](https://cookbook.chromadb.dev/core/filters/)

Thank you.
2024-10-08 20:21:58 +00:00
Bagatur
e3e9ee8398
core[patch]: utils for adding/subtracting usage metadata (#27203) 2024-10-08 13:15:33 -07:00
ccurme
e3920f2320
community[patch]: fix structured_output in llamacpp integration (#27202)
Resolves https://github.com/langchain-ai/langchain/issues/25318.
2024-10-08 15:16:59 -04:00
Erick Friis
b84e00283f
standard-tests: test that only one chunk sets input_tokens (#27177) 2024-10-08 11:35:32 -07:00
Ajayeswar Reddy
9b7bdf1a26
Fixed typo in llibs/community/langchain_community/storage/sql.py (#27029)
- [ ] **PR title**: docs: fix typo in SQLStore import path

- [ ] **PR message**: 
- **Description:** This PR corrects a typo in the docstrings for the
class SQLStore(BaseStore[str, bytes]). The import path in the docstring
currently reads from langchain_rag.storage import SQLStore, which should
be changed to langchain_community.storage import SQLStore. This typo is
also reflected in the official documentation.
    - **Issue:** N/A
    - **Dependencies:** None
    - **Twitter handle:** N/A

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-08 17:51:26 +00:00
Vadym Barda
8d27325dbc
core[patch]: support ValidationError from pydantic v1 in tools (#27194) 2024-10-08 10:19:04 -04:00
Christophe Bornet
16f5fdb38b
core: Add various ruff rules (#26836)
Adds
- ASYNC
- COM
- DJ
- EXE
- FLY
- FURB
- ICN
- INT
- LOG
- NPY
- PD
- Q
- RSE
- SLOT
- T10
- TID
- YTT

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-07 22:30:27 +00:00
Erick Friis
5c826faece
core: update make format to fix all autofixable things (#27174) 2024-10-07 15:20:47 -07:00
Christophe Bornet
d31ec8810a
core: Add ruff rules for error messages (EM) (#26965)
All auto-fixes

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-07 22:12:28 +00:00
Oleksii Pokotylo
37ca468d03
community: AzureSearch: fix reranking for empty lists (#27104)
**Description:** 
  Fix reranking for empty lists 

**Issue:** 
```
ValueError: not enough values to unpack (expected 3, got 0)
    documents, scores, vectors = map(list, zip(*docs))
  File langchain_community/vectorstores/azuresearch.py", line 1680, in _reorder_results_with_maximal_marginal_relevance
```

Co-authored-by: Oleksii Pokotylo <oleksii.pokotylo@pwc.com>
2024-10-07 15:27:09 -04:00
Christophe Bornet
c4ebccfec2
core[minor]: Improve support for id in VectorStore (#26660)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-07 15:01:08 -04:00
Bharat Ramanathan
931ce8d026
core[patch]: Update AsyncCallbackManager to honor run_inline attribute and prevent context loss (#26885)
## Description

This PR fixes the context loss issue in `AsyncCallbackManager`,
specifically in `on_llm_start` and `on_chat_model_start` methods. It
properly honors the `run_inline` attribute of callback handlers,
preventing race conditions and ordering issues.

Key changes:
1. Separate handlers into inline and non-inline groups.
2. Execute inline handlers sequentially for each prompt.
3. Execute non-inline handlers concurrently across all prompts.
4. Preserve context for stateful handlers.
5. Maintain performance benefits for non-inline handlers.

**These changes are implemented in `AsyncCallbackManager` rather than
`ahandle_event` because the issue occurs at the prompt and message_list
levels, not within individual events.**

## Testing

- Test case implemented in #26857 now passes, verifying execution order
for inline handlers.

## Related Issues

- Fixes issue discussed in #23909 

## Dependencies

No new dependencies are required.

---

@eyurtsev: This PR implements the discussed changes to respect
`run_inline` in `AsyncCallbackManager`. Please review and advise on any
needed changes.

Twitter handle: @parambharat

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-07 14:59:29 -04:00
João Carlos Ferra de Almeida
780ce00dea
core[minor]: add **kwargs to index and aindex functions for custom vector_field support (#26998)
Added `**kwargs` parameters to the `index` and `aindex` functions in
`libs/core/langchain_core/indexing/api.py`. This allows users to pass
additional arguments to the `add_documents` and `aadd_documents`
methods, enabling the specification of a custom `vector_field`. For
example, users can now use `vector_field="embedding"` when indexing
documents in `OpenSearchVectorStore`

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-10-07 14:52:50 -04:00
Jorge Piedrahita Ortiz
14de81b140
community: sambastudio chat model (#27056)
**Description:**: sambastudio chat model integration added, previously
only LLM integration
     included docs and tests

---------

Co-authored-by: luisfucros <luisfucros@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-10-07 14:31:39 -04:00
Aditya Anand
f70650f67d
core[patch]: correct typo doc-string for astream_events method (#27108)
This commit addresses a typographical error in the documentation for the
async astream_events method. The word 'evens' was incorrectly used in
the introductory sentence for the reference table, which could lead to
confusion for users.\n\n### Changes Made:\n- Corrected 'Below is a table
that illustrates some evens that might be emitted by various chains.' to
'Below is a table that illustrates some events that might be emitted by
various chains.'\n\nThis enhancement improves the clarity of the
documentation and ensures accurate terminology is used throughout the
reference material.\n\nIssue Reference: #27107
2024-10-07 14:12:42 -04:00
Bagatur
38099800cc
docs: fix anthropic max_tokens docstring (#27166) 2024-10-07 16:51:42 +00:00
ogawa
07dd8dd3d7
community[patch]: update gpt-4o cost (#27038)
updated OpenAI cost definition according to the following:
https://openai.com/api/pricing/
2024-10-07 09:06:30 -04:00
Bagatur
06ce5d1d5c
anthropic[patch]: Release 0.2.3 (#27126) 2024-10-04 22:38:03 +00:00
Bagatur
0b8416bd2e
anthropic[patch]: fix input_tokens when cached (#27125) 2024-10-04 22:35:51 +00:00
Bagatur
bd5b335cb4
standard-tests[patch]: fix oai usage metadata test (#27122) 2024-10-04 20:00:48 +00:00
Bagatur
827bdf4f51
fireworks[patch]: Release 0.2.1 (#27120) 2024-10-04 18:59:15 +00:00
Bagatur
98942edcc9
openai[patch]: Release 0.2.2 (#27119) 2024-10-04 11:54:01 -07:00
Bagatur
414fe16071
anthropic[patch]: Release 0.2.2 (#27118) 2024-10-04 11:53:53 -07:00
Bagatur
11df1b2b8d
core[patch]: Release 0.3.9 (#27117) 2024-10-04 18:35:33 +00:00
Scott Hurrey
558fb4d66d
box: Add citation support to langchain_box.retrievers.BoxRetriever when used with Box AI (#27012)
Thank you for contributing to LangChain!

**Description:** Box AI can return responses, but it can also be
configured to return citations. This change allows the developer to
decide if they want the answer, the citations, or both. Regardless of
the combination, this is returned as a single List[Document] object.

**Dependencies:** Updated to the latest Box Python SDK, v1.5.1
**Twitter handle:** BoxPlatform


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-04 18:32:34 +00:00