Commit Graph

7089 Commits

Author SHA1 Message Date
ccurme
32fcc97a90 openai[patch]: compat with Bedrock Converse (#31280)
ChatBedrockConverse passes through reasoning content blocks in [Bedrock
Converse
format](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html).

Similar to how we handle Anthropic thinking blocks, here we ensure these
are filtered out of OpenAI request payloads.

Resolves https://github.com/langchain-ai/langchain/issues/31279.
2025-05-19 10:35:26 -04:00
Christophe Bornet
17c5a1621f core: Improve Runnable __or__ method typing annotations (#31273)
* It is possible to chain a `Runnable` with an `AsyncIterator` as seen
in `test_runnable.py`.
* Iterator and AsyncIterator Input/Output of Callables must be put
before `Callable[[Other], Any]` otherwise the pattern matching picks the
latter.
2025-05-19 09:32:31 -04:00
mathislindner
e1af509966 anthropic: emit informative error message if there are only system messages in a prompt (#30822)
**PR message**: Not sure if I put the check at the right spot, but I
thought throwing the error before the loop made sense to me.
**Description:** Checks if there are only system messages using
AnthropicChat model and throws an error if it's the case. Check Issue
for more details
**Issue:** #30764

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-05-16 20:43:59 +00:00
OysterMax
eb25d7472d core: support Union type args in strict mode of OpenAI function calling / structured output (#30971)
**Issue:**[
#309070](https://github.com/langchain-ai/langchain/issues/30970)

**Cause**
Arg type in python code
```
arg: Union[SubSchema1, SubSchema2]
``` 
is translated to `anyOf` in **json schema**
```
"anyOf" : [{sub schema 1 ...}, {sub schema 1 ...}]
```
The value of anyOf is a list sub schemas. 
The bug is caused since the sub schemas inside `anyOf` list is not taken
care of.
The location where the issue happens is `convert_to_openai_function`
function -> `_recursive_set_additional_properties_false` function, that
recursively adds `"additionalProperties": false` to json schema which is
[required by OpenAI's strict function
calling](https://platform.openai.com/docs/guides/structured-outputs?api-mode=responses#additionalproperties-false-must-always-be-set-in-objects)

**Solution:**
This PR fixes this issue by iterating each sub schema inside `anyOf`
list.
A unit test is added.

**Twitter handle:** shengboma 


If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-05-16 16:20:32 -04:00
Christophe Bornet
c982573f1e core: Add ruff rules A (builtins shadowing) (#29312)
See https://docs.astral.sh/ruff/rules/#flake8-builtins-a
* Renamed vars where possible
* Added `noqa` where backward compatibility was needed
* Added `@override` when applicable
2025-05-16 15:19:37 -04:00
Shkarupa Alex
671e4fd114 langchain[patch]: Allow async indexing code to work for vectorstores that only defined sync delete (#30869)
`aindex` function should check not only `adelete` method, but `delete`
method too

**PR title**: "core: fix async indexing issue with adelete/delete
checking"
**PR message**: Currently `langchain.indexes.aindex` checks if vector
store has overrided adelete method. But due to `adelete` default
implementation store can have just `delete` overrided to make `adelete`
working.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-05-16 15:10:25 -04:00
ccurme
a401d7e52a ollama: release 0.3.3 (#31253) 2025-05-15 16:24:04 -04:00
Alexey Bondarenko
9efafe3337 ollama: Add separate kwargs parameter for async client (#31209)
**Description**:

Add a `async_client_kwargs` field to ollama chat/llm/embeddings adapters
that is passed to async httpx client constructor.

**Motivation:**

In my use-case:
- chat/embedding model adapters may be created frequently, sometimes to
be called just once or to never be called at all
- they may be used in bots sunc and async mode (not known at the moment
they are created)

So, I want to keep a static transport instance maintaining connection
pool, so model adapters can be created and destroyed freely. But that
doesn't work when both sync and async functions are in use as I can only
pass one transport instance for both sync and async client, while
transport types must be different for them. So I can't make both sync
and async calls use shared transport with current model adapter
interfaces.

In this PR I add a separate `async_client_kwargs` that gets passed to
async client constructor, so it will be possible to pass a separate
transport instance. For sake of backwards compatibility, it is merged
with `client_kwargs`, so nothing changes when it is not set.

I am unable to run linter right now, but the changes look ok.
2025-05-15 16:10:10 -04:00
ccurme
6bbc12b7f7 chroma: release 0.2.4 (#31252) 2025-05-15 15:58:29 -04:00
Jai Radhakrishnan
aa4890c136 partners: update deps for langchain-chroma (#31251)
Updates dependencies to Chroma to integrate the major release of Chroma
with improved performance, and to fix issues users have been seeing
using the latest chroma docker image with langchain-chroma

https://github.com/langchain-ai/langchain/issues/31047#issuecomment-2850790841
Updates chromadb dependency to >=1.0.9

This also removes the dependency of chroma-hnswlib, meaning it can run
against python 3.13 runners for tests as well.

Tested this by pulling the latest Chroma docker image, running
langchain-chroma using client mode
```
httpClient = chromadb.HttpClient(host="localhost", port=8000)

vector_store = Chroma(
    client=httpClient,
    collection_name="test",
    embedding_function=embeddings,
)
```
2025-05-15 15:55:15 -04:00
Christophe Bornet
a8f2ddee31 core: Add ruff rules RUF (#29353)
See https://docs.astral.sh/ruff/rules/#ruff-specific-rules-ruf
Mostly:
* [RUF022](https://docs.astral.sh/ruff/rules/unsorted-dunder-all/)
(unsorted `__all__`)
* [RUF100](https://docs.astral.sh/ruff/rules/unused-noqa/) (unused noqa)
*
[RUF021](https://docs.astral.sh/ruff/rules/parenthesize-chained-operators/)
(parenthesize-chained-operators)
*
[RUF015](https://docs.astral.sh/ruff/rules/unnecessary-iterable-allocation-for-first-element/)
(unnecessary-iterable-allocation-for-first-element)
*
[RUF005](https://docs.astral.sh/ruff/rules/collection-literal-concatenation/)
(collection-literal-concatenation)
* [RUF046](https://docs.astral.sh/ruff/rules/unnecessary-cast-to-int/)
(unnecessary-cast-to-int)

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-05-15 15:43:57 -04:00
Christophe Bornet
6cd1aadf60 langchain: use mypy strict checking with exemptions (#31018)
* Use strict checking and exclude some rules as TODOs
* Fix imports not exposed in `__all__`

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-05-15 11:37:18 -04:00
Christophe Bornet
eab8484a80 text-splitters[patch]: fix some import-untyped errors (#31030) 2025-05-15 11:34:22 -04:00
ccurme
672339f3c6 core: release 0.3.60 (#31249) 2025-05-15 11:14:04 -04:00
ccurme
8b145d5dc3 openai: release 0.3.17 (#31246) 2025-05-15 09:18:22 -04:00
Christophe Bornet
921573e2b7 core: Add ruff rules SLF (#30666)
Add ruff rules SLF: https://docs.astral.sh/ruff/rules/#flake8-self-slf

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-05-14 18:42:39 +00:00
Sydney Runkle
7263011b24 perf[core]: remove unnecessary model validators (#31238)
* Remove unnecessary cast of id -> str (can do with a field setting)
* Remove unnecessary `set_text` model validator (can be done with a
computed field - though we had to make some changes to the `Generation`
class to make this possible

Before: ~2.4s

Blue circles represent time spent in custom validators :(

<img width="1337" alt="Screenshot 2025-05-14 at 10 10 12 AM"
src="https://github.com/user-attachments/assets/bb4f477f-4ee3-4870-ae93-14ca7f197d55"
/>


After: ~2.2s

<img width="1344" alt="Screenshot 2025-05-14 at 10 11 03 AM"
src="https://github.com/user-attachments/assets/99f97d80-49de-462f-856f-9e7e8662adbc"
/>

We still want to optimize the backwards compatible tool calls model
validator, though I think this might involve breaking changes, so wanted
to separate that into a different PR. This is circled in green.
2025-05-14 10:20:22 -07:00
Sydney Runkle
1523602196 packaging[core]: bump min pydantic version (#31239)
Bumping to a version that's a year old, so seems like a reasonable bump.
2025-05-14 10:01:24 -07:00
Lope Ramos
b8ae2de169 langchain-core[patch]: Incremental record manager deletion should be batched (#31206)
**Description:** Before this commit, if one record is batched in more
than 32k rows for sqlite3 >= 3.32 or more than 999 rows for sqlite3 <
3.31, the `record_manager.delete_keys()` will fail, as we are creating a
query with too many variables.

This commit ensures that we are batching the delete operation leveraging
the `cleanup_batch_size` as it is already done for `full` cleanup.

Added unit tests for incremental mode as well on different deleting
batch size.
2025-05-14 11:38:21 -04:00
Sydney Runkle
263c215112 perf[core]: remove generations summation from hot loop (#31231)
1. Removes summation of `ChatGenerationChunk` from hot loops in `stream`
and `astream`
2. Removes run id gen from loop as well (minor impact)

Again, benchmarking on processing ~200k chunks (a poem about broccoli).

Before: ~4.2s

Blue circle is all the time spent adding up gen chunks

<img width="1345" alt="Screenshot 2025-05-14 at 7 48 33 AM"
src="https://github.com/user-attachments/assets/08a59d78-134d-4cd3-9d54-214de689df51"
/>

After: ~2.3s

Blue circle is remaining time spent on adding chunks, which can be
minimized in a future PR by optimizing the `merge_content`,
`merge_dicts`, and `merge_lists` utilities.

<img width="1353" alt="Screenshot 2025-05-14 at 7 50 08 AM"
src="https://github.com/user-attachments/assets/df6b3506-929e-4b6d-b198-7c4e992c6d34"
/>
2025-05-14 08:13:05 -07:00
Sydney Runkle
17b799860f perf[core]: remove costly async helpers for non-end event handlers (#31230)
1. Remove `shielded` decorator from non-end event handlers
2. Exit early with a `self.handlers` check instead of doing unnecessary
asyncio work

Using a benchmark that processes ~200k chunks (a poem about broccoli).

Before: ~15s

Circled in blue is unnecessary event handling time. This is addressed by
point 2 above

<img width="1347" alt="Screenshot 2025-05-14 at 7 37 53 AM"
src="https://github.com/user-attachments/assets/675e0fed-8f37-46c0-90b3-bef3cb9a1e86"
/>

After: ~4.2s

The total time is largely reduced by the removal of the `shielded`
decorator, which holds little significance for non-end handlers.

<img width="1348" alt="Screenshot 2025-05-14 at 7 37 22 AM"
src="https://github.com/user-attachments/assets/54be8a3e-5827-4136-a87b-54b0d40fe331"
/>
2025-05-14 07:42:56 -07:00
ccurme
0b8837a0cc openai: support runtime kwargs in embeddings (#31195) 2025-05-14 09:14:40 -04:00
ccurme
868cfc4a8f openai: ignore function_calls if tool_calls are present (#31198)
Some providers include (legacy) function calls in `additional_kwargs` in
addition to tool calls. We currently unpack both function calls and tool
calls if present, but OpenAI will raise 400 in this case.

This can come up if providers are mixed in a tool-calling loop. Example:
```python
from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage
from langchain_core.tools import tool


@tool
def get_weather(location: str) -> str:
    """Get weather at a location."""
    return "It's sunny."



gemini = init_chat_model("google_genai:gemini-2.0-flash-001").bind_tools([get_weather])
openai = init_chat_model("openai:gpt-4.1-mini").bind_tools([get_weather])

input_message = HumanMessage("What's the weather in Boston?")
tool_call_message = gemini.invoke([input_message])

assert len(tool_call_message.tool_calls) == 1
tool_call = tool_call_message.tool_calls[0]
tool_message = get_weather.invoke(tool_call)

response = openai.invoke(  # currently raises 400 / BadRequestError
    [input_message, tool_call_message, tool_message]
)
```

Here we ignore function calls if tool calls are present.
2025-05-12 13:50:56 -04:00
Christophe Bornet
83d006190d core: Fix some private member accesses (#30912)
See https://github.com/langchain-ai/langchain/pull/30666

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2025-05-12 17:42:26 +00:00
CtrlMj
1e56c66f86 core: Fix issue 31035 alias fields in base tool langchain core (#31112)
**Description**: The 'inspect' package in python skips over the aliases
set in the schema of a pydantic model. This is a workound to include the
aliases from the original input.
**issue**: #31035 


Cc: @ccurme @eyurtsev

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-05-12 11:04:13 -04:00
meirk-brd
e6147ce5d2 docs: Add Brightdata integration documentation (#31114)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
  - Example: "core: add foobar LLM"

- **Description:** Integrated the Bright Data package to enable
Langchain users to seamlessly incorporate Bright Data into their agents.
 - **Dependencies:** None
- **LinkedIn handle**:[Bright
Data](https://www.linkedin.com/company/bright-data)

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-05-11 16:07:21 +00:00
ccurme
ff9183fd3c docs: add Gel integration (#31186)
Continued from https://github.com/langchain-ai/langchain/pull/31050

---------

Co-authored-by: deepbuzin <contactbuzin@gmail.com>
2025-05-11 10:17:18 -04:00
ccurme
77d3f04e0a docs: add Aerospike to package registry (#31185)
Missed as part of https://github.com/langchain-ai/langchain/pull/31156
2025-05-11 09:33:58 -04:00
Sumin Shin
683da2c9e9 text-splitters: Fix regex separator merge bug in CharacterTextSplitter (#31137)
**Description:**
Fix the merge logic in `CharacterTextSplitter.split_text` so that when
using a regex lookahead separator (`is_separator_regex=True`) with
`keep_separator=False`, the raw pattern is not re-inserted between
chunks.

**Issue:**
Fixes #31136 

**Dependencies:**
None

**Twitter handle:**
None

Since this is my first open-source PR, please feel free to point out any
mistakes, and I'll be eager to make corrections.
2025-05-10 15:42:03 -04:00
ccurme
e9e597be8e docs: update sort order in integrations table (#31171) 2025-05-08 20:44:21 +00:00
ccurme
9aac8923a3 docs: add web search to anthropic docs (#31169) 2025-05-08 16:20:11 -04:00
ccurme
2d202f9762 anthropic[patch]: split test into two (#31167) 2025-05-08 09:23:36 -04:00
ccurme
d4555ac924 anthropic: release 0.3.13 (#31162) 2025-05-08 03:13:15 +00:00
ccurme
e34f9fd6f7 anthropic: update streaming usage metadata (#31158)
Anthropic updated how they report token counts during streaming today.
See changes to `MessageDeltaUsage` in [this
commit](2da00f26c5 (diff-1a396eba0cd9cd8952dcdb58049d3b13f6b7768ead1411888d66e28211f7bfc5)).

It's clean and simple to grab these fields from the final
`message_delta` event. However, some of them are typed as Optional, and
language
[here](e42451ab3f/src/anthropic/lib/streaming/_messages.py (L462))
suggests they may not always be present. So here we take the required
field from the `message_delta` event as we were doing previously, and
ignore the rest.
2025-05-07 23:09:56 -04:00
ccurme
682f338c17 anthropic[patch]: support web search (#31157) 2025-05-07 18:04:06 -04:00
ccurme
d7e016c5fc huggingface: release 0.2 (#31153) 2025-05-07 15:33:07 -04:00
ccurme
4b11cbeb47 huggingface[patch]: update lockfile (#31152) 2025-05-07 15:17:33 -04:00
ccurme
b5b90b5929 anthropic[patch]: be robust to null fields when translating usage metadata (#31151) 2025-05-07 18:30:21 +00:00
ccurme
f70b263ff3 core: release 0.3.59 (#31150) 2025-05-07 17:36:59 +00:00
ccurme
bb69d4c42e docs: specify js support for tavily (#31149) 2025-05-07 11:30:04 -04:00
zhurou603
1df3ee91e7 partners: (langchain-openai) total_tokens should not add 'Nonetype' t… (#31146)
partners: (langchain-openai) total_tokens should not add 'Nonetype' t…

# PR Description

## Description
Fixed an issue in `langchain-openai` where `total_tokens` was
incorrectly adding `None` to an integer, causing a TypeError. The fix
ensures proper type checking before adding token counts.

## Issue
Fixes the TypeError traceback shown in the image where `'NoneType'`
cannot be added to an integer.

## Dependencies
None

## Twitter handle
None

![image](https://github.com/user-attachments/assets/9683a795-a003-455a-ada9-fe277245e2b2)

Co-authored-by: qiulijie <qiulijie@yuaiweiwu.com>
2025-05-07 11:09:50 -04:00
Collier King
19041dcc95 docs: update langchain-cloudflare repo/path on packages.yaml (#31138)
**Library Repo Path Update **: "langchain-cloudflare"

We recently changed our `langchain-cloudflare` repo to allow for future
libraries.
Created a `libs` folder to hold `langchain-cloudflare` python package.


https://github.com/cloudflare/langchain-cloudflare/tree/main/libs/langchain-cloudflare
 
On `langchain`, updating `packages.yaml` to point to new
`libs/langchain-cloudflare` library folder.
2025-05-07 11:01:25 -04:00
Jacob Lee
66d1ed6099 fix(core): Permit OpenAI style blocks to be passed into convert_to_openai_messages (#31140)
Should effectively be a noop, just shouldn't throw

CC @madams0013

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-05-07 10:57:37 -04:00
唐小鸭
50fa524a6d partners: (langchain-deepseek) fix deepseek-r1 always returns an empty reasoning_content when reasoning (#31065)
## Description
deepseek-r1 always returns an empty string `reasoning_content` to the
first chunk when thinking, and sets `reasoning_content` to None when
thinking is over, to determine when to switch to normal output.

Therefore, whether the reasoning_content field exists should be judged
as None.

## Demo
deepseek-r1 reasoning output: 

```
{'delta': {'content': None, 'function_call': None, 'refusal': None, 'role': 'assistant', 'tool_calls': None, 'reasoning_content': ''}, 'finish_reason': None, 'index': 0, 'logprobs': None}
{'delta': {'content': None, 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': '好的'}, 'finish_reason': None, 'index': 0, 'logprobs': None}
{'delta': {'content': None, 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': ','}, 'finish_reason': None, 'index': 0, 'logprobs': None}
{'delta': {'content': None, 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': '用户'}, 'finish_reason': None, 'index': 0, 'logprobs': None}
...
```

deepseek-r1 first normal output
```
...
{'delta': {'content': ' main', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}
{'delta': {'content': '\n\nimport', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}
...
```

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-05-05 22:31:58 +00:00
Stefano Lottini
325f729a92 docs: improvements to Astra DB pages, especially modernize Vector DB example notebook (#30961)
This PR brings several improvements and modernizations to the
documentation around the Astra DB partner package.

- language alignment for better matching with the terms used in the
Astra DB docs
- updated several links to pages on said documentation
- for the `AstraDBVectorStore`, added mentions of the new features in
the overall `astra.mdx`
- for the vector store, rewritten/upgraded most of the usage example
notebook for a more straightforward experience able to highlight the
main usage patterns (including new ones such as the newly-introduced
"autodetect feature")

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-05-03 14:26:52 -04:00
Asif Mehmood
00ac49dd3e Replace deprecated .dict() with .model_dump() for Pydantic v2 compatibility (#31107)
**What does this PR do?**
This PR replaces deprecated usages of ```.dict()``` with
```.model_dump()``` to ensure compatibility with Pydantic v2 and prepare
for v3, addressing the deprecation warning
```PydanticDeprecatedSince20``` as required in [Issue#
31103](https://github.com/langchain-ai/langchain/issues/31103).

**Changes made:**
* Replaced ```.dict()``` with ```.model_dump()``` in multiple locations
* Ensured consistency with Pydantic v2 migration guidelines
* Verified compatibility across affected modules

**Notes**
* This is a code maintenance and compatibility update
* Tested locally with Pydantic v2.11
* No functional logic changes; only internal method replacements to
prevent deprecation issues
2025-05-03 13:40:54 -04:00
ccurme
6268ae8db0 langchain: release 0.3.25 (#31101) 2025-05-02 17:42:32 +00:00
ccurme
77ecf47f6d openai: release 0.3.16 (#31100) 2025-05-02 13:14:46 -04:00
ccurme
ff41f47e91 core: release 0.3.58 (#31099) 2025-05-02 12:46:32 -04:00
Eugene Yurtsev
4da525bc63 langchain[patch]: Remove beta decorator from init_embeddings (#31098)
Remove beta decorator from init_embeddings.
2025-05-02 11:52:50 -04:00