As a LangChain user streaming a tool-calling model, I expect each
streamed chunk to expose structured `tool_call_chunk` content blocks so
I can render or process tool calls live, instead of waiting for the
final aggregated message.
This adds `tool_call_streaming` to `ModelProfile` and uses it in the
standard chat-model tool-calling tests. When a model profile opts in,
`test_tool_calling` and `test_tool_calling_async` now validate that at
least one streamed chunk includes a `tool_call_chunk` block via
`content_blocks`, while preserving the existing final-message
validation.
This keeps the contract profile-gated so providers can opt in once their
streaming chunk shape is verified. This PR opts in the providers
verified by smoke testing with straightforward profile coverage: OpenAI,
Anthropic, Fireworks, HuggingFace, OpenRouter, DeepSeek, and xAI. The
generated profile artifacts are refreshed so runtime profiles expose the
new capability flag.
Perplexity Responses also passed the smoke test, but its current profile
data is for the `sonar` family while the Responses smoke path used a
routed model string. That profile strategy is left as follow-up.
MistralAI currently streams `.tool_call_chunks`, but its content-block
translator exposes a complete `tool_call` block instead of
`tool_call_chunk`, so it also stays out of this flag until that
integration is fixed.
Fixes#30870
When an `AsyncBaseTracer` with `_schema_format="original"` (the default)
is used with sync `llm.invoke()`, the `on_chat_model_start` to
`on_llm_start` fallback doesn't fire. The async handler returns a
coroutine instead of raising `NotImplementedError` synchronously, so it
bypasses the existing fallback logic and lands in `_run_coros`, which
only logs the error generically.
This fallback already works for sync handlers in sync context and async
handlers in async context. This PR closes the gap for async handlers in
sync context.
Fixes#35244
Users can write async agent middleware with `@wrap_model_call`, and
LangChain already supports that behavior at runtime by detecting
coroutine functions and wiring them to `awrap_model_call`.
However, the decorator's public typing currently describes only the sync
callable shape. As a result, valid async middleware is rejected by type
checkers such as mypy and ty, even though the same code runs correctly.
This updates the middleware decorator types so async `wrap_model_call`
and `wrap_tool_call` functions type-check consistently with their
runtime behavior. It also simplifies related callable aliases and uses
casts where `iscoroutinefunction` narrows the callable at runtime but
static type checkers cannot follow that narrowing.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Closes#37736
---
`EnsembleRetriever` normalizes retriever outputs to `Document` objects
in both `rank_fusion` (sync) and `arank_fusion` (async), but the two
methods used different conditions:
- `rank_fusion` wraps only bare strings: `isinstance(doc, str)`
- `arank_fusion` wrapped anything that isn't a `Document`: `not
isinstance(doc, Document)`
If a retriever returns a non-string, non-`Document` value through the
async path, `arank_fusion` would try to construct
`Document(page_content=<non-string>)` and Pydantic raises a
`ValidationError`. The sync path handles the same input without crashing
— the behavior is inconsistent.
The fix is a one-line change in `arank_fusion` to use `isinstance(doc,
str)`, matching the sync path exactly.
Three tests were added to `test_ensemble.py`:
- `test_rank_fusion_bare_strings` — sync path wraps bare strings into
Documents
- `test_arank_fusion_bare_strings` — async path wraps bare strings into
Documents
- `test_arank_fusion_matches_rank_fusion` — sync and async return
identical results for normal Document input
---
This continues the work from #37737 by @AliMuhammadAslam (credited as
co-author), rebased onto `master` with the type-check lint failure
resolved. Supersedes that PR.
Co-authored-by: AliMuhammadAslam <aaalimohdaslam@gmail.com>
Provider-native structured output fallback detection now uses bounded
model-name patterns instead of broad substring checks, reducing false
positives for unrelated model IDs. The model examples and test fixtures
across OpenAI/OpenRouter-facing code were refreshed around current
OpenAI model families while preserving shipped defaults.
## Changes
- Tightened `FALLBACK_MODELS_WITH_STRUCTURED_OUTPUT` from loose string
fragments to regex patterns, with `_supports_provider_strategy` matching
full model-name segments instead of arbitrary substrings.
- Expanded structured-output fallback coverage for newer OpenAI,
Anthropic, and xAI/Grok model families, including `gpt-5.x`, newer
Claude 4/5-style names, and `grok-build`.
- Reused `_attempt_infer_model_provider` in provider tool search routing
so `_provider_from_model_name` follows the same provider inference
behavior as `init_chat_model`.
- Suppressed irrelevant provider-inference deprecation warnings during
provider tool search registry lookup.
- Refreshed OpenAI, Azure OpenAI, OpenRouter, core metadata, and example
model references from older fixtures like `gpt-4`, `gpt-4o`, `o1`, and
`o4-mini` to current test/profile models such as `gpt-5.5`,
`gpt-5-nano`, and `gpt-4.1-mini`.
- Removed outdated OpenAI test assumptions around legacy `o1` behavior
and narrowed legacy structured-output checks to explicitly legacy model
names.
`ChatMistralAI` now supports `stop` sequences.
Previously, a `stop` value passed to the model was silently discarded:
the code carried a stale "not yet supported" note, dropped the parameter
before the request, and logged a warning. Mistral's chat completions API
does accept `stop` (a string or list of strings, up to 4 sequences), so
anyone setting `stop` and expecting generation to halt was getting no
effect.
Now `stop` is a first-class parameter. It can be set on the constructor
(`ChatMistralAI(stop=[...])`) or per call (`model.invoke(prompt,
stop=[...])`) and is forwarded to the API. A per-call value overrides
the instance default, and an empty list is treated as "no stop
sequences" — omitted from the request rather than sent as an empty array
(which the API rejects).
Verified against the live Mistral API: with `stop=["5"]`, "Count from 1
to 10" returns `1 2 3 4 ` instead of the full sequence. The 422
`extra_forbidden` response the API returns for genuinely unknown fields
confirms `stop` is a real schema field, not silently ignored.
This PR also folds in some test hygiene: the base-URL env test uses
`monkeypatch.setenv` so `MISTRAL_BASE_URL=boo` no longer leaks into
later serialization tests, and `test_extra_kwargs` asserts the
intentional unknown-kwarg warning with `pytest.warns`.
## Review notes
- Behavior change worth a careful look: `stop` now reaches the API
instead of being dropped. This changes request payloads for anyone
previously passing `stop`. It is the intended fix, but flagging it
explicitly.
- Coverage: `test_stop_sequence` (integration) exercises the end-to-end
behavior; unit tests cover parameter wiring, per-call-vs-instance
precedence, and the empty-list case.
Partner unit tests now reflect the warning behavior emitted by updated
`langchain-core` serialization and model initialization paths.
Warning-strict runs can stay focused on the behavior under test rather
than expected framework warnings.
Warning-producing test paths now either exercise the intended Anthropic
model branch or explicitly assert expected warnings. That keeps `make
test` output clean while preserving coverage for backwards-compatible
parameters, deprecated `AnthropicLLM`, and standard structured-output
behavior.
Anthropic unit tests now pin the expected API base URL where
serialization and initialization assertions depend on it. That keeps
local gateway settings like `ANTHROPIC_BASE_URL` from changing snapshot
output or default URL assertions during development.
Supersedes #34727Closes#30703
Related:
* langchain-ai/langchain-google#1460
* langchain-ai/langchain-google#1501
Fixing this at the `langchain-core` callback layer instead of
normalizing inside individual provider integrations, so structured
streaming content is preserved consistently.
---
Models are increasingly streaming structured content blocks instead of
plain text tokens. For example, Gemini 3 can stream text as
content-block lists, and Anthropic/tool-use flows can also produce
non-text message content. Today those values already reach
`on_llm_new_token`, but the callback API still advertises `token: str`,
which makes custom callbacks, tracers, and streaming helpers assume
every streamed value is text.
User story: as a LangChain user building a streaming callback for chat
models with tool calls, reasoning/thinking blocks, or provider-specific
structured content, I need `on_llm_new_token` to accept the same content
shape that chat model chunks can actually emit, so my callback can
observe the stream without providers flattening or dropping non-text
data.
Fixing this in `langchain-core` makes the existing runtime behavior
explicit at the shared callback boundary. Normalizing content blocks
inside each provider would duplicate logic, produce inconsistent
behavior across integrations, and in some cases lose required provider
metadata such as Gemini thought signatures.
## Changes
- Update the callback contract so streamed tokens can be either plain
text or structured content blocks
- Carry structured streamed content through tracing and event/log
streaming paths without forcing provider data into text too early
- Keep built-in text-oriented streaming callbacks working by converting
structured tokens only at the display/queue boundary
- Drop the now-incorrect `cast("str", ...)` on streamed content in
`BaseChatModel` so the producer side matches the widened callback
signature instead of asserting a string it doesn't always have (no
runtime change — `cast` is erased)
- Align Anthropic and Mistral content typing with the structured content
shapes already used by chat model messages
- Update callback tests to reflect that not every streamed value is text
## Compatibility
No runtime behavior change: no producer emits anything it wasn't already
emitting, and widening a parameter type is safe for existing callers and
handlers that pass or receive `str`. The one caveat is downstream code
that subclasses a callback handler or tracer and overrides
`on_llm_new_token` with a `token: str` annotation — under strict type
checking that override is now narrower than the base and will be flagged
as incompatible with the supertype. Such code still runs unchanged; the
fix is to widen the annotation to match.
`Runnable.__or__`, `Runnable.__ror__`, and their `RunnableSequence` and
`StructuredPrompt` overrides previously erased composition types: the
right-hand operand was typed `Runnable[Any, Other]`, so piping two
runnables together always produced `RunnableSerializable[Input, Any]`.
Type information was lost at every `|`, which is why chains so often
needed a `chain: Runnable = ...` annotation just to recover usable
inference.
This adds `@overload`s so the `Output` of one step flows into the
`Input` of the next and the composed result carries the real `Output`
type through. `Runnable[int, str] | Runnable[str, float]` now infers
`RunnableSerializable[int, float]` instead of `[int, Any]`.
`coerce_to_runnable` gains overloads so a `Mapping` resolves to
`RunnableParallel` while everything else stays a `Runnable`. As a
knock-on effect, dozens of now-unnecessary `: Runnable` annotations were
dropped from the test suite.
Runtime behavior is unchanged — this is a typing-only change.
## Impact on type-checked code
Most users will simply get better inference. Two changes can require a
small adjustment if you run a type checker (`mypy`, `pyright`):
### Stricter operand matching in `|`
The right-hand side of `|` is now typed `Runnable[Output, Other]` rather
than `Runnable[Any, Other]`, so the right operand's declared **input**
must match the left operand's **output**. This is more accurate, but it
surfaces a common pattern that was previously silent: piping a step that
outputs a plain `dict` into a step whose declared input is a more
specific type (for example a `TypedDict`). It still works at runtime;
the checker now reports an `[operator]` error.
If you hit this, narrow the boundary with a `cast` (or an explicit
annotation):
```python
from typing import Any, cast
from langchain_core.runnables import Runnable
# upstream outputs a dict; downstream declares a narrower input type
chain = cast("Runnable[Any, MyInput]", upstream) | downstream
```
### `list` → `Sequence` on `RunnableEach` / `map()`
`Runnable.map()` and the `invoke` / `ainvoke` methods of `RunnableEach`
now accept `Sequence[Input]` instead of `list[Input]`. Callers are
unaffected — a `list` is a `Sequence`, and tuples or other sequences now
type-check too. The only thing to adjust: if you **subclass**
`RunnableEach` (or `RunnableEachBase`) and override these methods with a
`list[...]` parameter, widen the annotation to `Sequence[...]` so the
override stays compatible with the base signature.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Simplify test for `create_agent` errors.
* Remove duplicate tests
* Test sync and async with common logic
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
In this order:
* used `@override` when overriding a parent method.
* prefixed param with `_` when the param could be renamed.
* used `*_args, **_kwargs` when it was not possible to rename (eg:
protocol)
* used `_ = some_variable` when the variable name is inspected (in
tools)
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Automated refresh of model profile data for all in-monorepo partner
integrations via `langchain-profiles refresh`.
🤖 Generated by the `refresh_model_profiles` workflow.
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
`handle_tool_error` callables can now return structured message content
as any valid sequence, not just a mutable `list`. Valid structured
sequences are normalized to the `ToolMessage` content shape at the tool
output boundary, while invalid content still falls back to
stringification.
## Changes
- Widened `ToolExceptionHandlerOutput` from `list[str | dict[str, Any]]`
to `Sequence[MessageContentBlock]` so handlers returning `list[dict[str,
Any]]` or tuple content blocks type-check cleanly.
- Added `_normalize_message_content` to validate structured message
content and convert valid non-string sequences to the `list` shape
expected by `ToolMessage`.
- Preserved existing stringification behavior for invalid structured
content blocks instead of treating failed normalization as `None`.
- Removed the now-unused `_is_message_content_type` helper; output
formatting validates content directly through
`_normalize_message_content`.
`handle_tool_error` callables can already return structured message
content at runtime, but the public typing only allowed strings. The tool
error handling API now reflects the existing output formatting path,
including clearer docs for how handled errors become
`ToolMessage(status="error")` results.
`SummarizationMiddleware._trigger_conditions` is now explicitly marked
as a temporary compatibility view for private consumers. The regression
test is tied to the package major version so the 2.0 release path fails
loudly until the legacy attr and test are removed.
`SummarizationMiddleware` now uses `_trigger_clauses` as the canonical
internal representation for AND/OR trigger evaluation while keeping
`_trigger_conditions` as a tuple-shaped compatibility view. This keeps
the new dict-style `TriggerClause` behavior intact without breaking
private consumers that still inspect the old tuple-normalized trigger
state.
## Changes
- Added `_trigger_clauses` as the source of truth for summarization
trigger evaluation, profile requirement checks, and compound AND clause
handling.
- Restored `_trigger_conditions` as a legacy compatibility projection
for tuple-expressible triggers, so tuple and single-key dict triggers
remain visible in the previous private shape.
- Avoided misrepresenting compound `TriggerClause` inputs like
`{"tokens": 1000, "messages": 5}` as independent OR-style tuple
conditions.
Closes#34442
[Docs](https://github.com/langchain-ai/docs/pull/4377)
---
Add parity with LangChain.js trigger semantics for Python
`SummarizationMiddleware`. `trigger` can now express AND conditions
within a single dict-style `TriggerClause` while preserving the existing
tuple and list-of-tuples behavior.
A simple user story: a support agent is helping debug an issue over a
long conversation. One tool call may return a large log snippet, briefly
pushing the token count over a limit, but the conversation is still only
a few messages long and the recent context is valuable. Separately, the
user may send many short follow-up messages that increase message count
without using much context.
With `trigger={"tokens": 4000, "messages": 10}`, both thresholds must be
met at the same time: at least 4,000 tokens and at least 10 messages.
This means 5,000 tokens across only 3 messages does not summarize, and
20 short messages totaling only 1,000 tokens does not summarize either.
Summarization waits until the conversation is large enough by both
measures, making it less likely to discard useful recent context too
early.
## Changes
- Add `TriggerClause` support so `trigger={"tokens": 4000, "messages":
10}` only summarizes when all configured thresholds are met
- Export `TriggerClause` from `langchain.agents.middleware` so users can
import and annotate dict-style trigger clauses from the public
middleware entrypoint
- Normalize tuple and mapping trigger inputs through
`_normalize_trigger`, preserving existing `ContextSize` tuple semantics
as single-condition clauses
- Defensively copy mutable trigger list and dict inputs during
initialization so caller-side mutations do not change the middleware's
stored public configuration after construction
- Keep list inputs as OR semantics across clauses, including mixed lists
like `[{"tokens": 4000, "messages": 10}, ("messages", 50)]`
- Update `_should_summarize` to evaluate AND within each clause and OR
across clauses for `tokens`, `messages`, and `fraction`
- Update the docs and API link map so `TriggerClause` resolves in the
Python middleware docs
- Preserve tuple-trigger compatibility while allowing message-based
`keep` configurations to summarize at least one message when a trigger
fires near the cutoff boundary
AI assistance was used to help draft and refine this contribution.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Clarifies how `get_buffer_string` treats multimodal message content
across output formats. The docs now make the default prefix format's
text-only behavior explicit and point users to XML when they need
structured multimodal block representations.
This behavior may change in future iterations
## Summary
Follow-up to #37911 (released in `langchain-perplexity` 1.3.2). That PR
fixed the outbound `ToolMessage` / `AIMessage.tool_calls` serialization;
this one implements **`ChatPerplexity.bind_tools`**, which flips
`has_tool_calling` to `True` and lights up the full `langchain-tests`
standard tool-calling suite — the suite that would have caught #37911 in
the first place.
Verified live against the Perplexity Agent API (`openai/gpt-5.5`,
`use_responses_api=True`): a client-side function-tool round-trip
(invoke + stream) works end-to-end.
## Core change (the `bind_tools` work + the Responses-API follow-up)
- **`bind_tools`** mirrors `langchain-openai`: converts tools via
`convert_to_openai_tool`, normalizes `tool_choice`, and passes
Perplexity built-in tools (`web_search`, etc.) through unchanged.
- **`_to_responses_payload`** now translates tool turns into the
Responses (Agent) API's typed input items: `AIMessage.tool_calls` →
`function_call`, `ToolMessage` → `function_call_output`, and flattens
function tool specs. (The Responses API has no `tool` role, so this
translation is required for round-trips.)
## Changes required to make standard-suite tests pass on the Responses
route
- Streaming: `_convert_responses_stream_event_to_chunk` emits a
`tool_call_chunk` on `response.output_item.done` function calls —
required by `test_tool_calling` (which streams and asserts tool calls).
- `_content_to_text` reduces list-shaped assistant content to text in
the tool-call branch — required by `test_agent_loop` and
`test_tool_message_histories_list_content`.
- `response_metadata["model_name"]` on the Responses route, mirroring
Chat Completions — required by `test_usage_metadata` /
`test_usage_metadata_streaming` (used by `langchain_core` usage
callbacks).
## Tests
- `sonar` standard class marked `has_tool_calling=False` (the family
returns 400 "Tool calling is not supported for this model").
- New `TestPerplexityResponsesStandard` runs the full suite on
`openai/gpt-5.5` + `use_responses_api` with `has_tool_choice=False`:
**35 passed, 13 skipped, 2 xfailed**.
- The 2 xfails (`test_unicode_tool_call_integration`,
`test_structured_few_shot_examples`) hard-code `tool_choice="any"`. The
Responses (Agent) API does not support `tool_choice` (verified: every
form returns HTTP 200 without forcing a call), which `ChatPerplexity`
surfaces as `ValueError` — **existing behavior, unchanged here.**
Softening that to a warning can be a separate change.
`make format lint` clean; unit + standard tests green.
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Automated refresh of model profile data for all in-monorepo partner
integrations via `langchain-profiles refresh`.
🤖 Generated by the `refresh_model_profiles` workflow.
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
`dict()` is a problematic method name as it clashes with the builtin
`dict` used as a type annotation.
This PR replaces it with an `asdict` method (inspired by dataclasses).
It also fixes a few places where `dict` must be replaced by
`builtins.dict` until the `dict()` method is removed.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
The custom VCR serializer pipes the cassette dict through
`yaml.safe_dump`, which raises on stream objects — so any request with
an `io.BytesIO` body (multipart/file-upload endpoints) couldn't be
recorded. A new `_coerce_bytesio` helper walks the cassette and replaces
each `BytesIO` with its raw bytes before dumping.
Automated refresh of model profile data for all in-monorepo partner
integrations via `langchain-profiles refresh`.
🤖 Generated by the `refresh_model_profiles` workflow.
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Automated refresh of model profile data for all in-monorepo partner
integrations via `langchain-profiles refresh`.
🤖 Generated by the `refresh_model_profiles` workflow.
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>