`ChatMistralAI` now supports `stop` sequences.
Previously, a `stop` value passed to the model was silently discarded:
the code carried a stale "not yet supported" note, dropped the parameter
before the request, and logged a warning. Mistral's chat completions API
does accept `stop` (a string or list of strings, up to 4 sequences), so
anyone setting `stop` and expecting generation to halt was getting no
effect.
Now `stop` is a first-class parameter. It can be set on the constructor
(`ChatMistralAI(stop=[...])`) or per call (`model.invoke(prompt,
stop=[...])`) and is forwarded to the API. A per-call value overrides
the instance default, and an empty list is treated as "no stop
sequences" — omitted from the request rather than sent as an empty array
(which the API rejects).
Verified against the live Mistral API: with `stop=["5"]`, "Count from 1
to 10" returns `1 2 3 4 ` instead of the full sequence. The 422
`extra_forbidden` response the API returns for genuinely unknown fields
confirms `stop` is a real schema field, not silently ignored.
This PR also folds in some test hygiene: the base-URL env test uses
`monkeypatch.setenv` so `MISTRAL_BASE_URL=boo` no longer leaks into
later serialization tests, and `test_extra_kwargs` asserts the
intentional unknown-kwarg warning with `pytest.warns`.
## Review notes
- Behavior change worth a careful look: `stop` now reaches the API
instead of being dropped. This changes request payloads for anyone
previously passing `stop`. It is the intended fix, but flagging it
explicitly.
- Coverage: `test_stop_sequence` (integration) exercises the end-to-end
behavior; unit tests cover parameter wiring, per-call-vs-instance
precedence, and the empty-list case.
Partner unit tests now reflect the warning behavior emitted by updated
`langchain-core` serialization and model initialization paths.
Warning-strict runs can stay focused on the behavior under test rather
than expected framework warnings.
Warning-producing test paths now either exercise the intended Anthropic
model branch or explicitly assert expected warnings. That keeps `make
test` output clean while preserving coverage for backwards-compatible
parameters, deprecated `AnthropicLLM`, and standard structured-output
behavior.
Anthropic unit tests now pin the expected API base URL where
serialization and initialization assertions depend on it. That keeps
local gateway settings like `ANTHROPIC_BASE_URL` from changing snapshot
output or default URL assertions during development.
Supersedes #34727Closes#30703
Related:
* langchain-ai/langchain-google#1460
* langchain-ai/langchain-google#1501
Fixing this at the `langchain-core` callback layer instead of
normalizing inside individual provider integrations, so structured
streaming content is preserved consistently.
---
Models are increasingly streaming structured content blocks instead of
plain text tokens. For example, Gemini 3 can stream text as
content-block lists, and Anthropic/tool-use flows can also produce
non-text message content. Today those values already reach
`on_llm_new_token`, but the callback API still advertises `token: str`,
which makes custom callbacks, tracers, and streaming helpers assume
every streamed value is text.
User story: as a LangChain user building a streaming callback for chat
models with tool calls, reasoning/thinking blocks, or provider-specific
structured content, I need `on_llm_new_token` to accept the same content
shape that chat model chunks can actually emit, so my callback can
observe the stream without providers flattening or dropping non-text
data.
Fixing this in `langchain-core` makes the existing runtime behavior
explicit at the shared callback boundary. Normalizing content blocks
inside each provider would duplicate logic, produce inconsistent
behavior across integrations, and in some cases lose required provider
metadata such as Gemini thought signatures.
## Changes
- Update the callback contract so streamed tokens can be either plain
text or structured content blocks
- Carry structured streamed content through tracing and event/log
streaming paths without forcing provider data into text too early
- Keep built-in text-oriented streaming callbacks working by converting
structured tokens only at the display/queue boundary
- Drop the now-incorrect `cast("str", ...)` on streamed content in
`BaseChatModel` so the producer side matches the widened callback
signature instead of asserting a string it doesn't always have (no
runtime change — `cast` is erased)
- Align Anthropic and Mistral content typing with the structured content
shapes already used by chat model messages
- Update callback tests to reflect that not every streamed value is text
## Compatibility
No runtime behavior change: no producer emits anything it wasn't already
emitting, and widening a parameter type is safe for existing callers and
handlers that pass or receive `str`. The one caveat is downstream code
that subclasses a callback handler or tracer and overrides
`on_llm_new_token` with a `token: str` annotation — under strict type
checking that override is now narrower than the base and will be flagged
as incompatible with the supertype. Such code still runs unchanged; the
fix is to widen the annotation to match.
`Runnable.__or__`, `Runnable.__ror__`, and their `RunnableSequence` and
`StructuredPrompt` overrides previously erased composition types: the
right-hand operand was typed `Runnable[Any, Other]`, so piping two
runnables together always produced `RunnableSerializable[Input, Any]`.
Type information was lost at every `|`, which is why chains so often
needed a `chain: Runnable = ...` annotation just to recover usable
inference.
This adds `@overload`s so the `Output` of one step flows into the
`Input` of the next and the composed result carries the real `Output`
type through. `Runnable[int, str] | Runnable[str, float]` now infers
`RunnableSerializable[int, float]` instead of `[int, Any]`.
`coerce_to_runnable` gains overloads so a `Mapping` resolves to
`RunnableParallel` while everything else stays a `Runnable`. As a
knock-on effect, dozens of now-unnecessary `: Runnable` annotations were
dropped from the test suite.
Runtime behavior is unchanged — this is a typing-only change.
## Impact on type-checked code
Most users will simply get better inference. Two changes can require a
small adjustment if you run a type checker (`mypy`, `pyright`):
### Stricter operand matching in `|`
The right-hand side of `|` is now typed `Runnable[Output, Other]` rather
than `Runnable[Any, Other]`, so the right operand's declared **input**
must match the left operand's **output**. This is more accurate, but it
surfaces a common pattern that was previously silent: piping a step that
outputs a plain `dict` into a step whose declared input is a more
specific type (for example a `TypedDict`). It still works at runtime;
the checker now reports an `[operator]` error.
If you hit this, narrow the boundary with a `cast` (or an explicit
annotation):
```python
from typing import Any, cast
from langchain_core.runnables import Runnable
# upstream outputs a dict; downstream declares a narrower input type
chain = cast("Runnable[Any, MyInput]", upstream) | downstream
```
### `list` → `Sequence` on `RunnableEach` / `map()`
`Runnable.map()` and the `invoke` / `ainvoke` methods of `RunnableEach`
now accept `Sequence[Input]` instead of `list[Input]`. Callers are
unaffected — a `list` is a `Sequence`, and tuples or other sequences now
type-check too. The only thing to adjust: if you **subclass**
`RunnableEach` (or `RunnableEachBase`) and override these methods with a
`list[...]` parameter, widen the annotation to `Sequence[...]` so the
override stays compatible with the base signature.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Simplify test for `create_agent` errors.
* Remove duplicate tests
* Test sync and async with common logic
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
In this order:
* used `@override` when overriding a parent method.
* prefixed param with `_` when the param could be renamed.
* used `*_args, **_kwargs` when it was not possible to rename (eg:
protocol)
* used `_ = some_variable` when the variable name is inspected (in
tools)
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Automated refresh of model profile data for all in-monorepo partner
integrations via `langchain-profiles refresh`.
🤖 Generated by the `refresh_model_profiles` workflow.
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
The release workflow's `mark-release` job downloads built wheels to
`<package>/dist/` but told `ncipollo/release-action` to glob `dist/*`.
Because JS actions don't honor `defaults.run.working-directory`, that
pattern resolved against the repo root, matched nothing, and logged
`Artifact pattern :dist/* did not match any files`. The warning is
non-fatal, so tags and releases were still created — just with no assets
attached. Verified across published releases (`langchain-groq`,
`langchain-core`, `langchain-openai`, `langchain-anthropic`): every one
has an empty asset list.
Closes#37997
Forked repositories with Actions enabled currently run the scheduled
model profile refresh without access to the GitHub App secrets used to
open the automated PR. Guarding the job to the `langchain-ai` owner
prevents noisy daily failures on forks while preserving the scheduled
refresh for the main repository.
## Changes
- Added a repository-owner guard to the `refresh-profiles` job so
`refresh_model_profiles` only runs under `langchain-ai`.
- Kept the existing reusable workflow invocation and bot secret wiring
unchanged for the canonical repository.
PR authors get clearer guidance for writing descriptions that reviewers
can understand quickly. The template and contributor guidance now ask
for a short explanation of who benefits, what problem they had, and how
the change solves it instead of a generic summary.
`handle_tool_error` callables can now return structured message content
as any valid sequence, not just a mutable `list`. Valid structured
sequences are normalized to the `ToolMessage` content shape at the tool
output boundary, while invalid content still falls back to
stringification.
## Changes
- Widened `ToolExceptionHandlerOutput` from `list[str | dict[str, Any]]`
to `Sequence[MessageContentBlock]` so handlers returning `list[dict[str,
Any]]` or tuple content blocks type-check cleanly.
- Added `_normalize_message_content` to validate structured message
content and convert valid non-string sequences to the `list` shape
expected by `ToolMessage`.
- Preserved existing stringification behavior for invalid structured
content blocks instead of treating failed normalization as `None`.
- Removed the now-unused `_is_message_content_type` helper; output
formatting validates content directly through
`_normalize_message_content`.
`handle_tool_error` callables can already return structured message
content at runtime, but the public typing only allowed strings. The tool
error handling API now reflects the existing output formatting path,
including clearer docs for how handled errors become
`ToolMessage(status="error")` results.
`SummarizationMiddleware._trigger_conditions` is now explicitly marked
as a temporary compatibility view for private consumers. The regression
test is tied to the package major version so the 2.0 release path fails
loudly until the legacy attr and test are removed.
`SummarizationMiddleware` now uses `_trigger_clauses` as the canonical
internal representation for AND/OR trigger evaluation while keeping
`_trigger_conditions` as a tuple-shaped compatibility view. This keeps
the new dict-style `TriggerClause` behavior intact without breaking
private consumers that still inspect the old tuple-normalized trigger
state.
## Changes
- Added `_trigger_clauses` as the source of truth for summarization
trigger evaluation, profile requirement checks, and compound AND clause
handling.
- Restored `_trigger_conditions` as a legacy compatibility projection
for tuple-expressible triggers, so tuple and single-key dict triggers
remain visible in the previous private shape.
- Avoided misrepresenting compound `TriggerClause` inputs like
`{"tokens": 1000, "messages": 5}` as independent OR-style tuple
conditions.
Closes#34442
[Docs](https://github.com/langchain-ai/docs/pull/4377)
---
Add parity with LangChain.js trigger semantics for Python
`SummarizationMiddleware`. `trigger` can now express AND conditions
within a single dict-style `TriggerClause` while preserving the existing
tuple and list-of-tuples behavior.
A simple user story: a support agent is helping debug an issue over a
long conversation. One tool call may return a large log snippet, briefly
pushing the token count over a limit, but the conversation is still only
a few messages long and the recent context is valuable. Separately, the
user may send many short follow-up messages that increase message count
without using much context.
With `trigger={"tokens": 4000, "messages": 10}`, both thresholds must be
met at the same time: at least 4,000 tokens and at least 10 messages.
This means 5,000 tokens across only 3 messages does not summarize, and
20 short messages totaling only 1,000 tokens does not summarize either.
Summarization waits until the conversation is large enough by both
measures, making it less likely to discard useful recent context too
early.
## Changes
- Add `TriggerClause` support so `trigger={"tokens": 4000, "messages":
10}` only summarizes when all configured thresholds are met
- Export `TriggerClause` from `langchain.agents.middleware` so users can
import and annotate dict-style trigger clauses from the public
middleware entrypoint
- Normalize tuple and mapping trigger inputs through
`_normalize_trigger`, preserving existing `ContextSize` tuple semantics
as single-condition clauses
- Defensively copy mutable trigger list and dict inputs during
initialization so caller-side mutations do not change the middleware's
stored public configuration after construction
- Keep list inputs as OR semantics across clauses, including mixed lists
like `[{"tokens": 4000, "messages": 10}, ("messages", 50)]`
- Update `_should_summarize` to evaluate AND within each clause and OR
across clauses for `tokens`, `messages`, and `fraction`
- Update the docs and API link map so `TriggerClause` resolves in the
Python middleware docs
- Preserve tuple-trigger compatibility while allowing message-based
`keep` configurations to summarize at least one message when a trigger
fires near the cutoff boundary
AI assistance was used to help draft and refine this contribution.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Clarifies how `get_buffer_string` treats multimodal message content
across output formats. The docs now make the default prefix format's
text-only behavior explicit and point users to XML when they need
structured multimodal block representations.
This behavior may change in future iterations
## Summary
Follow-up to #37911 (released in `langchain-perplexity` 1.3.2). That PR
fixed the outbound `ToolMessage` / `AIMessage.tool_calls` serialization;
this one implements **`ChatPerplexity.bind_tools`**, which flips
`has_tool_calling` to `True` and lights up the full `langchain-tests`
standard tool-calling suite — the suite that would have caught #37911 in
the first place.
Verified live against the Perplexity Agent API (`openai/gpt-5.5`,
`use_responses_api=True`): a client-side function-tool round-trip
(invoke + stream) works end-to-end.
## Core change (the `bind_tools` work + the Responses-API follow-up)
- **`bind_tools`** mirrors `langchain-openai`: converts tools via
`convert_to_openai_tool`, normalizes `tool_choice`, and passes
Perplexity built-in tools (`web_search`, etc.) through unchanged.
- **`_to_responses_payload`** now translates tool turns into the
Responses (Agent) API's typed input items: `AIMessage.tool_calls` →
`function_call`, `ToolMessage` → `function_call_output`, and flattens
function tool specs. (The Responses API has no `tool` role, so this
translation is required for round-trips.)
## Changes required to make standard-suite tests pass on the Responses
route
- Streaming: `_convert_responses_stream_event_to_chunk` emits a
`tool_call_chunk` on `response.output_item.done` function calls —
required by `test_tool_calling` (which streams and asserts tool calls).
- `_content_to_text` reduces list-shaped assistant content to text in
the tool-call branch — required by `test_agent_loop` and
`test_tool_message_histories_list_content`.
- `response_metadata["model_name"]` on the Responses route, mirroring
Chat Completions — required by `test_usage_metadata` /
`test_usage_metadata_streaming` (used by `langchain_core` usage
callbacks).
## Tests
- `sonar` standard class marked `has_tool_calling=False` (the family
returns 400 "Tool calling is not supported for this model").
- New `TestPerplexityResponsesStandard` runs the full suite on
`openai/gpt-5.5` + `use_responses_api` with `has_tool_choice=False`:
**35 passed, 13 skipped, 2 xfailed**.
- The 2 xfails (`test_unicode_tool_call_integration`,
`test_structured_few_shot_examples`) hard-code `tool_choice="any"`. The
Responses (Agent) API does not support `tool_choice` (verified: every
form returns HTTP 200 without forcing a call), which `ChatPerplexity`
surfaces as `ValueError` — **existing behavior, unchanged here.**
Softening that to a warning can be a separate change.
`make format lint` clean; unit + standard tests green.
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Release validation now covers the release paths that were intended but
not actually exercised. Manual core and `langchain_v1` releases use
short dropdown inputs, so the dependent-package test gate needs to match
those values in addition to full `libs/...` paths.
Automated refresh of model profile data for all in-monorepo partner
integrations via `langchain-profiles refresh`.
🤖 Generated by the `refresh_model_profiles` workflow.
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
`dict()` is a problematic method name as it clashes with the builtin
`dict` used as a type annotation.
This PR replaces it with an `asdict` method (inspired by dataclasses).
It also fixes a few places where `dict` must be replaced by
`builtins.dict` until the `dict()` method is removed.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
The custom VCR serializer pipes the cassette dict through
`yaml.safe_dump`, which raises on stream objects — so any request with
an `io.BytesIO` body (multipart/file-upload endpoints) couldn't be
recorded. A new `_coerce_bytesio` helper walks the cassette and replaces
each `BytesIO` with its raw bytes before dumping.
Automated refresh of model profile data for all in-monorepo partner
integrations via `langchain-profiles refresh`.
🤖 Generated by the `refresh_model_profiles` workflow.
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Automated refresh of model profile data for all in-monorepo partner
integrations via `langchain-profiles refresh`.
🤖 Generated by the `refresh_model_profiles` workflow.
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Fixes#37912
`ChatPerplexity._convert_message_to_dict` raises `TypeError` on
`ToolMessage` and drops `AIMessage.tool_calls`, which breaks
tool-message round-trips through `ChatPerplexity` — a client-side
tool-calling loop, or a shared message history across providers via
`RunnableWithFallbacks`.
Repro:
```python
from langchain_perplexity import ChatPerplexity
from langchain_core.messages import ToolMessage
ChatPerplexity(model="sonar")._convert_message_to_dict(
ToolMessage(content="result", tool_call_id="call_1")
)
# TypeError: Got unknown type content='result' tool_call_id='call_1'
```
An `AIMessage` carrying `tool_calls` also serializes to `{"role":
"assistant", "content": ...}` with the `tool_calls` silently dropped.
This brings the converter to parity with `langchain-openai`: serialize
`tool_calls` / `invalid_tool_calls`, send `content` as `null` when
tool_calls are present, and add a `tool`-role branch for `ToolMessage`.
How I verified: added unit tests for the `ToolMessage` and
`AIMessage.tool_calls` / `invalid_tool_calls` cases; the perplexity
package unit tests, lint, and format all pass.
Scope: translating these to the Responses (Agent) API's `function_call`
/ `function_call_output` input items is a separate follow-up; this PR is
the Chat Completions serialization parity fix.
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
The `🚀 Publish to PyPI` job no longer starts until `🔄 Test prior
partners against new core` finishes. Previously that dependency was
commented out, so a core release could publish to PyPI in parallel
with—or before—the integration tests that install the new unreleased
core against already-published partner packages, defeating their purpose
as a pre-publish gate.
## Changes
- Add `test-prior-published-packages-against-new-core` to the `publish`
job's `needs`, so publishing blocks on those partner integration tests
completing.
- The existing `if: ${{ !cancelled() && !failure() }}` guard is
unchanged: publish proceeds only if the gate **succeeded or was
skipped**, and fails closed if the partner tests fail. For non-core
releases the partner-test job short-circuits with `exit 0`, so this adds
no friction outside `libs/core` releases.