Release validation now covers more of the compatibility surface before
packages ship. The release dependency check also handles coordinated
monorepo version bumps explicitly, so release PRs can verify
published-package installability without failing on sibling package
versions that will be published by the same PR.
## Changes
- Run partner package unit tests, not just integration tests, when
validating previously published partner packages against the newly built
`langchain-core` wheel.
- Treat dependencies satisfied by package versions introduced in the
same release PR as expected unpublished siblings, stripping only those
pins before resolving runtime dependencies against PyPI.
- Compare changed manifests against the PR base to detect same-PR
package version bumps using static `[project]` metadata and
canonicalized package names.
- Preserve resolver-affecting `[tool.uv]` settings such as `prerelease`,
`constraint-dependencies`, and `override-dependencies` in the filtered
manifest while still dropping workspace sources.
- Add a maintainer bypass label, `release-deps: acknowledged`, for
reviewed coordinated releases that intentionally fall outside the
detected same-PR bump path.
- Surface captured resolver output and distinguish likely transient
PyPI/index failures from unsatisfiable dependency pins.
Chroma, Exa, Nomic, and Qdrant now expose package `__version__` values
through package-local `_version.py` modules, matching the version-file
pattern used by the other partner packages. Each package also gets a
`check_version` target so release version drift between `pyproject.toml`
and runtime exports is caught consistently.
Following on the heels of #35293
TODO:
- Packages outside of this repo (e.g. LiteLLM, Nvidia, Google, AWS)
---
## Summary
Surface partner package versions in `metadata.versions` on LangSmith
traces. Mirrors the JS SDK's `_addVersion()` pattern
([langchainjs#10106](https://github.com/langchain-ai/langchainjs/pull/10106)).
Each model constructor records its package version via `_add_version()`
on `BaseLanguageModel`. The version dict accumulates through the class
hierarchy — `langchain-core` is added in
`BaseLanguageModel.model_post_init`, `langchain-openai` in
`BaseChatOpenAI._set_openai_chat_version`, and each leaf partner in its
uniquely-named `model_validator`. Traces end up with:
```json
{
"metadata": {
"versions": {
"langchain-core": "1.4.5",
"langchain-openai": "1.3.0",
"langchain-xai": "1.2.2"
}
}
}
```
### Changes
- `BaseLanguageModel._add_version(pkg, version)` — appends to
`self.metadata["versions"]`; accepts any `Mapping` type; emits a warning
if a non-mapping value is found and replaced
- `BaseLanguageModel.model_post_init` — adds `langchain-core` version;
calls `super()` for MRO safety
- `_merge_metadata_dicts` — one-level-deep (non-recursive) merge for
nested dict metadata keys
- `CallbackManager.add_metadata` — uses `_merge_metadata_dicts` instead
of flat `dict.update()` so nested metadata dicts (like `versions`)
coexist rather than clobber
- `merge_configs` — uses `_merge_metadata_dicts` for config merging
**Partners:**
- Each now calls `self._add_version("langchain-<pkg>", __version__)`
### Design decisions
- **Constructor-based, not `_get_ls_params`-based** — versions flow
through `self.metadata` (local metadata on traces), not through
`LangSmithParams`. This matches JS and makes child-class version
inheritance automatic (no merge/clobber issues).
- **`versions` is local (non-inheritable) metadata** — `self.metadata`
is passed to `CallbackManager.configure` as `local_metadata`
(`add_metadata(..., inherit=False)`), so `versions` is attached **once
per chat-model run** and is **not** propagated to child runs or
duplicated onto every streaming chunk. This is intentionally the
opposite of the inheritable-per-chunk metadata that #36588 was reducing
for performance — `versions` does not regress that path.
- **`add_metadata` deep-merge is a correctness fix, not just for
versions** — previously `add_metadata`/`merge_configs` did a flat
top-level `dict.update`/spread, so any nested metadata dict baked into a
config (e.g. via `.with_config({"metadata": {...}})`) would be wholly
replaced when a caller also passed `metadata`. `_merge_metadata_dicts`
merges one level deep so user-provided `config.metadata.versions` and
model-set `versions` coexist instead of clobbering. The merge runs once
per `configure` (not per chunk), so it is off the streaming hot path.
- **One level deep only** — `_merge_metadata_dicts` is deliberately
*not* a recursive deep merge; values nested more than one level are
last-writer-wins. This covers the `versions` case without the
ambiguity/cost of arbitrary-depth merging.
- **Warn on non-dict `metadata["versions"]`** — if a user sets
`metadata={"versions": "some-string"}`, `_add_version` emits a warning
and replaces the value with the version dict rather than silently
discarding user data or crashing. This is a soft breaking change for
anyone who previously stored non-dict values at this key.
### Follow-ups (tracked separately, out of scope here)
- JS `mergeConfigs` still flat-spreads nested metadata, so
`metadata.versions` can still clobber on the JS side until an equivalent
deep-merge lands.
---
Made by [Open SWE](https://openswe.vercel.app)
---------
Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>
OpenAI Chat Completions streaming has a v1 normalization gap when tool
calls are streamed.
When users opt into `output_version="v1"`, `.content_blocks` is expected
to be the normalized cross-provider view of the message. For OpenAI Chat
Completions streams, though, chunks still carry raw string `content`
plus side-channel `tool_call_chunks` / `tool_calls`.
Practically, an OpenAI stream chunk can look like this internally:
```python
AIMessageChunk(
content="",
tool_call_chunks=[
{
"name": "get_weather",
"args": '{"location": "SF"}',
"id": "call_123",
"index": 0,
"type": "tool_call_chunk",
}
],
response_metadata={"model_provider": "openai", "output_version": "v1"},
)
```
That is not already-normalized v1 content like this:
```python
AIMessageChunk(
content=[
{
"type": "tool_call_chunk",
"name": "get_weather",
"args": '{"location": "SF"}',
"id": "call_123",
"index": 0,
}
],
)
```
Because `.content_blocks` currently short-circuits solely on
`output_version="v1"`, it can return the raw string/empty list directly
instead of running the OpenAI translator that incorporates
`tool_call_chunks` / `tool_calls` into normalized v1 blocks.
In practice, a streamed OpenAI tool call can be parsed successfully into
`tool_calls`, but still be missing from the final aggregated
`.content_blocks`. Downstream code that consumes the v1 block interface
then sees no `tool_call` block and must know to inspect OpenAI-specific
chunk fields instead.
User story:
> As a LangChain user streaming OpenAI Chat Completions with bound tools
and `output_version="v1"`, I need the final aggregated message's
`.content_blocks` to include normalized `tool_call` blocks, so that code
written against the v1 content-block interface handles streamed tool
calls consistently across providers.
Expected final aggregated view:
```python
message.content_blocks == [
{
"type": "tool_call",
"name": "get_weather",
"args": {"location": "SF"},
"id": "call_123",
}
]
```
Root causes:
1. The usage-only Chat Completions chunk uses `content=[]` in v1 mode
while normal streaming chunks use `content=""`, creating inconsistent
content types during chunk aggregation.
2. `AIMessage.content_blocks` and `AIMessageChunk.content_blocks` treat
any `output_version="v1"` message as already-normalized, even when
`content` is still raw string content from Chat Completions.
3. Content-bearing OpenAI stream chunks do not carry
`output_version="v1"`, so the final merged chunk may not reliably take
the v1 normalization path.
Changes:
- Keep usage-only Chat Completions chunks as `content=""` instead of
overriding to `[]`, so streaming chunks merge consistently.
- Propagate `output_version="v1"` to content-bearing chunks.
- Only short-circuit v1 `.content_blocks` when `content` is already a
list of blocks; otherwise fall through to the provider translator.
- Add regression tests covering string-content v1 fallback, usage-only
chunk content consistency, and streamed tool calls appearing as
normalized final v1 blocks.
Originally a narrow bump of mypy to `1.20` in four packages. Expanded to
get the whole monorepo onto a single, current mypy and a consistent
type-check configuration, so contributors no longer hit different mypy
versions and divergent behavior depending on which package they touch.
### What changed
- **Unified the mypy pin to `>=2.1.0,<2.2.0`** in every mypy-using
package (6 libs + 14 partners), replacing the previously scattered pins
(`1.10`/`1.17`/`1.18`/`1.19`/`1.20`, with assorted upper bounds).
- **Unified the `[tool.mypy]` base per tier:**
- libs: `plugins = ["pydantic.mypy"]`, `strict = true`,
`enable_error_code = "deprecated"`, `warn_unreachable = true`
- partners: `disallow_untyped_defs = true`
- Normalized style (`disallow_untyped_defs = "True"` string → bool,
quote/key consistency).
- **Fixed the 20 real errors** mypy 2.1 surfaces: `redundant-cast` from
improved narrowing (`core`, `langchain-classic`), a `var-annotated` for
`_LOGGED`, a return-type widening in `langchain-groq`'s
`_convert_from_v1_to_groq` (it can legitimately return a bare `str`),
and stale `type-arg`/`unused-ignore` in `langchain-model-profiles`
tests.
### Deliberate non-uniformity (documented inline in the relevant
`pyproject.toml`s)
Going fully byte-identical would surface ~196 additional errors that are
*not* real bugs, so two settings are kept package-appropriate:
- **`warn_unreachable`** is enabled on every strict lib **except
`core`**, where it false-flags intentional defensive code — including
the SSRF / IP-policy guards in `_security/` — as unreachable.
- **`pydantic.mypy` plugin** is used only on `anthropic` and
`perplexity` (their code is authored against it and reports ~99/~132
errors without it). It is *not* added to the other partners, where it
only flags the public alias constructor API (e.g. `ChatGroq(model=...)`)
in tests rather than finding bugs.
- **`ollama`** is left on its `ty` type checker; it does not use mypy.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Release jobs can now be given an expected package version and will stop
early if that value does not match the package metadata. The workflow
also checks PyPI for an existing normalized release so duplicate version
attempts fail before build and publish steps continue.
As a LangChain user streaming a tool-calling model, I expect each
streamed chunk to expose structured `tool_call_chunk` content blocks so
I can render or process tool calls live, instead of waiting for the
final aggregated message.
This adds `tool_call_streaming` to `ModelProfile` and uses it in the
standard chat-model tool-calling tests. When a model profile opts in,
`test_tool_calling` and `test_tool_calling_async` now validate that at
least one streamed chunk includes a `tool_call_chunk` block via
`content_blocks`, while preserving the existing final-message
validation.
This keeps the contract profile-gated so providers can opt in once their
streaming chunk shape is verified. This PR opts in the providers
verified by smoke testing with straightforward profile coverage: OpenAI,
Anthropic, Fireworks, HuggingFace, OpenRouter, DeepSeek, and xAI. The
generated profile artifacts are refreshed so runtime profiles expose the
new capability flag.
Perplexity Responses also passed the smoke test, but its current profile
data is for the `sonar` family while the Responses smoke path used a
routed model string. That profile strategy is left as follow-up.
MistralAI currently streams `.tool_call_chunks`, but its content-block
translator exposes a complete `tool_call` block instead of
`tool_call_chunk`, so it also stays out of this flag until that
integration is fixed.
Fixes#30870
When an `AsyncBaseTracer` with `_schema_format="original"` (the default)
is used with sync `llm.invoke()`, the `on_chat_model_start` to
`on_llm_start` fallback doesn't fire. The async handler returns a
coroutine instead of raising `NotImplementedError` synchronously, so it
bypasses the existing fallback logic and lands in `_run_coros`, which
only logs the error generically.
This fallback already works for sync handlers in sync context and async
handlers in async context. This PR closes the gap for async handlers in
sync context.
Fixes#35244
Users can write async agent middleware with `@wrap_model_call`, and
LangChain already supports that behavior at runtime by detecting
coroutine functions and wiring them to `awrap_model_call`.
However, the decorator's public typing currently describes only the sync
callable shape. As a result, valid async middleware is rejected by type
checkers such as mypy and ty, even though the same code runs correctly.
This updates the middleware decorator types so async `wrap_model_call`
and `wrap_tool_call` functions type-check consistently with their
runtime behavior. It also simplifies related callable aliases and uses
casts where `iscoroutinefunction` narrows the callable at runtime but
static type checkers cannot follow that narrowing.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Closes#37736
---
`EnsembleRetriever` normalizes retriever outputs to `Document` objects
in both `rank_fusion` (sync) and `arank_fusion` (async), but the two
methods used different conditions:
- `rank_fusion` wraps only bare strings: `isinstance(doc, str)`
- `arank_fusion` wrapped anything that isn't a `Document`: `not
isinstance(doc, Document)`
If a retriever returns a non-string, non-`Document` value through the
async path, `arank_fusion` would try to construct
`Document(page_content=<non-string>)` and Pydantic raises a
`ValidationError`. The sync path handles the same input without crashing
— the behavior is inconsistent.
The fix is a one-line change in `arank_fusion` to use `isinstance(doc,
str)`, matching the sync path exactly.
Three tests were added to `test_ensemble.py`:
- `test_rank_fusion_bare_strings` — sync path wraps bare strings into
Documents
- `test_arank_fusion_bare_strings` — async path wraps bare strings into
Documents
- `test_arank_fusion_matches_rank_fusion` — sync and async return
identical results for normal Document input
---
This continues the work from #37737 by @AliMuhammadAslam (credited as
co-author), rebased onto `master` with the type-check lint failure
resolved. Supersedes that PR.
Co-authored-by: AliMuhammadAslam <aaalimohdaslam@gmail.com>
Provider-native structured output fallback detection now uses bounded
model-name patterns instead of broad substring checks, reducing false
positives for unrelated model IDs. The model examples and test fixtures
across OpenAI/OpenRouter-facing code were refreshed around current
OpenAI model families while preserving shipped defaults.
## Changes
- Tightened `FALLBACK_MODELS_WITH_STRUCTURED_OUTPUT` from loose string
fragments to regex patterns, with `_supports_provider_strategy` matching
full model-name segments instead of arbitrary substrings.
- Expanded structured-output fallback coverage for newer OpenAI,
Anthropic, and xAI/Grok model families, including `gpt-5.x`, newer
Claude 4/5-style names, and `grok-build`.
- Reused `_attempt_infer_model_provider` in provider tool search routing
so `_provider_from_model_name` follows the same provider inference
behavior as `init_chat_model`.
- Suppressed irrelevant provider-inference deprecation warnings during
provider tool search registry lookup.
- Refreshed OpenAI, Azure OpenAI, OpenRouter, core metadata, and example
model references from older fixtures like `gpt-4`, `gpt-4o`, `o1`, and
`o4-mini` to current test/profile models such as `gpt-5.5`,
`gpt-5-nano`, and `gpt-4.1-mini`.
- Removed outdated OpenAI test assumptions around legacy `o1` behavior
and narrowed legacy structured-output checks to explicitly legacy model
names.
Release PRs now get an earlier dependency-resolution check before merge,
catching unpublished intra-monorepo pins before they can produce broken
wheel metadata. The minimum-version helper also fails with a clear error
when no published PyPI version satisfies a declared constraint, instead
of emitting an invalid `pkg==None` requirement.
`ChatMistralAI` now supports `stop` sequences.
Previously, a `stop` value passed to the model was silently discarded:
the code carried a stale "not yet supported" note, dropped the parameter
before the request, and logged a warning. Mistral's chat completions API
does accept `stop` (a string or list of strings, up to 4 sequences), so
anyone setting `stop` and expecting generation to halt was getting no
effect.
Now `stop` is a first-class parameter. It can be set on the constructor
(`ChatMistralAI(stop=[...])`) or per call (`model.invoke(prompt,
stop=[...])`) and is forwarded to the API. A per-call value overrides
the instance default, and an empty list is treated as "no stop
sequences" — omitted from the request rather than sent as an empty array
(which the API rejects).
Verified against the live Mistral API: with `stop=["5"]`, "Count from 1
to 10" returns `1 2 3 4 ` instead of the full sequence. The 422
`extra_forbidden` response the API returns for genuinely unknown fields
confirms `stop` is a real schema field, not silently ignored.
This PR also folds in some test hygiene: the base-URL env test uses
`monkeypatch.setenv` so `MISTRAL_BASE_URL=boo` no longer leaks into
later serialization tests, and `test_extra_kwargs` asserts the
intentional unknown-kwarg warning with `pytest.warns`.
## Review notes
- Behavior change worth a careful look: `stop` now reaches the API
instead of being dropped. This changes request payloads for anyone
previously passing `stop`. It is the intended fix, but flagging it
explicitly.
- Coverage: `test_stop_sequence` (integration) exercises the end-to-end
behavior; unit tests cover parameter wiring, per-call-vs-instance
precedence, and the empty-list case.
Partner unit tests now reflect the warning behavior emitted by updated
`langchain-core` serialization and model initialization paths.
Warning-strict runs can stay focused on the behavior under test rather
than expected framework warnings.
Warning-producing test paths now either exercise the intended Anthropic
model branch or explicitly assert expected warnings. That keeps `make
test` output clean while preserving coverage for backwards-compatible
parameters, deprecated `AnthropicLLM`, and standard structured-output
behavior.
Anthropic unit tests now pin the expected API base URL where
serialization and initialization assertions depend on it. That keeps
local gateway settings like `ANTHROPIC_BASE_URL` from changing snapshot
output or default URL assertions during development.
Supersedes #34727Closes#30703
Related:
* langchain-ai/langchain-google#1460
* langchain-ai/langchain-google#1501
Fixing this at the `langchain-core` callback layer instead of
normalizing inside individual provider integrations, so structured
streaming content is preserved consistently.
---
Models are increasingly streaming structured content blocks instead of
plain text tokens. For example, Gemini 3 can stream text as
content-block lists, and Anthropic/tool-use flows can also produce
non-text message content. Today those values already reach
`on_llm_new_token`, but the callback API still advertises `token: str`,
which makes custom callbacks, tracers, and streaming helpers assume
every streamed value is text.
User story: as a LangChain user building a streaming callback for chat
models with tool calls, reasoning/thinking blocks, or provider-specific
structured content, I need `on_llm_new_token` to accept the same content
shape that chat model chunks can actually emit, so my callback can
observe the stream without providers flattening or dropping non-text
data.
Fixing this in `langchain-core` makes the existing runtime behavior
explicit at the shared callback boundary. Normalizing content blocks
inside each provider would duplicate logic, produce inconsistent
behavior across integrations, and in some cases lose required provider
metadata such as Gemini thought signatures.
## Changes
- Update the callback contract so streamed tokens can be either plain
text or structured content blocks
- Carry structured streamed content through tracing and event/log
streaming paths without forcing provider data into text too early
- Keep built-in text-oriented streaming callbacks working by converting
structured tokens only at the display/queue boundary
- Drop the now-incorrect `cast("str", ...)` on streamed content in
`BaseChatModel` so the producer side matches the widened callback
signature instead of asserting a string it doesn't always have (no
runtime change — `cast` is erased)
- Align Anthropic and Mistral content typing with the structured content
shapes already used by chat model messages
- Update callback tests to reflect that not every streamed value is text
## Compatibility
No runtime behavior change: no producer emits anything it wasn't already
emitting, and widening a parameter type is safe for existing callers and
handlers that pass or receive `str`. The one caveat is downstream code
that subclasses a callback handler or tracer and overrides
`on_llm_new_token` with a `token: str` annotation — under strict type
checking that override is now narrower than the base and will be flagged
as incompatible with the supertype. Such code still runs unchanged; the
fix is to widen the annotation to match.
`Runnable.__or__`, `Runnable.__ror__`, and their `RunnableSequence` and
`StructuredPrompt` overrides previously erased composition types: the
right-hand operand was typed `Runnable[Any, Other]`, so piping two
runnables together always produced `RunnableSerializable[Input, Any]`.
Type information was lost at every `|`, which is why chains so often
needed a `chain: Runnable = ...` annotation just to recover usable
inference.
This adds `@overload`s so the `Output` of one step flows into the
`Input` of the next and the composed result carries the real `Output`
type through. `Runnable[int, str] | Runnable[str, float]` now infers
`RunnableSerializable[int, float]` instead of `[int, Any]`.
`coerce_to_runnable` gains overloads so a `Mapping` resolves to
`RunnableParallel` while everything else stays a `Runnable`. As a
knock-on effect, dozens of now-unnecessary `: Runnable` annotations were
dropped from the test suite.
Runtime behavior is unchanged — this is a typing-only change.
## Impact on type-checked code
Most users will simply get better inference. Two changes can require a
small adjustment if you run a type checker (`mypy`, `pyright`):
### Stricter operand matching in `|`
The right-hand side of `|` is now typed `Runnable[Output, Other]` rather
than `Runnable[Any, Other]`, so the right operand's declared **input**
must match the left operand's **output**. This is more accurate, but it
surfaces a common pattern that was previously silent: piping a step that
outputs a plain `dict` into a step whose declared input is a more
specific type (for example a `TypedDict`). It still works at runtime;
the checker now reports an `[operator]` error.
If you hit this, narrow the boundary with a `cast` (or an explicit
annotation):
```python
from typing import Any, cast
from langchain_core.runnables import Runnable
# upstream outputs a dict; downstream declares a narrower input type
chain = cast("Runnable[Any, MyInput]", upstream) | downstream
```
### `list` → `Sequence` on `RunnableEach` / `map()`
`Runnable.map()` and the `invoke` / `ainvoke` methods of `RunnableEach`
now accept `Sequence[Input]` instead of `list[Input]`. Callers are
unaffected — a `list` is a `Sequence`, and tuples or other sequences now
type-check too. The only thing to adjust: if you **subclass**
`RunnableEach` (or `RunnableEachBase`) and override these methods with a
`list[...]` parameter, widen the annotation to `Sequence[...]` so the
override stays compatible with the base signature.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Simplify test for `create_agent` errors.
* Remove duplicate tests
* Test sync and async with common logic
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
In this order:
* used `@override` when overriding a parent method.
* prefixed param with `_` when the param could be renamed.
* used `*_args, **_kwargs` when it was not possible to rename (eg:
protocol)
* used `_ = some_variable` when the variable name is inspected (in
tools)
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Automated refresh of model profile data for all in-monorepo partner
integrations via `langchain-profiles refresh`.
🤖 Generated by the `refresh_model_profiles` workflow.
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
The release workflow's `mark-release` job downloads built wheels to
`<package>/dist/` but told `ncipollo/release-action` to glob `dist/*`.
Because JS actions don't honor `defaults.run.working-directory`, that
pattern resolved against the repo root, matched nothing, and logged
`Artifact pattern :dist/* did not match any files`. The warning is
non-fatal, so tags and releases were still created — just with no assets
attached. Verified across published releases (`langchain-groq`,
`langchain-core`, `langchain-openai`, `langchain-anthropic`): every one
has an empty asset list.
Closes#37997
Forked repositories with Actions enabled currently run the scheduled
model profile refresh without access to the GitHub App secrets used to
open the automated PR. Guarding the job to the `langchain-ai` owner
prevents noisy daily failures on forks while preserving the scheduled
refresh for the main repository.
## Changes
- Added a repository-owner guard to the `refresh-profiles` job so
`refresh_model_profiles` only runs under `langchain-ai`.
- Kept the existing reusable workflow invocation and bot secret wiring
unchanged for the canonical repository.
PR authors get clearer guidance for writing descriptions that reviewers
can understand quickly. The template and contributor guidance now ask
for a short explanation of who benefits, what problem they had, and how
the change solves it instead of a generic summary.
`handle_tool_error` callables can now return structured message content
as any valid sequence, not just a mutable `list`. Valid structured
sequences are normalized to the `ToolMessage` content shape at the tool
output boundary, while invalid content still falls back to
stringification.
## Changes
- Widened `ToolExceptionHandlerOutput` from `list[str | dict[str, Any]]`
to `Sequence[MessageContentBlock]` so handlers returning `list[dict[str,
Any]]` or tuple content blocks type-check cleanly.
- Added `_normalize_message_content` to validate structured message
content and convert valid non-string sequences to the `list` shape
expected by `ToolMessage`.
- Preserved existing stringification behavior for invalid structured
content blocks instead of treating failed normalization as `None`.
- Removed the now-unused `_is_message_content_type` helper; output
formatting validates content directly through
`_normalize_message_content`.
`handle_tool_error` callables can already return structured message
content at runtime, but the public typing only allowed strings. The tool
error handling API now reflects the existing output formatting path,
including clearer docs for how handled errors become
`ToolMessage(status="error")` results.
`SummarizationMiddleware._trigger_conditions` is now explicitly marked
as a temporary compatibility view for private consumers. The regression
test is tied to the package major version so the 2.0 release path fails
loudly until the legacy attr and test are removed.
`SummarizationMiddleware` now uses `_trigger_clauses` as the canonical
internal representation for AND/OR trigger evaluation while keeping
`_trigger_conditions` as a tuple-shaped compatibility view. This keeps
the new dict-style `TriggerClause` behavior intact without breaking
private consumers that still inspect the old tuple-normalized trigger
state.
## Changes
- Added `_trigger_clauses` as the source of truth for summarization
trigger evaluation, profile requirement checks, and compound AND clause
handling.
- Restored `_trigger_conditions` as a legacy compatibility projection
for tuple-expressible triggers, so tuple and single-key dict triggers
remain visible in the previous private shape.
- Avoided misrepresenting compound `TriggerClause` inputs like
`{"tokens": 1000, "messages": 5}` as independent OR-style tuple
conditions.
Closes#34442
[Docs](https://github.com/langchain-ai/docs/pull/4377)
---
Add parity with LangChain.js trigger semantics for Python
`SummarizationMiddleware`. `trigger` can now express AND conditions
within a single dict-style `TriggerClause` while preserving the existing
tuple and list-of-tuples behavior.
A simple user story: a support agent is helping debug an issue over a
long conversation. One tool call may return a large log snippet, briefly
pushing the token count over a limit, but the conversation is still only
a few messages long and the recent context is valuable. Separately, the
user may send many short follow-up messages that increase message count
without using much context.
With `trigger={"tokens": 4000, "messages": 10}`, both thresholds must be
met at the same time: at least 4,000 tokens and at least 10 messages.
This means 5,000 tokens across only 3 messages does not summarize, and
20 short messages totaling only 1,000 tokens does not summarize either.
Summarization waits until the conversation is large enough by both
measures, making it less likely to discard useful recent context too
early.
## Changes
- Add `TriggerClause` support so `trigger={"tokens": 4000, "messages":
10}` only summarizes when all configured thresholds are met
- Export `TriggerClause` from `langchain.agents.middleware` so users can
import and annotate dict-style trigger clauses from the public
middleware entrypoint
- Normalize tuple and mapping trigger inputs through
`_normalize_trigger`, preserving existing `ContextSize` tuple semantics
as single-condition clauses
- Defensively copy mutable trigger list and dict inputs during
initialization so caller-side mutations do not change the middleware's
stored public configuration after construction
- Keep list inputs as OR semantics across clauses, including mixed lists
like `[{"tokens": 4000, "messages": 10}, ("messages", 50)]`
- Update `_should_summarize` to evaluate AND within each clause and OR
across clauses for `tokens`, `messages`, and `fraction`
- Update the docs and API link map so `TriggerClause` resolves in the
Python middleware docs
- Preserve tuple-trigger compatibility while allowing message-based
`keep` configurations to summarize at least one message when a trigger
fires near the cutoff boundary
AI assistance was used to help draft and refine this contribution.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Clarifies how `get_buffer_string` treats multimodal message content
across output formats. The docs now make the default prefix format's
text-only behavior explicit and point users to XML when they need
structured multimodal block representations.
This behavior may change in future iterations
## Summary
Follow-up to #37911 (released in `langchain-perplexity` 1.3.2). That PR
fixed the outbound `ToolMessage` / `AIMessage.tool_calls` serialization;
this one implements **`ChatPerplexity.bind_tools`**, which flips
`has_tool_calling` to `True` and lights up the full `langchain-tests`
standard tool-calling suite — the suite that would have caught #37911 in
the first place.
Verified live against the Perplexity Agent API (`openai/gpt-5.5`,
`use_responses_api=True`): a client-side function-tool round-trip
(invoke + stream) works end-to-end.
## Core change (the `bind_tools` work + the Responses-API follow-up)
- **`bind_tools`** mirrors `langchain-openai`: converts tools via
`convert_to_openai_tool`, normalizes `tool_choice`, and passes
Perplexity built-in tools (`web_search`, etc.) through unchanged.
- **`_to_responses_payload`** now translates tool turns into the
Responses (Agent) API's typed input items: `AIMessage.tool_calls` →
`function_call`, `ToolMessage` → `function_call_output`, and flattens
function tool specs. (The Responses API has no `tool` role, so this
translation is required for round-trips.)
## Changes required to make standard-suite tests pass on the Responses
route
- Streaming: `_convert_responses_stream_event_to_chunk` emits a
`tool_call_chunk` on `response.output_item.done` function calls —
required by `test_tool_calling` (which streams and asserts tool calls).
- `_content_to_text` reduces list-shaped assistant content to text in
the tool-call branch — required by `test_agent_loop` and
`test_tool_message_histories_list_content`.
- `response_metadata["model_name"]` on the Responses route, mirroring
Chat Completions — required by `test_usage_metadata` /
`test_usage_metadata_streaming` (used by `langchain_core` usage
callbacks).
## Tests
- `sonar` standard class marked `has_tool_calling=False` (the family
returns 400 "Tool calling is not supported for this model").
- New `TestPerplexityResponsesStandard` runs the full suite on
`openai/gpt-5.5` + `use_responses_api` with `has_tool_choice=False`:
**35 passed, 13 skipped, 2 xfailed**.
- The 2 xfails (`test_unicode_tool_call_integration`,
`test_structured_few_shot_examples`) hard-code `tool_choice="any"`. The
Responses (Agent) API does not support `tool_choice` (verified: every
form returns HTTP 200 without forcing a call), which `ChatPerplexity`
surfaces as `ValueError` — **existing behavior, unchanged here.**
Softening that to a warning can be a separate change.
`make format lint` clean; unit + standard tests green.
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Release validation now covers the release paths that were intended but
not actually exercised. Manual core and `langchain_v1` releases use
short dropdown inputs, so the dependent-package test gate needs to match
those values in addition to full `libs/...` paths.