Commit Graph

782 Commits

Author SHA1 Message Date
Mason Daugherty
86ce95afc2 test(core,langchain): update tests for explicit deserialization allowlists (#38118)
Core serialization tests now opt into the object allowlists they rely on
instead of assuming default deserialization permits core objects.
Compatibility tests that intentionally exercise deprecated runnable
streaming and history APIs also suppress the expected deprecation
warnings so they can keep covering those legacy paths cleanly.

## Changes
- Updated serialization and prompt round-trip tests to pass
`allowed_objects="core"` or targeted allowlists when loading
`AIMessage`, prompt templates, structured prompts, runnable maps, and
related core objects.
- Adjusted secret-injection regression coverage to keep testing
`secrets_from_env=True` behavior while explicitly allowing core
deserialization paths.
- Tightened prompt deserialization rejection tests so attribute-access
payloads are loaded only through the specific prompt-template allowlist
needed to reach validation.
- Added module-level warning filters around legacy runnable
compatibility coverage for `astream_log`,
`astream_events(version="v1")`, and `RunnableWithMessageHistory`.
- Bumped the `langchain` package's minimum `langgraph` dependency from
`1.2.4` to `1.2.5`.

## Testing
- Updated unit tests across core serialization, prompt, fake chat model,
runnable history, and runnable event coverage.
2026-06-12 16:49:14 -04:00
Mason Daugherty
4108c0738c release(core): 1.4.7 (#38111)
Bumps `langchain-core` to `1.4.7` for the next patch release and updates
downstream minimum `langchain-core` requirements so package locks
resolve against the new core version.

This also refreshes the runnable snapshots that embed `lc_versions`
metadata so the version consistency check continues to validate
checked-in artifacts.

Validated with `python libs/core/scripts/check_version.py`, `uv lock
--check` across package lockfiles, and the core runnable tests that own
the updated snapshots with local LangSmith tracing env disabled.
2026-06-12 14:54:25 -04:00
Mason Daugherty
8837163917 fix(core,partners): rename package version trace metadata (#38110)
Package-version trace metadata now uses the LangChain-owned
`metadata["lc_versions"]` convention instead of the user-owned
`metadata["versions"]` key. Metadata merging is narrowed so only
`lc_versions` accumulates nested package-version entries, while generic
nested metadata keeps normal last-writer-wins behavior.

## Changes
- Renamed `BaseLanguageModel._add_version()` trace metadata from
`versions` to `lc_versions`, including docstrings and the non-dict
replacement warning.
- Scoped `_merge_metadata_dicts()` nested-map accumulation to only
`lc_versions`; duplicate package entries remain last-writer-wins and
`lc_versions` mappings are copied defensively.
- Preserved user-owned `metadata["versions"]` semantics by keeping it
out of package-version tracking and generic nested metadata merging.
- Updated runnable snapshots and partner package metadata assertions
across Anthropic, DeepSeek, Fireworks, Groq, Hugging Face, MistralAI,
Ollama, OpenAI, OpenRouter, Perplexity, and xAI to expect `lc_versions`.

## Testing
- Added/adjusted core tests for `lc_versions` accumulation, duplicate
package overwrite behavior, non-dict `lc_versions` replacement,
defensive copying, and `metadata["versions"]` last-writer-wins behavior.
- Ran focused core and partner metadata tests plus Ruff checks for
changed areas.
2026-06-12 14:26:32 -04:00
Christophe Bornet
0392b6bae4 fix(core): fix Pydantic v1 support in tools/runnable (#33698)
`BaseTool.args_schema` is documented as accepting a Pydantic v1 model,
but several code paths assumed v2 and raised when handed a v1 schema
(e.g. an `AttributeError` from calling
`model_json_schema()`/`model_fields` on a v1 model). This affected
anyone using a v1 `args_schema`, and anyone composing runnables whose
input/output schema is a v1 model.

This PR makes the tool/runnable schema-derivation code version-agnostic.

## Type contract

`TypeBaseModel` (and `PydanticBaseModel`) now include
`pydantic.v1.BaseModel`, so the type honestly reflects what tools and
runnables already accept at runtime. The public schema accessors
(`Runnable.get_input_schema`/`get_output_schema` and the
`input_schema`/`output_schema` properties) return `TypeBaseModel`.

## Version-agnostic helpers

Added to `langchain_core.utils.pydantic`, each dispatching on the
model's Pydantic version so callers don't have to:

- `model_json_schema(model)` — JSON schema for either version.
- `model_validate(model, obj)` — validation for either version.
- `get_fields(model)` — field map for either version (existing helper,
now used consistently).

Internally, direct `.model_json_schema()` / `.model_fields` calls are
replaced with these helpers (or with `get_input_jsonschema()` /
`get_output_jsonschema()`).

## Behavior change worth a close look

When deriving a schema from a v1 model (in `RunnableParallel`,
`RunnableAssign`, and `RunnableSequence` output schemas), a **required**
v1 field is now correctly carried over as required. Previously the v1
path read the field's `default` — which is `None` for a required v1
field — and silently turned required fields into optional/nullable ones;
`default_factory` fields were dropped entirely. The new
`_get_schema_field_definition` helper translates a v1 `ModelField`
faithfully (required → `...`, factory preserved) and dispatches
explicitly on the field type.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2026-06-12 00:18:49 -04:00
Mason Daugherty
05cc55f1bc release(core): 1.4.6 (#38061) 2026-06-11 02:58:40 -04:00
Mason Daugherty
948f6cc58c feat(core,partners): add package version tracking to tracing metadata (#35295)
Following on the heels of #35293

TODO:
- Packages outside of this repo (e.g. LiteLLM, Nvidia, Google, AWS)

---

## Summary

Surface partner package versions in `metadata.versions` on LangSmith
traces. Mirrors the JS SDK's `_addVersion()` pattern
([langchainjs#10106](https://github.com/langchain-ai/langchainjs/pull/10106)).

Each model constructor records its package version via `_add_version()`
on `BaseLanguageModel`. The version dict accumulates through the class
hierarchy — `langchain-core` is added in
`BaseLanguageModel.model_post_init`, `langchain-openai` in
`BaseChatOpenAI._set_openai_chat_version`, and each leaf partner in its
uniquely-named `model_validator`. Traces end up with:

```json
{
  "metadata": {
    "versions": {
      "langchain-core": "1.4.5",
      "langchain-openai": "1.3.0",
      "langchain-xai": "1.2.2"
    }
  }
}
```

### Changes

- `BaseLanguageModel._add_version(pkg, version)` — appends to
`self.metadata["versions"]`; accepts any `Mapping` type; emits a warning
if a non-mapping value is found and replaced
- `BaseLanguageModel.model_post_init` — adds `langchain-core` version;
calls `super()` for MRO safety
- `_merge_metadata_dicts` — one-level-deep (non-recursive) merge for
nested dict metadata keys
- `CallbackManager.add_metadata` — uses `_merge_metadata_dicts` instead
of flat `dict.update()` so nested metadata dicts (like `versions`)
coexist rather than clobber
- `merge_configs` — uses `_merge_metadata_dicts` for config merging

**Partners:**
- Each now calls `self._add_version("langchain-<pkg>", __version__)`

### Design decisions

- **Constructor-based, not `_get_ls_params`-based** — versions flow
through `self.metadata` (local metadata on traces), not through
`LangSmithParams`. This matches JS and makes child-class version
inheritance automatic (no merge/clobber issues).
- **`versions` is local (non-inheritable) metadata** — `self.metadata`
is passed to `CallbackManager.configure` as `local_metadata`
(`add_metadata(..., inherit=False)`), so `versions` is attached **once
per chat-model run** and is **not** propagated to child runs or
duplicated onto every streaming chunk. This is intentionally the
opposite of the inheritable-per-chunk metadata that #36588 was reducing
for performance — `versions` does not regress that path.
- **`add_metadata` deep-merge is a correctness fix, not just for
versions** — previously `add_metadata`/`merge_configs` did a flat
top-level `dict.update`/spread, so any nested metadata dict baked into a
config (e.g. via `.with_config({"metadata": {...}})`) would be wholly
replaced when a caller also passed `metadata`. `_merge_metadata_dicts`
merges one level deep so user-provided `config.metadata.versions` and
model-set `versions` coexist instead of clobbering. The merge runs once
per `configure` (not per chunk), so it is off the streaming hot path.
- **One level deep only** — `_merge_metadata_dicts` is deliberately
*not* a recursive deep merge; values nested more than one level are
last-writer-wins. This covers the `versions` case without the
ambiguity/cost of arbitrary-depth merging.
- **Warn on non-dict `metadata["versions"]`** — if a user sets
`metadata={"versions": "some-string"}`, `_add_version` emits a warning
and replaces the value with the version dict rather than silently
discarding user data or crashing. This is a soft breaking change for
anyone who previously stored non-dict values at this key.

### Follow-ups (tracked separately, out of scope here)

- JS `mergeConfigs` still flat-spreads nested metadata, so
`metadata.versions` can still clobber on the JS side until an equivalent
deep-merge lands.

---

Made by [Open SWE](https://openswe.vercel.app)

---------

Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>
2026-06-11 02:23:19 -04:00
Mason Daugherty
86428c63ac fix(core,openai): normalize v1 streamed tool calls (#35983)
OpenAI Chat Completions streaming has a v1 normalization gap when tool
calls are streamed.

When users opt into `output_version="v1"`, `.content_blocks` is expected
to be the normalized cross-provider view of the message. For OpenAI Chat
Completions streams, though, chunks still carry raw string `content`
plus side-channel `tool_call_chunks` / `tool_calls`.

Practically, an OpenAI stream chunk can look like this internally:

```python
AIMessageChunk(
    content="",
    tool_call_chunks=[
        {
            "name": "get_weather",
            "args": '{"location": "SF"}',
            "id": "call_123",
            "index": 0,
            "type": "tool_call_chunk",
        }
    ],
    response_metadata={"model_provider": "openai", "output_version": "v1"},
)
```

That is not already-normalized v1 content like this:

```python
AIMessageChunk(
    content=[
        {
            "type": "tool_call_chunk",
            "name": "get_weather",
            "args": '{"location": "SF"}',
            "id": "call_123",
            "index": 0,
        }
    ],
)
```

Because `.content_blocks` currently short-circuits solely on
`output_version="v1"`, it can return the raw string/empty list directly
instead of running the OpenAI translator that incorporates
`tool_call_chunks` / `tool_calls` into normalized v1 blocks.

In practice, a streamed OpenAI tool call can be parsed successfully into
`tool_calls`, but still be missing from the final aggregated
`.content_blocks`. Downstream code that consumes the v1 block interface
then sees no `tool_call` block and must know to inspect OpenAI-specific
chunk fields instead.

User story:

> As a LangChain user streaming OpenAI Chat Completions with bound tools
and `output_version="v1"`, I need the final aggregated message's
`.content_blocks` to include normalized `tool_call` blocks, so that code
written against the v1 content-block interface handles streamed tool
calls consistently across providers.

Expected final aggregated view:

```python
message.content_blocks == [
    {
        "type": "tool_call",
        "name": "get_weather",
        "args": {"location": "SF"},
        "id": "call_123",
    }
]
```

Root causes:

1. The usage-only Chat Completions chunk uses `content=[]` in v1 mode
while normal streaming chunks use `content=""`, creating inconsistent
content types during chunk aggregation.
2. `AIMessage.content_blocks` and `AIMessageChunk.content_blocks` treat
any `output_version="v1"` message as already-normalized, even when
`content` is still raw string content from Chat Completions.
3. Content-bearing OpenAI stream chunks do not carry
`output_version="v1"`, so the final merged chunk may not reliably take
the v1 normalization path.

Changes:

- Keep usage-only Chat Completions chunks as `content=""` instead of
overriding to `[]`, so streaming chunks merge consistently.
- Propagate `output_version="v1"` to content-bearing chunks.
- Only short-circuit v1 `.content_blocks` when `content` is already a
list of blocks; otherwise fall through to the provider translator.
- Add regression tests covering string-content v1 fallback, usage-only
chunk content consistency, and streamed tool calls appearing as
normalized final v1 blocks.
2026-06-11 00:51:50 -04:00
Mason Daugherty
7cc9d0c84d fix(core): async tracer on_chat_model_start fallback in sync context (#35233)
Fixes #30870

When an `AsyncBaseTracer` with `_schema_format="original"` (the default)
is used with sync `llm.invoke()`, the `on_chat_model_start` to
`on_llm_start` fallback doesn't fire. The async handler returns a
coroutine instead of raising `NotImplementedError` synchronously, so it
bypasses the existing fallback logic and lands in `_run_coros`, which
only logs the error generically.

This fallback already works for sync handlers in sync context and async
handlers in async context. This PR closes the gap for async handlers in
sync context.
2026-06-10 22:15:29 -04:00
Mason Daugherty
6b9e22dbbc fix(langchain): tighten structured output model fallbacks (#38042)
Provider-native structured output fallback detection now uses bounded
model-name patterns instead of broad substring checks, reducing false
positives for unrelated model IDs. The model examples and test fixtures
across OpenAI/OpenRouter-facing code were refreshed around current
OpenAI model families while preserving shipped defaults.

## Changes
- Tightened `FALLBACK_MODELS_WITH_STRUCTURED_OUTPUT` from loose string
fragments to regex patterns, with `_supports_provider_strategy` matching
full model-name segments instead of arbitrary substrings.
- Expanded structured-output fallback coverage for newer OpenAI,
Anthropic, and xAI/Grok model families, including `gpt-5.x`, newer
Claude 4/5-style names, and `grok-build`.
- Reused `_attempt_infer_model_provider` in provider tool search routing
so `_provider_from_model_name` follows the same provider inference
behavior as `init_chat_model`.
- Suppressed irrelevant provider-inference deprecation warnings during
provider tool search registry lookup.
- Refreshed OpenAI, Azure OpenAI, OpenRouter, core metadata, and example
model references from older fixtures like `gpt-4`, `gpt-4o`, `o1`, and
`o4-mini` to current test/profile models such as `gpt-5.5`,
`gpt-5-nano`, and `gpt-4.1-mini`.
- Removed outdated OpenAI test assumptions around legacy `o1` behavior
and narrowed legacy structured-output checks to explicitly legacy model
names.
2026-06-10 21:18:14 -04:00
Mason Daugherty
f89f4c5afe fix(core): support content block tokens in callbacks (#34739)
Supersedes #34727
Closes #30703

Related:
* langchain-ai/langchain-google#1460
* langchain-ai/langchain-google#1501

Fixing this at the `langchain-core` callback layer instead of
normalizing inside individual provider integrations, so structured
streaming content is preserved consistently.

---

Models are increasingly streaming structured content blocks instead of
plain text tokens. For example, Gemini 3 can stream text as
content-block lists, and Anthropic/tool-use flows can also produce
non-text message content. Today those values already reach
`on_llm_new_token`, but the callback API still advertises `token: str`,
which makes custom callbacks, tracers, and streaming helpers assume
every streamed value is text.

User story: as a LangChain user building a streaming callback for chat
models with tool calls, reasoning/thinking blocks, or provider-specific
structured content, I need `on_llm_new_token` to accept the same content
shape that chat model chunks can actually emit, so my callback can
observe the stream without providers flattening or dropping non-text
data.

Fixing this in `langchain-core` makes the existing runtime behavior
explicit at the shared callback boundary. Normalizing content blocks
inside each provider would duplicate logic, produce inconsistent
behavior across integrations, and in some cases lose required provider
metadata such as Gemini thought signatures.

## Changes

- Update the callback contract so streamed tokens can be either plain
text or structured content blocks
- Carry structured streamed content through tracing and event/log
streaming paths without forcing provider data into text too early
- Keep built-in text-oriented streaming callbacks working by converting
structured tokens only at the display/queue boundary
- Drop the now-incorrect `cast("str", ...)` on streamed content in
`BaseChatModel` so the producer side matches the widened callback
signature instead of asserting a string it doesn't always have (no
runtime change — `cast` is erased)
- Align Anthropic and Mistral content typing with the structured content
shapes already used by chat model messages
- Update callback tests to reflect that not every streamed value is text

## Compatibility

No runtime behavior change: no producer emits anything it wasn't already
emitting, and widening a parameter type is safe for existing callers and
handlers that pass or receive `str`. The one caveat is downstream code
that subclasses a callback handler or tracer and overrides
`on_llm_new_token` with a `token: str` annotation — under strict type
checking that override is now narrower than the base and will be flagged
as incompatible with the supertype. Such code still runs unchanged; the
fix is to widen the annotation to match.
2026-06-10 16:59:08 -04:00
Christophe Bornet
720dfd3b09 chore(core): improve typing of Runnable __or__ (#34530)
`Runnable.__or__`, `Runnable.__ror__`, and their `RunnableSequence` and
`StructuredPrompt` overrides previously erased composition types: the
right-hand operand was typed `Runnable[Any, Other]`, so piping two
runnables together always produced `RunnableSerializable[Input, Any]`.
Type information was lost at every `|`, which is why chains so often
needed a `chain: Runnable = ...` annotation just to recover usable
inference.

This adds `@overload`s so the `Output` of one step flows into the
`Input` of the next and the composed result carries the real `Output`
type through. `Runnable[int, str] | Runnable[str, float]` now infers
`RunnableSerializable[int, float]` instead of `[int, Any]`.
`coerce_to_runnable` gains overloads so a `Mapping` resolves to
`RunnableParallel` while everything else stays a `Runnable`. As a
knock-on effect, dozens of now-unnecessary `: Runnable` annotations were
dropped from the test suite.

Runtime behavior is unchanged — this is a typing-only change.

## Impact on type-checked code

Most users will simply get better inference. Two changes can require a
small adjustment if you run a type checker (`mypy`, `pyright`):

### Stricter operand matching in `|`

The right-hand side of `|` is now typed `Runnable[Output, Other]` rather
than `Runnable[Any, Other]`, so the right operand's declared **input**
must match the left operand's **output**. This is more accurate, but it
surfaces a common pattern that was previously silent: piping a step that
outputs a plain `dict` into a step whose declared input is a more
specific type (for example a `TypedDict`). It still works at runtime;
the checker now reports an `[operator]` error.

If you hit this, narrow the boundary with a `cast` (or an explicit
annotation):

```python
from typing import Any, cast

from langchain_core.runnables import Runnable

# upstream outputs a dict; downstream declares a narrower input type
chain = cast("Runnable[Any, MyInput]", upstream) | downstream
```

### `list` → `Sequence` on `RunnableEach` / `map()`

`Runnable.map()` and the `invoke` / `ainvoke` methods of `RunnableEach`
now accept `Sequence[Input]` instead of `list[Input]`. Callers are
unaffected — a `list` is a `Sequence`, and tuples or other sequences now
type-check too. The only thing to adjust: if you **subclass**
`RunnableEach` (or `RunnableEachBase`) and override these methods with a
`list[...]` parameter, widen the annotation to `Sequence[...]` so the
override stays compatible with the base signature.

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2026-06-10 16:16:03 -04:00
Christophe Bornet
a063ec26dd chore(core): fix some any generics (#34545)
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2026-06-10 15:32:14 -04:00
Mason Daugherty
8bc96308d0 fix(core): accept sequence tool error content (#38005)
`handle_tool_error` callables can now return structured message content
as any valid sequence, not just a mutable `list`. Valid structured
sequences are normalized to the `ToolMessage` content shape at the tool
output boundary, while invalid content still falls back to
stringification.

## Changes
- Widened `ToolExceptionHandlerOutput` from `list[str | dict[str, Any]]`
to `Sequence[MessageContentBlock]` so handlers returning `list[dict[str,
Any]]` or tuple content blocks type-check cleanly.
- Added `_normalize_message_content` to validate structured message
content and convert valid non-string sequences to the `list` shape
expected by `ToolMessage`.
- Preserved existing stringification behavior for invalid structured
content blocks instead of treating failed normalization as `None`.
- Removed the now-unused `_is_message_content_type` helper; output
formatting validates content directly through
`_normalize_message_content`.
2026-06-09 22:35:33 -04:00
Mason Daugherty
0f1b291f42 fix(core): type structured tool error handler output (#38003)
`handle_tool_error` callables can already return structured message
content at runtime, but the public typing only allowed strings. The tool
error handling API now reflects the existing output formatting path,
including clearer docs for how handled errors become
`ToolMessage(status="error")` results.
2026-06-09 21:18:19 -04:00
Nidhi Rajani
0f45b2c285 feat(openai): support apply_patch built-in tool (#37157)
[Docs](https://github.com/langchain-ai/docs/pull/4370)

Fixes #37031

Adds support for OpenAI Responses API `apply_patch` built-in tool.

This PR:
- Adds `apply_patch` to the OpenAI well-known tools list so
`bind_tools([{"type": "apply_patch"}])` works.
- Preserves `apply_patch_call` and `apply_patch_call_output` items when
converting OpenAI Responses API outputs into LangChain
`AIMessage.content`.
- Preserves the same item types in streaming `AIMessageChunk`
conversion.
- Supports round-trip input conversion for `apply_patch_call` and
`apply_patch_call_output`.
- Adds unit tests for core tool passthrough, non-streaming conversion,
streaming conversion, and round-trip input conversion.

## Testing

- `cd libs/core && uv run --group test pytest
tests/unit_tests/utils/test_function_calling.py -k "apply_patch" -vv`
- `cd libs/partners/openai && uv run --group test pytest
tests/unit_tests/chat_models/test_base.py -k "apply_patch" -vv`
- `cd libs/core && uv run --all-groups ruff check
langchain_core/utils/function_calling.py
tests/unit_tests/utils/test_function_calling.py`
- `cd libs/partners/openai && uv run --all-groups ruff check
langchain_openai/chat_models/base.py
tests/unit_tests/chat_models/test_base.py`

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2026-06-09 16:13:37 -04:00
Christophe Bornet
74c23741b0 feat(core): deprecate problematic dict() method (#31685)
`dict()` is a problematic method name as it clashes with the builtin
`dict` used as a type annotation.
This PR replaces it with an `asdict` method (inspired by dataclasses).
It also fixes a few places where `dict` must be replaced by
`builtins.dict` until the `dict()` method is removed.

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2026-06-08 14:11:05 -04:00
Mason Daugherty
053c368ba4 fix(core): remove Bedrock prevalidation from load (#37909)
Removes the built-in Bedrock class init validator from `load` so Bedrock
kwargs such as `base_url` and `endpoint_url` are no longer specially
rejected during deserialization.

This keeps provider-specific SSRF policy out of core; callers should
continue to avoid untrusted manifests or use restrictive
`allowed_objects`.

Verified with `make format`, `make lint`, and the focused serialization
load unit tests.

AI-assisted contribution by Open SWE.

Made by [Open SWE](https://openswe.vercel.app)

---------

Co-authored-by: open-swe[bot] <215916821+open-swe[bot]@users.noreply.github.com>
2026-06-05 10:46:57 -04:00
Nick Hollon
3802938f1c fix(core): accept Serializable constructor-envelope wire shape in _convert_to_message (#37456) 2026-05-15 15:34:04 -07:00
Nick Hollon
f42d80ca1c fix(core): preserve chunk additional_kwargs across v3 stream assembly (#37435)
The v3 streaming path drops `additional_kwargs` from per-chunk
`AIMessageChunk`s during assembly: `chunks_to_events` emits no event
field for them, and `ChatModelStream._assemble_message` constructs the
final `AIMessage` without an `additional_kwargs` argument. Non-streaming
`ainvoke` returns the provider message unchanged, so streaming and
non-streaming diverge for any provider that uses `additional_kwargs` to
carry data outside the typed protocol blocks.

## How this surfaces

The concrete failure mode is Gemini's
`__gemini_function_call_thought_signatures__` — a per-tool-call
signature blob the Google GenAI integration places in
`additional_kwargs`, keyed by `tool_call_id`. Gemini requires that
signature on follow-up turns to replay the prior thought trace; without
it, multi-turn streaming flows lose thought continuity (and may
regenerate thinking, charging additional reasoning tokens, or in some
cases refuse). Other providers that use `additional_kwargs` (e.g. older
`function_call` accumulators, custom routing metadata) hit the same gap;
the fix is intentionally provider-agnostic.

## Fix

Provider-agnostic, two seams:

- `_compat_bridge` accumulates `msg.additional_kwargs` across chunks
with `merge_dicts` (matching `AIMessageChunk`'s own merge semantics for
fields that accumulate, like `function_call`) and emits the merged dict
on the `message-finish` event as an off-spec extension. The bridge
already uses one such extension (`metadata` on `MessageFinishData`);
this PR follows the same pattern for `additional_kwargs`.
- `ChatModelStream._finish` reads the new field; `_assemble_message`
threads it onto the final `AIMessage` only when non-empty, preserving
today's behavior of leaving `additional_kwargs` empty when no provider
data needs to ride on it.
2026-05-14 11:19:45 -07:00
Nick Hollon
649d82f206 fix(core): preserve reasoning blocks alongside tool_call in v3 stream (#37434)
Closes #37420

---

`stream_events(version="v3")` (and the `astream_events` async twin)
silently dropped reasoning content from the final assembled `AIMessage`
whenever the same message also produced a tool_call. The bug reproduces
against Gemini 2.5 Pro with `include_thoughts=True`: reasoning streams
correctly through `ChatModelStream.reasoning`, but the persisted message
in the final graph state carries only the `tool_call` block.

## Root cause

`_iter_protocol_blocks` in the compat bridge groups per-chunk content
blocks by source-side identifier. When a provider doesn't supply an
`index` field on its content blocks — which the Google GenAI translator
does not for either `reasoning` or `tool_call` blocks — the bridge falls
back to positional `i` as the bucket key. Because Gemini typically emits
one block per chunk, every reasoning chunk and the later tool_call chunk
all key to `0`, and the type mismatch trips `_accumulate`'s
self-contained `else` branch. That branch clears accumulated reasoning
state and replaces it with the incoming tool_call, so reasoning never
reaches `content-block-finish`.

## Fix

When a block has no source-side `index`, key it by `("__lc_no_index__",
block_type, positional_i)` instead of bare `i`. Same-type chunks at the
same position still share a bucket and merge cleanly (streaming text and
reasoning unchanged); different-type chunks at the same position now
occupy distinct wire blocks and both reach `content-block-finish`.
Providers that supply explicit indices (Anthropic, OpenAI Responses) are
unaffected.

## Verification

Unit-tested at the compat-bridge layer for both sync
(`chunks_to_events`) and async (`achunks_to_events`) paths.

Verified live against Gemini 2.5 Pro `gemini-2.5-pro` with
`thinking_budget=2048`, `include_thoughts=True`, and a single
`get_weather` tool. Pre-fix:
`final_state.messages[tool_calling_ai_message].content == [{type:
tool_call, ...}]`. Post-fix: `[..., {type: reasoning, reasoning: "..."},
{type: tool_call, ...}]`, matching the shape `ainvoke` returns on the
same input.
2026-05-14 11:11:30 -07:00
Nick Hollon
da380bccf8 chore(infra): merge v1.4 into master (#37350) 2026-05-11 11:39:25 -07:00
Mason Daugherty
8b21400627 fix(core): avoid eager pydantic.v1 import in @deprecated (#37308)
`langchain_core._api.deprecation` previously did `from
pydantic.v1.fields import FieldInfo as FieldInfoV1` at module scope,
which triggers Pydantic's `UserWarning("Core Pydantic V1 functionality
isn't compatible with Python 3.14 or greater.")` on every
`langchain_core` import under 3.14+. The v1 symbol is only needed inside
one runtime branch of `@deprecated`, so it's now resolved lazily.

## Changes
- Replace the top-level v1 `FieldInfo` import with
`_is_pydantic_v1_field_info`, which probes
`sys.modules.get("pydantic.v1.fields")` instead of forcing the import.
The reconstruction inside `deprecated`'s `finalize` closure imports
`FieldInfoV1` lazily, gated by the predicate — so the warning only fires
if a caller has already loaded `pydantic.v1` themselves.
- Add a subprocess-based regression test asserting that importing
`langchain_core._api.deprecation` does not pull any `pydantic.v1*`
module into `sys.modules`. Verified to fail when the eager import is
reintroduced.
- Add a v1 `FieldInfo` decoration test — the v1 branch of `@deprecated`
previously had zero direct coverage.
- Update the stale `# Last Any should be FieldInfoV1 but this leads to
circular imports` comment on `T`'s bound, which no longer reflects the
real reason (it's about the 3.14 warning, not circularity).
2026-05-09 20:35:17 -04:00
Nick Hollon
c979c6187b fix(core, langchain): harden load() against untrusted manifests (#37197) 2026-05-05 14:36:58 -04:00
Mason Daugherty
a1f336fdc7 fix(core): preserve structured inputs on tool runs in tracers (#37108)
Tool runs in `_TracerCore._create_tool_run` were discarding the
structured `inputs` dict that `BaseTool.run` passes to `on_tool_start`,
replacing it with `{"input": str(filtered_tool_input)}`. Consequently,
every multi-arg tool (e.g. ones in `deepagents` like `execute`,
`edit_file`, `write_file`, `grep`, ...) appeared in LangSmith with a
stringified, escaped dump of its arguments — multi-line bash commands
rendered with `\n` and were effectively unreadable. Chain runs already
preserved dicts via `_get_chain_inputs`; tool runs are now symmetric.

## Changes
- Preserve `inputs` when it is already a `dict` in the `original` /
`original+chat` branch of `_TracerCore._create_tool_run`, falling back
to `{"input": input_str}` only when no structured payload was provided
- Add regression tests in the sync and async base-tracer suites that
pass a structured `inputs` to `on_tool_start` and assert the dict
survives onto the resulting `Run`

## Breaking change
Custom `BaseTracer` subclasses that parsed `Run.inputs["input"]` as a
stringified dict for tool runs will need to read the structured fields
directly. The shape now matches what `on_tool_start(inputs=...)` has
always received — introduced alongside `_schema_format` in the
`astream_events` work — and what `streaming_events` consumers already
see.
2026-04-30 14:56:14 -04:00
Mason Daugherty
37be34be82 fix(core): make removal optional in warn_deprecated (#37056)
Drop the `NotImplementedError` branch in `warn_deprecated` so callers
can pass `pending=False` without specifying a `removal` version. The
previous behavior contradicted the docstring (which claimed an empty
default would auto-compute a removal version) — no such computation
existed; the function just raised a placeholder "Need to determine which
default deprecation schedule to use" error.
2026-04-28 11:05:31 -04:00
Sharvil Saxena
78546e9242 fix(core): validate batch_size in _batch and _abatch to prevent infinite loop (#36663) 2026-04-26 15:13:20 -04:00
Nick Hollon
9ce72eba9f feat(core): add content-block-centric streaming (v2) (#36834) 2026-04-24 11:36:17 -04:00
Hunter Lovell
9a671d7919 feat(core): allow _format_output to pass through list of ToolOutputMixin instances (#36963) 2026-04-23 13:49:46 -04:00
Jacob Lee
40026a7282 feat(core): Update inheritance behavior for tracer metadata for special keys (#36900)
JS equivalent: https://github.com/langchain-ai/langchainjs/pull/10733
2026-04-20 14:58:01 -07:00
Eugene Yurtsev
b00646d882 chore(core): keep checkpoint_ns behavior in streaming metadata for backwards compat (#36828)
minor buglet
2026-04-16 15:17:20 -04:00
Jacob Lee
c04e05feb1 feat(core): Add chat model and LLM invocation params to traceable metadata (#36771)
Equivalent to: https://github.com/langchain-ai/langchainjs/pull/10711/

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2026-04-16 18:30:54 +00:00
ccurme
338aa8131a fix(core): restore cloud metadata IPs and link-local range in SSRF policy (#36816) 2026-04-16 09:15:42 -04:00
ccurme
7d601dc2c6 chore(core): harden private SSRF utilities (#36768) 2026-04-15 16:13:20 -04:00
Jacob Lee
a6eb829701 fix(core): Use reference counting for storing inherited run trees to support garbage collection (#36660)
When a langsmith `@traceable` function invokes a LangChain Runnable or
LangGraph subgraph, the callback manager's `_configure` function injects
the `@traceable` RunTree into the `LangChainTracer`'s `run_map` so that
child runs can resolve their parent for trace nesting. However, since
the RunTree was created outside the tracer's callback lifecycle,
`_end_trace` never removes it. The entry persists in `run_map`
indefinitely, retaining the full RunTree and its entire child tree.

In applications with nested subgraph invocations (e.g. an outer
investigation graph delegating to skill agent subgraphs, each compiled
as their own `StateGraph`), this causes RunTree objects to accumulate
linearly with every call.

**Fix:** Track which `run_map` entries were injected externally via a
shared `_external_run_ids` refcount dict on `_TracerCore`. When
`_start_trace` adds a child under an external parent, it increments the
count. When `_end_trace` finishes a child, it decrements — and evicts
the external parent from `run_map` once the last child completes.

The refcount (rather than a simple set) is necessary because a single
external parent may have multiple sibling children in the callback chain
(e.g. a `prompt | llm` `RunnableSequence`). Only truly external runs are
tracked — the `_configure` guard `if run_id_str not in handler.run_map`
prevents tracer-managed runs from being misclassified.
2026-04-13 09:50:37 -04:00
Eugene Yurtsev
af4d711a2f chore(core): reduce streaming metadata / perf (#36588)
- looking into reducing streaming metadata / perfm

---------

Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
2026-04-10 10:47:54 -04:00
Eugene Yurtsev
af2ed47c6f fix(core): add more sanitization to templates (#36612)
add more sanitization to templates
2026-04-08 14:10:10 -04:00
ccurme
7629c74726 fix(core): handle symlinks in deprecated prompt save path (#36585)
Resolve symlinks before validating file extensions in the deprecated
`save()` method on prompt classes.

Credit to Jeff Ponte (@JDP-Security) for reporting the symlink
resolution issue.
2026-04-07 10:45:42 -04:00
Michael Chin
ebecdddb1b fix(core): add init validator and serialization mappings for Bedrock models (#34510)
Adds serialization mappings for `ChatBedrockConverse` and `BedrockLLM`
to unblock standard tests on `langchain-core>=1.2.5` (context:
[langchain-aws#821](https://github.com/langchain-ai/langchain-aws/pull/821)).
Also introduces a class-specific validator system in
`langchain_core.load` that blocks deserialization of AWS Bedrock models
when `endpoint_url` or `base_url` parameters are present, preventing
SSRF attacks via crafted serialized payloads.

Closes #34645

## Changes
- Add `ChatBedrockConverse` and `BedrockLLM` entries to
`SERIALIZABLE_MAPPING` in `mapping.py`, mapping legacy paths to their
`langchain_aws` import locations
- Add `validators.py` with `_bedrock_validator` — rejects
deserialization kwargs containing `endpoint_url` or `base_url` for all
Bedrock-related classes (`ChatBedrock`, `BedrockChat`,
`ChatBedrockConverse`, `ChatAnthropicBedrock`, `BedrockLLM`, `Bedrock`)
- `CLASS_INIT_VALIDATORS` registry covers both serialized (legacy) keys
and resolved import paths from `ALL_SERIALIZABLE_MAPPINGS`, preventing
bypass via direct-path payloads
- Move kwargs extraction and all validator checks
(`CLASS_INIT_VALIDATORS` + `init_validator`) in `Reviver.__call__` to
run **before** `importlib.import_module()` — fail fast on security
violations before executing third-party code
- Class-specific validators are independent of `init_validator` and
cannot be disabled by passing `init_validator=None`

## Testing
- `test_validator_registry_keys_in_serializable_mapping` — structural
invariant test ensuring every `CLASS_INIT_VALIDATORS` key exists in
`ALL_SERIALIZABLE_MAPPINGS`
- 10 end-to-end `load()` tests covering all Bedrock class paths (legacy
aliases, resolved import paths, `ChatAnthropicBedrock`,
`init_validator=None` bypass attempt)
- Unit tests for `_bedrock_validator` covering `endpoint_url`,
`base_url`, both params, and safe kwargs

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2026-04-03 19:22:39 -04:00
ccurme
0b5f2c08ee fix(core): harden check for txt files in deprecated prompt loading functions (#36471) 2026-04-02 16:42:48 -04:00
jasiecky
c9f51aef85 fix(core): fixed typos in the documentation (#36459)
Fixes #36458 

Fixed typos in the documentation in the core module.
2026-04-02 11:32:12 -04:00
Weiguang Li
e6c1b29e80 fix(core): add "computer" to _WellKnownOpenAITools (#36261) 2026-03-29 08:54:42 -04:00
Jacob Lee
389f7ad1bc revert: Revert "fix(core): trace invocation params in metadata" (#36322) 2026-03-27 19:14:02 -04:00
ccurme
27add91347 fix(core): validate paths in prompt.save and load_prompt, deprecate methods (#36200) 2026-03-24 14:27:14 -04:00
Mason Daugherty
2f64d80cc6 fix(core,model-profiles): add missing ModelProfile fields, warn on schema drift (#36129)
PR #35788 added 7 new fields to the `langchain-profiles` CLI output
(`name`, `status`, `release_date`, `last_updated`, `open_weights`,
`attachment`, `temperature`) but didn't update `ModelProfile` in
`langchain-core`. Partner packages like `langchain-aws` that set
`extra="forbid"` on their Pydantic models hit `extra_forbidden`
validation errors when Pydantic encountered undeclared TypedDict keys at
construction time. This adds the missing fields, makes `ModelProfile`
forward-compatible, provides a base-class hook so partners can stop
duplicating model-profile validator boilerplate, migrates all in-repo
partners to the new hook, and adds runtime + CI-time warnings for schema
drift.

## Changes

### `langchain-core`
- Add `__pydantic_config__ = ConfigDict(extra="allow")` to
`ModelProfile` so unknown profile keys pass Pydantic validation even on
models with `extra="forbid"` — forward-compatibility for when the CLI
schema evolves ahead of core
- Declare the 7 missing fields on `ModelProfile`: `name`, `status`,
`release_date`, `last_updated`, `open_weights` (metadata) and
`attachment`, `temperature` (capabilities)
- Add `_warn_unknown_profile_keys()` in `model_profile.py` — emits a
`UserWarning` when a profile dict contains keys not in `ModelProfile`,
suggesting a core upgrade. Wrapped in a bare `except` so introspection
failures never crash model construction
- Add `BaseChatModel._resolve_model_profile()` hook that returns `None`
by default. Partners can override this single method instead of
redefining the full `_set_model_profile` validator — the base validator
calls it automatically
- Add `BaseChatModel._check_profile_keys` as a separate
`model_validator` that calls `_warn_unknown_profile_keys`. Uses a
distinct method name so partner overrides of `_set_model_profile` don't
inadvertently suppress the check

### `langchain-profiles` CLI
- Add `_warn_undeclared_profile_keys()` to the CLI (`cli.py`), called
after merging augmentations in `refresh()` — warns at profile-generation
time (not just runtime) when emitted keys aren't declared in
`ModelProfile`. Gracefully skips if `langchain-core` isn't installed
- Add guard test
`test_model_data_to_profile_keys_subset_of_model_profile` in
model-profiles — feeds a fully-populated model dict to
`_model_data_to_profile()` and asserts every emitted key exists in
`ModelProfile.__annotations__`. CI fails before any release if someone
adds a CLI field without updating the TypedDict

### Partner packages
- Migrate all 10 in-repo partners to the `_resolve_model_profile()`
hook, replacing duplicated `@model_validator` / `_set_model_profile`
overrides: anthropic, deepseek, fireworks, groq, huggingface, mistralai,
openai (base + azure), openrouter, perplexity, xai
- Anthropic retains custom logic (context-1m beta → `max_input_tokens`
override); all others reduce to a one-liner
- Add `pr_lint.yml` scope for the new `model-profiles` package
2026-03-23 00:44:27 -04:00
Mason Daugherty
5ffece5c03 chore(core): remove stale blockbuster allowlist for deleted context module (#36168)
Closes #29530

---

Remove a stale BlockBuster allowlist entry in `conftest.py` referencing
`aconfig_with_context` — the function and its containing module
(`langchain_core/beta/runnables/context.py`) were deleted in `fded6c6b1`
(Sep 2025, #32850). Spotted by @antonio-mello-ai in #29530.
2026-03-22 20:39:55 -04:00
ccurme
70c88c0e72 fix(core): trace invocation params in metadata (#36080) 2026-03-18 13:20:18 -04:00
Eugene Yurtsev
dd136337d7 feat(core): harden anti-ssrf (#35960)
harden anti-ssrf

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2026-03-18 10:41:43 -04:00
Mohammad Mohtashim
b21c0a8062 fix(core): preserve default_factory when generating tool call schema (#35550) 2026-03-08 15:34:21 -04:00
Mason Daugherty
61fd90a2f3 fix(core): extract usage metadata from serialized tracer message outputs (#35526)
Fixes missing `run.metadata.usage_metadata` population in
`LangChainTracer` for real LLM/chat traces following #34414

- Fix extraction to read usage from serialized tracer message shape:
`outputs.generations[*][*].message.kwargs.usage_metadata`
- Remove non-serialized direct message shape handling
(`message.usage_metadata`) from extractor to match real tracer output
path
- Clarify tracer docstrings around chat callback naming
(`on_chat_model_start` + shared `on_llm_end`) to reduce ambiguity

## Why

#34414 introduced usage duplication into `run.metadata.usage_metadata`,
but the extractor read `message.usage_metadata`.

In real tracer flow, messages are serialized with `dumpd(...)` during
run completion, so usage metadata lives under
`message.kwargs.usage_metadata`. Because of this mismatch, duplication
did not trigger in real traces.
2026-03-02 17:43:33 -05:00
Guofang.Tang
78678534f9 fix(core): treat empty tool chunk ids as missing in merge (#35414) 2026-02-24 18:12:49 -05:00