langchain

mirror of https://github.com/hwchase17/langchain.git synced 2026-07-01 14:47:02 +00:00

Author	SHA1	Message	Date
Mason Daugherty	86ce95afc2	test(core,langchain): update tests for explicit deserialization allowlists (#38118 ) Core serialization tests now opt into the object allowlists they rely on instead of assuming default deserialization permits core objects. Compatibility tests that intentionally exercise deprecated runnable streaming and history APIs also suppress the expected deprecation warnings so they can keep covering those legacy paths cleanly. ## Changes - Updated serialization and prompt round-trip tests to pass `allowed_objects="core"` or targeted allowlists when loading `AIMessage`, prompt templates, structured prompts, runnable maps, and related core objects. - Adjusted secret-injection regression coverage to keep testing `secrets_from_env=True` behavior while explicitly allowing core deserialization paths. - Tightened prompt deserialization rejection tests so attribute-access payloads are loaded only through the specific prompt-template allowlist needed to reach validation. - Added module-level warning filters around legacy runnable compatibility coverage for `astream_log`, `astream_events(version="v1")`, and `RunnableWithMessageHistory`. - Bumped the `langchain` package's minimum `langgraph` dependency from `1.2.4` to `1.2.5`. ## Testing - Updated unit tests across core serialization, prompt, fake chat model, runnable history, and runnable event coverage.	2026-06-12 16:49:14 -04:00
Mason Daugherty	4108c0738c	release(core): 1.4.7 (#38111 ) Bumps `langchain-core` to `1.4.7` for the next patch release and updates downstream minimum `langchain-core` requirements so package locks resolve against the new core version. This also refreshes the runnable snapshots that embed `lc_versions` metadata so the version consistency check continues to validate checked-in artifacts. Validated with `python libs/core/scripts/check_version.py`, `uv lock --check` across package lockfiles, and the core runnable tests that own the updated snapshots with local LangSmith tracing env disabled.	2026-06-12 14:54:25 -04:00
Mason Daugherty	8837163917	fix(core,partners): rename package version trace metadata (#38110 ) Package-version trace metadata now uses the LangChain-owned `metadata["lc_versions"]` convention instead of the user-owned `metadata["versions"]` key. Metadata merging is narrowed so only `lc_versions` accumulates nested package-version entries, while generic nested metadata keeps normal last-writer-wins behavior. ## Changes - Renamed `BaseLanguageModel._add_version()` trace metadata from `versions` to `lc_versions`, including docstrings and the non-dict replacement warning. - Scoped `_merge_metadata_dicts()` nested-map accumulation to only `lc_versions`; duplicate package entries remain last-writer-wins and `lc_versions` mappings are copied defensively. - Preserved user-owned `metadata["versions"]` semantics by keeping it out of package-version tracking and generic nested metadata merging. - Updated runnable snapshots and partner package metadata assertions across Anthropic, DeepSeek, Fireworks, Groq, Hugging Face, MistralAI, Ollama, OpenAI, OpenRouter, Perplexity, and xAI to expect `lc_versions`. ## Testing - Added/adjusted core tests for `lc_versions` accumulation, duplicate package overwrite behavior, non-dict `lc_versions` replacement, defensive copying, and `metadata["versions"]` last-writer-wins behavior. - Ran focused core and partner metadata tests plus Ruff checks for changed areas.	2026-06-12 14:26:32 -04:00
Christophe Bornet	0392b6bae4	fix(core): fix Pydantic v1 support in tools/runnable (#33698 ) `BaseTool.args_schema` is documented as accepting a Pydantic v1 model, but several code paths assumed v2 and raised when handed a v1 schema (e.g. an `AttributeError` from calling `model_json_schema()`/`model_fields` on a v1 model). This affected anyone using a v1 `args_schema`, and anyone composing runnables whose input/output schema is a v1 model. This PR makes the tool/runnable schema-derivation code version-agnostic. ## Type contract `TypeBaseModel` (and `PydanticBaseModel`) now include `pydantic.v1.BaseModel`, so the type honestly reflects what tools and runnables already accept at runtime. The public schema accessors (`Runnable.get_input_schema`/`get_output_schema` and the `input_schema`/`output_schema` properties) return `TypeBaseModel`. ## Version-agnostic helpers Added to `langchain_core.utils.pydantic`, each dispatching on the model's Pydantic version so callers don't have to: - `model_json_schema(model)` — JSON schema for either version. - `model_validate(model, obj)` — validation for either version. - `get_fields(model)` — field map for either version (existing helper, now used consistently). Internally, direct `.model_json_schema()` / `.model_fields` calls are replaced with these helpers (or with `get_input_jsonschema()` / `get_output_jsonschema()`). ## Behavior change worth a close look When deriving a schema from a v1 model (in `RunnableParallel`, `RunnableAssign`, and `RunnableSequence` output schemas), a required v1 field is now correctly carried over as required. Previously the v1 path read the field's `default` — which is `None` for a required v1 field — and silently turned required fields into optional/nullable ones; `default_factory` fields were dropped entirely. The new `_get_schema_field_definition` helper translates a v1 `ModelField` faithfully (required → `...`, factory preserved) and dispatches explicitly on the field type. --------- Co-authored-by: Mason Daugherty <mason@langchain.dev> Co-authored-by: Mason Daugherty <github@mdrxy.com>	2026-06-12 00:18:49 -04:00
Mason Daugherty	05cc55f1bc	release(core): 1.4.6 (#38061 )	2026-06-11 02:58:40 -04:00
Mason Daugherty	948f6cc58c	feat(core,partners): add package version tracking to tracing metadata (#35295 ) Following on the heels of #35293 TODO: - Packages outside of this repo (e.g. LiteLLM, Nvidia, Google, AWS) --- ## Summary Surface partner package versions in `metadata.versions` on LangSmith traces. Mirrors the JS SDK's `_addVersion()` pattern ([langchainjs#10106](https://github.com/langchain-ai/langchainjs/pull/10106)). Each model constructor records its package version via `_add_version()` on `BaseLanguageModel`. The version dict accumulates through the class hierarchy — `langchain-core` is added in `BaseLanguageModel.model_post_init`, `langchain-openai` in `BaseChatOpenAI._set_openai_chat_version`, and each leaf partner in its uniquely-named `model_validator`. Traces end up with: ```json { "metadata": { "versions": { "langchain-core": "1.4.5", "langchain-openai": "1.3.0", "langchain-xai": "1.2.2" } } } ``` ### Changes - `BaseLanguageModel._add_version(pkg, version)` — appends to `self.metadata["versions"]`; accepts any `Mapping` type; emits a warning if a non-mapping value is found and replaced - `BaseLanguageModel.model_post_init` — adds `langchain-core` version; calls `super()` for MRO safety - `_merge_metadata_dicts` — one-level-deep (non-recursive) merge for nested dict metadata keys - `CallbackManager.add_metadata` — uses `_merge_metadata_dicts` instead of flat `dict.update()` so nested metadata dicts (like `versions`) coexist rather than clobber - `merge_configs` — uses `_merge_metadata_dicts` for config merging Partners: - Each now calls `self._add_version("langchain-<pkg>", __version__)` ### Design decisions - Constructor-based, not `_get_ls_params`-based — versions flow through `self.metadata` (local metadata on traces), not through `LangSmithParams`. This matches JS and makes child-class version inheritance automatic (no merge/clobber issues). - `versions` is local (non-inheritable) metadata — `self.metadata` is passed to `CallbackManager.configure` as `local_metadata` (`add_metadata(..., inherit=False)`), so `versions` is attached once per chat-model run and is not propagated to child runs or duplicated onto every streaming chunk. This is intentionally the opposite of the inheritable-per-chunk metadata that #36588 was reducing for performance — `versions` does not regress that path. - `add_metadata` deep-merge is a correctness fix, not just for versions — previously `add_metadata`/`merge_configs` did a flat top-level `dict.update`/spread, so any nested metadata dict baked into a config (e.g. via `.with_config({"metadata": {...}})`) would be wholly replaced when a caller also passed `metadata`. `_merge_metadata_dicts` merges one level deep so user-provided `config.metadata.versions` and model-set `versions` coexist instead of clobbering. The merge runs once per `configure` (not per chunk), so it is off the streaming hot path. - One level deep only — `_merge_metadata_dicts` is deliberately not a recursive deep merge; values nested more than one level are last-writer-wins. This covers the `versions` case without the ambiguity/cost of arbitrary-depth merging. - Warn on non-dict `metadata["versions"]` — if a user sets `metadata={"versions": "some-string"}`, `_add_version` emits a warning and replaces the value with the version dict rather than silently discarding user data or crashing. This is a soft breaking change for anyone who previously stored non-dict values at this key. ### Follow-ups (tracked separately, out of scope here) - JS `mergeConfigs` still flat-spreads nested metadata, so `metadata.versions` can still clobber on the JS side until an equivalent deep-merge lands. --- Made by [Open SWE](https://openswe.vercel.app) --------- Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>	2026-06-11 02:23:19 -04:00
Mason Daugherty	86428c63ac	fix(core,openai): normalize v1 streamed tool calls (#35983 ) OpenAI Chat Completions streaming has a v1 normalization gap when tool calls are streamed. When users opt into `output_version="v1"`, `.content_blocks` is expected to be the normalized cross-provider view of the message. For OpenAI Chat Completions streams, though, chunks still carry raw string `content` plus side-channel `tool_call_chunks` / `tool_calls`. Practically, an OpenAI stream chunk can look like this internally: ```python AIMessageChunk( content="", tool_call_chunks=[ { "name": "get_weather", "args": '{"location": "SF"}', "id": "call_123", "index": 0, "type": "tool_call_chunk", } ], response_metadata={"model_provider": "openai", "output_version": "v1"}, ) ``` That is not already-normalized v1 content like this: ```python AIMessageChunk( content=[ { "type": "tool_call_chunk", "name": "get_weather", "args": '{"location": "SF"}', "id": "call_123", "index": 0, } ], ) ``` Because `.content_blocks` currently short-circuits solely on `output_version="v1"`, it can return the raw string/empty list directly instead of running the OpenAI translator that incorporates `tool_call_chunks` / `tool_calls` into normalized v1 blocks. In practice, a streamed OpenAI tool call can be parsed successfully into `tool_calls`, but still be missing from the final aggregated `.content_blocks`. Downstream code that consumes the v1 block interface then sees no `tool_call` block and must know to inspect OpenAI-specific chunk fields instead. User story: > As a LangChain user streaming OpenAI Chat Completions with bound tools and `output_version="v1"`, I need the final aggregated message's `.content_blocks` to include normalized `tool_call` blocks, so that code written against the v1 content-block interface handles streamed tool calls consistently across providers. Expected final aggregated view: ```python message.content_blocks == [ { "type": "tool_call", "name": "get_weather", "args": {"location": "SF"}, "id": "call_123", } ] ``` Root causes: 1. The usage-only Chat Completions chunk uses `content=[]` in v1 mode while normal streaming chunks use `content=""`, creating inconsistent content types during chunk aggregation. 2. `AIMessage.content_blocks` and `AIMessageChunk.content_blocks` treat any `output_version="v1"` message as already-normalized, even when `content` is still raw string content from Chat Completions. 3. Content-bearing OpenAI stream chunks do not carry `output_version="v1"`, so the final merged chunk may not reliably take the v1 normalization path. Changes: - Keep usage-only Chat Completions chunks as `content=""` instead of overriding to `[]`, so streaming chunks merge consistently. - Propagate `output_version="v1"` to content-bearing chunks. - Only short-circuit v1 `.content_blocks` when `content` is already a list of blocks; otherwise fall through to the provider translator. - Add regression tests covering string-content v1 fallback, usage-only chunk content consistency, and streamed tool calls appearing as normalized final v1 blocks.	2026-06-11 00:51:50 -04:00
Mason Daugherty	7cc9d0c84d	fix(core): async tracer `on_chat_model_start` fallback in sync context (#35233 ) Fixes #30870 When an `AsyncBaseTracer` with `_schema_format="original"` (the default) is used with sync `llm.invoke()`, the `on_chat_model_start` to `on_llm_start` fallback doesn't fire. The async handler returns a coroutine instead of raising `NotImplementedError` synchronously, so it bypasses the existing fallback logic and lands in `_run_coros`, which only logs the error generically. This fallback already works for sync handlers in sync context and async handlers in async context. This PR closes the gap for async handlers in sync context.	2026-06-10 22:15:29 -04:00
Mason Daugherty	6b9e22dbbc	fix(langchain): tighten structured output model fallbacks (#38042 ) Provider-native structured output fallback detection now uses bounded model-name patterns instead of broad substring checks, reducing false positives for unrelated model IDs. The model examples and test fixtures across OpenAI/OpenRouter-facing code were refreshed around current OpenAI model families while preserving shipped defaults. ## Changes - Tightened `FALLBACK_MODELS_WITH_STRUCTURED_OUTPUT` from loose string fragments to regex patterns, with `_supports_provider_strategy` matching full model-name segments instead of arbitrary substrings. - Expanded structured-output fallback coverage for newer OpenAI, Anthropic, and xAI/Grok model families, including `gpt-5.x`, newer Claude 4/5-style names, and `grok-build`. - Reused `_attempt_infer_model_provider` in provider tool search routing so `_provider_from_model_name` follows the same provider inference behavior as `init_chat_model`. - Suppressed irrelevant provider-inference deprecation warnings during provider tool search registry lookup. - Refreshed OpenAI, Azure OpenAI, OpenRouter, core metadata, and example model references from older fixtures like `gpt-4`, `gpt-4o`, `o1`, and `o4-mini` to current test/profile models such as `gpt-5.5`, `gpt-5-nano`, and `gpt-4.1-mini`. - Removed outdated OpenAI test assumptions around legacy `o1` behavior and narrowed legacy structured-output checks to explicitly legacy model names.	2026-06-10 21:18:14 -04:00
Mason Daugherty	f89f4c5afe	fix(core): support content block tokens in callbacks (#34739 ) Supersedes #34727 Closes #30703 Related: * langchain-ai/langchain-google#1460 * langchain-ai/langchain-google#1501 Fixing this at the `langchain-core` callback layer instead of normalizing inside individual provider integrations, so structured streaming content is preserved consistently. --- Models are increasingly streaming structured content blocks instead of plain text tokens. For example, Gemini 3 can stream text as content-block lists, and Anthropic/tool-use flows can also produce non-text message content. Today those values already reach `on_llm_new_token`, but the callback API still advertises `token: str`, which makes custom callbacks, tracers, and streaming helpers assume every streamed value is text. User story: as a LangChain user building a streaming callback for chat models with tool calls, reasoning/thinking blocks, or provider-specific structured content, I need `on_llm_new_token` to accept the same content shape that chat model chunks can actually emit, so my callback can observe the stream without providers flattening or dropping non-text data. Fixing this in `langchain-core` makes the existing runtime behavior explicit at the shared callback boundary. Normalizing content blocks inside each provider would duplicate logic, produce inconsistent behavior across integrations, and in some cases lose required provider metadata such as Gemini thought signatures. ## Changes - Update the callback contract so streamed tokens can be either plain text or structured content blocks - Carry structured streamed content through tracing and event/log streaming paths without forcing provider data into text too early - Keep built-in text-oriented streaming callbacks working by converting structured tokens only at the display/queue boundary - Drop the now-incorrect `cast("str", ...)` on streamed content in `BaseChatModel` so the producer side matches the widened callback signature instead of asserting a string it doesn't always have (no runtime change — `cast` is erased) - Align Anthropic and Mistral content typing with the structured content shapes already used by chat model messages - Update callback tests to reflect that not every streamed value is text ## Compatibility No runtime behavior change: no producer emits anything it wasn't already emitting, and widening a parameter type is safe for existing callers and handlers that pass or receive `str`. The one caveat is downstream code that subclasses a callback handler or tracer and overrides `on_llm_new_token` with a `token: str` annotation — under strict type checking that override is now narrower than the base and will be flagged as incompatible with the supertype. Such code still runs unchanged; the fix is to widen the annotation to match.	2026-06-10 16:59:08 -04:00
Christophe Bornet	720dfd3b09	chore(core): improve typing of Runnable `__or__` (#34530 ) `Runnable.__or__`, `Runnable.__ror__`, and their `RunnableSequence` and `StructuredPrompt` overrides previously erased composition types: the right-hand operand was typed `Runnable[Any, Other]`, so piping two runnables together always produced `RunnableSerializable[Input, Any]`. Type information was lost at every `\|`, which is why chains so often needed a `chain: Runnable = ...` annotation just to recover usable inference. This adds `@overload`s so the `Output` of one step flows into the `Input` of the next and the composed result carries the real `Output` type through. `Runnable[int, str] \| Runnable[str, float]` now infers `RunnableSerializable[int, float]` instead of `[int, Any]`. `coerce_to_runnable` gains overloads so a `Mapping` resolves to `RunnableParallel` while everything else stays a `Runnable`. As a knock-on effect, dozens of now-unnecessary `: Runnable` annotations were dropped from the test suite. Runtime behavior is unchanged — this is a typing-only change. ## Impact on type-checked code Most users will simply get better inference. Two changes can require a small adjustment if you run a type checker (`mypy`, `pyright`): ### Stricter operand matching in `\|` The right-hand side of `\|` is now typed `Runnable[Output, Other]` rather than `Runnable[Any, Other]`, so the right operand's declared input must match the left operand's output. This is more accurate, but it surfaces a common pattern that was previously silent: piping a step that outputs a plain `dict` into a step whose declared input is a more specific type (for example a `TypedDict`). It still works at runtime; the checker now reports an `[operator]` error. If you hit this, narrow the boundary with a `cast` (or an explicit annotation): ```python from typing import Any, cast from langchain_core.runnables import Runnable # upstream outputs a dict; downstream declares a narrower input type chain = cast("Runnable[Any, MyInput]", upstream) \| downstream ``` ### `list` → `Sequence` on `RunnableEach` / `map()` `Runnable.map()` and the `invoke` / `ainvoke` methods of `RunnableEach` now accept `Sequence[Input]` instead of `list[Input]`. Callers are unaffected — a `list` is a `Sequence`, and tuples or other sequences now type-check too. The only thing to adjust: if you subclass `RunnableEach` (or `RunnableEachBase`) and override these methods with a `list[...]` parameter, widen the annotation to `Sequence[...]` so the override stays compatible with the base signature. --------- Co-authored-by: Mason Daugherty <github@mdrxy.com>	2026-06-10 16:16:03 -04:00
Christophe Bornet	a063ec26dd	chore(core): fix some `any` generics (#34545 ) Co-authored-by: Mason Daugherty <github@mdrxy.com>	2026-06-10 15:32:14 -04:00
Mason Daugherty	8bc96308d0	fix(core): accept sequence tool error content (#38005 ) `handle_tool_error` callables can now return structured message content as any valid sequence, not just a mutable `list`. Valid structured sequences are normalized to the `ToolMessage` content shape at the tool output boundary, while invalid content still falls back to stringification. ## Changes - Widened `ToolExceptionHandlerOutput` from `list[str \| dict[str, Any]]` to `Sequence[MessageContentBlock]` so handlers returning `list[dict[str, Any]]` or tuple content blocks type-check cleanly. - Added `_normalize_message_content` to validate structured message content and convert valid non-string sequences to the `list` shape expected by `ToolMessage`. - Preserved existing stringification behavior for invalid structured content blocks instead of treating failed normalization as `None`. - Removed the now-unused `_is_message_content_type` helper; output formatting validates content directly through `_normalize_message_content`.	2026-06-09 22:35:33 -04:00
Mason Daugherty	0f1b291f42	fix(core): type structured tool error handler output (#38003 ) `handle_tool_error` callables can already return structured message content at runtime, but the public typing only allowed strings. The tool error handling API now reflects the existing output formatting path, including clearer docs for how handled errors become `ToolMessage(status="error")` results.	2026-06-09 21:18:19 -04:00
Nidhi Rajani	0f45b2c285	feat(openai): support `apply_patch` built-in tool (#37157 ) [Docs](https://github.com/langchain-ai/docs/pull/4370) Fixes #37031 Adds support for OpenAI Responses API `apply_patch` built-in tool. This PR: - Adds `apply_patch` to the OpenAI well-known tools list so `bind_tools([{"type": "apply_patch"}])` works. - Preserves `apply_patch_call` and `apply_patch_call_output` items when converting OpenAI Responses API outputs into LangChain `AIMessage.content`. - Preserves the same item types in streaming `AIMessageChunk` conversion. - Supports round-trip input conversion for `apply_patch_call` and `apply_patch_call_output`. - Adds unit tests for core tool passthrough, non-streaming conversion, streaming conversion, and round-trip input conversion. ## Testing - `cd libs/core && uv run --group test pytest tests/unit_tests/utils/test_function_calling.py -k "apply_patch" -vv` - `cd libs/partners/openai && uv run --group test pytest tests/unit_tests/chat_models/test_base.py -k "apply_patch" -vv` - `cd libs/core && uv run --all-groups ruff check langchain_core/utils/function_calling.py tests/unit_tests/utils/test_function_calling.py` - `cd libs/partners/openai && uv run --all-groups ruff check langchain_openai/chat_models/base.py tests/unit_tests/chat_models/test_base.py` --------- Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2026-06-09 16:13:37 -04:00
Christophe Bornet	74c23741b0	feat(core): deprecate problematic `dict()` method (#31685 ) `dict()` is a problematic method name as it clashes with the builtin `dict` used as a type annotation. This PR replaces it with an `asdict` method (inspired by dataclasses). It also fixes a few places where `dict` must be replaced by `builtins.dict` until the `dict()` method is removed. --------- Co-authored-by: Mason Daugherty <github@mdrxy.com>	2026-06-08 14:11:05 -04:00
Mason Daugherty	053c368ba4	fix(core): remove Bedrock prevalidation from `load` (#37909 ) Removes the built-in Bedrock class init validator from `load` so Bedrock kwargs such as `base_url` and `endpoint_url` are no longer specially rejected during deserialization. This keeps provider-specific SSRF policy out of core; callers should continue to avoid untrusted manifests or use restrictive `allowed_objects`. Verified with `make format`, `make lint`, and the focused serialization load unit tests. AI-assisted contribution by Open SWE. Made by [Open SWE](https://openswe.vercel.app) --------- Co-authored-by: open-swe[bot] <215916821+open-swe[bot]@users.noreply.github.com>	2026-06-05 10:46:57 -04:00
Nick Hollon	3802938f1c	fix(core): accept `Serializable` constructor-envelope wire shape in `_convert_to_message` (#37456 )	2026-05-15 15:34:04 -07:00
Nick Hollon	f42d80ca1c	fix(core): preserve chunk `additional_kwargs` across v3 stream assembly (#37435 ) The v3 streaming path drops `additional_kwargs` from per-chunk `AIMessageChunk`s during assembly: `chunks_to_events` emits no event field for them, and `ChatModelStream._assemble_message` constructs the final `AIMessage` without an `additional_kwargs` argument. Non-streaming `ainvoke` returns the provider message unchanged, so streaming and non-streaming diverge for any provider that uses `additional_kwargs` to carry data outside the typed protocol blocks. ## How this surfaces The concrete failure mode is Gemini's `__gemini_function_call_thought_signatures__` — a per-tool-call signature blob the Google GenAI integration places in `additional_kwargs`, keyed by `tool_call_id`. Gemini requires that signature on follow-up turns to replay the prior thought trace; without it, multi-turn streaming flows lose thought continuity (and may regenerate thinking, charging additional reasoning tokens, or in some cases refuse). Other providers that use `additional_kwargs` (e.g. older `function_call` accumulators, custom routing metadata) hit the same gap; the fix is intentionally provider-agnostic. ## Fix Provider-agnostic, two seams: - `_compat_bridge` accumulates `msg.additional_kwargs` across chunks with `merge_dicts` (matching `AIMessageChunk`'s own merge semantics for fields that accumulate, like `function_call`) and emits the merged dict on the `message-finish` event as an off-spec extension. The bridge already uses one such extension (`metadata` on `MessageFinishData`); this PR follows the same pattern for `additional_kwargs`. - `ChatModelStream._finish` reads the new field; `_assemble_message` threads it onto the final `AIMessage` only when non-empty, preserving today's behavior of leaving `additional_kwargs` empty when no provider data needs to ride on it.	2026-05-14 11:19:45 -07:00
Nick Hollon	649d82f206	fix(core): preserve reasoning blocks alongside tool_call in v3 stream (#37434 ) Closes #37420 --- `stream_events(version="v3")` (and the `astream_events` async twin) silently dropped reasoning content from the final assembled `AIMessage` whenever the same message also produced a tool_call. The bug reproduces against Gemini 2.5 Pro with `include_thoughts=True`: reasoning streams correctly through `ChatModelStream.reasoning`, but the persisted message in the final graph state carries only the `tool_call` block. ## Root cause `_iter_protocol_blocks` in the compat bridge groups per-chunk content blocks by source-side identifier. When a provider doesn't supply an `index` field on its content blocks — which the Google GenAI translator does not for either `reasoning` or `tool_call` blocks — the bridge falls back to positional `i` as the bucket key. Because Gemini typically emits one block per chunk, every reasoning chunk and the later tool_call chunk all key to `0`, and the type mismatch trips `_accumulate`'s self-contained `else` branch. That branch clears accumulated reasoning state and replaces it with the incoming tool_call, so reasoning never reaches `content-block-finish`. ## Fix When a block has no source-side `index`, key it by `("__lc_no_index__", block_type, positional_i)` instead of bare `i`. Same-type chunks at the same position still share a bucket and merge cleanly (streaming text and reasoning unchanged); different-type chunks at the same position now occupy distinct wire blocks and both reach `content-block-finish`. Providers that supply explicit indices (Anthropic, OpenAI Responses) are unaffected. ## Verification Unit-tested at the compat-bridge layer for both sync (`chunks_to_events`) and async (`achunks_to_events`) paths. Verified live against Gemini 2.5 Pro `gemini-2.5-pro` with `thinking_budget=2048`, `include_thoughts=True`, and a single `get_weather` tool. Pre-fix: `final_state.messages[tool_calling_ai_message].content == [{type: tool_call, ...}]`. Post-fix: `[..., {type: reasoning, reasoning: "..."}, {type: tool_call, ...}]`, matching the shape `ainvoke` returns on the same input.	2026-05-14 11:11:30 -07:00
Nick Hollon	da380bccf8	chore(infra): merge v1.4 into master (#37350 )	2026-05-11 11:39:25 -07:00
Mason Daugherty	8b21400627	fix(core): avoid eager `pydantic.v1` import in `@deprecated` (#37308 ) `langchain_core._api.deprecation` previously did `from pydantic.v1.fields import FieldInfo as FieldInfoV1` at module scope, which triggers Pydantic's `UserWarning("Core Pydantic V1 functionality isn't compatible with Python 3.14 or greater.")` on every `langchain_core` import under 3.14+. The v1 symbol is only needed inside one runtime branch of `@deprecated`, so it's now resolved lazily. ## Changes - Replace the top-level v1 `FieldInfo` import with `_is_pydantic_v1_field_info`, which probes `sys.modules.get("pydantic.v1.fields")` instead of forcing the import. The reconstruction inside `deprecated`'s `finalize` closure imports `FieldInfoV1` lazily, gated by the predicate — so the warning only fires if a caller has already loaded `pydantic.v1` themselves. - Add a subprocess-based regression test asserting that importing `langchain_core._api.deprecation` does not pull any `pydantic.v1*` module into `sys.modules`. Verified to fail when the eager import is reintroduced. - Add a v1 `FieldInfo` decoration test — the v1 branch of `@deprecated` previously had zero direct coverage. - Update the stale `# Last Any should be FieldInfoV1 but this leads to circular imports` comment on `T`'s bound, which no longer reflects the real reason (it's about the 3.14 warning, not circularity).	2026-05-09 20:35:17 -04:00
Nick Hollon	c979c6187b	fix(core, langchain): harden `load()` against untrusted manifests (#37197 )	2026-05-05 14:36:58 -04:00
Mason Daugherty	a1f336fdc7	fix(core): preserve structured `inputs` on tool runs in tracers (#37108 ) Tool runs in `_TracerCore._create_tool_run` were discarding the structured `inputs` dict that `BaseTool.run` passes to `on_tool_start`, replacing it with `{"input": str(filtered_tool_input)}`. Consequently, every multi-arg tool (e.g. ones in `deepagents` like `execute`, `edit_file`, `write_file`, `grep`, ...) appeared in LangSmith with a stringified, escaped dump of its arguments — multi-line bash commands rendered with `\n` and were effectively unreadable. Chain runs already preserved dicts via `_get_chain_inputs`; tool runs are now symmetric. ## Changes - Preserve `inputs` when it is already a `dict` in the `original` / `original+chat` branch of `_TracerCore._create_tool_run`, falling back to `{"input": input_str}` only when no structured payload was provided - Add regression tests in the sync and async base-tracer suites that pass a structured `inputs` to `on_tool_start` and assert the dict survives onto the resulting `Run` ## Breaking change Custom `BaseTracer` subclasses that parsed `Run.inputs["input"]` as a stringified dict for tool runs will need to read the structured fields directly. The shape now matches what `on_tool_start(inputs=...)` has always received — introduced alongside `_schema_format` in the `astream_events` work — and what `streaming_events` consumers already see.	2026-04-30 14:56:14 -04:00
Mason Daugherty	37be34be82	fix(core): make `removal` optional in `warn_deprecated` (#37056 ) Drop the `NotImplementedError` branch in `warn_deprecated` so callers can pass `pending=False` without specifying a `removal` version. The previous behavior contradicted the docstring (which claimed an empty default would auto-compute a removal version) — no such computation existed; the function just raised a placeholder "Need to determine which default deprecation schedule to use" error.	2026-04-28 11:05:31 -04:00
Sharvil Saxena	78546e9242	fix(core): validate batch_size in _batch and _abatch to prevent infinite loop (#36663 )	2026-04-26 15:13:20 -04:00
Nick Hollon	9ce72eba9f	feat(core): add content-block-centric streaming (v2) (#36834 )	2026-04-24 11:36:17 -04:00
Hunter Lovell	9a671d7919	feat(core): allow _format_output to pass through list of ToolOutputMixin instances (#36963 )	2026-04-23 13:49:46 -04:00
Jacob Lee	40026a7282	feat(core): Update inheritance behavior for tracer metadata for special keys (#36900 ) JS equivalent: https://github.com/langchain-ai/langchainjs/pull/10733	2026-04-20 14:58:01 -07:00
Eugene Yurtsev	b00646d882	chore(core): keep checkpoint_ns behavior in streaming metadata for backwards compat (#36828 ) minor buglet	2026-04-16 15:17:20 -04:00
Jacob Lee	c04e05feb1	feat(core): Add chat model and LLM invocation params to traceable metadata (#36771 ) Equivalent to: https://github.com/langchain-ai/langchainjs/pull/10711/ --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2026-04-16 18:30:54 +00:00
ccurme	338aa8131a	fix(core): restore cloud metadata IPs and link-local range in SSRF policy (#36816 )	2026-04-16 09:15:42 -04:00
ccurme	7d601dc2c6	chore(core): harden private SSRF utilities (#36768 )	2026-04-15 16:13:20 -04:00
Jacob Lee	a6eb829701	fix(core): Use reference counting for storing inherited run trees to support garbage collection (#36660 ) When a langsmith `@traceable` function invokes a LangChain Runnable or LangGraph subgraph, the callback manager's `_configure` function injects the `@traceable` RunTree into the `LangChainTracer`'s `run_map` so that child runs can resolve their parent for trace nesting. However, since the RunTree was created outside the tracer's callback lifecycle, `_end_trace` never removes it. The entry persists in `run_map` indefinitely, retaining the full RunTree and its entire child tree. In applications with nested subgraph invocations (e.g. an outer investigation graph delegating to skill agent subgraphs, each compiled as their own `StateGraph`), this causes RunTree objects to accumulate linearly with every call. Fix: Track which `run_map` entries were injected externally via a shared `_external_run_ids` refcount dict on `_TracerCore`. When `_start_trace` adds a child under an external parent, it increments the count. When `_end_trace` finishes a child, it decrements — and evicts the external parent from `run_map` once the last child completes. The refcount (rather than a simple set) is necessary because a single external parent may have multiple sibling children in the callback chain (e.g. a `prompt \| llm` `RunnableSequence`). Only truly external runs are tracked — the `_configure` guard `if run_id_str not in handler.run_map` prevents tracer-managed runs from being misclassified.	2026-04-13 09:50:37 -04:00
Eugene Yurtsev	af4d711a2f	chore(core): reduce streaming metadata / perf (#36588 ) - looking into reducing streaming metadata / perfm --------- Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>	2026-04-10 10:47:54 -04:00
Eugene Yurtsev	af2ed47c6f	fix(core): add more sanitization to templates (#36612 ) add more sanitization to templates	2026-04-08 14:10:10 -04:00
ccurme	7629c74726	fix(core): handle symlinks in deprecated prompt save path (#36585 ) Resolve symlinks before validating file extensions in the deprecated `save()` method on prompt classes. Credit to Jeff Ponte (@JDP-Security) for reporting the symlink resolution issue.	2026-04-07 10:45:42 -04:00
Michael Chin	ebecdddb1b	fix(core): add init validator and serialization mappings for Bedrock models (#34510 ) Adds serialization mappings for `ChatBedrockConverse` and `BedrockLLM` to unblock standard tests on `langchain-core>=1.2.5` (context: [langchain-aws#821](https://github.com/langchain-ai/langchain-aws/pull/821)). Also introduces a class-specific validator system in `langchain_core.load` that blocks deserialization of AWS Bedrock models when `endpoint_url` or `base_url` parameters are present, preventing SSRF attacks via crafted serialized payloads. Closes #34645 ## Changes - Add `ChatBedrockConverse` and `BedrockLLM` entries to `SERIALIZABLE_MAPPING` in `mapping.py`, mapping legacy paths to their `langchain_aws` import locations - Add `validators.py` with `_bedrock_validator` — rejects deserialization kwargs containing `endpoint_url` or `base_url` for all Bedrock-related classes (`ChatBedrock`, `BedrockChat`, `ChatBedrockConverse`, `ChatAnthropicBedrock`, `BedrockLLM`, `Bedrock`) - `CLASS_INIT_VALIDATORS` registry covers both serialized (legacy) keys and resolved import paths from `ALL_SERIALIZABLE_MAPPINGS`, preventing bypass via direct-path payloads - Move kwargs extraction and all validator checks (`CLASS_INIT_VALIDATORS` + `init_validator`) in `Reviver.__call__` to run before `importlib.import_module()` — fail fast on security violations before executing third-party code - Class-specific validators are independent of `init_validator` and cannot be disabled by passing `init_validator=None` ## Testing - `test_validator_registry_keys_in_serializable_mapping` — structural invariant test ensuring every `CLASS_INIT_VALIDATORS` key exists in `ALL_SERIALIZABLE_MAPPINGS` - 10 end-to-end `load()` tests covering all Bedrock class paths (legacy aliases, resolved import paths, `ChatAnthropicBedrock`, `init_validator=None` bypass attempt) - Unit tests for `_bedrock_validator` covering `endpoint_url`, `base_url`, both params, and safe kwargs --------- Co-authored-by: Mason Daugherty <mason@langchain.dev> Co-authored-by: Mason Daugherty <github@mdrxy.com>	2026-04-03 19:22:39 -04:00
ccurme	0b5f2c08ee	fix(core): harden check for txt files in deprecated prompt loading functions (#36471 )	2026-04-02 16:42:48 -04:00
jasiecky	c9f51aef85	fix(core): fixed typos in the documentation (#36459 ) Fixes #36458 Fixed typos in the documentation in the core module.	2026-04-02 11:32:12 -04:00
Weiguang Li	e6c1b29e80	fix(core): add "computer" to _WellKnownOpenAITools (#36261 )	2026-03-29 08:54:42 -04:00
Jacob Lee	389f7ad1bc	revert: Revert "fix(core): trace invocation params in metadata" (#36322 )	2026-03-27 19:14:02 -04:00
ccurme	27add91347	fix(core): validate paths in `prompt.save` and `load_prompt`, deprecate methods (#36200 )	2026-03-24 14:27:14 -04:00
Mason Daugherty	2f64d80cc6	fix(core,model-profiles): add missing `ModelProfile` fields, warn on schema drift (#36129 ) PR #35788 added 7 new fields to the `langchain-profiles` CLI output (`name`, `status`, `release_date`, `last_updated`, `open_weights`, `attachment`, `temperature`) but didn't update `ModelProfile` in `langchain-core`. Partner packages like `langchain-aws` that set `extra="forbid"` on their Pydantic models hit `extra_forbidden` validation errors when Pydantic encountered undeclared TypedDict keys at construction time. This adds the missing fields, makes `ModelProfile` forward-compatible, provides a base-class hook so partners can stop duplicating model-profile validator boilerplate, migrates all in-repo partners to the new hook, and adds runtime + CI-time warnings for schema drift. ## Changes ### `langchain-core` - Add `__pydantic_config__ = ConfigDict(extra="allow")` to `ModelProfile` so unknown profile keys pass Pydantic validation even on models with `extra="forbid"` — forward-compatibility for when the CLI schema evolves ahead of core - Declare the 7 missing fields on `ModelProfile`: `name`, `status`, `release_date`, `last_updated`, `open_weights` (metadata) and `attachment`, `temperature` (capabilities) - Add `_warn_unknown_profile_keys()` in `model_profile.py` — emits a `UserWarning` when a profile dict contains keys not in `ModelProfile`, suggesting a core upgrade. Wrapped in a bare `except` so introspection failures never crash model construction - Add `BaseChatModel._resolve_model_profile()` hook that returns `None` by default. Partners can override this single method instead of redefining the full `_set_model_profile` validator — the base validator calls it automatically - Add `BaseChatModel._check_profile_keys` as a separate `model_validator` that calls `_warn_unknown_profile_keys`. Uses a distinct method name so partner overrides of `_set_model_profile` don't inadvertently suppress the check ### `langchain-profiles` CLI - Add `_warn_undeclared_profile_keys()` to the CLI (`cli.py`), called after merging augmentations in `refresh()` — warns at profile-generation time (not just runtime) when emitted keys aren't declared in `ModelProfile`. Gracefully skips if `langchain-core` isn't installed - Add guard test `test_model_data_to_profile_keys_subset_of_model_profile` in model-profiles — feeds a fully-populated model dict to `_model_data_to_profile()` and asserts every emitted key exists in `ModelProfile.__annotations__`. CI fails before any release if someone adds a CLI field without updating the TypedDict ### Partner packages - Migrate all 10 in-repo partners to the `_resolve_model_profile()` hook, replacing duplicated `@model_validator` / `_set_model_profile` overrides: anthropic, deepseek, fireworks, groq, huggingface, mistralai, openai (base + azure), openrouter, perplexity, xai - Anthropic retains custom logic (context-1m beta → `max_input_tokens` override); all others reduce to a one-liner - Add `pr_lint.yml` scope for the new `model-profiles` package	2026-03-23 00:44:27 -04:00
Mason Daugherty	5ffece5c03	chore(core): remove stale blockbuster allowlist for deleted context module (#36168 ) Closes #29530 --- Remove a stale BlockBuster allowlist entry in `conftest.py` referencing `aconfig_with_context` — the function and its containing module (`langchain_core/beta/runnables/context.py`) were deleted in `fded6c6b1` (Sep 2025, #32850). Spotted by @antonio-mello-ai in #29530.	2026-03-22 20:39:55 -04:00
ccurme	70c88c0e72	fix(core): trace invocation params in metadata (#36080 )	2026-03-18 13:20:18 -04:00
Eugene Yurtsev	dd136337d7	feat(core): harden anti-ssrf (#35960 ) harden anti-ssrf --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>	2026-03-18 10:41:43 -04:00
Mohammad Mohtashim	b21c0a8062	fix(core): preserve default_factory when generating tool call schema (#35550 )	2026-03-08 15:34:21 -04:00
Mason Daugherty	61fd90a2f3	fix(core): extract usage metadata from serialized tracer message outputs (#35526 ) Fixes missing `run.metadata.usage_metadata` population in `LangChainTracer` for real LLM/chat traces following #34414 - Fix extraction to read usage from serialized tracer message shape: `outputs.generations[][].message.kwargs.usage_metadata` - Remove non-serialized direct message shape handling (`message.usage_metadata`) from extractor to match real tracer output path - Clarify tracer docstrings around chat callback naming (`on_chat_model_start` + shared `on_llm_end`) to reduce ambiguity ## Why #34414 introduced usage duplication into `run.metadata.usage_metadata`, but the extractor read `message.usage_metadata`. In real tracer flow, messages are serialized with `dumpd(...)` during run completion, so usage metadata lives under `message.kwargs.usage_metadata`. Because of this mismatch, duplication did not trigger in real traces.	2026-03-02 17:43:33 -05:00
Guofang.Tang	78678534f9	fix(core): treat empty tool chunk ids as missing in merge (#35414 )	2026-02-24 18:12:49 -05:00

1 2 3 4 5 ...

782 Commits