Implementations using yield return generators, which are of type
`Iterator`.
This is technically a breaking change for implementers, however, known
existing implementations (in `langchain-community`) use `yield`, so they
already return `Iterator`s. For callers, it is not breaking.
Closes#25718
it looks scary but i promise it is not
improving documentation consistency across core. primarily update
docstrings and comments for better formatting, readability, and
accuracy, as well as add minor clarifications and formatting
improvements to user-facing documentation.
## Summary
Add XML format option for `get_buffer_string()` to provide unambiguous
message serialization. This fixes role prefix ambiguity when message
content contains strings like "Human:" or "AI:".
Fixes#34786
## Changes
- Add `format="xml"` parameter with proper XML escaping using
`quoteattr()` for attributes
- Add explicit validation for format parameter (raises `ValueError` for
invalid values)
- Add comprehensive tests for XML format edge cases
<img width="1952" height="706" alt="image"
src="https://github.com/user-attachments/assets/1cd6f887-9365-43cf-a532-72d7addd8bad"
/>
<img width="2786" height="776" alt="image"
src="https://github.com/user-attachments/assets/a07b0db0-519c-46d7-b34b-b404237d812b"
/>
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Bumps the uv group with 1 update in the /libs/core directory:
[setuptools](https://github.com/pypa/setuptools).
Updates `setuptools` from 67.8.0 to 78.1.1
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pypa/setuptools/blob/main/NEWS.rst">setuptools's
changelog</a>.</em></p>
<blockquote>
<h1>v78.1.1</h1>
<h2>Bugfixes</h2>
<ul>
<li>More fully sanitized the filename in PackageIndex._download. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4946">#4946</a>)</li>
</ul>
<h1>v78.1.0</h1>
<h2>Features</h2>
<ul>
<li>Restore access to _get_vc_env with a warning. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4874">#4874</a>)</li>
</ul>
<h1>v78.0.2</h1>
<h2>Bugfixes</h2>
<ul>
<li>Postponed removals of deprecated dash-separated and uppercase fields
in <code>setup.cfg</code>.
All packages with deprecated configurations are advised to move before
2026. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4911">#4911</a>)</li>
</ul>
<h1>v78.0.1</h1>
<h2>Misc</h2>
<ul>
<li><a
href="https://redirect.github.com/pypa/setuptools/issues/4909">#4909</a></li>
</ul>
<h1>v78.0.0</h1>
<h2>Bugfixes</h2>
<ul>
<li>Reverted distutils changes that broke the monkey patching of command
classes. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4902">#4902</a>)</li>
</ul>
<h2>Deprecations and Removals</h2>
<ul>
<li>Setuptools no longer accepts options containing uppercase or dash
characters in <code>setup.cfg</code>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8e4868a036"><code>8e4868a</code></a>
Bump version: 78.1.0 → 78.1.1</li>
<li><a
href="100e9a61ad"><code>100e9a6</code></a>
Merge pull request <a
href="https://redirect.github.com/pypa/setuptools/issues/4951">#4951</a></li>
<li><a
href="8faf1d7e0c"><code>8faf1d7</code></a>
Add news fragment.</li>
<li><a
href="2ca4a9fe47"><code>2ca4a9f</code></a>
Rely on re.sub to perform the decision in one expression.</li>
<li><a
href="e409e80029"><code>e409e80</code></a>
Extract _sanitize method for sanitizing the filename.</li>
<li><a
href="250a6d1797"><code>250a6d1</code></a>
Add a check to ensure the name resolves relative to the tmpdir.</li>
<li><a
href="d8390feaa9"><code>d8390fe</code></a>
Extract _resolve_download_filename with test.</li>
<li><a
href="4e1e89392d"><code>4e1e893</code></a>
Merge <a
href="https://github.com/jaraco/skeleton">https://github.com/jaraco/skeleton</a></li>
<li><a
href="3a3144f0d2"><code>3a3144f</code></a>
Fix typo: <code>pyproject.license</code> ->
<code>project.license</code> (<a
href="https://redirect.github.com/pypa/setuptools/issues/4931">#4931</a>)</li>
<li><a
href="d751068fd2"><code>d751068</code></a>
Fix typo: pyproject.license -> project.license</li>
<li>Additional commits viewable in <a
href="https://github.com/pypa/setuptools/compare/v67.8.0...v78.1.1">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/langchain-ai/langchain/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# Before
```python
if isinstance(block, dict) and "text" in block:
text = block["text"]
break
```
Extracts text from any `dict` with a `'text'` key, including
thinking/reasoning blocks.
# After
```python
if isinstance(block, dict) and "text" in block:
block_type = block.get("type")
if block_type is None or block_type == "text":
text = block["text"]
break
```
Skips blocks with explicit non-text types (e.g., `type: 'thinking'`).
# Justification
Models like Gemini 3 return structured content with multiple block
types:
```python
[
{"type": "thinking", "text": "let me reason..."},
{"type": "text", "text": "The answer is 42"}
]
```
The old logic extracted `'let me reason...'` (the thinking block)
because it matched first. The new logic skips it and correctly extracts
`'The answer is 42'`.
The `ChatGeneration.text` field is used by `on_llm_new_token(token,
chunk=chunk)` callbacks during streaming. Consequently, it would get
tokens incorrectly for reasoning blocks.
Related: #34727
Updates `comma_list` in `libs/core/langchain_core/utils/strings.py` to
accept `Iterable[Any]` instead of `list[Any]`, making the utility more
flexible.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
## Summary
- Adds explicit `tags: list[str] | None = None` parameter to sync
`LLMManagerMixin` methods
- Aligns sync methods with their async counterparts in
`AsyncCallbackHandler`
## Changes
Added `tags` parameter to:
- `on_llm_new_token`
- `on_llm_end`
- `on_llm_error`
## Why
- Sync handlers receive `tags` via `**kwargs`, but it was undocumented
in the method signature
- Async handlers already have `tags` explicitly documented
- This improves IDE autocompletion and type hints for sync handlers
Closes#34720🤖 Generated with [Claude Code](https://claude.ai/claude-code)
Co-authored-by: skyvanguard <skyvanguard@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
# Add `tool_call_id` to `on_tool_error` event data
## Summary
This PR addresses issue #33597 by adding `tool_call_id` to the
`on_tool_error` callback event data. This enables users to link tool
errors to specific tool calls in stateless agent implementations, which
is essential for building OpenAI-compatible APIs and tracking tool
execution flows.
## Problem
When streaming events using `astream_events` with `version="v2"`, the
`on_tool_error` event only included the error and input data, but lacked
the `tool_call_id`. This made it difficult to:
- Link errors to specific tool calls in stateless agent scenarios
- Implement OpenAI-compatible APIs that require tool call tracking
- Track tool execution flows when using `run_id` is not sufficient
## Solution
The fix adds `tool_call_id` propagation through the callback chain:
1. **Pass `tool_call_id` to callbacks**: Updated `BaseTool.run()` and
`BaseTool.arun()` to pass `tool_call_id` to both `on_tool_start` and
`on_tool_error` callbacks
2. **Store in event stream handler**: Modified
`_AstreamEventsCallbackHandler` to store `tool_call_id` in run info
during `on_tool_start`
3. **Include in error events**: Updated `on_tool_error` handler to
extract and include `tool_call_id` in the event data
## Changes
- **`libs/core/langchain_core/tools/base.py`**:
- Pass `tool_call_id` to `on_tool_start` in both sync and async methods
- Pass `tool_call_id` to `on_tool_error` when errors occur
- **`libs/core/langchain_core/tracers/event_stream.py`**:
- Store `tool_call_id` in run info during `on_tool_start`
- Extract `tool_call_id` from kwargs or run info in `on_tool_error`
- Include `tool_call_id` in the `on_tool_error` event data
## Testing
The fix was verified by:
1. Direct tool invocation: Confirmed `tool_call_id` appears in
`on_tool_error` event data when calling tools directly
2. Agent integration: Tested with `create_agent` to ensure
`tool_call_id` is present in error events during agent execution
```python
# Example verification
async for event in agent.astream_events(
{"messages": "Please demonstrate a tool error"},
version="v2",
):
if event["event"] == "on_tool_error":
assert "tool_call_id" in event["data"] # ✓ Now passes
print(event["data"]["tool_call_id"])
```
## Backward Compatibility
- ✅ Fully backward compatible: `tool_call_id` is optional (can be
`None`)
- ✅ No breaking changes: All changes are additive
- ✅ Existing code continues to work without modification
## Related Issues
Fixes#33597
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Changes Created
I have fixed the issue where a generic and misleading error message was
displayed when a JSON schema was missing the top-level
title
key.
[Fix: Improve error message for missing title in JSON schema
functions](https://github.com/Bhavesh007Sharma/langchain/tree/fix-json-schema-title-error)
File Modified:
libs/core/langchain_core/utils/function_calling.py
I updated the
convert_to_openai_function
validation logic to specifically check for
dict
inputs that look like schemas (
type
or
properties
keys present) but are missing the
title
key.
# Before (Generic Error)
raise ValueError(
f"Unsupported function\n\n{function}\n\nFunctions must be passed in"
" as Dict, pydantic.BaseModel, or Callable. If they're a dict they must"
" either be in OpenAI function format or valid JSON schema with
top-level"
" 'title' and 'description' keys."
)
# After (Specific Error)
if isinstance(function, dict) and ("type" in function or "properties" in
function):
msg = (
"Unsupported function\n\nTo use a JSON schema as a function, "
"it must have a top-level 'title' key to be used as the function name."
)
raise ValueError(msg)
Verification Results
Automated Tests
I created a reproduction script
reproduce_issue.py
to confirm the behavior.
Before Fix: The script would have raised the generic "Unsupported
function" error claiming description was also required.
After Fix: The script now confirms that the new, specific error message
is raised when
title
is missing.
(Note: Verification was performed by inspecting the code logic and
running a lightweight reproduction script locally, as full suite
verification had environment dependency issues.)
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
This PR fixes a signature mismatch between BaseStore and its concrete
implementations by making the `prefix` parameter keyword-only in
`yield_keys` and `ayield_keys`.
This aligns the implementations with the BaseStore interface contract,
prevents Liskov Substitution Principle violations, and ensures
consistent
method signatures across store backends.
Fixes#32637
Breaking changes
None. This change only enforces the existing abstract interface and does
not modify runtime behavior
Testing
- Verified that existing test suites pass after the signature fix.
Parts of this contribution were assisted by generative AI for
code navigation and drafting. All final design decisions and changes
were
reviewed and validated manually.
---------
Co-authored-by: Khagesh-Anayasmi <khagesh.desai@anayasmi.in>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
# feat(core): add more file extensions to ignore in HTML link extraction
## Description
This PR enhances the HTML link extraction utility in
`libs/core/langchain_core/utils/html.py` by expanding the
`SUFFIXES_TO_IGNORE` list to include additional common binary file
extensions:
- `.webp`
- `.pdf`
- `.docx`
- `.xlsx`
- `.pptx`
- `.pptm`
These file types are non-HTML, non-crawlable resources. Ignoring them
prevents `find_all_links` and `extract_sub_links` from mistakenly
treating such binary assets as navigable links. This improves link
filtering, reduces unnecessary crawling, and aligns behavior with
typical web scraping expectations.
## Summary of Changes
- **Updated** `libs/core/langchain_core/utils/html.py`: Added `.webp`,
`.pdf`, `.docx`, `.xlsx`, `.pptx`, `.pptm` to `SUFFIXES_TO_IGNORE`.
## Related Issues
N/A
## Verification
- `ruff check libs/core/langchain_core/utils/html.py`: **Passed**
- `mypy libs/core/langchain_core/utils/html.py`: **Passed**
- `pytest libs/core/tests/unit_tests/utils/test_html.py`: **Passed** (11
tests)
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
# refactor(core): improve docstrings for HTML link extraction utilities
## Description
This PR updates and clarifies the docstrings for `find_all_links` and
`extract_sub_links` in
`libs/core/langchain_core/utils/html.py`.
The previous return-value descriptions were vague (e.g., "all links",
"sub links"). They have now been revised to clearly describe the
behavior and output of each function:
- **find_all_links** → “A list of all links found in the HTML.”
- **extract_sub_links** → “A list of absolute paths to sub links.”
These improvements make the utilities more understandable and
developer-friendly without altering functionality.
## Verification
- `ruff check libs/core/langchain_core/utils/html.py`: **Passed**
- `pytest libs/core/tests/unit_tests/utils/test_html.py`: **Passed**
## Checklists
- PR title follows the required format: `TYPE(SCOPE): DESCRIPTION`
- Changes are limited to the `langchain-core` package
- `make format`, `make lint`, and `make test` pass
Fixes#34517
Supersedes #34557, #34570
Fixes token inflation in `SummarizationMiddleware` that caused context
window overflow during summarization.
**Root cause:** When formatting messages for the summary prompt,
`str(messages)` was implicitly called, which includes all Pydantic
metadata fields (`usage_metadata`, `response_metadata`,
`additional_kwargs`, etc.). This caused the stringified representation
to use ~2.5x more tokens than `count_tokens_approximately` estimates.
**Problem:**
- Summarization triggers at 85% of context window based on
`count_tokens_approximately`
- But `str(messages)` in the prompt uses 2.5x more tokens
- Results in `ContextLengthExceeded`
**Fix:** Use `get_buffer_string()` to format messages, which produces
compact output:
```
Human: What's the weather?
AI: Let me check...[tool_calls]
Tool: 72°F and sunny
```
Instead of verbose Pydantic repr:
```python
[HumanMessage(content='What's the weather?', additional_kwargs={}, response_metadata={}), ...]
```
**Description:**
*Closes
#[33883](https://github.com/langchain-ai/langchain/issues/33883)*
Chat model cache keys are generated by serializing messages via
`dumps(messages)`. The optional `BaseMessage.id` field (a UUID used
solely for tracing/threading) is included in this serialization, causing
functionally identical messages to produce different cache keys. This
results in repeated API calls, cache bloat, and degraded performance in
production workloads (e.g., agents, RAG chains, long conversations).
This change normalizes messages **only for cache key generation** by
stripping the nonsemantic `id` field using Pydantic V2’s
`model_copy(update={"id": None})`. The normalization is applied in both
synchronous and asynchronous cache paths (`_generate_with_cache` /
`_agenerate_with_cache`) immediately before `dumps()`.
```python
normalized_messages = [
msg.model_copy(update={"id": None})
if getattr(msg, "id", None) is not None
else msg
for msg in messages
]
prompt = dumps(normalized_messages)
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>