# feat(core): add more file extensions to ignore in HTML link extraction
## Description
This PR enhances the HTML link extraction utility in
`libs/core/langchain_core/utils/html.py` by expanding the
`SUFFIXES_TO_IGNORE` list to include additional common binary file
extensions:
- `.webp`
- `.pdf`
- `.docx`
- `.xlsx`
- `.pptx`
- `.pptm`
These file types are non-HTML, non-crawlable resources. Ignoring them
prevents `find_all_links` and `extract_sub_links` from mistakenly
treating such binary assets as navigable links. This improves link
filtering, reduces unnecessary crawling, and aligns behavior with
typical web scraping expectations.
## Summary of Changes
- **Updated** `libs/core/langchain_core/utils/html.py`: Added `.webp`,
`.pdf`, `.docx`, `.xlsx`, `.pptx`, `.pptm` to `SUFFIXES_TO_IGNORE`.
## Related Issues
N/A
## Verification
- `ruff check libs/core/langchain_core/utils/html.py`: **Passed**
- `mypy libs/core/langchain_core/utils/html.py`: **Passed**
- `pytest libs/core/tests/unit_tests/utils/test_html.py`: **Passed** (11
tests)
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
# refactor(core): improve docstrings for HTML link extraction utilities
## Description
This PR updates and clarifies the docstrings for `find_all_links` and
`extract_sub_links` in
`libs/core/langchain_core/utils/html.py`.
The previous return-value descriptions were vague (e.g., "all links",
"sub links"). They have now been revised to clearly describe the
behavior and output of each function:
- **find_all_links** → “A list of all links found in the HTML.”
- **extract_sub_links** → “A list of absolute paths to sub links.”
These improvements make the utilities more understandable and
developer-friendly without altering functionality.
## Verification
- `ruff check libs/core/langchain_core/utils/html.py`: **Passed**
- `pytest libs/core/tests/unit_tests/utils/test_html.py`: **Passed**
## Checklists
- PR title follows the required format: `TYPE(SCOPE): DESCRIPTION`
- Changes are limited to the `langchain-core` package
- `make format`, `make lint`, and `make test` pass
Fixes#34282
**Before:** When using agents with tools (like file reading, web search,
etc.), the conversation looks like this:
```
[User] "Read these 10 files and summarize them"
[AI] "I'll read all 10 files" + [tool_call: read_file x 10]
[Tool] "Contents of file1.txt..."
[Tool] "Contents of file2.txt..."
[Tool] "Contents of file3.txt..."
... (7 more tool responses)
```
When the conversation gets too long, `SummarizationMiddleware` kicks in
to compress older messages. The problem was:
If you asked to keep the last 6 messages, you'd get:
```
[Summary] "Here's what happened before..."
[Tool] "Contents of file5.txt..."
[Tool] "Contents of file6.txt..."
[Tool] "Contents of file7.txt..."
[Tool] "Contents of file8.txt..."
[Tool] "Contents of file9.txt..."
[Tool] "Contents of file10.txt..."
```
The AI's original request to read the files (`[AI]` message with
`tool_calls`) was summarized away, but the tool responses remained. This
caused the error:
```
Error code: 400 - "No tool call found for function call output with call_id..."
```
Many APIs require that every tool response has a matching tool request.
Without the AI message, the tool responses are "orphaned."
## The fix
Now when the cutoff lands on tool messages, we **move backward** to
include the AI message that requested those tools:
Same scenario, keeping last 6 messages:
```
[Summary] "Here's what happened before..."
[AI] "I'll read all 10 files" + [tool_call: read_file x 10]
[Tool] "Contents of file1.txt..."
[Tool] "Contents of file2.txt..."
... (all 10 tool responses)
```
The AI message is preserved along with its tool responses, keeping them
paired together.
## Practical examples
### Example 1: Parallel tool calls
**Scenario:** Agent reads 10 files in parallel, summarization triggers
(see above)
### Example 2: Mixed conversation
**Scenario:** User asks question, AI uses tools, user says thanks
```
[User] "What's the weather?"
[AI] "Let me check" + [tool_call: get_weather]
[Tool] "72F and sunny"
[AI] "It's 72F and sunny!"
[User] "Thanks!"
```
Keeping last 2 messages:
| Before (Bug) | After (Fix) |
|--------------|-------------|
| Only `[User] "Thanks!"` kept | `[AI] + [Tool] + [AI] + [User]` all
kept |
| Lost the weather info | Tool pair preserved with response |
### Example 3: Multiple tool sequences
```
[User] "Search for X"
[AI] [tool_call: search]
[Tool] "Results for X"
[User] "Now search for Y"
[AI] [tool_call: search]
[Tool] "Results for Y"
[User] "Great!"
```
**Keeping last 3 messages:** If cutoff lands on `[Tool] "Results for
Y"`, we now include `[AI] [tool_call: search]` to keep the pair
together.
Add unit coverage for chat model provider inference across common model
name prefixes. This improves regression protection without touching
runtime
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Fixes a bug introduced with commit 85f1ba2 (released in `langchain ==
1.2.1`).
Whenever the index embedding of the langgraph-server is configured with
`azure_openai` provider, the wrong class is going to be initialized (and
fails to do so if the now unexpected credentials in environment variable
`OPENAI_API_KEY` is not provided).
Example configuration file `langgraph.json` that will reproduce the
issue:
(see
https://docs.langchain.com/langsmith/cli#adding-semantic-search-to-the-store)
```json
{
"dependencies": ["."],
"graphs": {
"chat": "src/agents/chat/graph.py:graph",
},
"store": {
"index": {
"embed": "azure_openai:text-embedding-3-small",
"dims": 1536
}
},
"python_version": "3.13",
"image_distro": "wolfi"
}
```
The agent should only make a single call to update the todo list at a
time. A parallel call doesn't make sense, but also cannot work as
there's no obvious reducer to use.
On parallel calls of the todo tool, we return ToolMessage containing to
guide the LLM to not call the tool in parallel.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Fixes#34517
Supersedes #34557, #34570
Fixes token inflation in `SummarizationMiddleware` that caused context
window overflow during summarization.
**Root cause:** When formatting messages for the summary prompt,
`str(messages)` was implicitly called, which includes all Pydantic
metadata fields (`usage_metadata`, `response_metadata`,
`additional_kwargs`, etc.). This caused the stringified representation
to use ~2.5x more tokens than `count_tokens_approximately` estimates.
**Problem:**
- Summarization triggers at 85% of context window based on
`count_tokens_approximately`
- But `str(messages)` in the prompt uses 2.5x more tokens
- Results in `ContextLengthExceeded`
**Fix:** Use `get_buffer_string()` to format messages, which produces
compact output:
```
Human: What's the weather?
AI: Let me check...[tool_calls]
Tool: 72°F and sunny
```
Instead of verbose Pydantic repr:
```python
[HumanMessage(content='What's the weather?', additional_kwargs={}, response_metadata={}), ...]
```
Fixed typo in ImportError messages where "langchain experiment" should
be "langchain_experimental" for consistency with the actual package
name.
This helps improve clarity for users who encounter these error messages
when trying to use deprecated tools that have moved to the
langchain_experimental package.
Related issues: #13858, #13859
Co-authored-by: Ademola <ademicho@gmail>
**Description:**
*Closes
#[33883](https://github.com/langchain-ai/langchain/issues/33883)*
Chat model cache keys are generated by serializing messages via
`dumps(messages)`. The optional `BaseMessage.id` field (a UUID used
solely for tracing/threading) is included in this serialization, causing
functionally identical messages to produce different cache keys. This
results in repeated API calls, cache bloat, and degraded performance in
production workloads (e.g., agents, RAG chains, long conversations).
This change normalizes messages **only for cache key generation** by
stripping the nonsemantic `id` field using Pydantic V2’s
`model_copy(update={"id": None})`. The normalization is applied in both
synchronous and asynchronous cache paths (`_generate_with_cache` /
`_agenerate_with_cache`) immediately before `dumps()`.
```python
normalized_messages = [
msg.model_copy(update={"id": None})
if getattr(msg, "id", None) is not None
else msg
for msg in messages
]
prompt = dumps(normalized_messages)
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Summary
Fixes an issue where HTMLSemanticPreservingSplitter failed to preserve
elements nested inside non-container tags. With these changes, preserved
elements are now correctly detected and handled at any nesting depth.
Root Cause
`_process_element()` only recursed into a small set of hard-coded
container tags (`html`, `body`, `div`, `main`). For other tags, the
subtree was flattened into text, preventing nested preserved elements
(inside `<p>`, `<section>`, `<article>`, etc.) from being detected.
Fix
- Updated traversal logic in _process_element (html.py) to recursively
process child elements for any tag that contains nested elements
- Avoided duplicate text extraction
- Preserved correct placeholder ordering
- Treated leaf nodes as text only
Tests
Adds regression tests covering preserved elements nested inside
non-container tags, including:
- table inside section
- nested divs
- code inside paragraph
All existing tests pass (make lint, format, test, etc).
Breaking changes
None.
Fixes
Fixes#31569
Disclaimer
GitHub Copilot was used to assist with test case design in
test_text_splitters.py and documentation comments; all code logic was
manually implemented and reviewed.
---------
Co-authored-by: julih <julih@julihs-MacBook-Pro.local>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>