Fixes#34282
**Before:** When using agents with tools (like file reading, web search,
etc.), the conversation looks like this:
```
[User] "Read these 10 files and summarize them"
[AI] "I'll read all 10 files" + [tool_call: read_file x 10]
[Tool] "Contents of file1.txt..."
[Tool] "Contents of file2.txt..."
[Tool] "Contents of file3.txt..."
... (7 more tool responses)
```
When the conversation gets too long, `SummarizationMiddleware` kicks in
to compress older messages. The problem was:
If you asked to keep the last 6 messages, you'd get:
```
[Summary] "Here's what happened before..."
[Tool] "Contents of file5.txt..."
[Tool] "Contents of file6.txt..."
[Tool] "Contents of file7.txt..."
[Tool] "Contents of file8.txt..."
[Tool] "Contents of file9.txt..."
[Tool] "Contents of file10.txt..."
```
The AI's original request to read the files (`[AI]` message with
`tool_calls`) was summarized away, but the tool responses remained. This
caused the error:
```
Error code: 400 - "No tool call found for function call output with call_id..."
```
Many APIs require that every tool response has a matching tool request.
Without the AI message, the tool responses are "orphaned."
## The fix
Now when the cutoff lands on tool messages, we **move backward** to
include the AI message that requested those tools:
Same scenario, keeping last 6 messages:
```
[Summary] "Here's what happened before..."
[AI] "I'll read all 10 files" + [tool_call: read_file x 10]
[Tool] "Contents of file1.txt..."
[Tool] "Contents of file2.txt..."
... (all 10 tool responses)
```
The AI message is preserved along with its tool responses, keeping them
paired together.
## Practical examples
### Example 1: Parallel tool calls
**Scenario:** Agent reads 10 files in parallel, summarization triggers
(see above)
### Example 2: Mixed conversation
**Scenario:** User asks question, AI uses tools, user says thanks
```
[User] "What's the weather?"
[AI] "Let me check" + [tool_call: get_weather]
[Tool] "72F and sunny"
[AI] "It's 72F and sunny!"
[User] "Thanks!"
```
Keeping last 2 messages:
| Before (Bug) | After (Fix) |
|--------------|-------------|
| Only `[User] "Thanks!"` kept | `[AI] + [Tool] + [AI] + [User]` all
kept |
| Lost the weather info | Tool pair preserved with response |
### Example 3: Multiple tool sequences
```
[User] "Search for X"
[AI] [tool_call: search]
[Tool] "Results for X"
[User] "Now search for Y"
[AI] [tool_call: search]
[Tool] "Results for Y"
[User] "Great!"
```
**Keeping last 3 messages:** If cutoff lands on `[Tool] "Results for
Y"`, we now include `[AI] [tool_call: search]` to keep the pair
together.
Add unit coverage for chat model provider inference across common model
name prefixes. This improves regression protection without touching
runtime
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Fixes a bug introduced with commit 85f1ba2 (released in `langchain ==
1.2.1`).
Whenever the index embedding of the langgraph-server is configured with
`azure_openai` provider, the wrong class is going to be initialized (and
fails to do so if the now unexpected credentials in environment variable
`OPENAI_API_KEY` is not provided).
Example configuration file `langgraph.json` that will reproduce the
issue:
(see
https://docs.langchain.com/langsmith/cli#adding-semantic-search-to-the-store)
```json
{
"dependencies": ["."],
"graphs": {
"chat": "src/agents/chat/graph.py:graph",
},
"store": {
"index": {
"embed": "azure_openai:text-embedding-3-small",
"dims": 1536
}
},
"python_version": "3.13",
"image_distro": "wolfi"
}
```
The agent should only make a single call to update the todo list at a
time. A parallel call doesn't make sense, but also cannot work as
there's no obvious reducer to use.
On parallel calls of the todo tool, we return ToolMessage containing to
guide the LLM to not call the tool in parallel.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Fixes#34517
Supersedes #34557, #34570
Fixes token inflation in `SummarizationMiddleware` that caused context
window overflow during summarization.
**Root cause:** When formatting messages for the summary prompt,
`str(messages)` was implicitly called, which includes all Pydantic
metadata fields (`usage_metadata`, `response_metadata`,
`additional_kwargs`, etc.). This caused the stringified representation
to use ~2.5x more tokens than `count_tokens_approximately` estimates.
**Problem:**
- Summarization triggers at 85% of context window based on
`count_tokens_approximately`
- But `str(messages)` in the prompt uses 2.5x more tokens
- Results in `ContextLengthExceeded`
**Fix:** Use `get_buffer_string()` to format messages, which produces
compact output:
```
Human: What's the weather?
AI: Let me check...[tool_calls]
Tool: 72°F and sunny
```
Instead of verbose Pydantic repr:
```python
[HumanMessage(content='What's the weather?', additional_kwargs={}, response_metadata={}), ...]
```
Addresses a flaky test
When executing `exit 1` as a startup command, the shell process
terminates immediately. The code then tries to write a marker command
(`printf '...'`) to stdin, but the pipe is already broken because the
shell has exited, causing `BrokenPipeError`.
## Summary
Enhances the `init_chat_model` function with comprehensive input
validation, improved model inference patterns, and better error handling
to provide a significantly improved user experience.
## Changes Made
- ✅ **Input Validation**: Added comprehensive type and value checking
for all parameters
- ✅ **Enhanced Model Inference**: Improved pattern matching with
case-insensitive support and new model patterns
- ✅ **Better Error Messages**: Detailed error messages with examples and
documentation links
- ✅ **Comprehensive Tests**: Added extensive test coverage for all new
functionality
- ✅ **Documentation**: Enhanced docstrings and examples
## Backward Compatibility
All changes are fully backward compatible. No breaking changes
introduced.
## Testing
- Added 6 new test functions covering input validation, model inference,
and error handling
- All existing tests continue to pass
- Comprehensive parametrized testing for various model patterns
## User Experience Improvements
- Better error messages help users quickly resolve configuration issues
- Enhanced model inference reduces the need to specify providers
explicitly
- Comprehensive input validation catches issues early with helpful
guidance
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
If the `stdout` "done marker" arrives before the `stderr` output is
enqueued, the method returns early without capturing the `stderr` line.
The two reader threads run independently with no synchronization
guaranteeing `stderr` arrives before the done marker.
In environments with Python 3.10, timing differences can cause the
`stdout` marker to win the race, resulting in `<no output>` instead of
`[stderr]` error.
Observed as a flaky test on `test_stderr_output_labeling` in CI:
```shell
FAILED tests/unit_tests/agents/middleware/implementations/test_shell_tool.py::test_stderr_output_labeling - AssertionError: assert '[stderr] error' in '<no output>'
```
Use of the fixture `_base_vcr_config` is deprecated with alternative
function `base_vcr_config()`
This way:
* we don't need to import `_base_vcr_config` seen as unused (which leads
to ruff violations PLC0414 and F811)
* we don't need to make a copy since a new dict is created at each
function invocation
Co-authored-by: Mason Daugherty <mason@langchain.dev>