## Summary
Add XML format option for `get_buffer_string()` to provide unambiguous
message serialization. This fixes role prefix ambiguity when message
content contains strings like "Human:" or "AI:".
Fixes#34786
## Changes
- Add `format="xml"` parameter with proper XML escaping using
`quoteattr()` for attributes
- Add explicit validation for format parameter (raises `ValueError` for
invalid values)
- Add comprehensive tests for XML format edge cases
<img width="1952" height="706" alt="image"
src="https://github.com/user-attachments/assets/1cd6f887-9365-43cf-a532-72d7addd8bad"
/>
<img width="2786" height="776" alt="image"
src="https://github.com/user-attachments/assets/a07b0db0-519c-46d7-b34b-b404237d812b"
/>
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Fixes#34517
Supersedes #34557, #34570
Fixes token inflation in `SummarizationMiddleware` that caused context
window overflow during summarization.
**Root cause:** When formatting messages for the summary prompt,
`str(messages)` was implicitly called, which includes all Pydantic
metadata fields (`usage_metadata`, `response_metadata`,
`additional_kwargs`, etc.). This caused the stringified representation
to use ~2.5x more tokens than `count_tokens_approximately` estimates.
**Problem:**
- Summarization triggers at 85% of context window based on
`count_tokens_approximately`
- But `str(messages)` in the prompt uses 2.5x more tokens
- Results in `ContextLengthExceeded`
**Fix:** Use `get_buffer_string()` to format messages, which produces
compact output:
```
Human: What's the weather?
AI: Let me check...[tool_calls]
Tool: 72°F and sunny
```
Instead of verbose Pydantic repr:
```python
[HumanMessage(content='What's the weather?', additional_kwargs={}, response_metadata={}), ...]
```
* FIxed where possible
* Used `cast` when not possible to fix
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
## Summary
Fixes#33970
`get_buffer_string` was only checking for the deprecated `function_call`
field in `additional_kwargs`, which modern LLM providers no longer
return. This fix updates the function to check for the modern
`tool_calls` field first, falling back to `function_call` for legacy
compatibility.
## Changes
- Check `AIMessage.tool_calls` first (modern standard)
- Fall back to `additional_kwargs["function_call"]` (legacy support)
- Added 3 unit tests covering tool_calls, empty content, and precedence
behavior
## Testing
```python
# Before fix: tool_calls info was lost
msg = AIMessage(content="Hi", tool_calls=[{"name": "search", ...}])
get_buffer_string([msg]) # "AI: Hi" (no tool info)
# After fix: tool_calls are included
get_buffer_string([msg]) # "AI: Hi[{\"name\": \"search\", ...}]"
```
- All existing `get_buffer_string` tests pass
- Legacy `function_call` behavior preserved
---
> [!NOTE]
> This PR was developed with AI agent assistance (Factory/Droid).
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
### Description:
earlier we have to use like below:
```python
from langchain_core.messages import trim_messages
from langchain_core.messages.utils import count_tokens_approximately
trim_messages(..., token_counter=count_tokens_approximately)
```
Now can be used as like this also
```python
from langchain_core.messages import trim_messages
trim_messages(..., token_counter="approximate")
```
- [x] **Added tests**
- [x] **Lint and test**: Run this as I made change in langchain/core, uv
run --group test pytest tests/unit_tests/messages/test_utils.py -v
<img width="1006" height="66" alt="image"
src="https://github.com/user-attachments/assets/c6938c29-a781-4e7f-871b-8e888ee764b7"
/>
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
With this we get the correct types for `_runnable_support` annotated
functions.
* return list[BaseMessage] when messages is not None
* return Runnable when messages is None
* typing of function args
Largely:
- Remove explicit `"Default is x"` since new refs show default inferred
from sig
- Inline code (useful for eventual parsing)
- Fix code block rendering (indentations)