mirror of
https://github.com/hwchase17/langchain.git
synced 2026-07-01 22:59:06 +00:00
OpenAI Chat Completions streaming has a v1 normalization gap when tool
calls are streamed.
When users opt into `output_version="v1"`, `.content_blocks` is expected
to be the normalized cross-provider view of the message. For OpenAI Chat
Completions streams, though, chunks still carry raw string `content`
plus side-channel `tool_call_chunks` / `tool_calls`.
Practically, an OpenAI stream chunk can look like this internally:
```python
AIMessageChunk(
content="",
tool_call_chunks=[
{
"name": "get_weather",
"args": '{"location": "SF"}',
"id": "call_123",
"index": 0,
"type": "tool_call_chunk",
}
],
response_metadata={"model_provider": "openai", "output_version": "v1"},
)
```
That is not already-normalized v1 content like this:
```python
AIMessageChunk(
content=[
{
"type": "tool_call_chunk",
"name": "get_weather",
"args": '{"location": "SF"}',
"id": "call_123",
"index": 0,
}
],
)
```
Because `.content_blocks` currently short-circuits solely on
`output_version="v1"`, it can return the raw string/empty list directly
instead of running the OpenAI translator that incorporates
`tool_call_chunks` / `tool_calls` into normalized v1 blocks.
In practice, a streamed OpenAI tool call can be parsed successfully into
`tool_calls`, but still be missing from the final aggregated
`.content_blocks`. Downstream code that consumes the v1 block interface
then sees no `tool_call` block and must know to inspect OpenAI-specific
chunk fields instead.
User story:
> As a LangChain user streaming OpenAI Chat Completions with bound tools
and `output_version="v1"`, I need the final aggregated message's
`.content_blocks` to include normalized `tool_call` blocks, so that code
written against the v1 content-block interface handles streamed tool
calls consistently across providers.
Expected final aggregated view:
```python
message.content_blocks == [
{
"type": "tool_call",
"name": "get_weather",
"args": {"location": "SF"},
"id": "call_123",
}
]
```
Root causes:
1. The usage-only Chat Completions chunk uses `content=[]` in v1 mode
while normal streaming chunks use `content=""`, creating inconsistent
content types during chunk aggregation.
2. `AIMessage.content_blocks` and `AIMessageChunk.content_blocks` treat
any `output_version="v1"` message as already-normalized, even when
`content` is still raw string content from Chat Completions.
3. Content-bearing OpenAI stream chunks do not carry
`output_version="v1"`, so the final merged chunk may not reliably take
the v1 normalization path.
Changes:
- Keep usage-only Chat Completions chunks as `content=""` instead of
overriding to `[]`, so streaming chunks merge consistently.
- Propagate `output_version="v1"` to content-bearing chunks.
- Only short-circuit v1 `.content_blocks` when `content` is already a
list of blocks; otherwise fall through to the provider translator.
- Add regression tests covering string-content v1 fallback, usage-only
chunk content consistency, and streamed tool calls appearing as
normalized final v1 blocks.