Files
langchain/libs/core/tests
Mason Daugherty 86428c63ac fix(core,openai): normalize v1 streamed tool calls (#35983)
OpenAI Chat Completions streaming has a v1 normalization gap when tool
calls are streamed.

When users opt into `output_version="v1"`, `.content_blocks` is expected
to be the normalized cross-provider view of the message. For OpenAI Chat
Completions streams, though, chunks still carry raw string `content`
plus side-channel `tool_call_chunks` / `tool_calls`.

Practically, an OpenAI stream chunk can look like this internally:

```python
AIMessageChunk(
    content="",
    tool_call_chunks=[
        {
            "name": "get_weather",
            "args": '{"location": "SF"}',
            "id": "call_123",
            "index": 0,
            "type": "tool_call_chunk",
        }
    ],
    response_metadata={"model_provider": "openai", "output_version": "v1"},
)
```

That is not already-normalized v1 content like this:

```python
AIMessageChunk(
    content=[
        {
            "type": "tool_call_chunk",
            "name": "get_weather",
            "args": '{"location": "SF"}',
            "id": "call_123",
            "index": 0,
        }
    ],
)
```

Because `.content_blocks` currently short-circuits solely on
`output_version="v1"`, it can return the raw string/empty list directly
instead of running the OpenAI translator that incorporates
`tool_call_chunks` / `tool_calls` into normalized v1 blocks.

In practice, a streamed OpenAI tool call can be parsed successfully into
`tool_calls`, but still be missing from the final aggregated
`.content_blocks`. Downstream code that consumes the v1 block interface
then sees no `tool_call` block and must know to inspect OpenAI-specific
chunk fields instead.

User story:

> As a LangChain user streaming OpenAI Chat Completions with bound tools
and `output_version="v1"`, I need the final aggregated message's
`.content_blocks` to include normalized `tool_call` blocks, so that code
written against the v1 content-block interface handles streamed tool
calls consistently across providers.

Expected final aggregated view:

```python
message.content_blocks == [
    {
        "type": "tool_call",
        "name": "get_weather",
        "args": {"location": "SF"},
        "id": "call_123",
    }
]
```

Root causes:

1. The usage-only Chat Completions chunk uses `content=[]` in v1 mode
while normal streaming chunks use `content=""`, creating inconsistent
content types during chunk aggregation.
2. `AIMessage.content_blocks` and `AIMessageChunk.content_blocks` treat
any `output_version="v1"` message as already-normalized, even when
`content` is still raw string content from Chat Completions.
3. Content-bearing OpenAI stream chunks do not carry
`output_version="v1"`, so the final merged chunk may not reliably take
the v1 normalization path.

Changes:

- Keep usage-only Chat Completions chunks as `content=""` instead of
overriding to `[]`, so streaming chunks merge consistently.
- Propagate `output_version="v1"` to content-bearing chunks.
- Only short-circuit v1 `.content_blocks` when `content` is already a
list of blocks; otherwise fall through to the provider translator.
- Add regression tests covering string-content v1 fallback, usage-only
chunk content consistency, and streamed tool calls appearing as
normalized final v1 blocks.
2026-06-11 00:51:50 -04:00
..