Automated refresh of model profile data for all in-monorepo partner
integrations via `langchain-profiles refresh`.
🤖 Generated by the `refresh_model_profiles` workflow.
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Streaming token usage was silently dropped for `ChatOpenRouter`. Both
`_stream` and `_astream` skipped any SSE chunk without a `choices` array
— which is exactly the shape OpenRouter uses for the final
usage-reporting chunk. This meant `usage_metadata` was never populated
on streamed responses, causing downstream consumers (like the Deep
Agents CLI) to show "unknown" model with 0 tokens.
## Changes
- Add `stream_usage: bool = True` field to `ChatOpenRouter`, which
passes `stream_options: {"include_usage": True}` to the OpenRouter API
when streaming — matching the pattern already established in
`langchain-openai`'s `BaseChatOpenAI`
- Handle usage-only chunks (no `choices`, just `usage`) in both
`_stream` and `_astream` by emitting a `ChatGenerationChunk` with
`usage_metadata` via `_create_usage_metadata`, instead of silently
`continue`-ing past them
- Sort model profiles alphabetically by model ID (the top-level
`_PROFILES` dictionary keys, e.g. `claude-3-5-haiku-20241022`,
`gpt-4o-mini`) before writing `_profiles.py`, so that regenerating
profiles only shows actual data changes in diffs — not random reordering
from the models.dev API response order
- Regenerate all 10 partner profile files with the new sorted ordering
- Add `text_inputs` and `text_outputs` fields to `ModelProfile`
- Regenerate `_profiles.py` for all providers
## Why
models.dev data includes `'text'` as both an input and output modality,
but we didn't capture it.
models.dev broadly contains models without text input (Whisper/ASR) and
without text output (image generators, TTS).
Without this, downstream consumers can't filter on model text support
(e.g. preventing users from passing text input to an audio-only model).
---
We'd need to also run for Google, AWS and cut releases for all to
propagate
Just a small fix of some broken hyperlinks in the documentation of the
function `langchain_openai/chat_models/base.py#with_structured_output`
and a rephrase of the reference to supported models.
Co-authored-by: Thomas Reuhl <thomas.reuhl@telekom.de>
Fixed a bug where GPT-5 temperature validation was case-sensitive,
causing issues when users
specified Azure deployment names or model names in uppercase (e.g.,
`"GPT-5-2025-01-01"`, `"GPT-5-NANO"`). The validation now correctly
handles model names regardless of case.
Changes made:
- Updated `validate_temperature()` method in `BaseChatOpenAI` to perform
case-insensitive
model name comparisons
- Updated `_get_encoding_model()` method to use case-insensitive checks
for tiktoken encoder
selection
- Added comprehensive unit tests to verify case-insensitive behavior
with various case
combinations
**Issue:** Fixes#34003
**Dependencies:** None
**Test Coverage:**
- All existing tests pass
- New test `test_gpt_5_temperature_case_insensitive` covers uppercase,
lowercase, and
mixed-case model names
- Tests verify both non-chat GPT-5 models (temperature removed) and chat
models (temperature
preserved)
- Lint and format checks pass (`make lint`, `make format`)
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Now returns (`_iter`, `tokens`, `indices`, token_counts`). The
`token_counts` are calculated directly during tokenization, which is
more accurate and efficient than splitting strings later.
## Description
Fixes#31227 - Resolves the issue where `OpenAIEmbeddings` exceeds
OpenAI's 300,000 token per request limit, causing 400 BadRequest errors.
## Problem
When embedding large document sets, LangChain would send batches
containing more than 300,000 tokens in a single API request, causing
this error:
```
openai.BadRequestError: Error code: 400 - {'error': {'message': 'Requested 673477 tokens, max 300000 tokens per request'}}
```
The issue occurred because:
- The code chunks texts by `embedding_ctx_length` (8191 tokens per
chunk)
- Then batches chunks by `chunk_size` (default 1000 chunks per request)
- **But didn't check**: Total tokens per batch against OpenAI's 300k
limit
- Result: `1000 chunks × 8191 tokens = 8,191,000 tokens` → Exceeds
limit!
## Solution
This PR implements dynamic batching that respects the 300k token limit:
1. **Added constant**: `MAX_TOKENS_PER_REQUEST = 300000`
2. **Track token counts**: Calculate actual tokens for each chunk
3. **Dynamic batching**: Instead of fixed `chunk_size` batches,
accumulate chunks until approaching the 300k limit
4. **Applied to both sync and async**: Fixed both
`_get_len_safe_embeddings` and `_aget_len_safe_embeddings`
## Changes
- Modified `langchain_openai/embeddings/base.py`:
- Added `MAX_TOKENS_PER_REQUEST` constant
- Replaced fixed-size batching with token-aware dynamic batching
- Applied to both sync (line ~478) and async (line ~527) methods
- Added test in `tests/unit_tests/embeddings/test_base.py`:
- `test_embeddings_respects_token_limit()` - Verifies large document
sets are properly batched
## Testing
All existing tests pass (280 passed, 4 xfailed, 1 xpassed).
New test verifies:
- Large document sets (500 texts × 1000 tokens = 500k tokens) are split
into multiple API calls
- Each API call respects the 300k token limit
## Usage
After this fix, users can embed large document sets without errors:
```python
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_text_splitters import CharacterTextSplitter
# This will now work without exceeding token limits
embeddings = OpenAIEmbeddings()
documents = CharacterTextSplitter().split_documents(large_documents)
Chroma.from_documents(documents, embeddings)
```
Resolves#31227
---------
Co-authored-by: Kaparthy Reddy <kaparthyreddy@Kaparthys-MacBook-Air.local>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>