:…l logic
Addresses Issue #34007.
Fixes a bug where aliases like 'mistral:' were inferred correctly as a
provider but the prefix was not stripped from the model name, causing
API 400 errors. Added logic to strip prefix when inference succeeds.
**Description**
This PR resolves a logic error in `init_chat_model` where inferred
provider aliases (specifically `mistral:`) were correctly identified but
not stripped from the model string.
**The Problem**
When passing a string like `mistral:ministral-8b-latest`, the factory
logic correctly inferred the provider as `mistralai` but failed to enter
the string-splitting block because the alias `mistral` was not in the
hardcoded `_SUPPORTED_PROVIDERS` list. This caused the raw string
`mistral:ministral-8b-latest` to be passed to the `ChatMistralAI`
constructor, resulting in a 400 API error.
**The Fix**
I updated `_parse_model` in
`libs/langchain/langchain/chat_models/base.py`. The logic now attempts
to infer the provider from the prefix *before* determining whether to
split the string. This ensures that valid aliases trigger the stripping
logic, passing only the clean `model_name` to the integration class.
**Issue**
Fixes#34007
**Dependencies**
None.
**Verification**
Validated locally with a reproduction script:
- Input: `mistral:ministral-8b-latest`
- Result: Successfully instantiates `ChatMistralAI` with
`model="ministral-8b-latest"`.
- Validated that standard inputs (e.g., `gpt-4o`) remain unaffected.
Co-authored-by: ioop <ioop@Sidharths-MacBook-Air.local>
Closes https://github.com/langchain-ai/langchain/issues/33983
* Adds `ModelRetryMiddleware` modeled after `ToolRetryMiddleware`
* Uses `on_failure` modes of `error` and `continue` to match the
`exit_behavior` modes of model + tool call limit middleware
* In a backwards compatible manner, aligns the API of
`ToolRetryMiddleware`'s `on_failure` with the above
* Centralize common "retry" utils across these middlewares
### Description
This PR adds support for configuring HTTP/HTTPS proxies when rendering
Mermaid diagrams as PNG images using the remote Mermaid.INK API. This
enhancement allows users in restricted network environments to access
the API via a proxy, making the remote rendering feature more robust and
accessible.
The changes include:
- Added optional `proxies` parameter to `draw_mermaid_png` and
`_render_mermaid_using_api` functions
- Updated `Graph.draw_mermaid_png` method to support and pass through
proxy configuration
- Enhanced docstrings with usage examples for the new parameter
- Maintained full backward compatibility with existing code
### Usage Example
```python
proxies = {
"http": "http://127.0.0.1:7890",
"https": "http://127.0.0.1:7890"
}
display(Image(chain.get_graph().draw_mermaid_png(proxies=proxies)))
```
### Dependencies
No new dependencies required. Uses existing `requests` library for HTTP
requests.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
- Respect 300k token limit for embeddings API requests #33668
- fix create_agent / response_format for Responses API #33939
- fix response.incomplete event is not handled when using
stream_mode=['messages'] #33871
Now returns (`_iter`, `tokens`, `indices`, token_counts`). The
`token_counts` are calculated directly during tokenization, which is
more accurate and efficient than splitting strings later.
## Description
Fixes#31227 - Resolves the issue where `OpenAIEmbeddings` exceeds
OpenAI's 300,000 token per request limit, causing 400 BadRequest errors.
## Problem
When embedding large document sets, LangChain would send batches
containing more than 300,000 tokens in a single API request, causing
this error:
```
openai.BadRequestError: Error code: 400 - {'error': {'message': 'Requested 673477 tokens, max 300000 tokens per request'}}
```
The issue occurred because:
- The code chunks texts by `embedding_ctx_length` (8191 tokens per
chunk)
- Then batches chunks by `chunk_size` (default 1000 chunks per request)
- **But didn't check**: Total tokens per batch against OpenAI's 300k
limit
- Result: `1000 chunks × 8191 tokens = 8,191,000 tokens` → Exceeds
limit!
## Solution
This PR implements dynamic batching that respects the 300k token limit:
1. **Added constant**: `MAX_TOKENS_PER_REQUEST = 300000`
2. **Track token counts**: Calculate actual tokens for each chunk
3. **Dynamic batching**: Instead of fixed `chunk_size` batches,
accumulate chunks until approaching the 300k limit
4. **Applied to both sync and async**: Fixed both
`_get_len_safe_embeddings` and `_aget_len_safe_embeddings`
## Changes
- Modified `langchain_openai/embeddings/base.py`:
- Added `MAX_TOKENS_PER_REQUEST` constant
- Replaced fixed-size batching with token-aware dynamic batching
- Applied to both sync (line ~478) and async (line ~527) methods
- Added test in `tests/unit_tests/embeddings/test_base.py`:
- `test_embeddings_respects_token_limit()` - Verifies large document
sets are properly batched
## Testing
All existing tests pass (280 passed, 4 xfailed, 1 xpassed).
New test verifies:
- Large document sets (500 texts × 1000 tokens = 500k tokens) are split
into multiple API calls
- Each API call respects the 300k token limit
## Usage
After this fix, users can embed large document sets without errors:
```python
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_text_splitters import CharacterTextSplitter
# This will now work without exceeding token limits
embeddings = OpenAIEmbeddings()
documents = CharacterTextSplitter().split_documents(large_documents)
Chroma.from_documents(documents, embeddings)
```
Resolves#31227
---------
Co-authored-by: Kaparthy Reddy <kaparthyreddy@Kaparthys-MacBook-Air.local>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
* use `override` instead of directly patching things on `ModelRequest`
* rely on `ToolNode` for execution of tools related to said middleware,
using `wrap_model_call` to inject the relevant claude tool specs +
allowing tool node to forward them along to corresponding langchain tool
implementations
* making the same change for the native shell tool middleware
* allowing shell tool middleware to specify a name for the shell tool
(negative diff then for claude bash middleware)
long term I think the solution might be to attach metadata to a tool to
map the provider spec to a langchain implementation, which we could also
take some lessons from on the MCP front.
Wasn't immediately obvious that `get_num_tokens_from_messages` adds
additional prefixes to represent user roles in conversation, which adds
to the overall token count.
```python
from langchain_google_genai import GoogleGenerativeAI
llm = GoogleGenerativeAI(model="gemini-2.5-flash")
num_tokens = llm.get_num_tokens("Hello, world!")
print(f"Number of tokens: {num_tokens}")
# Number of tokens: 4
```
```python
from langchain.messages import HumanMessage
messages = [HumanMessage(content="Hello, world!")]
num_tokens = llm.get_num_tokens_from_messages(messages)
print(f"Number of tokens: {num_tokens}")
# Number of tokens: 6
```