Commit Graph

146 Commits

Author SHA1 Message Date
Nick Hollon
da380bccf8 chore(infra): merge v1.4 into master (#37350) 2026-05-11 11:39:25 -07:00
Nick Hollon
9ce72eba9f feat(core): add content-block-centric streaming (v2) (#36834) 2026-04-24 11:36:17 -04:00
Asamu David
4000c22376 feat(openai): prevent silent streaming hangs in ChatOpenAI (#36949)
> [!IMPORTANT]
> **Behavior change on upgrade — minor bump (`1.1.16` → `1.2.0`).**
>
> Streaming calls now raise `StreamChunkTimeoutError` (a `TimeoutError`
subclass — existing `except TimeoutError:` / `except
asyncio.TimeoutError:` handlers catch it) after 120s of content silence
instead of hanging forever. Opt out with `stream_chunk_timeout=None` or
`LANGCHAIN_OPENAI_STREAM_CHUNK_TIMEOUT_S=0`.
>
> Kernel-level TCP keepalive / `TCP_USER_TIMEOUT` are applied via a
custom `httpx` transport. `httpx` disables its env-proxy auto-detection
(`HTTP_PROXY` / `HTTPS_PROXY` / `ALL_PROXY` / `NO_PROXY` and
macOS/Windows system proxy) whenever a transport is supplied, so to
avoid silently breaking enterprise proxy users, `ChatOpenAI` now detects
the "proxy-env-shadow" shape at construction and **skips the custom
transport entirely** when **all** of these hold:
>
> - `http_socket_options` left at default (`None`)
> - No `http_client` or `http_async_client` supplied
> - No `openai_proxy` supplied
> - A proxy env var / system proxy is visible to httpx
>
> On that shape the instance falls back to pre-PR behavior and env-proxy
auto-detection still applies. A one-time `INFO` records the bypass.
>
> Users who explicitly set `http_socket_options=[...]` alongside an env
proxy still get the shadowed behavior with a one-time `WARNING` log —
they opted in. Full opt-outs below.

---

Streaming chat completions can hang forever when the underlying TCP
connection silently dies mid-stream (idle NAT/LB timeouts, sandboxed
runtimes killing long-lived connections, peer gone without a FIN or
RST). httpx's read timeout doesn't help here because it's reset by any
bytes arriving on the socket, including OpenAI's SSE keepalive comments,
so a stream that's quiet on content but still producing keepalives looks
alive forever.

This PR adds two knobs to `ChatOpenAI`, both on by default with
opt-outs:

- `stream_chunk_timeout` (default 120s): wraps the async streaming
iterator in `asyncio.wait_for` per chunk. Measures the gap between
*parsed* SSE chunks, so keepalives don't reset it. Fires on genuine
content silence and raises `StreamChunkTimeoutError` — a `TimeoutError`
subclass carrying `timeout_s`, `model_name`, and `chunks_received` as
structured attributes (mirrored in the WARNING log's `extra=`) for
alerting without message-regex. Override with the kwarg or
`LANGCHAIN_OPENAI_STREAM_CHUNK_TIMEOUT_S`.
- `http_socket_options`: applies `SO_KEEPALIVE` + `TCP_KEEPIDLE` /
`TCP_KEEPINTVL` / `TCP_KEEPCNT` + `TCP_USER_TIMEOUT` on Linux (macOS
equivalents where available). On platforms missing some options, they're
dropped silently and the remaining set still does useful work.

Pool limits are set explicitly on the custom transport to mirror the
`openai` SDK — without that, passing `transport=` to `httpx.AsyncClient`
silently shrinks the connection pool.

## Behavior change

The default-shape proxy-env bypass (above) covers the common enterprise
case. Beyond that:

- Connections that would previously have hung forever will now error out
via `StreamChunkTimeoutError`.
- Users who explicitly opt into `http_socket_options` while also relying
on env proxies will see a one-time `WARNING` and lose env-proxy
auto-detection — the custom transport shadows it. This is the original
shipped behavior, retained for anyone who *wants* socket tuning on top
of an env-proxied setup.

Full opt-outs:

- `stream_chunk_timeout=None` or
`LANGCHAIN_OPENAI_STREAM_CHUNK_TIMEOUT_S=0`
- `http_socket_options=()` or `LANGCHAIN_OPENAI_TCP_KEEPALIVE=0`
- Supply your own `http_client` **and** `http_async_client`.
`http_socket_options` is applied per side: passing only one still leaves
the other side's default builder getting socket options. Supply both (or
combine with `http_socket_options=()`) to take full control.

Unparseable or negative values for the `LANGCHAIN_OPENAI_*` env vars
fall back to the default with a `WARNING` log rather than silently being
accepted, so a misconfigured environment still boots but the fallback is
discoverable.

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2026-04-22 20:28:43 -04:00
Mason Daugherty
488c6a73bb fix(openai): tolerate prompt_cache_retention drift in streaming (#36925) 2026-04-21 14:54:32 -04:00
ccurme
19b0805bc1 fix(openai): accommodate dict response items in streaming (#36899) 2026-04-20 15:44:01 -04:00
Thomas
8fec4e7cee fix(openai): infer azure chat profiles from model name (#36858) 2026-04-19 11:06:26 -04:00
William FH
885f2c2c2d fix(openai): handle content blocks without type key in responses api conversion (#36725) 2026-04-14 15:13:40 -04:00
Mason Daugherty
8c15649127 fix(openai,groq,openrouter): use is-not-None checks in usage metadata token extraction (#36500)
Python's `or` operator treats `0` as falsy, so
`token_usage.get("total_tokens") or fallback` silently replaces a
provider-reported `total_tokens=0` with the computed sum of input +
output tokens. Providers can legitimately report zero tokens (e.g.,
cached responses, empty completions).

The same pattern exists in the dual-key lookups for
`input_tokens`/`output_tokens` in Groq and OpenRouter. While current
APIs don't return both key formats simultaneously (making the `or`-chain
functionally correct today), the semantics are still wrong; `0` should
not fall through to a fallback.

## Changes

- Replace `x.get(key) or fallback` with explicit `is not None` checks in
`_create_usage_metadata` across `langchain-openai`, `langchain-groq`,
and `langchain-openrouter` for `input_tokens`, `output_tokens`, and
`total_tokens`
- Fix a concrete bug in the `total_tokens` path: a provider-reported `0`
was silently replaced by the computed sum
- Harden dual-key lookups in Groq and OpenRouter to correctly preserve
zero values from the preferred key, should both key formats ever coexist
- Update OpenAI's single-key extraction for consistency — the old `or 0`
pattern happened to produce correct results (`0 or 0 == 0`) but was
semantically wrong
2026-04-03 11:46:36 -04:00
ccurme
bdfd4462ac feat(core): impute placeholder filenames for OpenAI file inputs (#36433) 2026-04-01 14:41:53 -04:00
Jackjin
7d05cfb131 fix(openai): preserve namespace field in streaming function_call chunks (#36108) 2026-03-20 12:51:13 -04:00
Giulio Leone
9e4a6013be fix(openai): add type: message to Responses API input items (#35693) 2026-03-15 12:43:16 -04:00
Mohammad Mohtashim
3af0bc0141 fix(openai): update responses API model detection for pro and codex models (#35594) 2026-03-09 09:20:20 -04:00
ccurme
fbfe4b812d feat(openai): support tool search (#35582) 2026-03-08 08:53:13 -04:00
Jason Meng
f698b43b9a fix(openai): avoid PydanticSerializationUnexpectedValue for structured output (#35543) 2026-03-04 21:46:46 -05:00
Mattijs Ugen
5c6f8fe0a6 fix(openai): accept valid responses that are falsy at runtime (#35307) 2026-02-18 21:06:43 -05:00
ccurme
32c6ab3033 fix(openai): add model property (#35284) 2026-02-17 10:46:49 -05:00
ccurme
8e35924083 fix(openai): sanitize chat completions text content blocks (#35217) 2026-02-15 15:31:02 -05:00
ccurme
7c41298355 feat(core): add ContextOverflowError, raise in anthropic and openai (#35099) 2026-02-09 15:15:34 -05:00
Guofang.Tang
06a7d079b0 fix(openai): detect codex models for responses api preference (#35058) 2026-02-08 13:15:48 -05:00
OysterMax
92afcaae60 fix(openai): raise proper exception OpenAIRefusalError on structured output refusal (#34619) 2026-01-07 14:34:02 -05:00
ccurme
5ec0fa69de fix(core): serialization patch (#34455)
- `allowed_objects` kwarg in `load`
- escape lc-ser formatted dicts on `dump`
- fix for jinja2

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-12-22 17:33:31 -06:00
Saurav Sapkota
f6297ced67 fix(openai): handle function_call content in token counting (#34379) 2025-12-19 15:17:40 -05:00
ccurme
e9f7cd3e0e release(openai): 1.1.6: update max input tokens for gpt-5 series (#34419) 2025-12-18 12:49:59 -05:00
Towseef Altaf
0e5e33ba03 fix(openai): correct image resize aspect ratio caps (#34192) 2025-12-12 14:34:17 -05:00
Jacob Lee
a528ea1796 feat(openai): Use responses API if model is gpt-5.2-pro (#34306) 2025-12-12 10:11:15 -05:00
j3r0lin
5720dea41b fix(openai): handle missing 'text' key in responses API content blocks (#34198) 2025-12-12 09:39:12 -05:00
Jacob Lee
badc0cf1b6 fix(openai): Allow temperature when reasoning is set to the string 'none' (#34298)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-12-11 15:57:04 -05:00
Marlene
ff3353f02f fix(openai): Fixing error that comes up using the Responses API with built-in tools and custom tools (#34136) 2025-12-08 09:10:44 -05:00
Abhinav
2ba3ce81a6 fix(openai): make GPT-5 temperature validation case-insensitive (#34012)
Fixed a bug where GPT-5 temperature validation was case-sensitive,
causing issues when users
specified Azure deployment names or model names in uppercase (e.g.,
`"GPT-5-2025-01-01"`, `"GPT-5-NANO"`). The validation now correctly
handles model names regardless of case.

  Changes made:
- Updated `validate_temperature()` method in `BaseChatOpenAI` to perform
case-insensitive
  model name comparisons
- Updated `_get_encoding_model()` method to use case-insensitive checks
for tiktoken encoder
  selection
- Added comprehensive unit tests to verify case-insensitive behavior
with various case
  combinations

  **Issue:** Fixes #34003

  **Dependencies:** None

  **Test Coverage:**
  - All existing tests pass
- New test `test_gpt_5_temperature_case_insensitive` covers uppercase,
lowercase, and
  mixed-case model names
- Tests verify both non-chat GPT-5 models (temperature removed) and chat
models (temperature
  preserved)
  - Lint and format checks pass (`make lint`, `make format`)

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-11-23 20:17:03 -05:00
ccurme
33e5d01f7c feat(model-profiles): distribute data across packages (#34024) 2025-11-21 15:47:05 -05:00
Mason Daugherty
099c042395 refactor(openai): embedding utils and calculations (#33982)
Now returns (`_iter`, `tokens`, `indices`, token_counts`). The
`token_counts` are calculated directly during tokenization, which is
more accurate and efficient than splitting strings later.
2025-11-14 19:18:37 -05:00
Kaparthy Reddy
2d4f00a451 fix(openai): Respect 300k token limit for embeddings API requests (#33668)
## Description

Fixes #31227 - Resolves the issue where `OpenAIEmbeddings` exceeds
OpenAI's 300,000 token per request limit, causing 400 BadRequest errors.

## Problem

When embedding large document sets, LangChain would send batches
containing more than 300,000 tokens in a single API request, causing
this error:
```
openai.BadRequestError: Error code: 400 - {'error': {'message': 'Requested 673477 tokens, max 300000 tokens per request'}}
```

The issue occurred because:
- The code chunks texts by `embedding_ctx_length` (8191 tokens per
chunk)
- Then batches chunks by `chunk_size` (default 1000 chunks per request)
- **But didn't check**: Total tokens per batch against OpenAI's 300k
limit
- Result: `1000 chunks × 8191 tokens = 8,191,000 tokens` → Exceeds
limit!

## Solution

This PR implements dynamic batching that respects the 300k token limit:

1. **Added constant**: `MAX_TOKENS_PER_REQUEST = 300000`
2. **Track token counts**: Calculate actual tokens for each chunk
3. **Dynamic batching**: Instead of fixed `chunk_size` batches,
accumulate chunks until approaching the 300k limit
4. **Applied to both sync and async**: Fixed both
`_get_len_safe_embeddings` and `_aget_len_safe_embeddings`

## Changes

- Modified `langchain_openai/embeddings/base.py`:
  - Added `MAX_TOKENS_PER_REQUEST` constant
  - Replaced fixed-size batching with token-aware dynamic batching
  - Applied to both sync (line ~478) and async (line ~527) methods
- Added test in `tests/unit_tests/embeddings/test_base.py`:
- `test_embeddings_respects_token_limit()` - Verifies large document
sets are properly batched

## Testing

All existing tests pass (280 passed, 4 xfailed, 1 xpassed).

New test verifies:
- Large document sets (500 texts × 1000 tokens = 500k tokens) are split
into multiple API calls
- Each API call respects the 300k token limit

## Usage

After this fix, users can embed large document sets without errors:
```python
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_text_splitters import CharacterTextSplitter

# This will now work without exceeding token limits
embeddings = OpenAIEmbeddings()
documents = CharacterTextSplitter().split_documents(large_documents)
Chroma.from_documents(documents, embeddings)
```

Resolves #31227

---------

Co-authored-by: Kaparthy Reddy <kaparthyreddy@Kaparthys-MacBook-Air.local>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-11-14 18:12:07 -05:00
Shagun Gupta
75fff151e8 fix(openai): replace pytest.warns(None) with warnings.catch_warnings in ChatOpenAI test to resolve TypeError . Resolves issue #33705 (#33741) 2025-10-30 09:22:34 -04:00
Mason Daugherty
f94108b4bc fix: links (#33691)
* X-ref to new docs
* Formatting updates
2025-10-27 19:04:29 -04:00
Ali Ismail
5acd34ae92 feat(openai): add unit test for streaming error in _generate (#33134) 2025-10-21 15:08:37 -04:00
Marlene
78175fcb96 feat(openai): add callable support for openai_api_key parameter (#33532) 2025-10-21 11:16:02 -04:00
Nuno Campos
0788461abd feat(openai): Add openai moderation middleware (#33492) 2025-10-15 13:59:49 -04:00
Mason Daugherty
31eeb50ce0 chore: drop UP045 (#33362)
Python 3.9 EOL
2025-10-08 21:17:53 -04:00
ccurme
de48e102c4 fix(core,openai,anthropic): delegate to core implementation on invoke when streaming=True (#33308) 2025-10-06 15:54:55 -04:00
ccurme
4e50ec4b98 feat(openai): enable stream_usage when using default base URL and client (#33205) 2025-10-06 08:56:38 -04:00
Mason Daugherty
eaa6dcce9e release: v1.0.0 (#32567)
Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com>
Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com>
Co-authored-by: Vadym Barda <vadim.barda@gmail.com>
2025-10-02 10:49:42 -04:00
Mason Daugherty
986302322f docs: more standardization (#33124) 2025-09-25 20:46:20 -04:00
Mason Daugherty
043a7560a5 test: use .get() for safe ls_params access (#33034) 2025-09-20 23:46:37 -04:00
Mason Daugherty
9f6431924f feat(openai): add max_tokens to AzureChatOpenAI (#32959)
Fixes #32949

This pattern is [present in
`ChatOpenAI`](https://github.com/langchain-ai/langchain/blob/master/libs/partners/openai/langchain_openai/chat_models/base.py#L2821)
but wasn't carried over to Azure.


[CI](https://github.com/langchain-ai/langchain/actions/runs/17741751797/job/50417180998)
2025-09-15 14:09:20 -04:00
Matthew Lapointe
b1f08467cd feat(core): allow overriding ls_model_name from kwargs (#32541) 2025-09-11 16:18:06 -04:00
Mason Daugherty
4c6af2d1b2 fix(openai): structured output (#32551) 2025-09-09 11:37:50 -04:00
Sadiq Khan
228fbac3a6 fix(openai): handle AIMessages without response_id in _get_last_messages (#32824) 2025-09-08 10:12:50 -04:00
JunHyungKang
6ea06ca972 fix(openai): Fix Azure OpenAI Responses API model field issue (#32649) 2025-09-08 10:08:35 -04:00
Jacob Lee
1459d4f4ce fix(openai): Always add raw response object to OpenAI client errors for invoke (#32655) 2025-08-26 09:59:25 -04:00
Alex Naidis
21f7a9a9e5 fix(openai): allow temperature parameter for gpt-5-chat models (#32624) 2025-08-21 16:40:10 -04:00