langchain

mirror of https://github.com/hwchase17/langchain.git synced 2026-06-09 18:50:33 +00:00

Author	SHA1	Message	Date
Kaparthy Reddy	2d4f00a451	fix(openai): Respect 300k token limit for embeddings API requests (#33668 ) ## Description Fixes #31227 - Resolves the issue where `OpenAIEmbeddings` exceeds OpenAI's 300,000 token per request limit, causing 400 BadRequest errors. ## Problem When embedding large document sets, LangChain would send batches containing more than 300,000 tokens in a single API request, causing this error: ``` openai.BadRequestError: Error code: 400 - {'error': {'message': 'Requested 673477 tokens, max 300000 tokens per request'}} ``` The issue occurred because: - The code chunks texts by `embedding_ctx_length` (8191 tokens per chunk) - Then batches chunks by `chunk_size` (default 1000 chunks per request) - But didn't check: Total tokens per batch against OpenAI's 300k limit - Result: `1000 chunks × 8191 tokens = 8,191,000 tokens` → Exceeds limit! ## Solution This PR implements dynamic batching that respects the 300k token limit: 1. Added constant: `MAX_TOKENS_PER_REQUEST = 300000` 2. Track token counts: Calculate actual tokens for each chunk 3. Dynamic batching: Instead of fixed `chunk_size` batches, accumulate chunks until approaching the 300k limit 4. Applied to both sync and async: Fixed both `_get_len_safe_embeddings` and `_aget_len_safe_embeddings` ## Changes - Modified `langchain_openai/embeddings/base.py`: - Added `MAX_TOKENS_PER_REQUEST` constant - Replaced fixed-size batching with token-aware dynamic batching - Applied to both sync (line ~478) and async (line ~527) methods - Added test in `tests/unit_tests/embeddings/test_base.py`: - `test_embeddings_respects_token_limit()` - Verifies large document sets are properly batched ## Testing All existing tests pass (280 passed, 4 xfailed, 1 xpassed). New test verifies: - Large document sets (500 texts × 1000 tokens = 500k tokens) are split into multiple API calls - Each API call respects the 300k token limit ## Usage After this fix, users can embed large document sets without errors: ```python from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma from langchain_text_splitters import CharacterTextSplitter # This will now work without exceeding token limits embeddings = OpenAIEmbeddings() documents = CharacterTextSplitter().split_documents(large_documents) Chroma.from_documents(documents, embeddings) ``` Resolves #31227 --------- Co-authored-by: Kaparthy Reddy <kaparthyreddy@Kaparthys-MacBook-Air.local> Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Mason Daugherty <mason@langchain.dev> Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-11-14 18:12:07 -05:00
Shagun Gupta	75fff151e8	fix(openai): replace pytest.warns(None) with warnings.catch_warnings in ChatOpenAI test to resolve TypeError . Resolves issue #33705 (#33741 )	2025-10-30 09:22:34 -04:00
Mason Daugherty	f94108b4bc	fix: links (#33691 ) * X-ref to new docs * Formatting updates	2025-10-27 19:04:29 -04:00
Ali Ismail	5acd34ae92	feat(openai): add unit test for streaming error in `_generate` (#33134 )	2025-10-21 15:08:37 -04:00
Marlene	78175fcb96	feat(openai): add callable support for openai_api_key parameter (#33532 )	2025-10-21 11:16:02 -04:00
Nuno Campos	0788461abd	feat(openai): Add openai moderation middleware (#33492 )	2025-10-15 13:59:49 -04:00
Mason Daugherty	31eeb50ce0	chore: drop UP045 (#33362 ) Python 3.9 EOL	2025-10-08 21:17:53 -04:00
ccurme	de48e102c4	fix(core,openai,anthropic): delegate to core implementation on invoke when streaming=True (#33308 )	2025-10-06 15:54:55 -04:00
ccurme	4e50ec4b98	feat(openai): enable stream_usage when using default base URL and client (#33205 )	2025-10-06 08:56:38 -04:00
Mason Daugherty	eaa6dcce9e	release: v1.0.0 (#32567 ) Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com> Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com> Co-authored-by: Vadym Barda <vadim.barda@gmail.com>	2025-10-02 10:49:42 -04:00
Mason Daugherty	986302322f	docs: more standardization (#33124 )	2025-09-25 20:46:20 -04:00
Mason Daugherty	043a7560a5	test: use `.get()` for safe `ls_params` access (#33034 )	2025-09-20 23:46:37 -04:00
Mason Daugherty	9f6431924f	feat(openai): add `max_tokens` to `AzureChatOpenAI` (#32959 ) Fixes #32949 This pattern is [present in `ChatOpenAI`](https://github.com/langchain-ai/langchain/blob/master/libs/partners/openai/langchain_openai/chat_models/base.py#L2821) but wasn't carried over to Azure. [CI](https://github.com/langchain-ai/langchain/actions/runs/17741751797/job/50417180998)	2025-09-15 14:09:20 -04:00
Matthew Lapointe	b1f08467cd	feat(core): allow overriding `ls_model_name` from kwargs (#32541 )	2025-09-11 16:18:06 -04:00
Mason Daugherty	4c6af2d1b2	fix(openai): structured output (#32551 )	2025-09-09 11:37:50 -04:00
Sadiq Khan	228fbac3a6	fix(openai): handle `AIMessage`s without `response_id` in `_get_last_messages` (#32824 )	2025-09-08 10:12:50 -04:00
JunHyungKang	6ea06ca972	fix(openai): Fix Azure OpenAI Responses API model field issue (#32649 )	2025-09-08 10:08:35 -04:00
Jacob Lee	1459d4f4ce	fix(openai): Always add raw response object to OpenAI client errors for invoke (#32655 )	2025-08-26 09:59:25 -04:00
Alex Naidis	21f7a9a9e5	fix(openai): allow temperature parameter for gpt-5-chat models (#32624 )	2025-08-21 16:40:10 -04:00
sa411022	61bc1bf9cc	fix(openai): construct responses api input (#32557 )	2025-08-21 15:56:29 -04:00
Mason Daugherty	0024dffa68	feat(openai): officially support `verbosity` (#32470 )	2025-08-12 16:00:30 +00:00
Mason Daugherty	00244122bd	feat(openai): `minimal` and `verbosity` (#32455 )	2025-08-08 02:24:21 +00:00
ccurme	ec2b34a02d	feat(openai): custom tools (#32449 )	2025-08-07 16:30:01 -04:00
Mason Daugherty	145d38f7dd	test(openai): add tests for `prompt_cache_key` parameter and update docs (#32363 ) Introduce tests to validate the behavior and inclusion of the `prompt_cache_key` parameter in request payloads for the `ChatOpenAI` model.	2025-08-07 15:29:47 -04:00
ccurme	a9e52ca605	chore(openai): bump openai sdk (#32322 )	2025-07-30 10:58:18 -04:00
niceg	0d6f915442	fix: LLM mimicking Unicode responses due to forced Unicode conversion of non-ASCII characters. (#32222 ) fix: Fix LLM mimicking Unicode responses due to forced Unicode conversion of non-ASCII characters. - Description: This PR fixes an issue where the LLM would mimic Unicode responses due to forced Unicode conversion of non-ASCII characters in tool calls. The fix involves disabling the `ensure_ascii` flag in `json.dumps()` when converting tool calls to OpenAI format. - Issue: Fixes ↓↓↓ input： ```json {'role': 'assistant', 'tool_calls': [{'type': 'function', 'id': 'call_nv9trcehdpihr21zj9po19vq', 'function': {'name': 'create_customer', 'arguments': '{"customer_name": "你好啊集团"}'}}]} ``` output: ```json {'role': 'assistant', 'tool_calls': [{'type': 'function', 'id': 'call_nv9trcehdpihr21zj9po19vq', 'function': {'name': 'create_customer', 'arguments': '{"customer_name": "\\u4f60\\u597d\\u554a\\u96c6\\u56e2"}'}}]} ``` then: llm will mimic outputting unicode. Unicode's vast number of symbols can lengthen LLM responses, leading to slower performance. <img width="686" height="277" alt="image" src="https://github.com/user-attachments/assets/28f3b007-3964-4455-bee2-68f86ac1906d" /> --------- Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-24 17:01:31 -04:00
Copilot	54542b9385	docs(openai): add comprehensive documentation and examples for `extra_body` + others (#32149 ) This PR addresses the common issue where users struggle to pass custom parameters to OpenAI-compatible APIs like LM Studio, vLLM, and others. The problem occurs when users try to use `model_kwargs` for custom parameters, which causes API errors. ## Problem Users attempting to pass custom parameters (like LM Studio's `ttl` parameter) were getting errors: ```python # ❌ This approach fails llm = ChatOpenAI( base_url="http://localhost:1234/v1", model="mlx-community/QwQ-32B-4bit", model_kwargs={"ttl": 5} # Causes TypeError: unexpected keyword argument 'ttl' ) ``` ## Solution The `extra_body` parameter is the correct way to pass custom parameters to OpenAI-compatible APIs: ```python # ✅ This approach works correctly llm = ChatOpenAI( base_url="http://localhost:1234/v1", model="mlx-community/QwQ-32B-4bit", extra_body={"ttl": 5} # Custom parameters go in extra_body ) ``` ## Changes Made 1. Enhanced Documentation: Updated the `extra_body` parameter docstring with comprehensive examples for LM Studio, vLLM, and other providers 2. Added Documentation Section: Created a new "OpenAI-compatible APIs" section in the main class docstring with practical examples 3. Unit Tests: Added tests to verify `extra_body` functionality works correctly: - `test_extra_body_parameter()`: Verifies custom parameters are included in request payload - `test_extra_body_with_model_kwargs()`: Ensures `extra_body` and `model_kwargs` work together 4. Clear Guidance: Documented when to use `extra_body` vs `model_kwargs` ## Examples Added LM Studio with TTL (auto-eviction): ```python ChatOpenAI( base_url="http://localhost:1234/v1", api_key="lm-studio", model="mlx-community/QwQ-32B-4bit", extra_body={"ttl": 300} # Auto-evict after 5 minutes ) ``` vLLM with custom sampling: ```python ChatOpenAI( base_url="http://localhost:8000/v1", api_key="EMPTY", model="meta-llama/Llama-2-7b-chat-hf", extra_body={ "use_beam_search": True, "best_of": 4 } ) ``` ## Why This Works - `model_kwargs` parameters are passed directly to the OpenAI client's `create()` method, causing errors for non-standard parameters - `extra_body` parameters are included in the HTTP request body, which is exactly what OpenAI-compatible APIs expect for custom parameters Fixes #32115. <!-- START COPILOT CODING AGENT TIPS --> --- 💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click [here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to start the survey. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-24 16:43:16 -04:00
ccurme	de13f6ae4f	fix(openai): support acknowledged safety checks in computer use (#31984 )	2025-07-14 07:33:37 -03:00
Mason Daugherty	33c9bf1adc	langchain-openai[patch]: Add ruff bandit rules to linter (#31788 )	2025-06-30 14:01:32 -04:00
Andrew Jaeger	0189c50570	openai[fix]: Correctly set usage metadata for OpenAI Responses API (#31756 )	2025-06-27 15:35:14 +00:00
ccurme	066be383e3	openai[patch]: update test following release of openai 1.92 (#31751 ) Added new required fields for `ResponseFunctionWebSearch`	2025-06-26 18:22:58 +00:00
ccurme	88d5f3edcc	openai[patch]: allow specification of output format for Responses API (#31686 )	2025-06-26 13:41:43 -04:00
ccurme	0bf223d6cf	openai[patch]: add attribute to always use previous_response_id (#31734 )	2025-06-25 19:01:43 +00:00
ccurme	b268ab6a28	openai[patch]: fix client caching when request_timeout is specified via httpx.Timeout (#31698 ) Resolves https://github.com/langchain-ai/langchain/issues/31697	2025-06-23 14:37:49 +00:00
ccurme	b9357d456e	openai[patch]: refactor handling of Responses API (#31587 )	2025-06-16 14:01:39 -04:00
ccurme	4cc2f6b807	openai[patch]: guard against None text completions in BaseOpenAI (#31514 ) Some chat completions APIs will return null `text` output (even though this is typed as string).	2025-06-06 09:14:37 -04:00
ccurme	053a1246da	openai[patch]: support built-in code interpreter and remote MCP tools (#31304 )	2025-05-22 11:47:57 -04:00
ccurme	32fcc97a90	openai[patch]: compat with Bedrock Converse (#31280 ) ChatBedrockConverse passes through reasoning content blocks in [Bedrock Converse format](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html). Similar to how we handle Anthropic thinking blocks, here we ensure these are filtered out of OpenAI request payloads. Resolves https://github.com/langchain-ai/langchain/issues/31279.	2025-05-19 10:35:26 -04:00
ccurme	0b8837a0cc	openai: support runtime kwargs in embeddings (#31195 )	2025-05-14 09:14:40 -04:00
ccurme	868cfc4a8f	openai: ignore function_calls if tool_calls are present (#31198 ) Some providers include (legacy) function calls in `additional_kwargs` in addition to tool calls. We currently unpack both function calls and tool calls if present, but OpenAI will raise 400 in this case. This can come up if providers are mixed in a tool-calling loop. Example: ```python from langchain.chat_models import init_chat_model from langchain_core.messages import HumanMessage from langchain_core.tools import tool @tool def get_weather(location: str) -> str: """Get weather at a location.""" return "It's sunny." gemini = init_chat_model("google_genai:gemini-2.0-flash-001").bind_tools([get_weather]) openai = init_chat_model("openai:gpt-4.1-mini").bind_tools([get_weather]) input_message = HumanMessage("What's the weather in Boston?") tool_call_message = gemini.invoke([input_message]) assert len(tool_call_message.tool_calls) == 1 tool_call = tool_call_message.tool_calls[0] tool_message = get_weather.invoke(tool_call) response = openai.invoke( # currently raises 400 / BadRequestError [input_message, tool_call_message, tool_message] ) ``` Here we ignore function calls if tool calls are present.	2025-05-12 13:50:56 -04:00
ccurme	94139ffcd3	openai[patch]: format system content blocks for Responses API (#31096 ) ```python from langchain_core.messages import HumanMessage, SystemMessage from langchain_openai import ChatOpenAI llm = ChatOpenAI(model="gpt-4.1", use_responses_api=True) messages = [ SystemMessage("test"), # Works HumanMessage("test"), # Works SystemMessage([{"type": "text", "text": "test"}]), # Bug in this case HumanMessage([{"type": "text", "text": "test"}]), # Works SystemMessage([{"type": "input_text", "text": "test"}]) # Works ] llm._get_request_payload(messages) ```	2025-05-02 15:22:30 +00:00
ccurme	629b7a5a43	openai[patch]: add explicit attribute for service tier (#31005 )	2025-04-25 18:38:23 +00:00
Aubrey Ford	b344f34635	partners/openai: OpenAIEmbeddings not respecting chunk_size argument (#30757 ) When calling `embed_documents` and providing a `chunk_size` argument, that argument is ignored when `OpenAIEmbeddings` is instantiated with its default configuration (where `check_embedding_ctx_length=True`). `_get_len_safe_embeddings` specifies a `chunk_size` parameter but it's not being passed through in `embed_documents`, which is its only caller. This appears to be an oversight, especially given that the `_get_len_safe_embeddings` docstring states it should respect "the set embedding context length and chunk size." Developers typically expect method parameters to take effect (also, take precedence) when explicitly provided, especially when instantiating using defaults. I was confused as to why my API calls were being rejected regardless of the chunk size I provided. This bug also exists in langchain_community package. I can add that to this PR if requested otherwise I will create a new one once this passes.	2025-04-18 15:27:27 -04:00
ccurme	86d51f6be6	multiple: permit optional fields on multimodal content blocks (#30887 ) Instead of stuffing provider-specific fields in `metadata`, they can go directly on the content block.	2025-04-17 12:48:46 +00:00
ccurme	9cfe6bcacd	multiple: multi-modal content blocks (#30746 ) Introduces standard content block format for images, audio, and files. ## Examples Image from url: ``` { "type": "image", "source_type": "url", "url": "https://path.to.image.png", } ``` Image, in-line data: ``` { "type": "image", "source_type": "base64", "data": "<base64 string>", "mime_type": "image/png", } ``` PDF, in-line data: ``` { "type": "file", "source_type": "base64", "data": "<base64 string>", "mime_type": "application/pdf", } ``` File from ID: ``` { "type": "file", "source_type": "id", "id": "file-abc123", } ``` Plain-text file: ``` { "type": "file", "source_type": "text", "text": "foo bar", } ```	2025-04-15 09:48:06 -04:00
ccurme	f7c4965fb6	openai[patch]: update imports in test (#30828 ) Quick fix to unblock CI, will need to address in core separately.	2025-04-14 19:33:38 +00:00
Sydney Runkle	8c6734325b	partners[lint]: run `pyupgrade` to get code in line with 3.9 standards (#30781 ) Using `pyupgrade` to get all `partners` code up to 3.9 standards (mostly, fixing old `typing` imports).	2025-04-11 07:18:44 -04:00
ccurme	fe0fd9dd70	openai[patch]: upgrade tiktoken and fix test (#30621 ) Related to https://github.com/langchain-ai/langchain/issues/30344 https://github.com/langchain-ai/langchain/pull/30542 introduced an erroneous test for token counts for o-series models. tiktoken==0.8 does not support o-series models in `tiktoken.encoding_for_model(model_name)`, and this is the version of tiktoken we had in the lock file. So we would default to `cl100k_base` for o-series, which is the wrong encoding model. The test tested against this wrong encoding (so it passed with tiktoken 0.8). Here we update tiktoken to 0.9 in the lock file, and fix the expected counts in the test. Verified that we are pulling [o200k_base](https://github.com/openai/tiktoken/blob/main/tiktoken/model.py#L8), as expected.	2025-04-02 10:44:48 -04:00
ccurme	8a69de5c24	openai[patch]: ignore file blocks when counting tokens (#30601 ) OpenAI does not appear to document how it transforms PDF pages to images, which determines how tokens are counted: https://platform.openai.com/docs/guides/pdf-files?api-mode=chat#usage-considerations Currently these block types raise ValueError inside `get_num_tokens_from_messages`. Here we update to generate a warning and continue.	2025-04-01 15:29:33 -04:00
Koshik Debanath	e7883d5b9f	langchain-openai: Support token counting for o-series models in ChatOpenAI (#30542 ) Related to #30344 Add support for token counting for o-series models in `test_token_counts.py`. * Update `_MODELS` and `_CHAT_MODELS` dictionaries - Add "o1", "o3", and "gpt-4o" to `_MODELS` and `_CHAT_MODELS` dictionaries. * Update token counts - Add token counts for "o1", "o3", and "gpt-4o" models. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/langchain-ai/langchain/pull/30542?shareId=ab208bf7-80a3-4b8d-80c4-2287486fedae).	2025-03-28 16:02:09 -04:00

1 2 3

115 Commits