langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-08-31 18:38:48 +00:00

Author	SHA1	Message	Date
Aubrey Ford	b344f34635	partners/openai: OpenAIEmbeddings not respecting chunk_size argument (#30757 ) When calling `embed_documents` and providing a `chunk_size` argument, that argument is ignored when `OpenAIEmbeddings` is instantiated with its default configuration (where `check_embedding_ctx_length=True`). `_get_len_safe_embeddings` specifies a `chunk_size` parameter but it's not being passed through in `embed_documents`, which is its only caller. This appears to be an oversight, especially given that the `_get_len_safe_embeddings` docstring states it should respect "the set embedding context length and chunk size." Developers typically expect method parameters to take effect (also, take precedence) when explicitly provided, especially when instantiating using defaults. I was confused as to why my API calls were being rejected regardless of the chunk size I provided. This bug also exists in langchain_community package. I can add that to this PR if requested otherwise I will create a new one once this passes.	2025-04-18 15:27:27 -04:00
ccurme	86d51f6be6	multiple: permit optional fields on multimodal content blocks (#30887 ) Instead of stuffing provider-specific fields in `metadata`, they can go directly on the content block.	2025-04-17 12:48:46 +00:00
ccurme	9cfe6bcacd	multiple: multi-modal content blocks (#30746 ) Introduces standard content block format for images, audio, and files. ## Examples Image from url: ``` { "type": "image", "source_type": "url", "url": "https://path.to.image.png", } ``` Image, in-line data: ``` { "type": "image", "source_type": "base64", "data": "<base64 string>", "mime_type": "image/png", } ``` PDF, in-line data: ``` { "type": "file", "source_type": "base64", "data": "<base64 string>", "mime_type": "application/pdf", } ``` File from ID: ``` { "type": "file", "source_type": "id", "id": "file-abc123", } ``` Plain-text file: ``` { "type": "file", "source_type": "text", "text": "foo bar", } ```	2025-04-15 09:48:06 -04:00
ccurme	f7c4965fb6	openai[patch]: update imports in test (#30828 ) Quick fix to unblock CI, will need to address in core separately.	2025-04-14 19:33:38 +00:00
Sydney Runkle	8c6734325b	partners[lint]: run `pyupgrade` to get code in line with 3.9 standards (#30781 ) Using `pyupgrade` to get all `partners` code up to 3.9 standards (mostly, fixing old `typing` imports).	2025-04-11 07:18:44 -04:00
ccurme	fe0fd9dd70	openai[patch]: upgrade tiktoken and fix test (#30621 ) Related to https://github.com/langchain-ai/langchain/issues/30344 https://github.com/langchain-ai/langchain/pull/30542 introduced an erroneous test for token counts for o-series models. tiktoken==0.8 does not support o-series models in `tiktoken.encoding_for_model(model_name)`, and this is the version of tiktoken we had in the lock file. So we would default to `cl100k_base` for o-series, which is the wrong encoding model. The test tested against this wrong encoding (so it passed with tiktoken 0.8). Here we update tiktoken to 0.9 in the lock file, and fix the expected counts in the test. Verified that we are pulling [o200k_base](https://github.com/openai/tiktoken/blob/main/tiktoken/model.py#L8), as expected.	2025-04-02 10:44:48 -04:00
ccurme	8a69de5c24	openai[patch]: ignore file blocks when counting tokens (#30601 ) OpenAI does not appear to document how it transforms PDF pages to images, which determines how tokens are counted: https://platform.openai.com/docs/guides/pdf-files?api-mode=chat#usage-considerations Currently these block types raise ValueError inside `get_num_tokens_from_messages`. Here we update to generate a warning and continue.	2025-04-01 15:29:33 -04:00
Koshik Debanath	e7883d5b9f	langchain-openai: Support token counting for o-series models in ChatOpenAI (#30542 ) Related to #30344 Add support for token counting for o-series models in `test_token_counts.py`. * Update `_MODELS` and `_CHAT_MODELS` dictionaries - Add "o1", "o3", and "gpt-4o" to `_MODELS` and `_CHAT_MODELS` dictionaries. * Update token counts - Add token counts for "o1", "o3", and "gpt-4o" models. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/langchain-ai/langchain/pull/30542?shareId=ab208bf7-80a3-4b8d-80c4-2287486fedae).	2025-03-28 16:02:09 -04:00
omahs	6f8735592b	docs,langchain-community: Fix typos in docs and code (#30541 ) Fix typos	2025-03-28 19:21:16 +00:00
ccurme	8486e0ae80	openai[patch]: bump openai sdk (#30461 ) [New required field](https://github.com/openai/openai-python/pull/2223/files#diff-530fd17eb1cc43440c82630df0ddd9b0893cf14b04065a95e6eef6cd2f766a44R26) for `ResponseUsage` released in 1.66.5.	2025-03-24 12:10:00 -04:00
ccurme	c74e7b997d	openai[patch]: support structured output via Responses API (#30265 ) Also runs all standard tests using Responses API.	2025-03-14 15:14:23 -04:00
ccurme	cd1ea8e94d	openai[patch]: support Responses API (#30231 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	2025-03-12 12:25:46 -04:00
ccurme	6c7c8a164f	openai[patch]: add unit test (#30022 ) Test `max_completion_tokens` is propagated to payload for AzureChatOpenAI.	2025-02-27 11:09:17 -05:00
ccurme	b1a7f4e106	core, openai[patch]: support serialization of pydantic models in messages (#29940 ) Resolves https://github.com/langchain-ai/langchain/issues/29003, https://github.com/langchain-ai/langchain/issues/27264 Related: https://github.com/langchain-ai/langchain-redis/issues/52 ```python from langchain.chat_models import init_chat_model from langchain.globals import set_llm_cache from langchain_community.cache import SQLiteCache from pydantic import BaseModel cache = SQLiteCache() set_llm_cache(cache) class Temperature(BaseModel): value: int city: str llm = init_chat_model("openai:gpt-4o-mini") structured_llm = llm.with_structured_output(Temperature) ``` ```python # 681 ms response = structured_llm.invoke("What is the average temperature of Rome in May?") ``` ```python # 6.98 ms response = structured_llm.invoke("What is the average temperature of Rome in May?") ```	2025-02-24 09:34:27 -05:00
ccurme	927ec20b69	openai[patch]: update system role to developer for o-series models (#29785 ) Some o-series models will raise a 400 error for `"role": "system"` (`o1-mini` and `o1-preview` will raise, `o1` and `o3-mini` will not). Here we update `ChatOpenAI` to update the role to `"developer"` for all model names matching `^o\d`. We only make this change on the ChatOpenAI class (not BaseChatOpenAI).	2025-02-24 08:59:46 -05:00
Chaymae El Aattabi	4b08a7e8e8	Fix #29759 : Use local chunk_size_ for looping in embed_documents (#29761 ) This fix ensures that the chunk size is correctly determined when processing text embeddings. Previously, the code did not properly handle cases where chunk_size was None, potentially leading to incorrect chunking behavior. Now, chunk_size_ is explicitly set to either the provided chunk_size or the default self.chunk_size, ensuring consistent chunking. This update improves reliability when processing large text inputs in batches and prevents unintended behavior when chunk_size is not specified. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2025-02-13 01:28:26 +00:00
Bagatur	8d566a8fe7	openai[patch]: detect old models in with_structured_output (#29392 ) Co-authored-by: ccurme <chester.curme@gmail.com>	2025-01-23 20:47:32 +00:00
ccurme	6e63ccba84	openai[minor]: release 0.3 (#29100 ) ## Goal Solve the following problems with `langchain-openai`: - Structured output with `o1` [breaks out of the box](https://langchain.slack.com/archives/C050X0VTN56/p1735232400232099). - `with_structured_output` by default does not use OpenAI’s [structured output feature](https://platform.openai.com/docs/guides/structured-outputs). - We override API defaults for temperature and other parameters. ## Breaking changes: - Default method for structured output is changing to OpenAI’s dedicated [structured output feature](https://platform.openai.com/docs/guides/structured-outputs). For schemas specified via TypedDict or JSON schema, strict schema validation is disabled by default but can be enabled by specifying `strict=True`. - To recover previous default, pass `method="function_calling"` into `with_structured_output`. - Models that don’t support `method="json_schema"` (e.g., `gpt-4` and `gpt-3.5-turbo`, currently the default model for ChatOpenAI) will raise an error unless `method` is explicitly specified. - To recover previous default, pass `method="function_calling"` into `with_structured_output`. - Schemas specified via Pydantic `BaseModel` that have fields with non-null defaults or metadata (like min/max constraints) will raise an error. - To recover previous default, pass `method="function_calling"` into `with_structured_output`. - `strict` now defaults to False for `method="json_schema"` when schemas are specified via TypedDict or JSON schema. - To recover previous behavior, use `with_structured_output(schema, strict=True)` - Schemas specified via Pydantic V1 will raise a warning (and use `method="function_calling"`) unless `method` is explicitly specified. - To remove the warning, pass `method="function_calling"` into `with_structured_output`. - Streaming with default structured output method / Pydantic schema no longer generates intermediate streamed chunks. - To recover previous behavior, pass `method="function_calling"` into `with_structured_output`. - We no longer override default temperature (was 0.7 in LangChain, now will follow OpenAI, currently 1.0). - To recover previous behavior, initialize `ChatOpenAI` or `AzureChatOpenAI` with `temperature=0.7`. - Note: conceptually there is a difference between forcing a tool call and forcing a response format. Tool calls may have more concise arguments vs. generating content adhering to a schema. Prompts may need to be adjusted to recover desired behavior. --------- Co-authored-by: Jacob Lee <jacoblee93@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2025-01-10 10:50:32 -05:00
Erick Friis	187131c55c	Revert "integrations[patch]: remove non-required chat param defaults" (#29048 ) Reverts langchain-ai/langchain#26730 discuss best way to release default changes (esp openai temperature)	2025-01-06 14:45:34 -08:00
Bagatur	3d7ae8b5d2	integrations[patch]: remove non-required chat param defaults (#26730 ) anthropic: - max_retries openai: - n - temperature - max_retries fireworks - temperature groq - n - max_retries - temperature mistral - max_retries - timeout - max_concurrent_requests - temperature - top_p - safe_mode --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-06 22:26:22 +00:00
Bagatur	1378ddfa5f	openai[patch]: type reasoning_effort (#28825 )	2024-12-19 19:36:49 +00:00
Bagatur	4a531437bb	core[patch], openai[patch]: Handle OpenAI developer msg (#28794 ) - Convert developer openai messages to SystemMessage - store additional_kwargs={"__openai_role__": "developer"} so that the correct role can be reconstructed if needed - update ChatOpenAI to read in openai_role --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-18 21:54:07 +00:00
Erick Friis	0eb7ab65f1	multiple: fix xfailed signatures (#28597 )	2024-12-06 15:39:47 -08:00
ccurme	42b18824c2	openai[patch]: use max_completion_tokens in place of max_tokens (#26917 ) `max_tokens` is deprecated: https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-11-26 16:30:19 +00:00
Erick Friis	29f8a79ebe	groq,openai,mistralai: fix unit tests (#28279 )	2024-11-22 04:54:01 +00:00
Erick Friis	0dbaf05bb7	standard-tests: rename langchain_standard_tests to langchain_tests, release 0.3.2 (#28203 )	2024-11-18 19:10:39 -08:00
Bagatur	ede953d617	openai[patch]: fix schema formatting util (#27685 )	2024-10-28 15:46:47 +00:00
Bagatur	655ced84d7	openai[patch]: accept json schema response format directly (#27623 ) fix #25460 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-10-24 18:19:15 +00:00
Erick Friis	7d65a32ee0	openai: audio modality, remove sockets from unit tests (#27436 )	2024-10-18 08:02:09 -07:00
Bagatur	ce33c4fa40	openai[patch]: default temp=1 for o1 (#27206 )	2024-10-08 15:45:21 -07:00
Bagatur	4935a14314	core,integrations[minor]: Dont error on fields in model_kwargs (#27110 ) Given the current erroring behavior, every time we've moved a kwarg from model_kwargs and made it its own field that was a breaking change. Updating this behavior to support the old instantiations / serializations. Assuming build_extra_kwargs was not something that itself is being used externally and needs to be kept backwards compatible	2024-10-04 11:30:27 -07:00
Erick Friis	e8e5d67a8d	openai: fix None token detail (#27091 ) happens in Azure	2024-10-04 01:25:38 +00:00
Bagatur	e1e4f88b3e	openai[patch]: enable Azure structured output, parallel_tool_calls=Fa… (#26599 ) …lse, tool_choice=required response_format=json_schema, tool_choice=required, parallel_tool_calls are all supported for gpt-4o on azure.	2024-09-22 22:25:22 -07:00
Erick Friis	c2a3021bb0	multiple: pydantic 2 compatibility, v0.3 (#26443 ) Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com>	2024-09-13 14:38:45 -07:00
liuhetian	7fc9e99e21	openai[patch]: get output_type when using with_structured_output (#26307 ) - This allows pydantic to correctly resolve annotations necessary when using openai new param `json_schema` Resolves issue: #26250 --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-09-13 11:42:01 -07:00
Harrison Chase	28ad244e77	community, openai: support nested dicts (#26414 ) needed for thinking tokens --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-09-12 21:47:47 -07:00
Bagatur	dba308447d	fmt	2024-09-04 11:28:04 -07:00
Bagatur	3ec93c2817	standard-tests[patch]: add Ser/Des test	2024-09-04 10:24:06 -07:00
Eugene Yurtsev	bc3b851f08	openai[patch]: Upgrade @root_validators in preparation for pydantic 2 migration (#25491 ) * Upgrade @root_validator in openai pkg * Ran notebooks for all but AzureAI embeddings --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-09-03 14:42:24 -07:00
Bagatur	bc3b02651c	standard-tests[patch]: test init from env vars (#25983 )	2024-09-03 19:05:39 +00:00
ccurme	2e5c379632	openai[patch]: fix get_num_tokens for function calls (#25785 ) Closes https://github.com/langchain-ai/langchain/issues/25784 See additional discussion [here](`0a4ee864e9 (r145147380)`).	2024-08-27 20:18:19 +00:00
Hyman	58e72febeb	openai:compatible with other llm usage meta data (#24500 ) - [ ] PR message: - Description: Compatible with other llm (eg: deepseek-chat, glm-4) usage meta data - Issue: N/A - Dependencies: no new dependencies added - [ ] Add tests and docs: libs/partners/openai/tests/unit_tests/chat_models/test_base.py ```shell cd libs/partners/openai poetry run pytest tests/unit_tests/chat_models/test_base.py::test_openai_astream poetry run pytest tests/unit_tests/chat_models/test_base.py::test_openai_stream poetry run pytest tests/unit_tests/chat_models/test_base.py::test_deepseek_astream poetry run pytest tests/unit_tests/chat_models/test_base.py::test_deepseek_stream poetry run pytest tests/unit_tests/chat_models/test_base.py::test_glm4_astream poetry run pytest tests/unit_tests/chat_models/test_base.py::test_glm4_stream ``` --------- Co-authored-by: hyman <hyman@xiaozancloud.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-08-23 16:59:14 -07:00
ccurme	b83f1eb0d5	core, partners: implement standard tracing params for LLMs (#25410 )	2024-08-16 13:18:09 -04:00
Bagatur	09fbce13c5	openai[patch]: ChatOpenAI.with_structured_output json_schema support (#25123 )	2024-08-07 08:09:07 -07:00
Erick Friis	2c6b9e8771	standard-tests: add override check (#24407 )	2024-07-22 23:38:01 +00:00
Bagatur	7d83189b19	openai[patch]: use model_name in AzureOpenAI.ls_model_name (#24366 )	2024-07-17 15:24:05 -07:00
Erick Friis	1e9cc02ed8	openai: raw response headers (#24150 )	2024-07-16 09:54:54 -07:00
Bagatur	5fd1e67808	core[minor], integrations...[patch]: Support ToolCall as Tool input and ToolMessage as Tool output (#24038 ) Changes: - ToolCall, InvalidToolCall and ToolCallChunk can all accept a "type" parameter now - LLM integration packages add "type" to all the above - Tool supports ToolCall inputs that have "type" specified - Tool outputs ToolMessage when a ToolCall is passed as input - Tools can separately specify ToolMessage.content and ToolMessage.raw_output - Tools emit events for validation errors (using on_tool_error and on_tool_end) Example: ```python @tool("structured_api", response_format="content_and_raw_output") def _mock_structured_tool_with_raw_output( arg1: int, arg2: bool, arg3: Optional[dict] = None ) -> Tuple[str, dict]: """A Structured Tool""" return f"{arg1} {arg2}", {"arg1": arg1, "arg2": arg2, "arg3": arg3} def test_tool_call_input_tool_message_with_raw_output() -> None: tool_call: Dict = { "name": "structured_api", "args": {"arg1": 1, "arg2": True, "arg3": {"img": "base64string..."}}, "id": "123", "type": "tool_call", } expected = ToolMessage("1 True", raw_output=tool_call["args"], tool_call_id="123") tool = _mock_structured_tool_with_raw_output actual = tool.invoke(tool_call) assert actual == expected tool_call.pop("type") with pytest.raises(ValidationError): tool.invoke(tool_call) actual_content = tool.invoke(tool_call["args"]) assert actual_content == expected.content ``` --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-07-11 14:54:02 -07:00
Bagatur	a0c2281540	infra: update mypy 1.10, ruff 0.5 (#23721 ) ```python """python scripts/update_mypy_ruff.py""" import glob import tomllib from pathlib import Path import toml import subprocess import re ROOT_DIR = Path(__file__).parents[1] def main(): for path in glob.glob(str(ROOT_DIR / "libs/*/pyproject.toml"), recursive=True): print(path) with open(path, "rb") as f: pyproject = tomllib.load(f) try: pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = ( "^1.10" ) pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = ( "^0.5" ) except KeyError: continue with open(path, "w") as f: toml.dump(pyproject, f) cwd = "/".join(path.split("/")[:-1]) completed = subprocess.run( "poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color", cwd=cwd, shell=True, capture_output=True, text=True, ) logs = completed.stdout.split("\n") to_ignore = {} for l in logs: if re.match("^(.)\:(\d+)\: error:.\[(.)\]", l): path, line_no, error_type = re.match( "^(.)\:(\d+)\: error:.\[(.*)\]", l ).groups() if (path, line_no) in to_ignore: to_ignore[(path, line_no)].append(error_type) else: to_ignore[(path, line_no)] = [error_type] print(len(to_ignore)) for (error_path, line_no), error_types in to_ignore.items(): all_errors = ", ".join(error_types) full_path = f"{cwd}/{error_path}" try: with open(full_path, "r") as f: file_lines = f.readlines() except FileNotFoundError: continue file_lines[int(line_no) - 1] = ( file_lines[int(line_no) - 1][:-1] + f" # type: ignore[{all_errors}]\n" ) with open(full_path, "w") as f: f.write("".join(file_lines)) subprocess.run( "poetry run ruff format .; poetry run ruff --select I --fix .", cwd=cwd, shell=True, capture_output=True, text=True, ) if __name__ == "__main__": main() ```	2024-07-03 10:33:27 -07:00
Chip Davis	04bc5f1a95	partners[azure]: fix having openai_api_base set for other packages (#22068 ) This fix is for #21726. When having other packages installed that require the `openai_api_base` environment variable, users are not able to instantiate the AzureChatModels or AzureEmbeddings. This PR adds a new value `ignore_openai_api_base` which is a bool. When set to True, it sets `openai_api_base` to `None` Two new tests were added for the `test_azure` and a new file `test_azure_embeddings` A different approach may be better for this. If you can think of better logic, let me know and I can adjust it. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-07-01 18:35:20 +00:00

1 2

73 Commits