openai[patch]: support built-in code interpreter and remote MCP tools (#31304)

2025-08-13 22:59:05 +00:00 · 2025-05-22 11:47:57 -04:00 · 2025-05-22 11:47:57 -04:00 · 053a1246da
commit 053a1246da
parent 1b5ffe4107
6 changed files with 389 additions and 14 deletions
--- a/docs/docs/integrations/chat/openai.ipynb
+++ b/docs/docs/integrations/chat/openai.ipynb
@ -915,6 +915,175 @@
    "response_2.text()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34ad0015-688c-4274-be55-93268b44f558",
   "metadata": {},
   "source": [
    "#### Code interpreter\n",
    "\n",
    "OpenAI implements a [code interpreter](https://platform.openai.com/docs/guides/tools-code-interpreter) tool to support the sandboxed generation and execution of code.\n",
    "\n",
    "Example use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "34826aae-6d48-4b84-bc00-89594a87d461",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "llm = ChatOpenAI(model=\"o4-mini\", use_responses_api=True)\n",
    "\n",
    "llm_with_tools = llm.bind_tools(\n",
    "    [\n",
    "        {\n",
    "            \"type\": \"code_interpreter\",\n",
    "            # Create a new container\n",
    "            \"container\": {\"type\": \"auto\"},\n",
    "        }\n",
    "    ]\n",
    ")\n",
    "response = llm_with_tools.invoke(\n",
    "    \"Write and run code to answer the question: what is 3^3?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b4d92b9-941f-4d54-93a5-b0c73afd66b2",
   "metadata": {},
   "source": [
    "Note that the above command created a new container. We can also specify an existing container ID:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d8c82895-5011-4062-a1bb-278ec91321e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "tool_outputs = response.additional_kwargs[\"tool_outputs\"]\n",
    "assert len(tool_outputs) == 1\n",
    "# highlight-next-line\n",
    "container_id = tool_outputs[0][\"container_id\"]\n",
    "\n",
    "llm_with_tools = llm.bind_tools(\n",
    "    [\n",
    "        {\n",
    "            \"type\": \"code_interpreter\",\n",
    "            # Use an existing container\n",
    "            # highlight-next-line\n",
    "            \"container\": container_id,\n",
    "        }\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8db30501-522c-4915-963d-d60539b5c16e",
   "metadata": {},
   "source": [
    "#### Remote MCP\n",
    "\n",
    "OpenAI implements a [remote MCP](https://platform.openai.com/docs/guides/tools-remote-mcp) tool that allows for model-generated calls to MCP servers.\n",
    "\n",
    "Example use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "7044a87b-8b99-49e8-8ca4-e2a8ae49f65a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "llm = ChatOpenAI(model=\"o4-mini\", use_responses_api=True)\n",
    "\n",
    "llm_with_tools = llm.bind_tools(\n",
    "    [\n",
    "        {\n",
    "            \"type\": \"mcp\",\n",
    "            \"server_label\": \"deepwiki\",\n",
    "            \"server_url\": \"https://mcp.deepwiki.com/mcp\",\n",
    "            \"require_approval\": \"never\",\n",
    "        }\n",
    "    ]\n",
    ")\n",
    "response = llm_with_tools.invoke(\n",
    "    \"What transport protocols does the 2025-03-26 version of the MCP \"\n",
    "    \"spec (modelcontextprotocol/modelcontextprotocol) support?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ed7494e-425d-4bdf-ab83-3164757031dd",
   "metadata": {},
   "source": [
    "<details>\n",
    "<summary>MCP Approvals</summary>\n",
    "\n",
    "OpenAI will at times request approval before sharing data with a remote MCP server.\n",
    "\n",
    "In the above command, we instructed the model to never require approval. We can also configure the model to always request approval, or to always request approval for specific tools:\n",
    "\n",
    "```python\n",
    "llm_with_tools = llm.bind_tools(\n",
    "    [\n",
    "        {\n",
    "            \"type\": \"mcp\",\n",
    "            \"server_label\": \"deepwiki\",\n",
    "            \"server_url\": \"https://mcp.deepwiki.com/mcp\",\n",
    "            \"require_approval\": {\n",
    "                \"always\": {\n",
    "                    \"tool_names\": [\"read_wiki_structure\"]\n",
    "                }\n",
    "            }\n",
    "        }\n",
    "    ]\n",
    ")\n",
    "response = llm_with_tools.invoke(\n",
    "    \"What transport protocols does the 2025-03-26 version of the MCP \"\n",
    "    \"spec (modelcontextprotocol/modelcontextprotocol) support?\"\n",
    ")\n",
    "```\n",
    "\n",
    "Responses may then include blocks with type `\"mcp_approval_request\"`.\n",
    "\n",
    "To submit approvals for an approval request, structure it into a content block in an input message:\n",
    "\n",
    "```python\n",
    "approval_message = {\n",
    "    \"role\": \"user\",\n",
    "    \"content\": [\n",
    "        {\n",
    "            \"type\": \"mcp_approval_response\",\n",
    "            \"approve\": True,\n",
    "            \"approval_request_id\": output[\"id\"],\n",
    "        }\n",
    "        for output in response.additional_kwargs[\"tool_outputs\"]\n",
    "        if output[\"type\"] == \"mcp_approval_request\"\n",
    "    ]\n",
    "}\n",
    "\n",
    "next_response = llm_with_tools.invoke(\n",
    "    [approval_message],\n",
    "    # continue existing thread\n",
    "    previous_response_id=response.response_metadata[\"id\"]\n",
    ")\n",
    "```\n",
    "\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fda05f0-4b81-4709-9407-f316d760ad50",
--- a/libs/core/langchain_core/utils/function_calling.py
+++ b/libs/core/langchain_core/utils/function_calling.py
@ -554,9 +554,19 @@ def convert_to_openai_tool(
        Return OpenAI Responses API-style tools unchanged. This includes
        any dict with "type" in "file_search", "function", "computer_use_preview",
        "web_search_preview".
    .. versionchanged:: 0.3.61
        Added support for OpenAI's built-in code interpreter and remote MCP tools.
    """
    if isinstance(tool, dict):
-        if tool.get("type") in ("function", "file_search", "computer_use_preview"):
+        if tool.get("type") in (
            "function",
            "file_search",
            "computer_use_preview",
            "code_interpreter",
            "mcp",
        ):
            return tool
        # As of 03.12.25 can be "web_search_preview" or "web_search_preview_2025_03_11"
        if (tool.get("type") or "").startswith("web_search_preview"):
--- a/libs/partners/openai/langchain_openai/chat_models/base.py
+++ b/libs/partners/openai/langchain_openai/chat_models/base.py
@ -775,16 +775,22 @@ class BaseChatOpenAI(BaseChatModel):
        with context_manager as response:
            is_first_chunk = True
            has_reasoning = False
            for chunk in response:
                metadata = headers if is_first_chunk else {}
                if generation_chunk := _convert_responses_chunk_to_generation_chunk(
-                    chunk, schema=original_schema_obj, metadata=metadata
+                    chunk,
                    schema=original_schema_obj,
                    metadata=metadata,
                    has_reasoning=has_reasoning,
                ):
                    if run_manager:
                        run_manager.on_llm_new_token(
                            generation_chunk.text, chunk=generation_chunk
                        )
                    is_first_chunk = False
                    if "reasoning" in generation_chunk.message.additional_kwargs:
                        has_reasoning = True
                    yield generation_chunk
    async def _astream_responses(
@ -811,16 +817,22 @@ class BaseChatOpenAI(BaseChatModel):
        async with context_manager as response:
            is_first_chunk = True
            has_reasoning = False
            async for chunk in response:
                metadata = headers if is_first_chunk else {}
                if generation_chunk := _convert_responses_chunk_to_generation_chunk(
-                    chunk, schema=original_schema_obj, metadata=metadata
+                    chunk,
                    schema=original_schema_obj,
                    metadata=metadata,
                    has_reasoning=has_reasoning,
                ):
                    if run_manager:
                        await run_manager.on_llm_new_token(
                            generation_chunk.text, chunk=generation_chunk
                        )
                    is_first_chunk = False
                    if "reasoning" in generation_chunk.message.additional_kwargs:
                        has_reasoning = True
                    yield generation_chunk
    def _should_stream_usage(
@ -1176,12 +1188,22 @@ class BaseChatOpenAI(BaseChatModel):
        self, stop: Optional[list[str]] = None, **kwargs: Any
    ) -> dict[str, Any]:
        """Get the parameters used to invoke the model."""
-        return {
+        params = {
            "model": self.model_name,
            **super()._get_invocation_params(stop=stop),
            **self._default_params,
            **kwargs,
        }
        # Redact headers from built-in remote MCP tool invocations
        if (tools := params.get("tools")) and isinstance(tools, list):
            params["tools"] = [
                ({**tool, "headers": "**REDACTED**"} if "headers" in tool else tool)
                if isinstance(tool, dict) and tool.get("type") == "mcp"
                else tool
                for tool in tools
            ]
        return params
    def _get_ls_params(
        self, stop: Optional[list[str]] = None, **kwargs: Any
@ -1456,6 +1478,8 @@ class BaseChatOpenAI(BaseChatModel):
                    "file_search",
                    "web_search_preview",
                    "computer_use_preview",
                    "code_interpreter",
                    "mcp",
                ):
                    tool_choice = {"type": tool_choice}
                # 'any' is not natively supported by OpenAI API.
@ -3150,12 +3174,22 @@ def _construct_responses_api_input(messages: Sequence[BaseMessage]) -> list:
                    ):
                        function_call["id"] = _id
                    function_calls.append(function_call)
-            # Computer calls
+            # Built-in tool calls
            computer_calls = []
            code_interpreter_calls = []
            mcp_calls = []
            tool_outputs = lc_msg.additional_kwargs.get("tool_outputs", [])
            for tool_output in tool_outputs:
                if tool_output.get("type") == "computer_call":
                    computer_calls.append(tool_output)
                elif tool_output.get("type") == "code_interpreter_call":
                    code_interpreter_calls.append(tool_output)
                elif tool_output.get("type") == "mcp_call":
                    mcp_calls.append(tool_output)
                else:
                    pass
            input_.extend(code_interpreter_calls)
            input_.extend(mcp_calls)
            msg["content"] = msg.get("content") or []
            if lc_msg.additional_kwargs.get("refusal"):
                if isinstance(msg["content"], str):
@ -3196,6 +3230,7 @@ def _construct_responses_api_input(messages: Sequence[BaseMessage]) -> list:
        elif msg["role"] in ("user", "system", "developer"):
            if isinstance(msg["content"], list):
                new_blocks = []
                non_message_item_types = ("mcp_approval_response",)
                for block in msg["content"]:
                    # chat api: {"type": "text", "text": "..."}
                    # responses api: {"type": "input_text", "text": "..."}
@ -3216,9 +3251,14 @@ def _construct_responses_api_input(messages: Sequence[BaseMessage]) -> list:
                        new_blocks.append(new_block)
                    elif block["type"] in ("input_text", "input_image", "input_file"):
                        new_blocks.append(block)
                    elif block["type"] in non_message_item_types:
                        input_.append(block)
                    else:
                        pass
                msg["content"] = new_blocks
                if msg["content"]:
                    input_.append(msg)
            else:
                input_.append(msg)
        else:
            input_.append(msg)
@ -3366,7 +3406,10 @@ def _construct_lc_result_from_responses_api(
 def _convert_responses_chunk_to_generation_chunk(
-    chunk: Any, schema: Optional[type[_BM]] = None, metadata: Optional[dict] = None
+    chunk: Any,
    schema: Optional[type[_BM]] = None,
    metadata: Optional[dict] = None,
    has_reasoning: bool = False,
 ) -> Optional[ChatGenerationChunk]:
    content = []
    tool_call_chunks: list = []
@ -3429,6 +3472,10 @@ def _convert_responses_chunk_to_generation_chunk(
        "web_search_call",
        "file_search_call",
        "computer_call",
        "code_interpreter_call",
        "mcp_call",
        "mcp_list_tools",
        "mcp_approval_request",
    ):
        additional_kwargs["tool_outputs"] = [
            chunk.item.model_dump(exclude_none=True, mode="json")
@ -3444,6 +3491,8 @@ def _convert_responses_chunk_to_generation_chunk(
    elif chunk.type == "response.refusal.done":
        additional_kwargs["refusal"] = chunk.refusal
    elif chunk.type == "response.output_item.added" and chunk.item.type == "reasoning":
        if not has_reasoning:
            # Hack until breaking release: store first reasoning item ID.
            additional_kwargs["reasoning"] = chunk.item.model_dump(
                exclude_none=True, mode="json"
            )
--- a/libs/partners/openai/tests/integration_tests/chat_models/test_responses_api.py
+++ b/libs/partners/openai/tests/integration_tests/chat_models/test_responses_api.py
@ -11,6 +11,7 @@ from langchain_core.messages import (
    AIMessageChunk,
    BaseMessage,
    BaseMessageChunk,
    HumanMessage,
 )
 from pydantic import BaseModel
 from typing_extensions import TypedDict
@ -377,3 +378,73 @@ def test_stream_reasoning_summary() -> None:
    message_2 = {"role": "user", "content": "Thank you."}
    response_2 = llm.invoke([message_1, response_1, message_2])
    assert isinstance(response_2, AIMessage)
 # TODO: VCR some of these
 def test_code_interpreter() -> None:
    llm = ChatOpenAI(model="o4-mini", use_responses_api=True)
    llm_with_tools = llm.bind_tools(
        [{"type": "code_interpreter", "container": {"type": "auto"}}]
    )
    response = llm_with_tools.invoke(
        "Write and run code to answer the question: what is 3^3?"
    )
    _check_response(response)
    tool_outputs = response.additional_kwargs["tool_outputs"]
    assert tool_outputs
    assert any(output["type"] == "code_interpreter_call" for output in tool_outputs)
    # Test streaming
    # Use same container
    tool_outputs = response.additional_kwargs["tool_outputs"]
    assert len(tool_outputs) == 1
    container_id = tool_outputs[0]["container_id"]
    llm_with_tools = llm.bind_tools(
        [{"type": "code_interpreter", "container": container_id}]
    )
    full: Optional[BaseMessageChunk] = None
    for chunk in llm_with_tools.stream(
        "Write and run code to answer the question: what is 3^3?"
    ):
        assert isinstance(chunk, AIMessageChunk)
        full = chunk if full is None else full + chunk
    assert isinstance(full, AIMessageChunk)
    tool_outputs = full.additional_kwargs["tool_outputs"]
    assert tool_outputs
    assert any(output["type"] == "code_interpreter_call" for output in tool_outputs)
 def test_mcp_builtin() -> None:
    pytest.skip()  # TODO: set up VCR
    llm = ChatOpenAI(model="o4-mini", use_responses_api=True)
    llm_with_tools = llm.bind_tools(
        [
            {
                "type": "mcp",
                "server_label": "deepwiki",
                "server_url": "https://mcp.deepwiki.com/mcp",
                "require_approval": {"always": {"tool_names": ["read_wiki_structure"]}},
            }
        ]
    )
    response = llm_with_tools.invoke(
        "What transport protocols does the 2025-03-26 version of the MCP spec "
        "(modelcontextprotocol/modelcontextprotocol) support?"
    )
    approval_message = HumanMessage(
        [
            {
                "type": "mcp_approval_response",
                "approve": True,
                "approval_request_id": output["id"],
            }
            for output in response.additional_kwargs["tool_outputs"]
            if output["type"] == "mcp_approval_request"
        ]
    )
    _ = llm_with_tools.invoke(
        [approval_message], previous_response_id=response.response_metadata["id"]
    )
--- a/libs/partners/openai/tests/unit_tests/chat_models/test_base.py
+++ b/libs/partners/openai/tests/unit_tests/chat_models/test_base.py
@ -21,6 +21,8 @@ from langchain_core.messages import (
 from langchain_core.messages.ai import UsageMetadata
 from langchain_core.outputs import ChatGeneration, ChatResult
 from langchain_core.runnables import RunnableLambda
 from langchain_core.tracers.base import BaseTracer
 from langchain_core.tracers.schemas import Run
 from openai.types.responses import ResponseOutputMessage
 from openai.types.responses.response import IncompleteDetails, Response, ResponseUsage
 from openai.types.responses.response_error import ResponseError
@ -1849,3 +1851,77 @@ def test_service_tier() -> None:
    llm = ChatOpenAI(model="o4-mini", service_tier="flex")
    payload = llm._get_request_payload([HumanMessage("Hello")])
    assert payload["service_tier"] == "flex"
 class FakeTracer(BaseTracer):
    def __init__(self) -> None:
        super().__init__()
        self.chat_model_start_inputs: list = []
    def _persist_run(self, run: Run) -> None:
        """Persist a run."""
        pass
    def on_chat_model_start(self, *args: Any, **kwargs: Any) -> Run:
        self.chat_model_start_inputs.append({"args": args, "kwargs": kwargs})
        return super().on_chat_model_start(*args, **kwargs)
 def test_mcp_tracing() -> None:
    # Test we exclude sensitive information from traces
    llm = ChatOpenAI(model="o4-mini", use_responses_api=True)
    tracer = FakeTracer()
    mock_client = MagicMock()
    def mock_create(*args: Any, **kwargs: Any) -> Response:
        return Response(
            id="resp_123",
            created_at=1234567890,
            model="o4-mini",
            object="response",
            parallel_tool_calls=True,
            tools=[],
            tool_choice="auto",
            output=[
                ResponseOutputMessage(
                    type="message",
                    id="msg_123",
                    content=[
                        ResponseOutputText(
                            type="output_text", text="Test response", annotations=[]
                        )
                    ],
                    role="assistant",
                    status="completed",
                )
            ],
        )
    mock_client.responses.create = mock_create
    input_message = HumanMessage("Test query")
    tools = [
        {
            "type": "mcp",
            "server_label": "deepwiki",
            "server_url": "https://mcp.deepwiki.com/mcp",
            "require_approval": "always",
            "headers": {"Authorization": "Bearer PLACEHOLDER"},
        }
    ]
    with patch.object(llm, "root_client", mock_client):
        llm_with_tools = llm.bind_tools(tools)
        _ = llm_with_tools.invoke([input_message], config={"callbacks": [tracer]})
    # Test headers are not traced
    assert len(tracer.chat_model_start_inputs) == 1
    invocation_params = tracer.chat_model_start_inputs[0]["kwargs"]["invocation_params"]
    for tool in invocation_params["tools"]:
        if "headers" in tool:
            assert tool["headers"] == "**REDACTED**"
    for substring in ["Authorization", "Bearer", "PLACEHOLDER"]:
        assert substring not in str(tracer.chat_model_start_inputs)
    # Test headers are correctly propagated to request
    payload = llm_with_tools._get_request_payload([input_message], tools=tools)  # type: ignore[attr-defined]
    assert payload["tools"][0]["headers"]["Authorization"] == "Bearer PLACEHOLDER"
--- a/libs/partners/openai/uv.lock
+++ b/libs/partners/openai/uv.lock
@ -462,7 +462,7 @@ wheels = [
 [[package]]
 name = "langchain-core"
-version = "0.3.59"
+version = "0.3.60"
 source = { editable = "../../core" }
 dependencies = [
    { name = "jsonpatch" },
@ -827,7 +827,7 @@ wheels = [
 [[package]]
 name = "openai"
-version = "1.68.2"
+version = "1.81.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "anyio" },
@ -839,9 +839,9 @@ dependencies = [
    { name = "tqdm" },
    { name = "typing-extensions" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/3f/6b/6b002d5d38794645437ae3ddb42083059d556558493408d39a0fcea608bc/openai-1.68.2.tar.gz", hash = "sha256:b720f0a95a1dbe1429c0d9bb62096a0d98057bcda82516f6e8af10284bdd5b19", size = 413429 }
+sdist = { url = "https://files.pythonhosted.org/packages/1c/89/a1e4f3fa7ca4f7fec90dbf47d93b7cd5ff65924926733af15044e302a192/openai-1.81.0.tar.gz", hash = "sha256:349567a8607e0bcffd28e02f96b5c2397d0d25d06732d90ab3ecbf97abf030f9", size = 456861 }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/fd/34/cebce15f64eb4a3d609a83ac3568d43005cc9a1cba9d7fde5590fd415423/openai-1.68.2-py3-none-any.whl", hash = "sha256:24484cb5c9a33b58576fdc5acf0e5f92603024a4e39d0b99793dfa1eb14c2b36", size = 606073 },
+    { url = "https://files.pythonhosted.org/packages/02/66/bcc7f9bf48e8610a33e3b5c96a5a644dad032d92404ea2a5e8b43ba067e8/openai-1.81.0-py3-none-any.whl", hash = "sha256:1c71572e22b43876c5d7d65ade0b7b516bb527c3d44ae94111267a09125f7bae", size = 717529 },
 ]
 [[package]]