mirror of https://github.com/hwchase17/langchain.git synced 2026-06-09 10:17:00 +00:00

Files

ccurme ed5e589191 openai[patch]: support multi-turn computer use (#30410 )

Here we accept ToolMessages of the form
```python
ToolMessage(
    content=<representation of screenshot> (see below),
    tool_call_id="abc123",
    additional_kwargs={"type": "computer_call_output"},
)
```
and translate them to `computer_call_output` items for the Responses
API.

We also propagate `reasoning_content` items from AIMessages.

## Example

### Load screenshots
```python
import base64

def load_png_as_base64(file_path):
    with open(file_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())
        return encoded_string.decode('utf-8')

screenshot_1_base64 = load_png_as_base64("/path/to/screenshot/of/application.png")
screenshot_2_base64 = load_png_as_base64("/path/to/screenshot/of/desktop.png")
```

### Initial message and response
```python
from langchain_core.messages import HumanMessage, ToolMessage
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="computer-use-preview",
    model_kwargs={"truncation": "auto"},
)

tool = {
    "type": "computer_use_preview",
    "display_width": 1024,
    "display_height": 768,
    "environment": "browser"
}
llm_with_tools = llm.bind_tools([tool])

input_message = HumanMessage(
    content=[
        {
            "type": "text",
            "text": (
                "Click the red X to close and reveal my Desktop. "
                "Proceed, no confirmation needed."
            )
        },
        {
            "type": "input_image",
            "image_url": f"data:image/png;base64,{screenshot_1_base64}",
        }
    ]
)

response = llm_with_tools.invoke(
    [input_message],
    reasoning={
        "generate_summary": "concise",
    },
)
response.additional_kwargs["tool_outputs"]
```

### Construct ToolMessage
```python
tool_call_id = response.additional_kwargs["tool_outputs"][0]["call_id"]

tool_message = ToolMessage(
    content=[
        {
            "type": "input_image",
            "image_url": f"data:image/png;base64,{screenshot_2_base64}"
        }
    ],
    #  content=f"data:image/png;base64,{screenshot_2_base64}",  # <-- also acceptable
    tool_call_id=tool_call_id,
    additional_kwargs={"type": "computer_call_output"},
)
```

### Invoke again
```python
messages = [
    input_message,
    response,
    tool_message,
]

response_2 = llm_with_tools.invoke(
    messages,
    reasoning={
        "generate_summary": "concise",
    },
)
```

2025-03-24 15:25:36 +00:00

langchain_openai

openai[patch]: support multi-turn computer use (#30410 )

2025-03-24 15:25:36 +00:00

scripts

multiple: pydantic 2 compatibility, v0.3 (#26443 )

2024-09-13 14:38:45 -07:00

tests

openai[patch]: support multi-turn computer use (#30410 )

2025-03-24 15:25:36 +00:00

.gitignore

openai: audio modality, remove sockets from unit tests (#27436 )

2024-10-18 08:02:09 -07:00

LICENSE

…

Makefile

infra: add UV_FROZEN to makefiles (#29642 )

2025-02-06 14:36:54 -05:00

pyproject.toml

openai[patch]: release 0.3.9 (#30325 )

2025-03-17 16:08:41 +00:00

README.md

docs: Update openai README.md (#29146 )

2025-01-10 17:24:16 -08:00

uv.lock

openai[patch]: release 0.3.9 (#30325 )

2025-03-17 16:08:41 +00:00

README.md

langchain-openai

This package contains the LangChain integrations for OpenAI through their openai SDK.

Installation and Setup

Install the LangChain partner package

pip install langchain-openai

Get an OpenAI api key and set it as an environment variable (OPENAI_API_KEY)

Chat model

See a usage example.

from langchain_openai import ChatOpenAI

If you are using a model hosted on Azure, you should use different wrapper for that:

from langchain_openai import AzureChatOpenAI

For a more detailed walkthrough of the Azure wrapper, see here

Text Embedding Model

See a usage example

from langchain_openai import OpenAIEmbeddings

If you are using a model hosted on Azure, you should use different wrapper for that:

from langchain_openai import AzureOpenAIEmbeddings

For a more detailed walkthrough of the Azure wrapper, see here

LLM (Legacy)

LLM refers to the legacy text-completion models that preceded chat models. See a usage example.

from langchain_openai import OpenAI

If you are using a model hosted on Azure, you should use different wrapper for that:

from langchain_openai import AzureOpenAI

For a more detailed walkthrough of the Azure wrapper, see here