Commit Graph

1144 Commits

Author SHA1 Message Date
Andras L Ferenczi
63673b765b Fix: Enable max_retries Parameter in ChatMistralAI Class (#30448)
**partners: Enable max_retries in ChatMistralAI**

**Description**

- This pull request reactivates the retry logic in the
completion_with_retry method of the ChatMistralAI class, restoring the
intended functionality of the previously ineffective max_retries
parameter. New unit test that mocks failed/successful retry calls and an
integration test to confirm end-to-end functionality.

**Issue**
- Closes #30362

**Dependencies**
- No additional dependencies required

Co-authored-by: andrasfe <andrasf94@gmail.com>
2025-03-27 11:53:44 -04:00
ccurme
a9b1e1b177 openai: release 0.3.11 (#30503) 2025-03-26 19:24:37 +00:00
ccurme
8119a7bc5c openai[patch]: support streaming token counts in AzureChatOpenAI (#30494)
When OpenAI originally released `stream_options` to enable token usage
during streaming, it was not supported in AzureOpenAI. It is now
supported.

Like the [OpenAI
SDK](f66d2e6fdc/src/openai/resources/completions.py (L68)),
ChatOpenAI does not return usage metadata during streaming by default
(which adds an extra chunk to the stream). The OpenAI SDK requires users
to pass `stream_options={"include_usage": True}`. ChatOpenAI implements
a convenience argument `stream_usage: Optional[bool]`, and an attribute
`stream_usage: bool = False`.

Here we extend this to AzureChatOpenAI by moving the `stream_usage`
attribute and `stream_usage` kwarg (on `_(a)stream`) from ChatOpenAI to
BaseChatOpenAI.

---

Additional consideration: we must be sensitive to the number of users
using BaseChatOpenAI to interact with other APIs that do not support the
`stream_options` parameter.

Suppose OpenAI in the future updates the default behavior to stream
token usage. Currently, BaseChatOpenAI only passes `stream_options` if
`stream_usage` is True, so there would be no way to disable this new
default behavior.

To address this, we could update the `stream_usage` attribute to
`Optional[bool] = None`, but this is technically a breaking change (as
currently values of False are not passed to the client). IMO: if / when
this change happens, we could accompany it with this update in a minor
bump.

--- 

Related previous PRs:
- https://github.com/langchain-ai/langchain/pull/22628
- https://github.com/langchain-ai/langchain/pull/22854
- https://github.com/langchain-ai/langchain/pull/23552

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-03-26 15:16:37 -04:00
ccurme
422ba4cde5 infra: handle flaky tests (#30501) 2025-03-26 13:28:56 -04:00
ccurme
299b222c53 mistral[patch]: check types in adding model_name to response_metadata (#30499) 2025-03-26 16:30:09 +00:00
ccurme
22d1a7d7b6 standard-tests[patch]: require model_name in response_metadata if returns_usage_metadata (#30497)
We are implementing a token-counting callback handler in
`langchain-core` that is intended to work with all chat models
supporting usage metadata. The callback will aggregate usage metadata by
model. This requires responses to include the model name in its
metadata.

To support this, if a model `returns_usage_metadata`, we check that it
includes a string model name in its `response_metadata` in the
`"model_name"` key.

More context: https://github.com/langchain-ai/langchain/pull/30487
2025-03-26 12:20:53 -04:00
ccurme
50ec4a1a4f openai[patch]: attempt to make test less flaky (#30463) 2025-03-24 17:36:36 +00:00
ccurme
8486e0ae80 openai[patch]: bump openai sdk (#30461)
[New required
field](https://github.com/openai/openai-python/pull/2223/files#diff-530fd17eb1cc43440c82630df0ddd9b0893cf14b04065a95e6eef6cd2f766a44R26)
for `ResponseUsage` released in 1.66.5.
2025-03-24 12:10:00 -04:00
ccurme
cbbc968903 openai: release 0.3.10 (#30460) 2025-03-24 15:37:53 +00:00
ccurme
ed5e589191 openai[patch]: support multi-turn computer use (#30410)
Here we accept ToolMessages of the form
```python
ToolMessage(
    content=<representation of screenshot> (see below),
    tool_call_id="abc123",
    additional_kwargs={"type": "computer_call_output"},
)
```
and translate them to `computer_call_output` items for the Responses
API.

We also propagate `reasoning_content` items from AIMessages.

## Example

### Load screenshots
```python
import base64

def load_png_as_base64(file_path):
    with open(file_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())
        return encoded_string.decode('utf-8')

screenshot_1_base64 = load_png_as_base64("/path/to/screenshot/of/application.png")
screenshot_2_base64 = load_png_as_base64("/path/to/screenshot/of/desktop.png")
```

### Initial message and response
```python
from langchain_core.messages import HumanMessage, ToolMessage
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="computer-use-preview",
    model_kwargs={"truncation": "auto"},
)

tool = {
    "type": "computer_use_preview",
    "display_width": 1024,
    "display_height": 768,
    "environment": "browser"
}
llm_with_tools = llm.bind_tools([tool])

input_message = HumanMessage(
    content=[
        {
            "type": "text",
            "text": (
                "Click the red X to close and reveal my Desktop. "
                "Proceed, no confirmation needed."
            )
        },
        {
            "type": "input_image",
            "image_url": f"data:image/png;base64,{screenshot_1_base64}",
        }
    ]
)

response = llm_with_tools.invoke(
    [input_message],
    reasoning={
        "generate_summary": "concise",
    },
)
response.additional_kwargs["tool_outputs"]
```

### Construct ToolMessage
```python
tool_call_id = response.additional_kwargs["tool_outputs"][0]["call_id"]

tool_message = ToolMessage(
    content=[
        {
            "type": "input_image",
            "image_url": f"data:image/png;base64,{screenshot_2_base64}"
        }
    ],
    #  content=f"data:image/png;base64,{screenshot_2_base64}",  # <-- also acceptable
    tool_call_id=tool_call_id,
    additional_kwargs={"type": "computer_call_output"},
)
```

### Invoke again
```python
messages = [
    input_message,
    response,
    tool_message,
]

response_2 = llm_with_tools.invoke(
    messages,
    reasoning={
        "generate_summary": "concise",
    },
)
```
2025-03-24 15:25:36 +00:00
Simon Paredes
df4448dfac langchain-groq: Add response metadata when streaming (#30379)
- **Description:** Add missing `model_name` and `system_fingerprint`
metadata when streaming.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-23 14:34:41 -04:00
ccurme
b78ae7817e openai[patch]: trace strict in structured_output_kwargs (#30425) 2025-03-21 14:37:28 -04:00
ccurme
1de7fa8f3a Revert "deepseek: temporarily bypass tests" (#30424)
Reverts langchain-ai/langchain#30423
2025-03-21 17:14:31 +00:00
ccurme
c74dfff836 deepseek: temporarily bypass tests (#30423)
Deepseek infra is not stable enough to get through integration tests.

Previous two attempts had two tests time out, they both pass locally.
2025-03-21 17:08:35 +00:00
ccurme
7147903724 deepseek: release 0.1.3 (#30422) 2025-03-21 16:39:50 +00:00
Andras L Ferenczi
b5f49df86a partner: ChatDeepSeek on openrouter not returning reasoning (#30240)
Deepseek model does not return reasoning when hosted on openrouter
(Issue [30067](https://github.com/langchain-ai/langchain/issues/30067))

the following code did not return reasoning:

```python
llm = ChatDeepSeek( model = 'deepseek/deepseek-r1:nitro', api_base="https://openrouter.ai/api/v1", api_key=os.getenv("OPENROUTER_API_KEY")) 
messages = [
    {"role": "system", "content": "You are an assistant."},
    {"role": "user", "content": "9.11 and 9.8, which is greater? Explain the reasoning behind this decision."}
]
response = llm.invoke(messages, extra_body={"include_reasoning": True})
print(response.content)
print(f"REASONING: {response.additional_kwargs.get('reasoning_content', '')}")
print(response)
```

The fix is to extract reasoning from
response.choices[0].message["model_extra"] and from
choices[0].delta["reasoning"]. and place in response additional_kwargs.
Change is really just the addition of a couple one-sentence if
statements.

---------

Co-authored-by: andrasfe <andrasf94@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-21 16:35:37 +00:00
ccurme
e8e3b2bfae ollama: release 0.3.0 (#30420) 2025-03-21 15:50:08 +00:00
Bob Merkus
5700646cc5 ollama: add reasoning model support (e.g. deepseek) (#29689)
# Description
This PR adds reasoning model support for `langchain-ollama` by
extracting reasoning token blocks, like those used in deepseek. It was
inspired by
[ollama-deep-researcher](https://github.com/langchain-ai/ollama-deep-researcher),
specifically the parsing of [thinking
blocks](6d1aaf2139/src/assistant/graph.py (L91)):
```python
  # TODO: This is a hack to remove the <think> tags w/ Deepseek models 
  # It appears very challenging to prompt them out of the responses 
  while "<think>" in running_summary and "</think>" in running_summary:
      start = running_summary.find("<think>")
      end = running_summary.find("</think>") + len("</think>")
      running_summary = running_summary[:start] + running_summary[end:]
```

This notes that it is very hard to remove the reasoning block from
prompting, but we actually want the model to reason in order to increase
model performance. This implementation extracts the thinking block, so
the client can still expect a proper message to be returned by
`ChatOllama` (and use the reasoning content separately when desired).

This implementation takes the same approach as
[ChatDeepseek](5d581ba22c/libs/partners/deepseek/langchain_deepseek/chat_models.py (L215)),
which adds the reasoning content to
chunk.additional_kwargs.reasoning_content;
```python
  if hasattr(response.choices[0].message, "reasoning_content"):  # type: ignore
      rtn.generations[0].message.additional_kwargs["reasoning_content"] = (
          response.choices[0].message.reasoning_content  # type: ignore
      )
```

This should probably be handled upstream in ollama + ollama-python, but
this seems like a reasonably effective solution. This is a standalone
example of what is happening;

```python
async def deepseek_message_astream(
    llm: BaseChatModel,
    messages: list[BaseMessage],
    config: RunnableConfig | None = None,
    *,
    model_target: str = "deepseek-r1",
    **kwargs: Any,
) -> AsyncIterator[BaseMessageChunk]:
    """Stream responses from Deepseek models, filtering out <think> tags.

    Args:
        llm: The language model to stream from
        messages: The messages to send to the model

    Yields:
        Filtered chunks from the model response
    """
    # check if the model is deepseek based
    if (llm.name and model_target not in llm.name) or (hasattr(llm, "model") and model_target not in llm.model):
        async for chunk in llm.astream(messages, config=config, **kwargs):
            yield chunk
        return

    # Yield with a buffer, upon completing the <think></think> tags, move them to the reasoning content and start over
    buffer = ""
    async for chunk in llm.astream(messages, config=config, **kwargs):
        # start or append
        if not buffer:
            buffer = chunk.content
        else:
            buffer += chunk.content if hasattr(chunk, "content") else chunk

        # Process buffer to remove <think> tags
        if "<think>" in buffer or "</think>" in buffer:
            if hasattr(chunk, "tool_calls") and chunk.tool_calls:
                raise NotImplementedError("tool calls during reasoning should be removed?")
            if "<think>" in chunk.content or "</think>" in chunk.content:
                continue
            chunk.additional_kwargs["reasoning_content"] = chunk.content
            chunk.content = ""
        # upon block completion, reset the buffer
        if "<think>" in buffer and "</think>" in buffer:
            buffer = ""
        yield chunk

```

# Issue
Integrating reasoning models (e.g. deepseek-r1) into existing LangChain
based workflows is hard due to the thinking blocks that are included in
the message contents. To avoid this, we could match the `ChatOllama`
integration with `ChatDeepseek` to return the reasoning content inside
`message.additional_arguments.reasoning_content` instead.

# Dependenices
None

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-21 15:44:54 +00:00
ccurme
d8145dda95 xai: release 0.2.2 (#30403) 2025-03-20 20:25:16 +00:00
ccurme
e194902994 mistral: release 0.2.9 (#30402) 2025-03-20 20:22:24 +00:00
ccurme
49466ec9ca groq: release 0.3.1 (#30401) 2025-03-20 20:19:49 +00:00
ccurme
db1e340387 fireworks: release 0.2.8 (#30400) 2025-03-20 16:15:51 -04:00
ccurme
de3960d285 multiple: enforce standards on tool_choice (#30372)
- Test if models support forcing tool calls via `tool_choice`. If they
do, they should support
  - `"any"` to specify any tool
  - the tool name as a string to force calling a particular tool
- Add `tool_choice` to signature of `BaseChatModel.bind_tools` in core
- Deprecate `tool_choice_value` in standard tests in favor of a boolean
`has_tool_choice`

Will follow up with PRs in external repos (tested in AWS and Google
already).
2025-03-20 17:48:59 +00:00
ccurme
b86cd8270c multiple: support strict and method in with_structured_output (#30385) 2025-03-20 13:17:07 -04:00
Mohammad Mohtashim
1103bdfaf1 (Ollama) Fix String Value parsing in _parse_arguments_from_tool_call (#30154)
- **Description:** Fix String Value parsing in
_parse_arguments_from_tool_call
- **Issue:** #30145

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-19 21:47:18 -04:00
ccurme
aae8306d6c groq: release 0.3.0 (#30374) 2025-03-19 15:23:30 +00:00
Ashwin
83cfb9691f Fix typo: change 'ben' to 'be' in comment (#30358)
**Description:**  
This PR fixes a minor typo in the comments within
`libs/partners/openai/langchain_openai/chat_models/base.py`. The word
"ben" has been corrected to "be" for clarity and professionalism.

**Issue:**  
N/A

**Dependencies:**  
None
2025-03-19 10:35:35 -04:00
Lance Martin
46d6bf0330 ollama[minor]: update default method for structured output (#30273)
From function calling to Ollama's [dedicated structured output
feature](https://ollama.com/blog/structured-outputs).

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-18 12:44:22 -04:00
ccurme
b91daf06eb groq[minor]: remove default model (#30341)
The default model for `ChatGroq`, `"mixtral-8x7b-32768"`, is being
retired on March 20, 2025. Here we remove the default, such that model
names must be explicitly specified (being explicit is a good practice
here, and avoids the need for breaking changes down the line). This
change will be released in a minor version bump to 0.3.

This follows https://github.com/langchain-ai/langchain/pull/30161
(released in version 0.2.5), where we began generating warnings to this
effect.

![Screenshot 2025-03-18 at 10 33
27 AM](https://github.com/user-attachments/assets/f1e4b302-c62a-43b0-aa86-eaf9271e86cb)
2025-03-18 10:50:34 -04:00
ccurme
5684653775 openai[patch]: release 0.3.9 (#30325) 2025-03-17 16:08:41 +00:00
ccurme
eb9b992aa6 openai[patch]: support additional Responses API features (#30322)
- Include response headers
- Max tokens
- Reasoning effort
- Fix bug with structured output / strict
- Fix bug with simultaneous tool calling + structured output
2025-03-17 12:02:21 -04:00
ccurme
c74e7b997d openai[patch]: support structured output via Responses API (#30265)
Also runs all standard tests using Responses API.
2025-03-14 15:14:23 -04:00
Stavros Kontopoulos
ac22cde130 langchain_ollama: Support keep_alive in embeddings (#30251)
- Description: Adds support for keep_alive in Ollama Embeddings see
https://github.com/ollama/ollama/issues/6401.
Builds on top of of
https://github.com/langchain-ai/langchain/pull/29296. I have this use
case where I want to keep the embeddings model in cpu forever.
- Dependencies: no deps are being introduced.
- Issue: haven't created an issue yet.
2025-03-14 14:56:50 -04:00
ccurme
d5d0134e7b anthropic: release 0.3.10 (#30287) 2025-03-14 16:23:21 +00:00
ccurme
226f29bc96 anthropic: support built-in tools, improve docs (#30274)
- Support features from recent update:
https://www.anthropic.com/news/token-saving-updates (mostly adding
support for built-in tools in `bind_tools`
- Add documentation around prompt caching, token-efficient tool use, and
built-in tools.
2025-03-14 16:18:50 +00:00
ccurme
bbd4b36d76 mistralai[patch]: bump core (#30278) 2025-03-13 23:04:36 +00:00
ccurme
733abcc884 mistral: release 0.2.8 (#30275) 2025-03-13 21:54:34 +00:00
ccurme
cd1ea8e94d openai[patch]: support Responses API (#30231)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2025-03-12 12:25:46 -04:00
ccurme
62c570dd77 standard-tests, openai: bump core (#30202) 2025-03-10 19:22:24 +00:00
ccurme
f896e701eb deepseek: install local langchain-tests in test deps (#30198) 2025-03-10 16:58:17 +00:00
ccurme
b209d46eb3 mistral[patch]: set global ssl context (#30189) 2025-03-09 21:27:41 +00:00
ccurme
17507c9ba6 groq[patch]: release 0.2.5 (#30168) 2025-03-07 20:25:51 +00:00
ccurme
74e7772a5f groq[patch]: warn if model is not specified (#30161)
Groq is retiring `mixtral-8x7b-32768`, which is currently the default
model for ChatGroq, on March 20. Here we emit a warning if the model is
not specified explicitly.

A version 0.3.0 will be released ahead of March 20 that removes the
default altogether.
2025-03-07 15:21:13 -05:00
ccurme
34638ccfae openai[patch]: release 0.3.8 (#30164) 2025-03-07 18:26:40 +00:00
ccurme
806211475a core[patch]: update structured output tracing (#30123)
- Trace JSON schema in `options`
- Rename to `ls_structured_output_format`
2025-03-07 13:05:25 -05:00
ccurme
230876a7c5 anthropic[patch]: add PDF input example to API reference (#30156) 2025-03-07 14:19:08 +00:00
ccurme
52b0570bec core, openai, standard-tests: improve OpenAI compatibility with Anthropic content blocks (#30128)
- Support thinking blocks in core's `convert_to_openai_messages` (pass
through instead of error)
- Ignore thinking blocks in ChatOpenAI (instead of error)
- Support Anthropic-style image blocks in ChatOpenAI

---

Standard integration tests include a `supports_anthropic_inputs`
property which is currently enabled only for tests on `ChatAnthropic`.
This test enforces compatibility with message histories of the form:
```
- system message
- human message
- AI message with tool calls specified only through `tool_use` content blocks
- human message containing `tool_result` and an additional `text` block
```
It additionally checks support for Anthropic-style image inputs if
`supports_image_inputs` is enabled.

Here we change this test, such that if you enable
`supports_anthropic_inputs`:
- You support AI messages with text and `tool_use` content blocks
- You support Anthropic-style image inputs (if `supports_image_inputs`
is enabled)
- You support thinking content blocks.

That is, we add a test case for thinking content blocks, but we also
remove the requirement of handling tool results within HumanMessages
(motivated by existing agent abstractions, which should all return
ToolMessage). We move that requirement to a ChatAnthropic-specific test.
2025-03-06 09:53:14 -05:00
ccurme
ba5ddb218f anthropic[patch]: release 0.3.9 (#30103) 2025-03-04 10:53:55 -05:00
Samuel Dion-Girardeau
ccb64e9f4f docs: Fix typo in code samples for max_tokens_for_prompt (#30088)
- **Description:** Fix typo in code samples for max_tokens_for_prompt.
Code blocks had singular "token" but the method has plural "tokens".
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** N/A
2025-03-04 09:11:21 -05:00
ccurme
3b066dc005 anthropic[patch]: allow structured output when thinking is enabled (#30047)
Structured output will currently always raise a BadRequestError when
Claude 3.7 Sonnet's `thinking` is enabled, because we rely on forced
tool use for structured output and this feature is not supported when
`thinking` is enabled.

Here we:
- Emit a warning if `with_structured_output` is called when `thinking`
is enabled.
- Raise `OutputParserException` if no tool calls are generated.

This is arguably preferable to raising an error in all cases.

```python
from langchain_anthropic import ChatAnthropic
from pydantic import BaseModel


class Person(BaseModel):
    name: str
    age: int


llm = ChatAnthropic(
    model="claude-3-7-sonnet-latest",
    max_tokens=5_000,
    thinking={"type": "enabled", "budget_tokens": 2_000},
)
structured_llm = llm.with_structured_output(Person)  # <-- this generates a warning
```

```python
structured_llm.invoke("Alice is 30.")  # <-- works
```

```python
structured_llm.invoke("Hello!")  # <-- raises OutputParserException
```
2025-02-28 14:44:11 -05:00