Deepseek model does not return reasoning when hosted on openrouter
(Issue [30067](https://github.com/langchain-ai/langchain/issues/30067))
the following code did not return reasoning:
```python
llm = ChatDeepSeek( model = 'deepseek/deepseek-r1:nitro', api_base="https://openrouter.ai/api/v1", api_key=os.getenv("OPENROUTER_API_KEY"))
messages = [
{"role": "system", "content": "You are an assistant."},
{"role": "user", "content": "9.11 and 9.8, which is greater? Explain the reasoning behind this decision."}
]
response = llm.invoke(messages, extra_body={"include_reasoning": True})
print(response.content)
print(f"REASONING: {response.additional_kwargs.get('reasoning_content', '')}")
print(response)
```
The fix is to extract reasoning from
response.choices[0].message["model_extra"] and from
choices[0].delta["reasoning"]. and place in response additional_kwargs.
Change is really just the addition of a couple one-sentence if
statements.
---------
Co-authored-by: andrasfe <andrasf94@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
# Description
This PR adds reasoning model support for `langchain-ollama` by
extracting reasoning token blocks, like those used in deepseek. It was
inspired by
[ollama-deep-researcher](https://github.com/langchain-ai/ollama-deep-researcher),
specifically the parsing of [thinking
blocks](6d1aaf2139/src/assistant/graph.py (L91)):
```python
# TODO: This is a hack to remove the <think> tags w/ Deepseek models
# It appears very challenging to prompt them out of the responses
while "<think>" in running_summary and "</think>" in running_summary:
start = running_summary.find("<think>")
end = running_summary.find("</think>") + len("</think>")
running_summary = running_summary[:start] + running_summary[end:]
```
This notes that it is very hard to remove the reasoning block from
prompting, but we actually want the model to reason in order to increase
model performance. This implementation extracts the thinking block, so
the client can still expect a proper message to be returned by
`ChatOllama` (and use the reasoning content separately when desired).
This implementation takes the same approach as
[ChatDeepseek](5d581ba22c/libs/partners/deepseek/langchain_deepseek/chat_models.py (L215)),
which adds the reasoning content to
chunk.additional_kwargs.reasoning_content;
```python
if hasattr(response.choices[0].message, "reasoning_content"): # type: ignore
rtn.generations[0].message.additional_kwargs["reasoning_content"] = (
response.choices[0].message.reasoning_content # type: ignore
)
```
This should probably be handled upstream in ollama + ollama-python, but
this seems like a reasonably effective solution. This is a standalone
example of what is happening;
```python
async def deepseek_message_astream(
llm: BaseChatModel,
messages: list[BaseMessage],
config: RunnableConfig | None = None,
*,
model_target: str = "deepseek-r1",
**kwargs: Any,
) -> AsyncIterator[BaseMessageChunk]:
"""Stream responses from Deepseek models, filtering out <think> tags.
Args:
llm: The language model to stream from
messages: The messages to send to the model
Yields:
Filtered chunks from the model response
"""
# check if the model is deepseek based
if (llm.name and model_target not in llm.name) or (hasattr(llm, "model") and model_target not in llm.model):
async for chunk in llm.astream(messages, config=config, **kwargs):
yield chunk
return
# Yield with a buffer, upon completing the <think></think> tags, move them to the reasoning content and start over
buffer = ""
async for chunk in llm.astream(messages, config=config, **kwargs):
# start or append
if not buffer:
buffer = chunk.content
else:
buffer += chunk.content if hasattr(chunk, "content") else chunk
# Process buffer to remove <think> tags
if "<think>" in buffer or "</think>" in buffer:
if hasattr(chunk, "tool_calls") and chunk.tool_calls:
raise NotImplementedError("tool calls during reasoning should be removed?")
if "<think>" in chunk.content or "</think>" in chunk.content:
continue
chunk.additional_kwargs["reasoning_content"] = chunk.content
chunk.content = ""
# upon block completion, reset the buffer
if "<think>" in buffer and "</think>" in buffer:
buffer = ""
yield chunk
```
# Issue
Integrating reasoning models (e.g. deepseek-r1) into existing LangChain
based workflows is hard due to the thinking blocks that are included in
the message contents. To avoid this, we could match the `ChatOllama`
integration with `ChatDeepseek` to return the reasoning content inside
`message.additional_arguments.reasoning_content` instead.
# Dependenices
None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- Test if models support forcing tool calls via `tool_choice`. If they
do, they should support
- `"any"` to specify any tool
- the tool name as a string to force calling a particular tool
- Add `tool_choice` to signature of `BaseChatModel.bind_tools` in core
- Deprecate `tool_choice_value` in standard tests in favor of a boolean
`has_tool_choice`
Will follow up with PRs in external repos (tested in AWS and Google
already).
**Description:**
This PR fixes a minor typo in the comments within
`libs/partners/openai/langchain_openai/chat_models/base.py`. The word
"ben" has been corrected to "be" for clarity and professionalism.
**Issue:**
N/A
**Dependencies:**
None
The default model for `ChatGroq`, `"mixtral-8x7b-32768"`, is being
retired on March 20, 2025. Here we remove the default, such that model
names must be explicitly specified (being explicit is a good practice
here, and avoids the need for breaking changes down the line). This
change will be released in a minor version bump to 0.3.
This follows https://github.com/langchain-ai/langchain/pull/30161
(released in version 0.2.5), where we began generating warnings to this
effect.

- Support features from recent update:
https://www.anthropic.com/news/token-saving-updates (mostly adding
support for built-in tools in `bind_tools`
- Add documentation around prompt caching, token-efficient tool use, and
built-in tools.
Groq is retiring `mixtral-8x7b-32768`, which is currently the default
model for ChatGroq, on March 20. Here we emit a warning if the model is
not specified explicitly.
A version 0.3.0 will be released ahead of March 20 that removes the
default altogether.
- Support thinking blocks in core's `convert_to_openai_messages` (pass
through instead of error)
- Ignore thinking blocks in ChatOpenAI (instead of error)
- Support Anthropic-style image blocks in ChatOpenAI
---
Standard integration tests include a `supports_anthropic_inputs`
property which is currently enabled only for tests on `ChatAnthropic`.
This test enforces compatibility with message histories of the form:
```
- system message
- human message
- AI message with tool calls specified only through `tool_use` content blocks
- human message containing `tool_result` and an additional `text` block
```
It additionally checks support for Anthropic-style image inputs if
`supports_image_inputs` is enabled.
Here we change this test, such that if you enable
`supports_anthropic_inputs`:
- You support AI messages with text and `tool_use` content blocks
- You support Anthropic-style image inputs (if `supports_image_inputs`
is enabled)
- You support thinking content blocks.
That is, we add a test case for thinking content blocks, but we also
remove the requirement of handling tool results within HumanMessages
(motivated by existing agent abstractions, which should all return
ToolMessage). We move that requirement to a ChatAnthropic-specific test.
- **Description:** Fix typo in code samples for max_tokens_for_prompt.
Code blocks had singular "token" but the method has plural "tokens".
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** N/A
Structured output will currently always raise a BadRequestError when
Claude 3.7 Sonnet's `thinking` is enabled, because we rely on forced
tool use for structured output and this feature is not supported when
`thinking` is enabled.
Here we:
- Emit a warning if `with_structured_output` is called when `thinking`
is enabled.
- Raise `OutputParserException` if no tool calls are generated.
This is arguably preferable to raising an error in all cases.
```python
from langchain_anthropic import ChatAnthropic
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
llm = ChatAnthropic(
model="claude-3-7-sonnet-latest",
max_tokens=5_000,
thinking={"type": "enabled", "budget_tokens": 2_000},
)
structured_llm = llm.with_structured_output(Person) # <-- this generates a warning
```
```python
structured_llm.invoke("Alice is 30.") # <-- works
```
```python
structured_llm.invoke("Hello!") # <-- raises OutputParserException
```
Took a "census" of models supported by init_chat_model-- of those that
return model names in response metadata, these were the only two that
had it keyed under `"model"` instead of `"model_name"`.
Resolves https://github.com/langchain-ai/langchain/issues/29003,
https://github.com/langchain-ai/langchain/issues/27264
Related: https://github.com/langchain-ai/langchain-redis/issues/52
```python
from langchain.chat_models import init_chat_model
from langchain.globals import set_llm_cache
from langchain_community.cache import SQLiteCache
from pydantic import BaseModel
cache = SQLiteCache()
set_llm_cache(cache)
class Temperature(BaseModel):
value: int
city: str
llm = init_chat_model("openai:gpt-4o-mini")
structured_llm = llm.with_structured_output(Temperature)
```
```python
# 681 ms
response = structured_llm.invoke("What is the average temperature of Rome in May?")
```
```python
# 6.98 ms
response = structured_llm.invoke("What is the average temperature of Rome in May?")
```