Fixes#37480
---
Avoid prepending a leading newline when converting multimodal message
text content in `ChatOllama`. Previously, the first text segment in
list-format message content was always prefixed with `\n`, which could
break formatting-sensitive vision/OCR models.
Also adds a regression test to ensure multimodal message content does
not start with an unintended leading newline.
Verified by running:
- make format
- make lint
- make test
Fixes#34610
---
This PR resolves an issue where `ChatOllama` would raise an `unexpected
keyword argument 'response_format'` error when used with `create_agent`
or when passed an OpenAI-style `response_format`.
When using `create_agent` (especially with models like `gpt-oss`),
LangChain creates a `response_format` argument (e.g., `{"type":
"json_schema", ...}`). `ChatOllama` previously passed this argument
directly to the underlying Ollama client, which does not support
`response_format` and instead expects a `format` parameter.
## The Fix
I updated `_chat_params` in
`libs/partners/ollama/langchain_ollama/chat_models.py` to:
1. Intercept the `response_format` argument.
2. Map it to the native Ollama `format` parameter:
* `{"type": "json_schema", "json_schema": {"schema": ...}}` ->
`format=schema`
* `{"type": "json_object"}` -> `format="json"`
3. Remove `response_format` from the kwargs passed to the client.
## Validation
* **Reproduction Script**: Verified the fix with a script covering
`json_schema`, `json_object`, and explicit `format` priority scenarios.
* **New Tests**: Added 3 new unit tests to
`libs/partners/ollama/tests/unit_tests/test_chat_models.py` covering
these scenarios.
* **Regression**: Ran the full test suite (`make -C libs/partners/ollama
test`), passing 29 tests (previously 26).
* **Lint/Format**: Verified with `make lint_package` and `make format`.
---------
Co-authored-by: Mohan Kumar Sagadevan <mohankumarsagadevan@Mohans-MacBook-Air.local>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Closes#36177.
---
Ollama's deserialization path already captures `"thinking"` content as
`additional_kwargs["reasoning_content"]` on `AIMessage`, but the reverse
direction — serializing back to the Ollama wire format — was missing.
This means multi-turn conversations with reasoning models like
`deepseek-r1` would silently drop the chain-of-thought, breaking agents
that need prior reasoning preserved across turns.
Fixes#34623
Add `dimensions` field to `OllamaEmbeddings` to allow users to specify
output embedding size for models that support variable dimensions . The
field is passed
directly to the Ollama client's `embed()` call for both sync and async
methods.
**How I verified it works:**
- Ran unit tests: `python -m pytest tests/unit_tests/ -v`
- Ran integration tests against a live Ollama instance:
`OLLAMA_HOST=http://ollama:11434 python -m pytest
tests/integration_tests/ -v`
- Confirmed that passing `dimensions=768` no longer raises
`extra_forbidden`
Pydantic validation error and returns embeddings of the expected size.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Fixes#33986.
Summary:
- Normalize scheme-less `base_url` values (e.g., `ollama:11434`) by
defaulting to `http://` when the input resembles `host:port`.
- Preserve and merge `Authorization` headers when `userinfo` credentials
are present, both for sync and async clients.
- Add unit tests covering scheme-less host:port and scheme-less userinfo
credentials.
Implementation details:
- Update `parse_url_with_auth` to accept scheme-less endpoints,
producing a cleaned URL with explicit scheme and extracted auth headers.
- No changes required in `OllamaLLM`, `ChatOllama`, or
`OllamaEmbeddings`—they already consume the cleaned URL and headers.
Why:
- Previously, scheme-less inputs caused `parse_url_with_auth` to return
`(None, None)`, leading Ollama clients to fall back to defaults and
ignore the provided `base_url`.
Tests:
- Extended `libs/partners/ollama/tests/unit_tests/test_auth.py` to cover
the new cases.
Notes:
- Default scheme chosen is `http` to match common Ollama local
deployments. Users can still explicitly provide `https://` when
appropriate.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Closes#34207
---
Expose log probabilities from the Ollama Python SDK through
`ChatOllama`. The ollama client already returns a `logprobs` field on
chat responses for supported models, but `ChatOllama` had no way to
request or surface it.
## Changes
- Add `logprobs` and `top_logprobs` fields to `ChatOllama`, forwarded to
the client via `_build_chat_params`. Setting `top_logprobs` without
`logprobs=True` auto-enables it with a warning; setting it with
`logprobs=False` raises a `ValueError`
- Surface per-token logprobs on intermediate streaming chunks (both sync
`_create_chat_stream` and async `_create_async_chat_stream`) via
`response_metadata["logprobs"]`, accumulated into the final response on
`invoke()`
- Bump minimum `ollama` SDK from `>=0.6.0` to `>=0.6.1` — the version
that added logprobs support
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
## Summary
- When `self._client` is `None` in `_create_chat_stream()`, the method
silently produces an empty generator instead of failing.
- The error only surfaces later as a misleading `"No data received from
Ollama stream"` ValueError, making it difficult to diagnose the actual
root cause (uninitialized client).
- Changed to raise `RuntimeError` immediately with a clear message when
the sync client is not initialized.
## Why this matters
Users who hit this path see a confusing error message that points them
in the wrong direction. An explicit error at the point of failure makes
debugging straightforward.
## Test plan
- [x] Added `test_create_chat_stream_raises_when_client_none`
- [x] Existing tests still pass
> This PR was authored with the help of AI tools.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
## Problem
When using `ChatOllama` with `create_react_agent`, agents would
sometimes terminate prematurely with empty responses when Ollama
returned `done_reason: 'load'` responses with no content. This caused
agents to return empty `AIMessage` objects instead of actual generated
text.
```python
from langchain_ollama import ChatOllama
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage
llm = ChatOllama(model='qwen2.5:7b', temperature=0)
agent = create_react_agent(model=llm, tools=[])
result = agent.invoke(HumanMessage('Hello'), {"configurable": {"thread_id": "1"}})
# Before fix: AIMessage(content='', response_metadata={'done_reason': 'load'})
# Expected: AIMessage with actual generated content
```
## Root Cause
The `_iterate_over_stream` and `_aiterate_over_stream` methods treated
any response with `done: True` as final, regardless of `done_reason`.
When Ollama returns `done_reason: 'load'` with empty content, it
indicates the model was loaded but no actual generation occurred - this
should not be considered a complete response.
## Solution
Modified the streaming logic to skip responses when:
- `done: True`
- `done_reason: 'load'`
- Content is empty or contains only whitespace
This ensures agents only receive actual generated content while
preserving backward compatibility for load responses that do contain
content.
## Changes
- **`_iterate_over_stream`**: Skip empty load responses instead of
yielding them
- **`_aiterate_over_stream`**: Apply same fix to async streaming
- **Tests**: Added comprehensive test cases covering all edge cases
## Testing
All scenarios now work correctly:
- ✅ Empty load responses are skipped (fixes original issue)
- ✅ Load responses with actual content are preserved (backward
compatibility)
- ✅ Normal stop responses work unchanged
- ✅ Streaming behavior preserved
- ✅ `create_react_agent` integration fixed
Fixes#31482.
<!-- START COPILOT CODING AGENT TIPS -->
---
💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
**Description:**
This PR makes argument parsing for Ollama tool calls more robust. Some
LLMs—including Ollama—may return arguments as Python-style dictionaries
with single quotes (e.g., `{'a': 1}`), which are not valid JSON and
previously caused parsing to fail.
The updated `_parse_json_string` method in
`langchain_ollama.chat_models` now attempts standard JSON parsing and,
if that fails, falls back to `ast.literal_eval` for safe evaluation of
Python-style dictionaries. This improves interoperability with LLMs and
fixes a common usability issue for tool-based agents.
**Issue:**
Closes#30910
**Dependencies:**
None
**Tests:**
- Added new unit tests for double-quoted JSON, single-quoted dicts,
mixed quoting, and malformed/failure cases.
- All tests pass locally, including new coverage for single-quoted
inputs.
**Notes:**
- No breaking changes.
- No new dependencies introduced.
- Code is formatted and linted (`ruff format`, `ruff check`).
- If maintainers have suggestions for further improvements, I’m happy to
revise!
Thank you for maintaining LangChain! Looking forward to your feedback.
The `num_gpu` parameter in `OllamaEmbeddings` was not being passed to
the Ollama client in the async embedding method, causing GPU
acceleration settings to be ignored when using async operations.
## Problem
The issue was in the `aembed_documents` method where the `options`
parameter (containing `num_gpu` and other configuration) was missing:
```python
# Sync method (working correctly)
return self._client.embed(
self.model, texts, options=self._default_params, keep_alive=self.keep_alive
)["embeddings"]
# Async method (missing options parameter)
return (
await self._async_client.embed(
self.model, texts, keep_alive=self.keep_alive # ❌ No options!
)
)["embeddings"]
```
This meant that when users specified `num_gpu=4` (or any other GPU
configuration), it would work with sync calls but be ignored with async
calls.
## Solution
Added the missing `options=self._default_params` parameter to the async
embed call to match the sync version:
```python
# Fixed async method
return (
await self._async_client.embed(
self.model,
texts,
options=self._default_params, # ✅ Now includes num_gpu!
keep_alive=self.keep_alive,
)
)["embeddings"]
```
## Validation
- ✅ Added unit test to verify options are correctly passed in both sync
and async methods
- ✅ All existing tests continue to pass
- ✅ Manual testing confirms `num_gpu` parameter now works correctly
- ✅ Code passes linting and formatting checks
The fix ensures that GPU configuration works consistently across both
synchronous and asynchronous embedding operations.
Fixes#32059.
<!-- START COPILOT CODING AGENT TIPS -->
---
💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
* update model validation due to change in [Ollama
client](https://github.com/ollama/ollama) - ensure you are running the
latest version (0.9.6) to use `validate_model_on_init`
* add code example and fix formatting for ChatOllama reasoning
* ensure that setting `reasoning` in invocation kwargs overrides
class-level setting
* tests
* New `reasoning` (bool) param to support toggling [Ollama
thinking](https://ollama.com/blog/thinking) (#31573, #31700). If
`reasoning=True`, Ollama's `thinking` content will be placed in the
model responses' `additional_kwargs.reasoning_content`.
* Supported by:
* ChatOllama (class level, invocation level TODO)
* OllamaLLM (TODO)
* Added tests to ensure streaming tool calls is successful (#29129)
* Refactored tests that relied on `extract_reasoning()`
* Myriad docs additions and consistency/typo fixes
* Improved type safety in some spots
Closes#29129
Addresses #31573 and #31700
Supersedes #31701
* Ensure access to local model during `ChatOllama` instantiation
(#27720). This adds a new param `validate_model_on_init` (default:
`true`)
* Catch a few more errors from the Ollama client to assist users