Fixes#33986.
Summary:
- Normalize scheme-less `base_url` values (e.g., `ollama:11434`) by
defaulting to `http://` when the input resembles `host:port`.
- Preserve and merge `Authorization` headers when `userinfo` credentials
are present, both for sync and async clients.
- Add unit tests covering scheme-less host:port and scheme-less userinfo
credentials.
Implementation details:
- Update `parse_url_with_auth` to accept scheme-less endpoints,
producing a cleaned URL with explicit scheme and extracted auth headers.
- No changes required in `OllamaLLM`, `ChatOllama`, or
`OllamaEmbeddings`—they already consume the cleaned URL and headers.
Why:
- Previously, scheme-less inputs caused `parse_url_with_auth` to return
`(None, None)`, leading Ollama clients to fall back to defaults and
ignore the provided `base_url`.
Tests:
- Extended `libs/partners/ollama/tests/unit_tests/test_auth.py` to cover
the new cases.
Notes:
- Default scheme chosen is `http` to match common Ollama local
deployments. Users can still explicitly provide `https://` when
appropriate.
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Closes#34207
---
Expose log probabilities from the Ollama Python SDK through
`ChatOllama`. The ollama client already returns a `logprobs` field on
chat responses for supported models, but `ChatOllama` had no way to
request or surface it.
## Changes
- Add `logprobs` and `top_logprobs` fields to `ChatOllama`, forwarded to
the client via `_build_chat_params`. Setting `top_logprobs` without
`logprobs=True` auto-enables it with a warning; setting it with
`logprobs=False` raises a `ValueError`
- Surface per-token logprobs on intermediate streaming chunks (both sync
`_create_chat_stream` and async `_create_async_chat_stream`) via
`response_metadata["logprobs"]`, accumulated into the final response on
`invoke()`
- Bump minimum `ollama` SDK from `>=0.6.0` to `>=0.6.1` — the version
that added logprobs support
---------
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
## Summary
- When `self._client` is `None` in `_create_chat_stream()`, the method
silently produces an empty generator instead of failing.
- The error only surfaces later as a misleading `"No data received from
Ollama stream"` ValueError, making it difficult to diagnose the actual
root cause (uninitialized client).
- Changed to raise `RuntimeError` immediately with a clear message when
the sync client is not initialized.
## Why this matters
Users who hit this path see a confusing error message that points them
in the wrong direction. An explicit error at the point of failure makes
debugging straightforward.
## Test plan
- [x] Added `test_create_chat_stream_raises_when_client_none`
- [x] Existing tests still pass
> This PR was authored with the help of AI tools.
---------
Co-authored-by: Mason Daugherty <github@mdrxy.com>
The Ollama chat model adapter does not support all of the possible
message content formats. That leads to Ollama model adapter crashing on
some messages from different models (e.g. Gemini 2.5 Flash).
These changes should fix one known scenario - when `content` is a list
containing a string.
Ensures proper reStructuredText formatting by adding the required blank
line before closing docstring quotes, which resolves the "Block quote
ends without a blank line; unexpected unindent" warning.
## Problem
When using `ChatOllama` with `create_react_agent`, agents would
sometimes terminate prematurely with empty responses when Ollama
returned `done_reason: 'load'` responses with no content. This caused
agents to return empty `AIMessage` objects instead of actual generated
text.
```python
from langchain_ollama import ChatOllama
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage
llm = ChatOllama(model='qwen2.5:7b', temperature=0)
agent = create_react_agent(model=llm, tools=[])
result = agent.invoke(HumanMessage('Hello'), {"configurable": {"thread_id": "1"}})
# Before fix: AIMessage(content='', response_metadata={'done_reason': 'load'})
# Expected: AIMessage with actual generated content
```
## Root Cause
The `_iterate_over_stream` and `_aiterate_over_stream` methods treated
any response with `done: True` as final, regardless of `done_reason`.
When Ollama returns `done_reason: 'load'` with empty content, it
indicates the model was loaded but no actual generation occurred - this
should not be considered a complete response.
## Solution
Modified the streaming logic to skip responses when:
- `done: True`
- `done_reason: 'load'`
- Content is empty or contains only whitespace
This ensures agents only receive actual generated content while
preserving backward compatibility for load responses that do contain
content.
## Changes
- **`_iterate_over_stream`**: Skip empty load responses instead of
yielding them
- **`_aiterate_over_stream`**: Apply same fix to async streaming
- **Tests**: Added comprehensive test cases covering all edge cases
## Testing
All scenarios now work correctly:
- ✅ Empty load responses are skipped (fixes original issue)
- ✅ Load responses with actual content are preserved (backward
compatibility)
- ✅ Normal stop responses work unchanged
- ✅ Streaming behavior preserved
- ✅ `create_react_agent` integration fixed
Fixes#31482.
<!-- START COPILOT CODING AGENT TIPS -->
---
💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
**Description:**
This PR makes argument parsing for Ollama tool calls more robust. Some
LLMs—including Ollama—may return arguments as Python-style dictionaries
with single quotes (e.g., `{'a': 1}`), which are not valid JSON and
previously caused parsing to fail.
The updated `_parse_json_string` method in
`langchain_ollama.chat_models` now attempts standard JSON parsing and,
if that fails, falls back to `ast.literal_eval` for safe evaluation of
Python-style dictionaries. This improves interoperability with LLMs and
fixes a common usability issue for tool-based agents.
**Issue:**
Closes#30910
**Dependencies:**
None
**Tests:**
- Added new unit tests for double-quoted JSON, single-quoted dicts,
mixed quoting, and malformed/failure cases.
- All tests pass locally, including new coverage for single-quoted
inputs.
**Notes:**
- No breaking changes.
- No new dependencies introduced.
- Code is formatted and linted (`ruff format`, `ruff check`).
- If maintainers have suggestions for further improvements, I’m happy to
revise!
Thank you for maintaining LangChain! Looking forward to your feedback.
The `num_gpu` parameter in `OllamaEmbeddings` was not being passed to
the Ollama client in the async embedding method, causing GPU
acceleration settings to be ignored when using async operations.
## Problem
The issue was in the `aembed_documents` method where the `options`
parameter (containing `num_gpu` and other configuration) was missing:
```python
# Sync method (working correctly)
return self._client.embed(
self.model, texts, options=self._default_params, keep_alive=self.keep_alive
)["embeddings"]
# Async method (missing options parameter)
return (
await self._async_client.embed(
self.model, texts, keep_alive=self.keep_alive # ❌ No options!
)
)["embeddings"]
```
This meant that when users specified `num_gpu=4` (or any other GPU
configuration), it would work with sync calls but be ignored with async
calls.
## Solution
Added the missing `options=self._default_params` parameter to the async
embed call to match the sync version:
```python
# Fixed async method
return (
await self._async_client.embed(
self.model,
texts,
options=self._default_params, # ✅ Now includes num_gpu!
keep_alive=self.keep_alive,
)
)["embeddings"]
```
## Validation
- ✅ Added unit test to verify options are correctly passed in both sync
and async methods
- ✅ All existing tests continue to pass
- ✅ Manual testing confirms `num_gpu` parameter now works correctly
- ✅ Code passes linting and formatting checks
The fix ensures that GPU configuration works consistently across both
synchronous and asynchronous embedding operations.
Fixes#32059.
<!-- START COPILOT CODING AGENT TIPS -->
---
💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
* update model validation due to change in [Ollama
client](https://github.com/ollama/ollama) - ensure you are running the
latest version (0.9.6) to use `validate_model_on_init`
* add code example and fix formatting for ChatOllama reasoning
* ensure that setting `reasoning` in invocation kwargs overrides
class-level setting
* tests
* New `reasoning` (bool) param to support toggling [Ollama
thinking](https://ollama.com/blog/thinking) (#31573, #31700). If
`reasoning=True`, Ollama's `thinking` content will be placed in the
model responses' `additional_kwargs.reasoning_content`.
* Supported by:
* ChatOllama (class level, invocation level TODO)
* OllamaLLM (TODO)
* Added tests to ensure streaming tool calls is successful (#29129)
* Refactored tests that relied on `extract_reasoning()`
* Myriad docs additions and consistency/typo fixes
* Improved type safety in some spots
Closes#29129
Addresses #31573 and #31700
Supersedes #31701
* Ensure access to local model during `ChatOllama` instantiation
(#27720). This adds a new param `validate_model_on_init` (default:
`true`)
* Catch a few more errors from the Ollama client to assist users
- There was some ambiguous wording that has been updated to hopefully
clarify the functionality of `reasoning_format` in ChatGroq.
- Added support for `reasoning_effort`
- Added links to see models capable of `reasoning_format` and
`reasoning_effort`
- Other minor nits
**Description**:
Add a `async_client_kwargs` field to ollama chat/llm/embeddings adapters
that is passed to async httpx client constructor.
**Motivation:**
In my use-case:
- chat/embedding model adapters may be created frequently, sometimes to
be called just once or to never be called at all
- they may be used in bots sunc and async mode (not known at the moment
they are created)
So, I want to keep a static transport instance maintaining connection
pool, so model adapters can be created and destroyed freely. But that
doesn't work when both sync and async functions are in use as I can only
pass one transport instance for both sync and async client, while
transport types must be different for them. So I can't make both sync
and async calls use shared transport with current model adapter
interfaces.
In this PR I add a separate `async_client_kwargs` that gets passed to
async client constructor, so it will be possible to pass a separate
transport instance. For sake of backwards compatibility, it is merged
with `client_kwargs`, so nothing changes when it is not set.
I am unable to run linter right now, but the changes look ok.