29 Commits

Author SHA1 Message Date
Mason Daugherty
281488a5cf Merge branch 'master' into wip-v0.4 2025-08-11 15:10:42 -04:00
Mason Daugherty
ee4c2510eb feat: port various nit changes from wip-v0.4 (#32506)
Lots of work that wasn't directly related to core
improvements/messages/testing functionality
2025-08-11 15:09:08 -04:00
ccurme
45a067509f fix(core): fix tracing for PDFs in v1 messages (#32434) 2025-08-11 12:18:32 -04:00
Mason Daugherty
13d67cf37e fix(ollama): reasoning should come before text content (#32476) 2025-08-08 19:34:36 -04:00
Mason Daugherty
c1b86cc929 feat: minor core work, v1 standard tests & (most of) v1 ollama (#32315)
Resolves #32215

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
2025-08-06 18:22:02 -04:00
Copilot
d40fd5a3ce feat(ollama): warn on empty load responses (#32161)
## Problem

When using `ChatOllama` with `create_react_agent`, agents would
sometimes terminate prematurely with empty responses when Ollama
returned `done_reason: 'load'` responses with no content. This caused
agents to return empty `AIMessage` objects instead of actual generated
text.

```python
from langchain_ollama import ChatOllama
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage

llm = ChatOllama(model='qwen2.5:7b', temperature=0)
agent = create_react_agent(model=llm, tools=[])

result = agent.invoke(HumanMessage('Hello'), {"configurable": {"thread_id": "1"}})
# Before fix: AIMessage(content='', response_metadata={'done_reason': 'load'})
# Expected: AIMessage with actual generated content
```

## Root Cause

The `_iterate_over_stream` and `_aiterate_over_stream` methods treated
any response with `done: True` as final, regardless of `done_reason`.
When Ollama returns `done_reason: 'load'` with empty content, it
indicates the model was loaded but no actual generation occurred - this
should not be considered a complete response.

## Solution

Modified the streaming logic to skip responses when:
- `done: True`
- `done_reason: 'load'` 
- Content is empty or contains only whitespace

This ensures agents only receive actual generated content while
preserving backward compatibility for load responses that do contain
content.

## Changes

- **`_iterate_over_stream`**: Skip empty load responses instead of
yielding them
- **`_aiterate_over_stream`**: Apply same fix to async streaming
- **Tests**: Added comprehensive test cases covering all edge cases

## Testing

All scenarios now work correctly:
-  Empty load responses are skipped (fixes original issue)
-  Load responses with actual content are preserved (backward
compatibility)
-  Normal stop responses work unchanged
-  Streaming behavior preserved
-  `create_react_agent` integration fixed

Fixes #31482.

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-22 13:21:11 -04:00
diego-coder
8e4396bb32 fix(ollama): robustly parse single-quoted JSON in tool calls (#32109)
**Description:**
This PR makes argument parsing for Ollama tool calls more robust. Some
LLMs—including Ollama—may return arguments as Python-style dictionaries
with single quotes (e.g., `{'a': 1}`), which are not valid JSON and
previously caused parsing to fail.
The updated `_parse_json_string` method in
`langchain_ollama.chat_models` now attempts standard JSON parsing and,
if that fails, falls back to `ast.literal_eval` for safe evaluation of
Python-style dictionaries. This improves interoperability with LLMs and
fixes a common usability issue for tool-based agents.

**Issue:**
Closes #30910

**Dependencies:**
None

**Tests:**
- Added new unit tests for double-quoted JSON, single-quoted dicts,
mixed quoting, and malformed/failure cases.
- All tests pass locally, including new coverage for single-quoted
inputs.

**Notes:**
- No breaking changes.
- No new dependencies introduced.
- Code is formatted and linted (`ruff format`, `ruff check`).
- If maintainers have suggestions for further improvements, I’m happy to
revise!

Thank you for maintaining LangChain! Looking forward to your feedback.
2025-07-21 12:11:22 -04:00
Copilot
98c3bbbaf0 fix(ollama): num_gpu parameter not working in async OllamaEmbeddings method (#32074)
The `num_gpu` parameter in `OllamaEmbeddings` was not being passed to
the Ollama client in the async embedding method, causing GPU
acceleration settings to be ignored when using async operations.

## Problem

The issue was in the `aembed_documents` method where the `options`
parameter (containing `num_gpu` and other configuration) was missing:

```python
# Sync method (working correctly)
return self._client.embed(
    self.model, texts, options=self._default_params, keep_alive=self.keep_alive
)["embeddings"]

# Async method (missing options parameter)
return (
    await self._async_client.embed(
        self.model, texts, keep_alive=self.keep_alive  #  No options!
    )
)["embeddings"]
```

This meant that when users specified `num_gpu=4` (or any other GPU
configuration), it would work with sync calls but be ignored with async
calls.

## Solution

Added the missing `options=self._default_params` parameter to the async
embed call to match the sync version:

```python
# Fixed async method
return (
    await self._async_client.embed(
        self.model,
        texts,
        options=self._default_params,  #  Now includes num_gpu!
        keep_alive=self.keep_alive,
    )
)["embeddings"]
```

## Validation

-  Added unit test to verify options are correctly passed in both sync
and async methods
-  All existing tests continue to pass
-  Manual testing confirms `num_gpu` parameter now works correctly
-  Code passes linting and formatting checks

The fix ensures that GPU configuration works consistently across both
synchronous and asynchronous embedding operations.

Fixes #32059.

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mdrxy <61371264+mdrxy@users.noreply.github.com>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-07-16 18:42:52 -04:00
Mason Daugherty
0002b1dafa ollama[patch]: fix model validation, ensure per-call reasoning can be set, tests (#31927)
* update model validation due to change in [Ollama
client](https://github.com/ollama/ollama) - ensure you are running the
latest version (0.9.6) to use `validate_model_on_init`
* add code example and fix formatting for ChatOllama reasoning
* ensure that setting `reasoning` in invocation kwargs overrides
class-level setting
* tests
2025-07-08 16:39:41 -04:00
Mason Daugherty
1f829aacf4 ollama[patch]: ruff fixes and rules (#31924)
* bump ruff deps
* add more thorough ruff rules
* fix said rules
2025-07-08 13:42:19 -04:00
Mason Daugherty
e686a70ee0 ollama: thinking, tool streaming, docs, tests (#31772)
* New `reasoning` (bool) param to support toggling [Ollama
thinking](https://ollama.com/blog/thinking) (#31573, #31700). If
`reasoning=True`, Ollama's `thinking` content will be placed in the
model responses' `additional_kwargs.reasoning_content`.
  * Supported by:
    * ChatOllama (class level, invocation level TODO)
    * OllamaLLM (TODO)
* Added tests to ensure streaming tool calls is successful (#29129)
* Refactored tests that relied on `extract_reasoning()`
* Myriad docs additions and consistency/typo fixes
* Improved type safety in some spots

Closes #29129
Addresses #31573 and #31700
Supersedes #31701
2025-07-07 13:56:41 -04:00
Mason Daugherty
572020c4d8 ollama: add validate_model_on_init, catch more errors (#31784)
* Ensure access to local model during `ChatOllama` instantiation
(#27720). This adds a new param `validate_model_on_init` (default:
`true`)
* Catch a few more errors from the Ollama client to assist users
2025-07-03 11:07:11 -04:00
Mason Daugherty
2fb27b63f5 ollama: update tests, docs (#31736)
- docs: for the Ollama notebooks, improve the specificity of some links,
add `homebrew` install info, update some wording
- tests: reduce number of local models needed to run in half from 4 → 2
(shedding 8gb of required installs)
- bump deps (non-breaking) in anticipation of upcoming "thinking" PR
2025-06-25 20:13:20 +00:00
rylativity
dbf9986d44 langchain-ollama (partners) / langchain-core: allow passing ChatMessages to Ollama (including arbitrary roles) (#30411)
Replacement for PR #30191 (@ccurme)

**Description**: currently, ChatOllama [will raise a value error if a
ChatMessage is passed to
it](https://github.com/langchain-ai/langchain/blob/master/libs/partners/ollama/langchain_ollama/chat_models.py#L514),
as described
https://github.com/langchain-ai/langchain/pull/30147#issuecomment-2708932481.

Furthermore, ollama-python is removing the limitations on valid roles
that can be passed through chat messages to a model in ollama -
https://github.com/ollama/ollama-python/pull/462#event-16917810634.

This PR removes the role limitations imposed by langchain and enables
passing langchain ChatMessages with arbitrary 'role' values through the
langchain ChatOllama class to the underlying ollama-python Client.

As this PR relies on [merged but unreleased functionality in
ollama-python](
https://github.com/ollama/ollama-python/pull/462#event-16917810634), I
have temporarily pointed the ollama package source to the main branch of
the ollama-python github repo.

Format, lint, and tests of new functionality passing. Need to resolve
issue with recently added ChatOllama tests. (Now resolved)

**Issue**: resolves #30122 (related to ollama issue
https://github.com/ollama/ollama/issues/8955)

**Dependencies**: no new dependencies

[x] PR title
[x] PR message
[x] Lint and test: format, lint, and test all running successfully and
passing

---------

Co-authored-by: Ryan Stewart <ryanstewart@Ryans-MacBook-Pro.local>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-04-18 10:07:07 -04:00
ccurme
085baef926 ollama[patch]: support standard image format (#30864)
Following https://github.com/langchain-ai/langchain/pull/30746
2025-04-15 22:14:50 +00:00
Sydney Runkle
8c6734325b partners[lint]: run pyupgrade to get code in line with 3.9 standards (#30781)
Using `pyupgrade` to get all `partners` code up to 3.9 standards
(mostly, fixing old `typing` imports).
2025-04-11 07:18:44 -04:00
Bob Merkus
5700646cc5 ollama: add reasoning model support (e.g. deepseek) (#29689)
# Description
This PR adds reasoning model support for `langchain-ollama` by
extracting reasoning token blocks, like those used in deepseek. It was
inspired by
[ollama-deep-researcher](https://github.com/langchain-ai/ollama-deep-researcher),
specifically the parsing of [thinking
blocks](6d1aaf2139/src/assistant/graph.py (L91)):
```python
  # TODO: This is a hack to remove the <think> tags w/ Deepseek models 
  # It appears very challenging to prompt them out of the responses 
  while "<think>" in running_summary and "</think>" in running_summary:
      start = running_summary.find("<think>")
      end = running_summary.find("</think>") + len("</think>")
      running_summary = running_summary[:start] + running_summary[end:]
```

This notes that it is very hard to remove the reasoning block from
prompting, but we actually want the model to reason in order to increase
model performance. This implementation extracts the thinking block, so
the client can still expect a proper message to be returned by
`ChatOllama` (and use the reasoning content separately when desired).

This implementation takes the same approach as
[ChatDeepseek](5d581ba22c/libs/partners/deepseek/langchain_deepseek/chat_models.py (L215)),
which adds the reasoning content to
chunk.additional_kwargs.reasoning_content;
```python
  if hasattr(response.choices[0].message, "reasoning_content"):  # type: ignore
      rtn.generations[0].message.additional_kwargs["reasoning_content"] = (
          response.choices[0].message.reasoning_content  # type: ignore
      )
```

This should probably be handled upstream in ollama + ollama-python, but
this seems like a reasonably effective solution. This is a standalone
example of what is happening;

```python
async def deepseek_message_astream(
    llm: BaseChatModel,
    messages: list[BaseMessage],
    config: RunnableConfig | None = None,
    *,
    model_target: str = "deepseek-r1",
    **kwargs: Any,
) -> AsyncIterator[BaseMessageChunk]:
    """Stream responses from Deepseek models, filtering out <think> tags.

    Args:
        llm: The language model to stream from
        messages: The messages to send to the model

    Yields:
        Filtered chunks from the model response
    """
    # check if the model is deepseek based
    if (llm.name and model_target not in llm.name) or (hasattr(llm, "model") and model_target not in llm.model):
        async for chunk in llm.astream(messages, config=config, **kwargs):
            yield chunk
        return

    # Yield with a buffer, upon completing the <think></think> tags, move them to the reasoning content and start over
    buffer = ""
    async for chunk in llm.astream(messages, config=config, **kwargs):
        # start or append
        if not buffer:
            buffer = chunk.content
        else:
            buffer += chunk.content if hasattr(chunk, "content") else chunk

        # Process buffer to remove <think> tags
        if "<think>" in buffer or "</think>" in buffer:
            if hasattr(chunk, "tool_calls") and chunk.tool_calls:
                raise NotImplementedError("tool calls during reasoning should be removed?")
            if "<think>" in chunk.content or "</think>" in chunk.content:
                continue
            chunk.additional_kwargs["reasoning_content"] = chunk.content
            chunk.content = ""
        # upon block completion, reset the buffer
        if "<think>" in buffer and "</think>" in buffer:
            buffer = ""
        yield chunk

```

# Issue
Integrating reasoning models (e.g. deepseek-r1) into existing LangChain
based workflows is hard due to the thinking blocks that are included in
the message contents. To avoid this, we could match the `ChatOllama`
integration with `ChatDeepseek` to return the reasoning content inside
`message.additional_arguments.reasoning_content` instead.

# Dependenices
None

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-21 15:44:54 +00:00
Mohammad Mohtashim
1103bdfaf1 (Ollama) Fix String Value parsing in _parse_arguments_from_tool_call (#30154)
- **Description:** Fix String Value parsing in
_parse_arguments_from_tool_call
- **Issue:** #30145

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-19 21:47:18 -04:00
Lance Martin
46d6bf0330 ollama[minor]: update default method for structured output (#30273)
From function calling to Ollama's [dedicated structured output
feature](https://ollama.com/blog/structured-outputs).

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-03-18 12:44:22 -04:00
Stavros Kontopoulos
ac22cde130 langchain_ollama: Support keep_alive in embeddings (#30251)
- Description: Adds support for keep_alive in Ollama Embeddings see
https://github.com/ollama/ollama/issues/6401.
Builds on top of of
https://github.com/langchain-ai/langchain/pull/29296. I have this use
case where I want to keep the embeddings model in cpu forever.
- Dependencies: no deps are being introduced.
- Issue: haven't created an issue yet.
2025-03-14 14:56:50 -04:00
Erick Friis
8f95da4eb1 multiple: structured output tracing standard metadata (#29421)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-29 14:00:26 -08:00
Bagatur
e4d3ccf62f json mode standard test (#25497)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-17 18:47:34 +00:00
ccurme
5c6e2cbcda ollama[patch]: support structured output (#28629)
- Bump minimum version of `ollama` to 0.4.4 (which also addresses
https://github.com/langchain-ai/langchain/issues/28607).
- Support recently-released [structured
output](https://ollama.com/blog/structured-outputs) feature. This can be
accessed by calling `.with_structured_output` with
`method="json_schema"` (choice of name
[mirrors](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.with_structured_output)
what we have for OpenAI's structured output feature).

`ChatOllama` previously implemented `.with_structured_output` via the
[base
implementation](ec9b41431e/libs/core/langchain_core/language_models/chat_models.py (L1117)).
2024-12-10 10:36:00 -05:00
TheDannyG
607c60a594 partners/ollama: fix tool calling with nested schemas (#28225)
## Description

This PR addresses the following:

**Fixes Issue #25343:**
- Adds additional logic to parse shallowly nested JSON-encoded strings
in tool call arguments, allowing for proper parsing of responses like
that of Llama3.1 and 3.2 with nested schemas.
 
**Adds Integration Test for Fix:**
- Adds a Ollama specific integration test to ensure the issue is
resolved and to prevent regressions in the future.

**Fixes Failing Integration Tests:**
- Fixes failing integration tests (even prior to changes) caused by
`llama3-groq-tool-use` model. Previously,
tests`test_structured_output_async` and
`test_structured_output_optional_param` failed due to the model not
issuing a tool call in the response. Resolved by switching to
`llama3.1`.

## Issue
Fixes #25343.

## Dependencies
No dependencies.

____

Done in collaboration with @ishaan-upadhyay @mirajismail @ZackSteine.
2024-11-27 10:32:02 -05:00
Erick Friis
0dbaf05bb7 standard-tests: rename langchain_standard_tests to langchain_tests, release 0.3.2 (#28203) 2024-11-18 19:10:39 -08:00
Isaac Francisco
a2e90a5a43 add embeddings integration tests (#25508) 2024-08-16 13:20:37 -07:00
ccurme
b83f1eb0d5 core, partners: implement standard tracing params for LLMs (#25410) 2024-08-16 13:18:09 -04:00
Isaac Francisco
152427eca1 make image inputs compatible with langchain_ollama (#24619) 2024-07-26 17:39:57 -07:00
Isaac Francisco
838464de25 ollama: init package (#23615)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-20 00:43:29 +00:00