feat(ollama): docs updates (#32507)

This commit is contained in:
Mason Daugherty 2025-08-11 15:39:44 -04:00 committed by GitHub
parent ee4c2510eb
commit 5ccdcd7b7b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 54 additions and 67 deletions

View File

@ -46,7 +46,7 @@
"\n", "\n",
"1. [`llama.cpp`](https://github.com/ggerganov/llama.cpp): C++ implementation of llama inference code with [weight optimization / quantization](https://finbarr.ca/how-is-llama-cpp-possible/)\n", "1. [`llama.cpp`](https://github.com/ggerganov/llama.cpp): C++ implementation of llama inference code with [weight optimization / quantization](https://finbarr.ca/how-is-llama-cpp-possible/)\n",
"2. [`gpt4all`](https://docs.gpt4all.io/index.html): Optimized C backend for inference\n", "2. [`gpt4all`](https://docs.gpt4all.io/index.html): Optimized C backend for inference\n",
"3. [`Ollama`](https://ollama.ai/): Bundles model weights and environment into an app that runs on device and serves the LLM\n", "3. [`ollama`](https://github.com/ollama/ollama): Bundles model weights and environment into an app that runs on device and serves the LLM\n",
"4. [`llamafile`](https://github.com/Mozilla-Ocho/llamafile): Bundles model weights and everything needed to run the model in a single file, allowing you to run the LLM locally from this file without any additional installation steps\n", "4. [`llamafile`](https://github.com/Mozilla-Ocho/llamafile): Bundles model weights and everything needed to run the model in a single file, allowing you to run the LLM locally from this file without any additional installation steps\n",
"\n", "\n",
"In general, these frameworks will do a few things:\n", "In general, these frameworks will do a few things:\n",
@ -74,12 +74,12 @@
"\n", "\n",
"## Quickstart\n", "## Quickstart\n",
"\n", "\n",
"[`Ollama`](https://ollama.ai/) is one way to easily run inference on macOS.\n", "[Ollama](https://ollama.com/) is one way to easily run inference on macOS.\n",
" \n", " \n",
"The instructions [here](https://github.com/jmorganca/ollama?tab=readme-ov-file#ollama) provide details, which we summarize:\n", "The instructions [here](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) provide details, which we summarize:\n",
" \n", " \n",
"* [Download and run](https://ollama.ai/download) the app\n", "* [Download and run](https://ollama.ai/download) the app\n",
"* From command line, fetch a model from this [list of options](https://github.com/jmorganca/ollama): e.g., `ollama pull llama3.1:8b`\n", "* From command line, fetch a model from this [list of options](https://ollama.com/search): e.g., `ollama pull gpt-oss:20b`\n",
"* When the app is running, all models are automatically served on `localhost:11434`\n" "* When the app is running, all models are automatically served on `localhost:11434`\n"
] ]
}, },
@ -95,7 +95,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": null,
"id": "86178adb", "id": "86178adb",
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
@ -111,11 +111,11 @@
} }
], ],
"source": [ "source": [
"from langchain_ollama import OllamaLLM\n", "from langchain_ollama import ChatOllama\n",
"\n", "\n",
"llm = OllamaLLM(model=\"llama3.1:8b\")\n", "llm = ChatOllama(model=\"gpt-oss:20b\", validate_model_on_init=True)\n",
"\n", "\n",
"llm.invoke(\"The first man on the moon was ...\")" "llm.invoke(\"The first man on the moon was ...\").content"
] ]
}, },
{ {
@ -200,7 +200,7 @@
"\n", "\n",
"### Running Apple silicon GPU\n", "### Running Apple silicon GPU\n",
"\n", "\n",
"`Ollama` and [`llamafile`](https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#gpu-support) will automatically utilize the GPU on Apple devices.\n", "`ollama` and [`llamafile`](https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#gpu-support) will automatically utilize the GPU on Apple devices.\n",
" \n", " \n",
"Other frameworks require the user to set up the environment to utilize the Apple GPU.\n", "Other frameworks require the user to set up the environment to utilize the Apple GPU.\n",
"\n", "\n",
@ -212,15 +212,15 @@
"\n", "\n",
"In particular, ensure that conda is using the correct virtual environment that you created (`miniforge3`).\n", "In particular, ensure that conda is using the correct virtual environment that you created (`miniforge3`).\n",
"\n", "\n",
"E.g., for me:\n", "e.g., for me:\n",
"\n", "\n",
"```\n", "```shell\n",
"conda activate /Users/rlm/miniforge3/envs/llama\n", "conda activate /Users/rlm/miniforge3/envs/llama\n",
"```\n", "```\n",
"\n", "\n",
"With the above confirmed, then:\n", "With the above confirmed, then:\n",
"\n", "\n",
"```\n", "```shell\n",
"CMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir\n", "CMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir\n",
"```" "```"
] ]
@ -236,20 +236,16 @@
"\n", "\n",
"1. [`HuggingFace`](https://huggingface.co/TheBloke) - Many quantized model are available for download and can be run with framework such as [`llama.cpp`](https://github.com/ggerganov/llama.cpp). You can also download models in [`llamafile` format](https://huggingface.co/models?other=llamafile) from HuggingFace.\n", "1. [`HuggingFace`](https://huggingface.co/TheBloke) - Many quantized model are available for download and can be run with framework such as [`llama.cpp`](https://github.com/ggerganov/llama.cpp). You can also download models in [`llamafile` format](https://huggingface.co/models?other=llamafile) from HuggingFace.\n",
"2. [`gpt4all`](https://gpt4all.io/index.html) - The model explorer offers a leaderboard of metrics and associated quantized models available for download \n", "2. [`gpt4all`](https://gpt4all.io/index.html) - The model explorer offers a leaderboard of metrics and associated quantized models available for download \n",
"3. [`Ollama`](https://github.com/jmorganca/ollama) - Several models can be accessed directly via `pull`\n", "3. [`ollama`](https://github.com/jmorganca/ollama) - Several models can be accessed directly via `pull`\n",
"\n", "\n",
"### Ollama\n", "### Ollama\n",
"\n", "\n",
"With [Ollama](https://github.com/jmorganca/ollama), fetch a model via `ollama pull <model family>:<tag>`:\n", "With [Ollama](https://github.com/ollama/ollama), fetch a model via `ollama pull <model family>:<tag>`."
"\n",
"* E.g., for Llama 2 7b: `ollama pull llama2` will download the most basic version of the model (e.g., smallest # parameters and 4 bit quantization)\n",
"* We can also specify a particular version from the [model list](https://github.com/jmorganca/ollama?tab=readme-ov-file#model-library), e.g., `ollama pull llama2:13b`\n",
"* See the full set of parameters on the [API reference page](https://python.langchain.com/api_reference/community/llms/langchain_community.llms.ollama.Ollama.html)"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 42, "execution_count": null,
"id": "8ecd2f78", "id": "8ecd2f78",
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
@ -265,7 +261,7 @@
} }
], ],
"source": [ "source": [
"llm = OllamaLLM(model=\"llama2:13b\")\n", "llm = ChatOllama(model=\"gpt-oss:20b\")\n",
"llm.invoke(\"The first man on the moon was ... think step by step\")" "llm.invoke(\"The first man on the moon was ... think step by step\")"
] ]
}, },
@ -694,7 +690,7 @@
], ],
"metadata": { "metadata": {
"kernelspec": { "kernelspec": {
"display_name": "Python 3 (ipykernel)", "display_name": "langchain",
"language": "python", "language": "python",
"name": "python3" "name": "python3"
}, },
@ -708,7 +704,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.5" "version": "3.12.11"
} }
}, },
"nbformat": 4, "nbformat": 4,

View File

@ -17,9 +17,9 @@
"source": [ "source": [
"# ChatOllama\n", "# ChatOllama\n",
"\n", "\n",
"[Ollama](https://ollama.ai/) allows you to run open-source large language models, such as Llama 2, locally.\n", "[Ollama](https://ollama.com/) allows you to run open-source large language models, such as `got-oss`, locally.\n",
"\n", "\n",
"Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.\n", "`ollama` bundles model weights, configuration, and data into a single package, defined by a Modelfile.\n",
"\n", "\n",
"It optimizes setup and configuration details, including GPU usage.\n", "It optimizes setup and configuration details, including GPU usage.\n",
"\n", "\n",
@ -28,14 +28,14 @@
"## Overview\n", "## Overview\n",
"### Integration details\n", "### Integration details\n",
"\n", "\n",
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/chat/ollama) | Package downloads | Package latest |\n", "| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/docs/integrations/chat/ollama) | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n", "| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
"| [ChatOllama](https://python.langchain.com/v0.2/api_reference/ollama/chat_models/langchain_ollama.chat_models.ChatOllama.html) | [langchain-ollama](https://python.langchain.com/v0.2/api_reference/ollama/index.html) | ✅ | ❌ | ✅ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-ollama?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-ollama?style=flat-square&label=%20) |\n", "| [ChatOllama](https://python.langchain.com/api_reference/ollama/chat_models/langchain_ollama.chat_models.ChatOllama.html#chatollama) | [langchain-ollama](https://python.langchain.com/api_reference/ollama/index.html) | ✅ | ❌ | ✅ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-ollama?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-ollama?style=flat-square&label=%20) |\n",
"\n", "\n",
"### Model features\n", "### Model features\n",
"| [Tool calling](/docs/how_to/tool_calling/) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n", "| [Tool calling](/docs/how_to/tool_calling/) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
"| :---: |:----------------------------------------------------:| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n", "| :---: |:----------------------------------------------------:| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| ✅ | ✅ | ✅ | | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ |\n", "| ✅ | ✅ | ✅ | | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ |\n",
"\n", "\n",
"## Setup\n", "## Setup\n",
"\n", "\n",
@ -45,17 +45,17 @@
" * macOS users can install via Homebrew with `brew install ollama` and start with `brew services start ollama`\n", " * macOS users can install via Homebrew with `brew install ollama` and start with `brew services start ollama`\n",
"* Fetch available LLM model via `ollama pull <name-of-model>`\n", "* Fetch available LLM model via `ollama pull <name-of-model>`\n",
" * View a list of available models via the [model library](https://ollama.ai/library)\n", " * View a list of available models via the [model library](https://ollama.ai/library)\n",
" * e.g., `ollama pull llama3`\n", " * e.g., `ollama pull gpt-oss:20b`\n",
"* This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.\n", "* This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.\n",
"\n", "\n",
"> On Mac, the models will be download to `~/.ollama/models`\n", "> On Mac, the models will be download to `~/.ollama/models`\n",
">\n", ">\n",
"> On Linux (or WSL), the models will be stored at `/usr/share/ollama/.ollama/models`\n", "> On Linux (or WSL), the models will be stored at `/usr/share/ollama/.ollama/models`\n",
"\n", "\n",
"* Specify the exact version of the model of interest as such `ollama pull vicuna:13b-v1.5-16k-q4_0` (View the [various tags for the `Vicuna`](https://ollama.ai/library/vicuna/tags) model in this instance)\n", "* Specify the exact version of the model of interest as such `ollama pull gpt-oss:20b` (View the [various tags for the `Vicuna`](https://ollama.ai/library/vicuna/tags) model in this instance)\n",
"* To view all pulled models, use `ollama list`\n", "* To view all pulled models, use `ollama list`\n",
"* To chat directly with a model from the command line, use `ollama run <name-of-model>`\n", "* To chat directly with a model from the command line, use `ollama run <name-of-model>`\n",
"* View the [Ollama documentation](https://github.com/ollama/ollama/tree/main/docs) for more commands. You can run `ollama help` in the terminal to see available commands.\n" "* View the [Ollama documentation](https://github.com/ollama/ollama/blob/main/docs/README.md) for more commands. You can run `ollama help` in the terminal to see available commands.\n"
] ]
}, },
{ {
@ -102,7 +102,11 @@
"id": "b18bd692076f7cf7", "id": "b18bd692076f7cf7",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Make sure you're using the latest Ollama version for structured outputs. Update by running:" ":::warning\n",
"Make sure you're using the latest Ollama version!\n",
":::\n",
"\n",
"Update by running:"
] ]
}, },
{ {
@ -257,10 +261,10 @@
"source": [ "source": [
"## Tool calling\n", "## Tool calling\n",
"\n", "\n",
"We can use [tool calling](/docs/concepts/tool_calling/) with an LLM [that has been fine-tuned for tool use](https://ollama.com/search?&c=tools) such as `llama3.1`:\n", "We can use [tool calling](/docs/concepts/tool_calling/) with an LLM [that has been fine-tuned for tool use](https://ollama.com/search?&c=tools) such as `gpt-oss`:\n",
"\n", "\n",
"```\n", "```\n",
"ollama pull llama3.1\n", "ollama pull gpt-oss:20b\n",
"```\n", "```\n",
"\n", "\n",
"Details on creating custom tools are available in [this guide](/docs/how_to/custom_tools/). Below, we demonstrate how to create a tool using the `@tool` decorator on a normal python function." "Details on creating custom tools are available in [this guide](/docs/how_to/custom_tools/). Below, we demonstrate how to create a tool using the `@tool` decorator on a normal python function."
@ -268,7 +272,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 13, "execution_count": null,
"id": "f767015f", "id": "f767015f",
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
@ -300,7 +304,8 @@
"\n", "\n",
"\n", "\n",
"llm = ChatOllama(\n", "llm = ChatOllama(\n",
" model=\"llama3.1\",\n", " model=\"gpt-oss:20b\",\n",
" validate_model_on_init=True,\n",
" temperature=0,\n", " temperature=0,\n",
").bind_tools([validate_user])\n", ").bind_tools([validate_user])\n",
"\n", "\n",
@ -321,9 +326,7 @@
"source": [ "source": [
"## Multi-modal\n", "## Multi-modal\n",
"\n", "\n",
"Ollama has support for multi-modal LLMs, such as [bakllava](https://ollama.com/library/bakllava) and [llava](https://ollama.com/library/llava).\n", "Ollama has limited support for multi-modal LLMs, such as [gemma3](https://ollama.com/library/gemma3)\n",
"\n",
" ollama pull bakllava\n",
"\n", "\n",
"Be sure to update Ollama so that you have the most recent version to support multi-modal." "Be sure to update Ollama so that you have the most recent version to support multi-modal."
] ]
@ -518,7 +521,7 @@
], ],
"metadata": { "metadata": {
"kernelspec": { "kernelspec": {
"display_name": "Python 3 (ipykernel)", "display_name": "langchain",
"language": "python", "language": "python",
"name": "python3" "name": "python3"
}, },
@ -532,7 +535,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.4" "version": "3.12.11"
} }
}, },
"nbformat": 4, "nbformat": 4,

View File

@ -1,14 +1,14 @@
# Ollama # Ollama
>[Ollama](https://ollama.com/) allows you to run open-source large language models, >[Ollama](https://ollama.com/) allows you to run open-source large language models,
> such as [Llama3.1](https://ai.meta.com/blog/meta-llama-3-1/), locally. > such as [gpt-oss](https://ollama.com/library/gpt-oss), locally.
> >
>`Ollama` bundles model weights, configuration, and data into a single package, defined by a Modelfile. >`Ollama` bundles model weights, configuration, and data into a single package, defined by a Modelfile.
>It optimizes setup and configuration details, including GPU usage. >It optimizes setup and configuration details, including GPU usage.
>For a complete list of supported models and model variants, see the [Ollama model library](https://ollama.ai/library). >For a complete list of supported models and model variants, see the [Ollama model library](https://ollama.ai/library).
See [this guide](/docs/how_to/local_llms) for more details See [this guide](/docs/how_to/local_llms#ollama) for more details
on how to use `Ollama` with LangChain. on how to use `ollama` with LangChain.
## Installation and Setup ## Installation and Setup
### Ollama installation ### Ollama installation
@ -26,7 +26,7 @@ ollama serve
After starting ollama, run `ollama pull <name-of-model>` to download a model from the [Ollama model library](https://ollama.ai/library): After starting ollama, run `ollama pull <name-of-model>` to download a model from the [Ollama model library](https://ollama.ai/library):
```bash ```bash
ollama pull llama3.1 ollama pull gpt-oss:20b
``` ```
- This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model. - This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.

View File

@ -229,7 +229,7 @@ class ChatOllama(BaseChatModel):
.. code-block:: bash .. code-block:: bash
ollama pull mistral:v0.3 ollama pull gpt-oss:20b
pip install -U langchain-ollama pip install -U langchain-ollama
Key init args completion params: Key init args completion params:
@ -262,7 +262,8 @@ class ChatOllama(BaseChatModel):
from langchain_ollama import ChatOllama from langchain_ollama import ChatOllama
llm = ChatOllama( llm = ChatOllama(
model = "llama3", model = "gpt-oss:20b",
validate_model_on_init = True,
temperature = 0.8, temperature = 0.8,
num_predict = 256, num_predict = 256,
# other params ... # other params ...
@ -284,10 +285,7 @@ class ChatOllama(BaseChatModel):
Stream: Stream:
.. code-block:: python .. code-block:: python
messages = [ for chunk in llm.stream("Return the words Hello World!"):
("human", "Return the words Hello World!"),
]
for chunk in llm.stream(messages):
print(chunk.text(), end="") print(chunk.text(), end="")
@ -314,10 +312,7 @@ class ChatOllama(BaseChatModel):
Async: Async:
.. code-block:: python .. code-block:: python
messages = [ await llm.ainvoke("Hello how are you!")
("human", "Hello how are you!"),
]
await llm.ainvoke(messages)
.. code-block:: python .. code-block:: python
@ -325,10 +320,7 @@ class ChatOllama(BaseChatModel):
.. code-block:: python .. code-block:: python
messages = [ async for chunk in llm.astream("Say hello world!"):
("human", "Say hello world!"),
]
async for chunk in llm.astream(messages):
print(chunk.content) print(chunk.content)
.. code-block:: python .. code-block:: python
@ -356,10 +348,7 @@ class ChatOllama(BaseChatModel):
json_llm = ChatOllama(format="json") json_llm = ChatOllama(format="json")
messages = [ llm.invoke("Return a query for the weather in a random location and time of day with two keys: location and time_of_day. Respond using JSON only.").content
("human", "Return a query for the weather in a random location and time of day with two keys: location and time_of_day. Respond using JSON only."),
]
llm.invoke(messages).content
.. code-block:: python .. code-block:: python
@ -406,17 +395,16 @@ class ChatOllama(BaseChatModel):
llm = ChatOllama( llm = ChatOllama(
model = "deepseek-r1:8b", model = "deepseek-r1:8b",
validate_model_on_init = True,
reasoning= True, reasoning= True,
) )
user_message = HumanMessage(content="how many r in the word strawberry?") llm.invoke("how many r in the word strawberry?")
messages: List[Any] = [user_message]
llm.invoke(messages)
# or, on an invocation basis: # or, on an invocation basis:
llm.invoke(messages, reasoning=True) llm.invoke("how many r in the word strawberry?", reasoning=True)
# or llm.stream(messages, reasoning=True) # or llm.stream("how many r in the word strawberry?", reasoning=True)
# If not provided, the invocation will default to the ChatOllama reasoning # If not provided, the invocation will default to the ChatOllama reasoning
# param provided (None by default). # param provided (None by default).