ollama: init package (#23615)

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-09-02 19:47:13 +00:00 · 2024-07-19 17:43:29 -07:00
parent f4ee3c8a22
commit 838464de25
28 changed files with 8787 additions and 571 deletions
--- a/docs/docs/integrations/chat/ollama.ipynb
+++ b/docs/docs/integrations/chat/ollama.ipynb
@@ -2,6 +2,7 @@
 "cells": [
  {
   "cell_type": "raw",
+   "id": "afaf8039",
   "metadata": {},
   "source": [
    "---\n",
@@ -11,6 +12,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "e49f1e0d",
   "metadata": {},
   "source": [
    "# ChatOllama\n",
@@ -23,6 +25,18 @@
    "\n",
    "For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/jmorganca/ollama#model-library).\n",
    "\n",
+    "## Overview\n",
+    "### Integration details\n",
+    "\n",
+    "| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/chat/ollama) | Package downloads | Package latest |\n",
+    "| :--- | :--- | :---: | :---: |  :---: | :---: | :---: |\n",
+    "| [ChatOllama](https://api.python.langchain.com/en/latest/chat_models/langchain_ollama.chat_models.ChatOllama.html) | [langchain-ollama](https://api.python.langchain.com/en/latest/ollama_api_reference.html) | ✅ | ❌ | ✅ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-ollama?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-ollama?style=flat-square&label=%20) |\n",
+    "\n",
+    "### Model features\n",
+    "| [Tool calling](/docs/how_to/tool_calling/) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
+    "| :---: | :---: | :---: | :---: |  :---: | :---: | :---: | :---: | :---: | :---: |\n",
+    "| ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | \n",
+    "\n",
    "## Setup\n",
    "\n",
    "First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n",
@@ -40,307 +54,202 @@
    "* Specify the exact version of the model of interest as such `ollama pull vicuna:13b-v1.5-16k-q4_0` (View the [various tags for the `Vicuna`](https://ollama.ai/library/vicuna/tags) model in this instance)\n",
    "* To view all pulled models, use `ollama list`\n",
    "* To chat directly with a model from the command line, use `ollama run <name-of-model>`\n",
-    "* View the [Ollama documentation](https://github.com/jmorganca/ollama) for more commands. Run `ollama help` in the terminal to see available commands too.\n",
-    "\n",
-    "## Usage\n",
-    "\n",
-    "You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain.llms.ollama.Ollama.html).\n",
-    "\n",
-    "If you are using a LLaMA `chat` model (e.g., `ollama pull llama3`) then you can use the `ChatOllama` interface.\n",
-    "\n",
-    "This includes [special tokens](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) for system message and user input.\n",
-    "\n",
-    "## Interacting with Models \n",
-    "\n",
-    "Here are a few ways to interact with pulled local models\n",
-    "\n",
-    "#### In the terminal:\n",
-    "\n",
-    "* All of your local models are automatically served on `localhost:11434`\n",
-    "* Run `ollama run <name-of-model>` to start interacting via the command line directly\n",
-    "\n",
-    "#### Via an API\n",
-    "\n",
-    "Send an `application/json` request to the API endpoint of Ollama to interact.\n",
-    "\n",
-    "```bash\n",
-    "curl http://localhost:11434/api/generate -d '{\n",
-    "  \"model\": \"llama3\",\n",
-    "  \"prompt\":\"Why is the sky blue?\"\n",
-    "}'\n",
-    "```\n",
-    "\n",
-    "See the Ollama [API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md) for all endpoints.\n",
-    "\n",
-    "#### Via LangChain\n",
-    "\n",
-    "See a typical basic example of using Ollama via the `ChatOllama` chat model in your LangChain application. \n",
-    "\n",
-    "View the [API Reference for ChatOllama](https://api.python.langchain.com/en/latest/chat_models/langchain_community.chat_models.ollama.ChatOllama.html#langchain_community.chat_models.ollama.ChatOllama) for more."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Why did the astronaut break up with his girlfriend?\n",
-      "\n",
-      "Because he needed space!\n"
-     ]
-    }
-   ],
-   "source": [
-    "# LangChain supports many other chat models. Here, we're using Ollama\n",
-    "from langchain_community.chat_models import ChatOllama\n",
-    "from langchain_core.output_parsers import StrOutputParser\n",
-    "from langchain_core.prompts import ChatPromptTemplate\n",
-    "\n",
-    "# supports many more optional parameters. Hover on your `ChatOllama(...)`\n",
-    "# class to view the latest available supported parameters\n",
-    "llm = ChatOllama(model=\"llama3\")\n",
-    "prompt = ChatPromptTemplate.from_template(\"Tell me a short joke about {topic}\")\n",
-    "\n",
-    "# using LangChain Expressive Language chain syntax\n",
-    "# learn more about the LCEL on\n",
-    "# /docs/concepts/#langchain-expression-language-lcel\n",
-    "chain = prompt | llm | StrOutputParser()\n",
-    "\n",
-    "# for brevity, response is printed in terminal\n",
-    "# You can use LangServe to deploy your application for\n",
-    "# production\n",
-    "print(chain.invoke({\"topic\": \"Space travel\"}))"
+    "* View the [Ollama documentation](https://github.com/jmorganca/ollama) for more commands. Run `ollama help` in the terminal to see available commands too.\n"
   ]
  },
  {
   "cell_type": "markdown",
+   "id": "72ee0c4b-9764-423a-9dbf-95129e185210",
   "metadata": {},
   "source": [
-    "LCEL chains, out of the box, provide extra functionalities, such as streaming of responses, and async support"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Why\n",
-      " did\n",
-      " the\n",
-      " astronaut\n",
-      " break\n",
-      " up\n",
-      " with\n",
-      " his\n",
-      " girlfriend\n",
-      " before\n",
-      " going\n",
-      " to\n",
-      " Mars\n",
-      "?\n",
-      "\n",
-      "\n",
-      "Because\n",
-      " he\n",
-      " needed\n",
-      " space\n",
-      "!\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "topic = {\"topic\": \"Space travel\"}\n",
-    "\n",
-    "for chunks in chain.stream(topic):\n",
-    "    print(chunks)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "For streaming async support, here's an example - all possible via the single chain created above."
+    "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
+   "id": "a15d341e-3e26-4ca3-830b-5aab30ed66de",
   "metadata": {},
   "outputs": [],
   "source": [
-    "topic = {\"topic\": \"Space travel\"}\n",
-    "\n",
-    "async for chunks in chain.astream(topic):\n",
-    "    print(chunks)"
+    "# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
+    "# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
   ]
  },
  {
   "cell_type": "markdown",
+   "id": "0730d6a1-c893-4840-9817-5e5251676d5d",
   "metadata": {},
   "source": [
-    "Take a look at the [LangChain Expressive Language (LCEL) Interface](/docs/concepts#interface) for the other available interfaces for use when a chain is created.\n",
+    "### Installation\n",
    "\n",
-    "## Building from source\n",
-    "\n",
-    "For up to date instructions on building from source, check the Ollama documentation on [Building from Source](https://github.com/ollama/ollama?tab=readme-ov-file#building)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Extraction\n",
-    " \n",
-    "Use the latest version of Ollama and supply the [`format`](https://github.com/jmorganca/ollama/blob/main/docs/api.md#json-mode) flag. The `format` flag will force the model to produce the response in JSON.\n",
-    "\n",
-    "> **Note:** You can also try out the experimental [OllamaFunctions](/docs/integrations/chat/ollama_functions) wrapper for convenience."
+    "The LangChain Ollama integration lives in the `langchain-ollama` package:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
+   "id": "652d6238-1f87-422a-b135-f5abbb8652fc",
   "metadata": {},
   "outputs": [],
   "source": [
-    "from langchain_community.chat_models import ChatOllama\n",
+    "%pip install -qU langchain-ollama"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a38cde65-254d-4219-a441-068766c0d4b5",
+   "metadata": {},
+   "source": [
+    "## Instantiation\n",
    "\n",
-    "llm = ChatOllama(model=\"llama3\", format=\"json\", temperature=0)"
+    "Now we can instantiate our model object and generate chat completions:\n",
+    "\n",
+    "- TODO: Update model instantiation with relevant params."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "cb09c344-1836-4e0c-acf8-11d13ac1dbae",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_ollama import ChatOllama\n",
+    "\n",
+    "llm = ChatOllama(\n",
+    "    model=\"llama3\",\n",
+    "    temperature=0,\n",
+    "    # other params...\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b4f3e15",
+   "metadata": {},
+   "source": [
+    "## Invocation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "content='{ \"morning\": \"blue\", \"noon\": \"clear blue\", \"afternoon\": \"hazy yellow\", \"evening\": \"orange-red\" }\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n  \\n\\n\\n\\n\\n\\n ' id='run-e893700f-e2d0-4df8-ad86-17525dcee318-0'\n"
-     ]
-    }
-   ],
-   "source": [
-    "from langchain_core.messages import HumanMessage\n",
-    "\n",
-    "messages = [\n",
-    "    HumanMessage(\n",
-    "        content=\"What color is the sky at different times of the day? Respond using JSON\"\n",
-    "    )\n",
-    "]\n",
-    "\n",
-    "chat_model_response = llm.invoke(messages)\n",
-    "print(chat_model_response)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "Name: John\n",
-      "Age: 35\n",
-      "Likes: Pizza\n"
-     ]
-    }
-   ],
-   "source": [
-    "import json\n",
-    "\n",
-    "from langchain_community.chat_models import ChatOllama\n",
-    "from langchain_core.messages import HumanMessage\n",
-    "from langchain_core.output_parsers import StrOutputParser\n",
-    "from langchain_core.prompts import ChatPromptTemplate\n",
-    "\n",
-    "json_schema = {\n",
-    "    \"title\": \"Person\",\n",
-    "    \"description\": \"Identifying information about a person.\",\n",
-    "    \"type\": \"object\",\n",
-    "    \"properties\": {\n",
-    "        \"name\": {\"title\": \"Name\", \"description\": \"The person's name\", \"type\": \"string\"},\n",
-    "        \"age\": {\"title\": \"Age\", \"description\": \"The person's age\", \"type\": \"integer\"},\n",
-    "        \"fav_food\": {\n",
-    "            \"title\": \"Fav Food\",\n",
-    "            \"description\": \"The person's favorite food\",\n",
-    "            \"type\": \"string\",\n",
-    "        },\n",
-    "    },\n",
-    "    \"required\": [\"name\", \"age\"],\n",
-    "}\n",
-    "\n",
-    "llm = ChatOllama(model=\"llama2\")\n",
-    "\n",
-    "messages = [\n",
-    "    HumanMessage(\n",
-    "        content=\"Please tell me about a person using the following JSON schema:\"\n",
-    "    ),\n",
-    "    HumanMessage(content=\"{dumps}\"),\n",
-    "    HumanMessage(\n",
-    "        content=\"Now, considering the schema, tell me about a person named John who is 35 years old and loves pizza.\"\n",
-    "    ),\n",
-    "]\n",
-    "\n",
-    "prompt = ChatPromptTemplate.from_messages(messages)\n",
-    "dumps = json.dumps(json_schema, indent=2)\n",
-    "\n",
-    "chain = prompt | llm | StrOutputParser()\n",
-    "\n",
-    "print(chain.invoke({\"dumps\": dumps}))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Multi-modal\n",
-    "\n",
-    "Ollama has support for multi-modal LLMs, such as [bakllava](https://ollama.ai/library/bakllava) and [llava](https://ollama.ai/library/llava).\n",
-    "\n",
-    "Browse the full set of versions for models with `tags`, such as [Llava](https://ollama.ai/library/llava/tags).\n",
-    "\n",
-    "Download the desired LLM via `ollama pull bakllava`\n",
-    "\n",
-    "Be sure to update Ollama so that you have the most recent version to support multi-modal.\n",
-    "\n",
-    "Check out the typical example of how to use ChatOllama multi-modal support below:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
+   "id": "62e0dbc3",
   "metadata": {
-    "scrolled": true
+    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "Note: you may need to restart the kernel to use updated packages.\n"
+      "AIMessage(content='Je adore le programmation.\\n\\n(Note: \"programmation\" is the feminine form of the noun in French, but if you want to use the masculine form, it would be \"le programme\" instead.)' response_metadata={'model': 'llama3', 'created_at': '2024-07-04T04:20:28.138164Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 1943337750, 'load_duration': 1128875, 'prompt_eval_count': 33, 'prompt_eval_duration': 322813000, 'eval_count': 43, 'eval_duration': 1618213000} id='run-ed8c17ab-7fc2-4c90-a88a-f6273b49bc78-0')\n"
     ]
    }
   ],
   "source": [
-    "!pip install --upgrade --quiet  pillow"
+    "from langchain_core.messages import AIMessage\n",
+    "\n",
+    "messages = [\n",
+    "    (\n",
+    "        \"system\",\n",
+    "        \"You are a helpful assistant that translates English to French. Translate the user sentence.\",\n",
+    "    ),\n",
+    "    (\"human\", \"I love programming.\"),\n",
+    "]\n",
+    "ai_msg = llm.invoke(messages)\n",
+    "ai_msg"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 8,
+   "id": "d86145b3-bfef-46e8-b227-4dda5c9c2705",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Je adore le programmation.\n",
+      "\n",
+      "(Note: \"programmation\" is the feminine form of the noun in French, but if you want to use the masculine form, it would be \"le programme\" instead.)\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(ai_msg.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18e2bfc0-7e78-4528-a73f-499ac150dca8",
+   "metadata": {},
+   "source": [
+    "## Chaining\n",
+    "\n",
+    "We can [chain](/docs/how_to/sequence/) our model with a prompt template like so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "e197d1d7-a070-4c96-9f8a-a0e86d046e0b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "AIMessage(content='Ich liebe Programmieren!\\n\\n(Note: \"Ich liebe\" means \"I love\", \"Programmieren\" is the verb for \"programming\")', response_metadata={'model': 'llama3', 'created_at': '2024-07-04T04:22:33.864132Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 1310800083, 'load_duration': 1782000, 'prompt_eval_count': 16, 'prompt_eval_duration': 250199000, 'eval_count': 29, 'eval_duration': 1057192000}, id='run-cbadbe59-2de2-4ec0-a18a-b3220226c3d2-0')"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain_core.prompts import ChatPromptTemplate\n",
+    "\n",
+    "prompt = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        (\n",
+    "            \"system\",\n",
+    "            \"You are a helpful assistant that translates {input_language} to {output_language}.\",\n",
+    "        ),\n",
+    "        (\"human\", \"{input}\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "chain = prompt | llm\n",
+    "chain.invoke(\n",
+    "    {\n",
+    "        \"input_language\": \"English\",\n",
+    "        \"output_language\": \"German\",\n",
+    "        \"input\": \"I love programming.\",\n",
+    "    }\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c5e0197",
+   "metadata": {},
+   "source": [
+    "## Multi-modal\n",
+    "\n",
+    "Ollama has support for multi-modal LLMs, such as [bakllava](https://ollama.com/library/bakllava) and [llava](https://ollama.com/library/llava).\n",
+    "\n",
+    "    ollama pull bakllava\n",
+    "\n",
+    "Be sure to update Ollama so that you have the most recent version to support multi-modal."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "36c9b1c2",
   "metadata": {},
   "outputs": [
    {
@@ -399,7 +308,8 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 12,
+   "id": "32b3ba7b",
   "metadata": {},
   "outputs": [
    {
@@ -411,8 +321,8 @@
    }
   ],
   "source": [
-    "from langchain_community.chat_models import ChatOllama\n",
    "from langchain_core.messages import HumanMessage\n",
+    "from langchain_ollama import ChatOllama\n",
    "\n",
    "llm = ChatOllama(model=\"bakllava\", temperature=0)\n",
    "\n",
@@ -449,20 +359,12 @@
  },
  {
   "cell_type": "markdown",
+   "id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
   "metadata": {},
   "source": [
-    "## Concurrency Features\n",
+    "## API reference\n",
    "\n",
-    "Ollama supports concurrency inference for a single model, and or loading multiple models simulatenously (at least [version 0.1.33](https://github.com/ollama/ollama/releases)).\n",
-    "\n",
-    "Start the Ollama server with:\n",
-    "\n",
-    "* `OLLAMA_NUM_PARALLEL`: Handle multiple requests simultaneously for a single model\n",
-    "* `OLLAMA_MAX_LOADED_MODELS`: Load multiple models simultaneously\n",
-    "\n",
-    "Example: `OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve`\n",
-    "\n",
-    "Learn more about configuring Ollama server in [the official guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)."
+    "For detailed documentation of all ChatOllama features and configurations head to the API reference: https://api.python.langchain.com/en/latest/chat_models/langchain_ollama.chat_models.ChatOllama.html"
   ]
  }
 ],
@@ -482,9 +384,9 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.8"
+   "version": "3.12.3"
  }
 },
 "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 5
 }
--- a/docs/docs/integrations/llms/ollama.ipynb
+++ b/docs/docs/integrations/llms/ollama.ipynb
@@ -1,10 +1,21 @@
 {
 "cells": [
  {
-   "cell_type": "markdown",
+   "cell_type": "raw",
+   "id": "67db2992",
   "metadata": {},
   "source": [
-    "# Ollama\n",
+    "---\n",
+    "sidebar_label: Ollama\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9597802c",
+   "metadata": {},
+   "source": [
+    "# OllamaLLM\n",
    "\n",
    ":::caution\n",
    "You are currently on a page documenting the use of Ollama models as [text completion models](/docs/concepts/#llms). Many popular Ollama models are [chat completion models](/docs/concepts/#chat-models).\n",
@@ -12,21 +23,35 @@
    "You may be looking for [this page instead](/docs/integrations/chat/ollama/).\n",
    ":::\n",
    "\n",
-    "[Ollama](https://ollama.ai/) allows you to run open-source large language models, such as Llama 2, locally.\n",
-    "\n",
-    "Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. \n",
-    "\n",
-    "It optimizes setup and configuration details, including GPU usage.\n",
-    "\n",
-    "For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/ollama/ollama#model-library).\n",
+    "This page goes over how to use LangChain to interact with `Ollama` models.\n",
    "\n",
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "59c710c4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# install package\n",
+    "%pip install -U langchain-ollama"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ee90032",
+   "metadata": {},
+   "source": [
    "## Setup\n",
    "\n",
-    "First, follow [these instructions](https://github.com/ollama/ollama) to set up and run a local Ollama instance:\n",
+    "First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n",
    "\n",
    "* [Download](https://ollama.ai/download) and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)\n",
    "* Fetch available LLM model via `ollama pull <name-of-model>`\n",
-    "    * View a list of available models via the [model library](https://ollama.ai/library) and pull to use locally with the command `ollama pull llama3`\n",
+    "    * View a list of available models via the [model library](https://ollama.ai/library)\n",
+    "    * e.g., `ollama pull llama3`\n",
    "* This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.\n",
    "\n",
    "> On Mac, the models will be download to `~/.ollama/models`\n",
@@ -34,194 +59,67 @@
    "> On Linux (or WSL), the models will be stored at `/usr/share/ollama/.ollama/models`\n",
    "\n",
    "* Specify the exact version of the model of interest as such `ollama pull vicuna:13b-v1.5-16k-q4_0` (View the [various tags for the `Vicuna`](https://ollama.ai/library/vicuna/tags) model in this instance)\n",
-    "* To view all pulled models on your local instance, use `ollama list`\n",
+    "* To view all pulled models, use `ollama list`\n",
    "* To chat directly with a model from the command line, use `ollama run <name-of-model>`\n",
-    "* View the [Ollama documentation](https://github.com/ollama/ollama) for more commands. \n",
-    "* Run `ollama help` in the terminal to see available commands too.\n",
+    "* View the [Ollama documentation](https://github.com/jmorganca/ollama) for more commands. Run `ollama help` in the terminal to see available commands too.\n",
    "\n",
-    "## Usage\n",
-    "\n",
-    "You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.ollama.Ollama.html).\n",
-    "\n",
-    "If you are using a LLaMA `chat` model (e.g., `ollama pull llama3`) then you can use the `ChatOllama` [interface](https://python.langchain.com/v0.2/docs/integrations/chat/ollama/).\n",
-    "\n",
-    "This includes [special tokens](https://ollama.com/library/llama3) for system message and user input.\n",
-    "\n",
-    "## Interacting with Models \n",
-    "\n",
-    "Here are a few ways to interact with pulled local models\n",
-    "\n",
-    "#### In the terminal:\n",
-    "\n",
-    "* All of your local models are automatically served on `localhost:11434`\n",
-    "* Run `ollama run <name-of-model>` to start interacting via the command line directly\n",
-    "\n",
-    "#### Via the API\n",
-    "\n",
-    "Send an `application/json` request to the API endpoint of Ollama to interact.\n",
-    "\n",
-    "```bash\n",
-    "curl http://localhost:11434/api/generate -d '{\n",
-    "  \"model\": \"llama3\",\n",
-    "  \"prompt\":\"Why is the sky blue?\"\n",
-    "}'\n",
-    "```\n",
-    "\n",
-    "See the Ollama [API documentation](https://github.com/ollama/ollama/blob/main/docs/api.md) for all endpoints.\n",
-    "\n",
-    "#### via LangChain\n",
-    "\n",
-    "See a typical basic example of using [Ollama chat model](https://python.langchain.com/v0.2/docs/integrations/chat/ollama/) in your LangChain application."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!pip install langchain-community"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "\"Here's one:\\n\\nWhy don't scientists trust atoms?\\n\\nBecause they make up everything!\\n\\nHope that made you smile! Do you want to hear another one?\""
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from langchain_community.llms import Ollama\n",
-    "\n",
-    "llm = Ollama(\n",
-    "    model=\"llama3\"\n",
-    ")  # assuming you have Ollama installed and have llama3 model pulled with `ollama pull llama3 `\n",
-    "\n",
-    "llm.invoke(\"Tell me a joke\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "To stream tokens, use the `.stream(...)` method:"
+    "## Usage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
-   "metadata": {},
+   "id": "035dea0f",
+   "metadata": {
+    "tags": []
+   },
   "outputs": [
    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "\n",
-      "S\n",
-      "ure\n",
-      ",\n",
-      " here\n",
-      "'\n",
-      "s\n",
-      " one\n",
-      ":\n",
-      "\n",
-      "\n",
-      "\n",
-      "\n",
-      "Why\n",
-      " don\n",
-      "'\n",
-      "t\n",
-      " scient\n",
-      "ists\n",
-      " trust\n",
-      " atoms\n",
-      "?\n",
-      "\n",
-      "\n",
-      "B\n",
-      "ecause\n",
-      " they\n",
-      " make\n",
-      " up\n",
-      " everything\n",
-      "!\n",
-      "\n",
-      "\n",
-      "\n",
-      "\n",
-      "I\n",
-      " hope\n",
-      " you\n",
-      " found\n",
-      " that\n",
-      " am\n",
-      "using\n",
-      "!\n",
-      " Do\n",
-      " you\n",
-      " want\n",
-      " to\n",
-      " hear\n",
-      " another\n",
-      " one\n",
-      "?\n",
-      "\n"
-     ]
+     "data": {
+      "text/plain": [
+       "'A great start!\\n\\nLangChain is a type of AI model that uses language processing techniques to generate human-like text based on input prompts or chains of reasoning. In other words, it can have a conversation with humans, understanding the context and responding accordingly.\\n\\nHere\\'s a possible breakdown:\\n\\n* \"Lang\" likely refers to its focus on natural language processing (NLP) and linguistic analysis.\\n* \"Chain\" suggests that LangChain is designed to generate text in response to a series of connected ideas or prompts, rather than simply generating random text.\\n\\nSo, what do you think LangChain\\'s capabilities might be?'"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
    }
   ],
   "source": [
-    "query = \"Tell me a joke\"\n",
+    "from langchain_core.prompts import ChatPromptTemplate\n",
+    "from langchain_ollama.llms import OllamaLLM\n",
    "\n",
-    "for chunks in llm.stream(query):\n",
-    "    print(chunks)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "To learn more about the LangChain Expressive Language and the available methods on an LLM, see the [LCEL Interface](/docs/concepts#interface)"
+    "template = \"\"\"Question: {question}\n",
+    "\n",
+    "Answer: Let's think step by step.\"\"\"\n",
+    "\n",
+    "prompt = ChatPromptTemplate.from_template(template)\n",
+    "\n",
+    "model = OllamaLLM(model=\"llama3\")\n",
+    "\n",
+    "chain = prompt | model\n",
+    "\n",
+    "chain.invoke({\"question\": \"What is LangChain?\"})"
   ]
  },
  {
   "cell_type": "markdown",
+   "id": "e2d85456",
   "metadata": {},
   "source": [
    "## Multi-modal\n",
    "\n",
-    "Ollama has support for multi-modal LLMs, such as [bakllava](https://ollama.ai/library/bakllava) and [llava](https://ollama.ai/library/llava).\n",
+    "Ollama has support for multi-modal LLMs, such as [bakllava](https://ollama.com/library/bakllava) and [llava](https://ollama.com/library/llava).\n",
    "\n",
-    "`ollama pull bakllava`\n",
+    "    ollama pull bakllava\n",
    "\n",
    "Be sure to update Ollama so that you have the most recent version to support multi-modal."
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_community.llms import Ollama\n",
-    "\n",
-    "bakllava = Ollama(model=\"bakllava\")"
-   ]
-  },
  {
   "cell_type": "code",
   "execution_count": 2,
+   "id": "4043e202",
   "metadata": {},
   "outputs": [
    {
@@ -279,7 +177,8 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 4,
+   "id": "79aaf863",
   "metadata": {},
   "outputs": [
    {
@@ -288,38 +187,24 @@
       "'90%'"
      ]
     },
-     "execution_count": 8,
+     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "llm_with_image_context = bakllava.bind(images=[image_b64])\n",
+    "from langchain_ollama import OllamaLLM\n",
+    "\n",
+    "llm = OllamaLLM(model=\"bakllava\")\n",
+    "\n",
+    "llm_with_image_context = llm.bind(images=[image_b64])\n",
    "llm_with_image_context.invoke(\"What is the dollar based gross retention rate:\")"
   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Concurrency Features\n",
-    "\n",
-    "Ollama supports concurrency inference for a single model, and or loading multiple models simulatenously (at least [version 0.1.33](https://github.com/ollama/ollama/releases)).\n",
-    "\n",
-    "Start the Ollama server with:\n",
-    "\n",
-    "* `OLLAMA_NUM_PARALLEL`: Handle multiple requests simultaneously for a single model\n",
-    "* `OLLAMA_MAX_LOADED_MODELS`: Load multiple models simultaneously\n",
-    "\n",
-    "Example: `OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve`\n",
-    "\n",
-    "Learn more about configuring Ollama server in [the official guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)."
-   ]
  }
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3.11.1 64-bit",
   "language": "python",
   "name": "python3"
  },
@@ -333,9 +218,14 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.8"
+   "version": "3.12.3"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "e971737741ff4ec9aff7dc6155a1060a59a8a6d52c757dbbe66bf8ee389494b1"
+   }
  }
 },
 "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 5
 }
--- a/docs/docs/integrations/text_embedding/ollama.ipynb
+++ b/docs/docs/integrations/text_embedding/ollama.ipynb