From c64b0a30951391d6f3865e3e8f93d6dd02e0b2b1 Mon Sep 17 00:00:00 2001
From: KhoPhi <seanmavley@gmail.com>
Date: Thu, 30 May 2024 15:06:45 +0000
Subject: [PATCH] Docs: Ollama (LLM, Chat Model & Text Embedding) (#22321)

- [x] Docs Update: Ollama
  - llm/ollama
- Switched to using llama3 as model with reference to templating and
prompting
      - Added concurrency notes to llm/ollama docs
  - chat_models/ollama
      - Added concurrency notes to llm/ollama docs
  - text_embedding/ollama
     - include example for specific embedding models from Ollama
---
 docs/docs/integrations/chat/ollama.ipynb      | 34 ++++++--
 docs/docs/integrations/llms/ollama.ipynb      | 57 ++++++++++----
 .../integrations/text_embedding/ollama.ipynb  | 78 +++++++++----------
 3 files changed, 106 insertions(+), 63 deletions(-)
diff --git a/docs/docs/integrations/chat/ollama.ipynb b/docs/docs/integrations/chat/ollama.ipynb
index 22a87ebfb76..d8e3b6ca4aa 100644
--- a/docs/docs/integrations/chat/ollama.ipynb
+++ b/docs/docs/integrations/chat/ollama.ipynb
@@ -54,12 +54,12 @@
     "\n",
     "Here are a few ways to interact with pulled local models\n",
     "\n",
-    "#### directly in the terminal:\n",
+    "#### In the terminal:\n",
     "\n",
     "* All of your local models are automatically served on `localhost:11434`\n",
     "* Run `ollama run <name-of-model>` to start interacting via the command line directly\n",
     "\n",
-    "### via an API\n",
+    "#### Via an API\n",
     "\n",
     "Send an `application/json` request to the API endpoint of Ollama to interact.\n",
     "\n",
@@ -72,9 +72,11 @@
     "\n",
     "See the Ollama [API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md) for all endpoints.\n",
     "\n",
-    "#### via LangChain\n",
+    "#### Via LangChain\n",
     "\n",
-    "See a typical basic example of using Ollama via the `ChatOllama` chat model in your LangChain application."
+    "See a typical basic example of using Ollama via the `ChatOllama` chat model in your LangChain application. \n",
+    "\n",
+    "View the [API Reference for ChatOllama](https://api.python.langchain.com/en/latest/chat_models/langchain_community.chat_models.ollama.ChatOllama.html#langchain_community.chat_models.ollama.ChatOllama) for more."
    ]
   },
   {
@@ -105,7 +107,7 @@
     "\n",
     "# using LangChain Expressive Language chain syntax\n",
     "# learn more about the LCEL on\n",
-    "# /docs/expression_language/why\n",
+    "# /docs/concepts/#langchain-expression-language-lcel\n",
     "chain = prompt | llm | StrOutputParser()\n",
     "\n",
     "# for brevity, response is printed in terminal\n",
@@ -189,7 +191,7 @@
     "\n",
     "## Building from source\n",
     "\n",
-    "For up to date instructions on building from source, check the Ollama documentation on [Building from Source](https://github.com/jmorganca/ollama?tab=readme-ov-file#building)"
+    "For up to date instructions on building from source, check the Ollama documentation on [Building from Source](https://github.com/ollama/ollama?tab=readme-ov-file#building)"
    ]
   },
   {
@@ -333,7 +335,7 @@
     }
    ],
    "source": [
-    "pip install --upgrade --quiet  pillow"
+    "!pip install --upgrade --quiet  pillow"
    ]
   },
   {
@@ -444,6 +446,24 @@
     "\n",
     "print(query_chain)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Concurrency Features\n",
+    "\n",
+    "Ollama supports concurrency inference for a single model, and or loading multiple models simulatenously (at least [version 0.1.33](https://github.com/ollama/ollama/releases)).\n",
+    "\n",
+    "Start the Ollama server with:\n",
+    "\n",
+    "* `OLLAMA_NUM_PARALLEL`: Handle multiple requests simultaneously for a single model\n",
+    "* `OLLAMA_MAX_LOADED_MODELS`: Load multiple models simultaneously\n",
+    "\n",
+    "Example: `OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve`\n",
+    "\n",
+    "Learn more about configuring Ollama server in [the official guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)."
+   ]
   }
  ],
  "metadata": {
diff --git a/docs/docs/integrations/llms/ollama.ipynb b/docs/docs/integrations/llms/ollama.ipynb
index 7c6be1a28c8..e80ce6e4b77 100644
--- a/docs/docs/integrations/llms/ollama.ipynb
+++ b/docs/docs/integrations/llms/ollama.ipynb
@@ -12,16 +12,15 @@
     "\n",
     "It optimizes setup and configuration details, including GPU usage.\n",
     "\n",
-    "For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/jmorganca/ollama#model-library).\n",
+    "For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/ollama/ollama#model-library).\n",
     "\n",
     "## Setup\n",
     "\n",
-    "First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n",
+    "First, follow [these instructions](https://github.com/ollama/ollama) to set up and run a local Ollama instance:\n",
     "\n",
     "* [Download](https://ollama.ai/download) and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)\n",
     "* Fetch available LLM model via `ollama pull <name-of-model>`\n",
-    "    * View a list of available models via the [model library](https://ollama.ai/library)\n",
-    "    * e.g., `ollama pull llama3`\n",
+    "    * View a list of available models via the [model library](https://ollama.ai/library) and pull to use locally with the command `ollama pull llama3`\n",
     "* This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.\n",
     "\n",
     "> On Mac, the models will be download to `~/.ollama/models`\n",
@@ -29,28 +28,29 @@
     "> On Linux (or WSL), the models will be stored at `/usr/share/ollama/.ollama/models`\n",
     "\n",
     "* Specify the exact version of the model of interest as such `ollama pull vicuna:13b-v1.5-16k-q4_0` (View the [various tags for the `Vicuna`](https://ollama.ai/library/vicuna/tags) model in this instance)\n",
-    "* To view all pulled models, use `ollama list`\n",
+    "* To view all pulled models on your local instance, use `ollama list`\n",
     "* To chat directly with a model from the command line, use `ollama run <name-of-model>`\n",
-    "* View the [Ollama documentation](https://github.com/jmorganca/ollama) for more commands. Run `ollama help` in the terminal to see available commands too.\n",
+    "* View the [Ollama documentation](https://github.com/ollama/ollama) for more commands. \n",
+    "* Run `ollama help` in the terminal to see available commands too.\n",
     "\n",
     "## Usage\n",
     "\n",
-    "You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain.llms.ollama.Ollama.html).\n",
+    "You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.ollama.Ollama.html).\n",
     "\n",
-    "If you are using a LLaMA `chat` model (e.g., `ollama pull llama3`) then you can use the `ChatOllama` interface.\n",
+    "If you are using a LLaMA `chat` model (e.g., `ollama pull llama3`) then you can use the `ChatOllama` [interface](https://python.langchain.com/v0.2/docs/integrations/chat/ollama/).\n",
     "\n",
-    "This includes [special tokens](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) for system message and user input.\n",
+    "This includes [special tokens](https://ollama.com/library/llama3) for system message and user input.\n",
     "\n",
     "## Interacting with Models \n",
     "\n",
     "Here are a few ways to interact with pulled local models\n",
     "\n",
-    "#### directly in the terminal:\n",
+    "#### In the terminal:\n",
     "\n",
     "* All of your local models are automatically served on `localhost:11434`\n",
     "* Run `ollama run <name-of-model>` to start interacting via the command line directly\n",
     "\n",
-    "### via an API\n",
+    "#### Via the API\n",
     "\n",
     "Send an `application/json` request to the API endpoint of Ollama to interact.\n",
     "\n",
@@ -61,11 +61,20 @@
     "}'\n",
     "```\n",
     "\n",
-    "See the Ollama [API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md) for all endpoints.\n",
+    "See the Ollama [API documentation](https://github.com/ollama/ollama/blob/main/docs/api.md) for all endpoints.\n",
     "\n",
     "#### via LangChain\n",
     "\n",
-    "See a typical basic example of using Ollama chat model in your LangChain application."
+    "See a typical basic example of using [Ollama chat model](https://python.langchain.com/v0.2/docs/integrations/chat/ollama/) in your LangChain application."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install langchain-community"
    ]
   },
   {
@@ -87,7 +96,9 @@
    "source": [
     "from langchain_community.llms import Ollama\n",
     "\n",
-    "llm = Ollama(model=\"llama3\")\n",
+    "llm = Ollama(\n",
+    "    model=\"llama3\"\n",
+    ")  # assuming you have Ollama installed and have llama3 model pulled with `ollama pull llama3 `\n",
     "\n",
     "llm.invoke(\"Tell me a joke\")"
    ]
@@ -280,6 +291,24 @@
     "llm_with_image_context = bakllava.bind(images=[image_b64])\n",
     "llm_with_image_context.invoke(\"What is the dollar based gross retention rate:\")"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Concurrency Features\n",
+    "\n",
+    "Ollama supports concurrency inference for a single model, and or loading multiple models simulatenously (at least [version 0.1.33](https://github.com/ollama/ollama/releases)).\n",
+    "\n",
+    "Start the Ollama server with:\n",
+    "\n",
+    "* `OLLAMA_NUM_PARALLEL`: Handle multiple requests simultaneously for a single model\n",
+    "* `OLLAMA_MAX_LOADED_MODELS`: Load multiple models simultaneously\n",
+    "\n",
+    "Example: `OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve`\n",
+    "\n",
+    "Learn more about configuring Ollama server in [the official guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)."
+   ]
   }
  ],
  "metadata": {
diff --git a/docs/docs/integrations/text_embedding/ollama.ipynb b/docs/docs/integrations/text_embedding/ollama.ipynb
index 915f9edab02..c7af848287d 100644
--- a/docs/docs/integrations/text_embedding/ollama.ipynb
+++ b/docs/docs/integrations/text_embedding/ollama.ipynb
@@ -7,7 +7,27 @@
    "source": [
     "# Ollama\n",
     "\n",
-    "Let's load the Ollama Embeddings class."
+    "\"Ollama supports embedding models, making it possible to build retrieval augmented generation (RAG) applications that combine text prompts with existing documents or other data.\" Learn more about the introduction to [Ollama Embeddings](https://ollama.com/blog/embedding-models) in the blog post.\n",
+    "\n",
+    "To use Ollama Embeddings, first, install [LangChain Community](https://pypi.org/project/langchain-community/) package:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "854d6a2e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install langchain-community"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54fbb4cd",
+   "metadata": {},
+   "source": [
+    "Load the Ollama Embeddings class:"
    ]
   },
   {
@@ -17,26 +37,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from langchain_community.embeddings import OllamaEmbeddings"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "2c66e5da",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "embeddings = OllamaEmbeddings()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "01370375",
-   "metadata": {},
-   "outputs": [],
-   "source": [
+    "from langchain_community.embeddings import OllamaEmbeddings\n",
+    "\n",
+    "embeddings = (\n",
+    "    OllamaEmbeddings()\n",
+    ")  # by default, uses llama2. Run `ollama pull llama2` to pull down the model\n",
+    "\n",
     "text = \"This is a test document.\""
    ]
   },
@@ -105,7 +111,13 @@
    "id": "bb61bbeb",
    "metadata": {},
    "source": [
-    "Let's load the Ollama Embeddings class with smaller model (e.g. llama:7b). Note: See other supported models [https://ollama.ai/library](https://ollama.ai/library)"
+    "### Embedding Models\n",
+    "\n",
+    "Ollama has embedding models, that are lightweight enough for use in embeddings, with the smallest about the size of 25Mb. See some of the available [embedding models from Ollama](https://ollama.com/blog/embedding-models).\n",
+    "\n",
+    "Let's load the Ollama Embeddings class with smaller model (e.g. `mxbai-embed-large`). \n",
+    "\n",
+    "> Note: See other supported models [https://ollama.ai/library](https://ollama.ai/library)"
    ]
   },
   {
@@ -115,26 +127,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "embeddings = OllamaEmbeddings(model=\"llama2:7b\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "14aefb64",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "text = \"This is a test document.\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "3c39ed33",
-   "metadata": {},
-   "outputs": [],
-   "source": [
+    "embeddings = OllamaEmbeddings(model=\"mxbai-embed-large\")\n",
+    "text = \"This is a test document.\"\n",
     "query_result = embeddings.embed_query(text)"
    ]
   },