From c64b0a30951391d6f3865e3e8f93d6dd02e0b2b1 Mon Sep 17 00:00:00 2001 From: KhoPhi Date: Thu, 30 May 2024 15:06:45 +0000 Subject: [PATCH] Docs: Ollama (LLM, Chat Model & Text Embedding) (#22321) - [x] Docs Update: Ollama - llm/ollama - Switched to using llama3 as model with reference to templating and prompting - Added concurrency notes to llm/ollama docs - chat_models/ollama - Added concurrency notes to llm/ollama docs - text_embedding/ollama - include example for specific embedding models from Ollama --- docs/docs/integrations/chat/ollama.ipynb | 34 ++++++-- docs/docs/integrations/llms/ollama.ipynb | 57 ++++++++++---- .../integrations/text_embedding/ollama.ipynb | 78 +++++++++---------- 3 files changed, 106 insertions(+), 63 deletions(-) diff --git a/docs/docs/integrations/chat/ollama.ipynb b/docs/docs/integrations/chat/ollama.ipynb index 22a87ebfb76..d8e3b6ca4aa 100644 --- a/docs/docs/integrations/chat/ollama.ipynb +++ b/docs/docs/integrations/chat/ollama.ipynb @@ -54,12 +54,12 @@ "\n", "Here are a few ways to interact with pulled local models\n", "\n", - "#### directly in the terminal:\n", + "#### In the terminal:\n", "\n", "* All of your local models are automatically served on `localhost:11434`\n", "* Run `ollama run ` to start interacting via the command line directly\n", "\n", - "### via an API\n", + "#### Via an API\n", "\n", "Send an `application/json` request to the API endpoint of Ollama to interact.\n", "\n", @@ -72,9 +72,11 @@ "\n", "See the Ollama [API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md) for all endpoints.\n", "\n", - "#### via LangChain\n", + "#### Via LangChain\n", "\n", - "See a typical basic example of using Ollama via the `ChatOllama` chat model in your LangChain application." + "See a typical basic example of using Ollama via the `ChatOllama` chat model in your LangChain application. \n", + "\n", + "View the [API Reference for ChatOllama](https://api.python.langchain.com/en/latest/chat_models/langchain_community.chat_models.ollama.ChatOllama.html#langchain_community.chat_models.ollama.ChatOllama) for more." ] }, { @@ -105,7 +107,7 @@ "\n", "# using LangChain Expressive Language chain syntax\n", "# learn more about the LCEL on\n", - "# /docs/expression_language/why\n", + "# /docs/concepts/#langchain-expression-language-lcel\n", "chain = prompt | llm | StrOutputParser()\n", "\n", "# for brevity, response is printed in terminal\n", @@ -189,7 +191,7 @@ "\n", "## Building from source\n", "\n", - "For up to date instructions on building from source, check the Ollama documentation on [Building from Source](https://github.com/jmorganca/ollama?tab=readme-ov-file#building)" + "For up to date instructions on building from source, check the Ollama documentation on [Building from Source](https://github.com/ollama/ollama?tab=readme-ov-file#building)" ] }, { @@ -333,7 +335,7 @@ } ], "source": [ - "pip install --upgrade --quiet pillow" + "!pip install --upgrade --quiet pillow" ] }, { @@ -444,6 +446,24 @@ "\n", "print(query_chain)" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Concurrency Features\n", + "\n", + "Ollama supports concurrency inference for a single model, and or loading multiple models simulatenously (at least [version 0.1.33](https://github.com/ollama/ollama/releases)).\n", + "\n", + "Start the Ollama server with:\n", + "\n", + "* `OLLAMA_NUM_PARALLEL`: Handle multiple requests simultaneously for a single model\n", + "* `OLLAMA_MAX_LOADED_MODELS`: Load multiple models simultaneously\n", + "\n", + "Example: `OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve`\n", + "\n", + "Learn more about configuring Ollama server in [the official guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)." + ] } ], "metadata": { diff --git a/docs/docs/integrations/llms/ollama.ipynb b/docs/docs/integrations/llms/ollama.ipynb index 7c6be1a28c8..e80ce6e4b77 100644 --- a/docs/docs/integrations/llms/ollama.ipynb +++ b/docs/docs/integrations/llms/ollama.ipynb @@ -12,16 +12,15 @@ "\n", "It optimizes setup and configuration details, including GPU usage.\n", "\n", - "For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/jmorganca/ollama#model-library).\n", + "For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/ollama/ollama#model-library).\n", "\n", "## Setup\n", "\n", - "First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n", + "First, follow [these instructions](https://github.com/ollama/ollama) to set up and run a local Ollama instance:\n", "\n", "* [Download](https://ollama.ai/download) and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)\n", "* Fetch available LLM model via `ollama pull `\n", - " * View a list of available models via the [model library](https://ollama.ai/library)\n", - " * e.g., `ollama pull llama3`\n", + " * View a list of available models via the [model library](https://ollama.ai/library) and pull to use locally with the command `ollama pull llama3`\n", "* This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.\n", "\n", "> On Mac, the models will be download to `~/.ollama/models`\n", @@ -29,28 +28,29 @@ "> On Linux (or WSL), the models will be stored at `/usr/share/ollama/.ollama/models`\n", "\n", "* Specify the exact version of the model of interest as such `ollama pull vicuna:13b-v1.5-16k-q4_0` (View the [various tags for the `Vicuna`](https://ollama.ai/library/vicuna/tags) model in this instance)\n", - "* To view all pulled models, use `ollama list`\n", + "* To view all pulled models on your local instance, use `ollama list`\n", "* To chat directly with a model from the command line, use `ollama run `\n", - "* View the [Ollama documentation](https://github.com/jmorganca/ollama) for more commands. Run `ollama help` in the terminal to see available commands too.\n", + "* View the [Ollama documentation](https://github.com/ollama/ollama) for more commands. \n", + "* Run `ollama help` in the terminal to see available commands too.\n", "\n", "## Usage\n", "\n", - "You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain.llms.ollama.Ollama.html).\n", + "You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.ollama.Ollama.html).\n", "\n", - "If you are using a LLaMA `chat` model (e.g., `ollama pull llama3`) then you can use the `ChatOllama` interface.\n", + "If you are using a LLaMA `chat` model (e.g., `ollama pull llama3`) then you can use the `ChatOllama` [interface](https://python.langchain.com/v0.2/docs/integrations/chat/ollama/).\n", "\n", - "This includes [special tokens](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) for system message and user input.\n", + "This includes [special tokens](https://ollama.com/library/llama3) for system message and user input.\n", "\n", "## Interacting with Models \n", "\n", "Here are a few ways to interact with pulled local models\n", "\n", - "#### directly in the terminal:\n", + "#### In the terminal:\n", "\n", "* All of your local models are automatically served on `localhost:11434`\n", "* Run `ollama run ` to start interacting via the command line directly\n", "\n", - "### via an API\n", + "#### Via the API\n", "\n", "Send an `application/json` request to the API endpoint of Ollama to interact.\n", "\n", @@ -61,11 +61,20 @@ "}'\n", "```\n", "\n", - "See the Ollama [API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md) for all endpoints.\n", + "See the Ollama [API documentation](https://github.com/ollama/ollama/blob/main/docs/api.md) for all endpoints.\n", "\n", "#### via LangChain\n", "\n", - "See a typical basic example of using Ollama chat model in your LangChain application." + "See a typical basic example of using [Ollama chat model](https://python.langchain.com/v0.2/docs/integrations/chat/ollama/) in your LangChain application." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install langchain-community" ] }, { @@ -87,7 +96,9 @@ "source": [ "from langchain_community.llms import Ollama\n", "\n", - "llm = Ollama(model=\"llama3\")\n", + "llm = Ollama(\n", + " model=\"llama3\"\n", + ") # assuming you have Ollama installed and have llama3 model pulled with `ollama pull llama3 `\n", "\n", "llm.invoke(\"Tell me a joke\")" ] @@ -280,6 +291,24 @@ "llm_with_image_context = bakllava.bind(images=[image_b64])\n", "llm_with_image_context.invoke(\"What is the dollar based gross retention rate:\")" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Concurrency Features\n", + "\n", + "Ollama supports concurrency inference for a single model, and or loading multiple models simulatenously (at least [version 0.1.33](https://github.com/ollama/ollama/releases)).\n", + "\n", + "Start the Ollama server with:\n", + "\n", + "* `OLLAMA_NUM_PARALLEL`: Handle multiple requests simultaneously for a single model\n", + "* `OLLAMA_MAX_LOADED_MODELS`: Load multiple models simultaneously\n", + "\n", + "Example: `OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve`\n", + "\n", + "Learn more about configuring Ollama server in [the official guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)." + ] } ], "metadata": { diff --git a/docs/docs/integrations/text_embedding/ollama.ipynb b/docs/docs/integrations/text_embedding/ollama.ipynb index 915f9edab02..c7af848287d 100644 --- a/docs/docs/integrations/text_embedding/ollama.ipynb +++ b/docs/docs/integrations/text_embedding/ollama.ipynb @@ -7,7 +7,27 @@ "source": [ "# Ollama\n", "\n", - "Let's load the Ollama Embeddings class." + "\"Ollama supports embedding models, making it possible to build retrieval augmented generation (RAG) applications that combine text prompts with existing documents or other data.\" Learn more about the introduction to [Ollama Embeddings](https://ollama.com/blog/embedding-models) in the blog post.\n", + "\n", + "To use Ollama Embeddings, first, install [LangChain Community](https://pypi.org/project/langchain-community/) package:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "854d6a2e", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install langchain-community" + ] + }, + { + "cell_type": "markdown", + "id": "54fbb4cd", + "metadata": {}, + "source": [ + "Load the Ollama Embeddings class:" ] }, { @@ -17,26 +37,12 @@ "metadata": {}, "outputs": [], "source": [ - "from langchain_community.embeddings import OllamaEmbeddings" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "2c66e5da", - "metadata": {}, - "outputs": [], - "source": [ - "embeddings = OllamaEmbeddings()" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "01370375", - "metadata": {}, - "outputs": [], - "source": [ + "from langchain_community.embeddings import OllamaEmbeddings\n", + "\n", + "embeddings = (\n", + " OllamaEmbeddings()\n", + ") # by default, uses llama2. Run `ollama pull llama2` to pull down the model\n", + "\n", "text = \"This is a test document.\"" ] }, @@ -105,7 +111,13 @@ "id": "bb61bbeb", "metadata": {}, "source": [ - "Let's load the Ollama Embeddings class with smaller model (e.g. llama:7b). Note: See other supported models [https://ollama.ai/library](https://ollama.ai/library)" + "### Embedding Models\n", + "\n", + "Ollama has embedding models, that are lightweight enough for use in embeddings, with the smallest about the size of 25Mb. See some of the available [embedding models from Ollama](https://ollama.com/blog/embedding-models).\n", + "\n", + "Let's load the Ollama Embeddings class with smaller model (e.g. `mxbai-embed-large`). \n", + "\n", + "> Note: See other supported models [https://ollama.ai/library](https://ollama.ai/library)" ] }, { @@ -115,26 +127,8 @@ "metadata": {}, "outputs": [], "source": [ - "embeddings = OllamaEmbeddings(model=\"llama2:7b\")" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "14aefb64", - "metadata": {}, - "outputs": [], - "source": [ - "text = \"This is a test document.\"" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "3c39ed33", - "metadata": {}, - "outputs": [], - "source": [ + "embeddings = OllamaEmbeddings(model=\"mxbai-embed-large\")\n", + "text = \"This is a test document.\"\n", "query_result = embeddings.embed_query(text)" ] },