mirror of
https://github.com/hwchase17/langchain.git
synced 2025-08-31 18:38:48 +00:00
Docs: Ollama (LLM, Chat Model & Text Embedding) (#22321)
- [x] Docs Update: Ollama - llm/ollama - Switched to using llama3 as model with reference to templating and prompting - Added concurrency notes to llm/ollama docs - chat_models/ollama - Added concurrency notes to llm/ollama docs - text_embedding/ollama - include example for specific embedding models from Ollama
This commit is contained in:
@@ -54,12 +54,12 @@
|
||||
"\n",
|
||||
"Here are a few ways to interact with pulled local models\n",
|
||||
"\n",
|
||||
"#### directly in the terminal:\n",
|
||||
"#### In the terminal:\n",
|
||||
"\n",
|
||||
"* All of your local models are automatically served on `localhost:11434`\n",
|
||||
"* Run `ollama run <name-of-model>` to start interacting via the command line directly\n",
|
||||
"\n",
|
||||
"### via an API\n",
|
||||
"#### Via an API\n",
|
||||
"\n",
|
||||
"Send an `application/json` request to the API endpoint of Ollama to interact.\n",
|
||||
"\n",
|
||||
@@ -72,9 +72,11 @@
|
||||
"\n",
|
||||
"See the Ollama [API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md) for all endpoints.\n",
|
||||
"\n",
|
||||
"#### via LangChain\n",
|
||||
"#### Via LangChain\n",
|
||||
"\n",
|
||||
"See a typical basic example of using Ollama via the `ChatOllama` chat model in your LangChain application."
|
||||
"See a typical basic example of using Ollama via the `ChatOllama` chat model in your LangChain application. \n",
|
||||
"\n",
|
||||
"View the [API Reference for ChatOllama](https://api.python.langchain.com/en/latest/chat_models/langchain_community.chat_models.ollama.ChatOllama.html#langchain_community.chat_models.ollama.ChatOllama) for more."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -105,7 +107,7 @@
|
||||
"\n",
|
||||
"# using LangChain Expressive Language chain syntax\n",
|
||||
"# learn more about the LCEL on\n",
|
||||
"# /docs/expression_language/why\n",
|
||||
"# /docs/concepts/#langchain-expression-language-lcel\n",
|
||||
"chain = prompt | llm | StrOutputParser()\n",
|
||||
"\n",
|
||||
"# for brevity, response is printed in terminal\n",
|
||||
@@ -189,7 +191,7 @@
|
||||
"\n",
|
||||
"## Building from source\n",
|
||||
"\n",
|
||||
"For up to date instructions on building from source, check the Ollama documentation on [Building from Source](https://github.com/jmorganca/ollama?tab=readme-ov-file#building)"
|
||||
"For up to date instructions on building from source, check the Ollama documentation on [Building from Source](https://github.com/ollama/ollama?tab=readme-ov-file#building)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -333,7 +335,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pip install --upgrade --quiet pillow"
|
||||
"!pip install --upgrade --quiet pillow"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -444,6 +446,24 @@
|
||||
"\n",
|
||||
"print(query_chain)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Concurrency Features\n",
|
||||
"\n",
|
||||
"Ollama supports concurrency inference for a single model, and or loading multiple models simulatenously (at least [version 0.1.33](https://github.com/ollama/ollama/releases)).\n",
|
||||
"\n",
|
||||
"Start the Ollama server with:\n",
|
||||
"\n",
|
||||
"* `OLLAMA_NUM_PARALLEL`: Handle multiple requests simultaneously for a single model\n",
|
||||
"* `OLLAMA_MAX_LOADED_MODELS`: Load multiple models simultaneously\n",
|
||||
"\n",
|
||||
"Example: `OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve`\n",
|
||||
"\n",
|
||||
"Learn more about configuring Ollama server in [the official guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@@ -12,16 +12,15 @@
|
||||
"\n",
|
||||
"It optimizes setup and configuration details, including GPU usage.\n",
|
||||
"\n",
|
||||
"For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/jmorganca/ollama#model-library).\n",
|
||||
"For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/ollama/ollama#model-library).\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n",
|
||||
"First, follow [these instructions](https://github.com/ollama/ollama) to set up and run a local Ollama instance:\n",
|
||||
"\n",
|
||||
"* [Download](https://ollama.ai/download) and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)\n",
|
||||
"* Fetch available LLM model via `ollama pull <name-of-model>`\n",
|
||||
" * View a list of available models via the [model library](https://ollama.ai/library)\n",
|
||||
" * e.g., `ollama pull llama3`\n",
|
||||
" * View a list of available models via the [model library](https://ollama.ai/library) and pull to use locally with the command `ollama pull llama3`\n",
|
||||
"* This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.\n",
|
||||
"\n",
|
||||
"> On Mac, the models will be download to `~/.ollama/models`\n",
|
||||
@@ -29,28 +28,29 @@
|
||||
"> On Linux (or WSL), the models will be stored at `/usr/share/ollama/.ollama/models`\n",
|
||||
"\n",
|
||||
"* Specify the exact version of the model of interest as such `ollama pull vicuna:13b-v1.5-16k-q4_0` (View the [various tags for the `Vicuna`](https://ollama.ai/library/vicuna/tags) model in this instance)\n",
|
||||
"* To view all pulled models, use `ollama list`\n",
|
||||
"* To view all pulled models on your local instance, use `ollama list`\n",
|
||||
"* To chat directly with a model from the command line, use `ollama run <name-of-model>`\n",
|
||||
"* View the [Ollama documentation](https://github.com/jmorganca/ollama) for more commands. Run `ollama help` in the terminal to see available commands too.\n",
|
||||
"* View the [Ollama documentation](https://github.com/ollama/ollama) for more commands. \n",
|
||||
"* Run `ollama help` in the terminal to see available commands too.\n",
|
||||
"\n",
|
||||
"## Usage\n",
|
||||
"\n",
|
||||
"You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain.llms.ollama.Ollama.html).\n",
|
||||
"You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.ollama.Ollama.html).\n",
|
||||
"\n",
|
||||
"If you are using a LLaMA `chat` model (e.g., `ollama pull llama3`) then you can use the `ChatOllama` interface.\n",
|
||||
"If you are using a LLaMA `chat` model (e.g., `ollama pull llama3`) then you can use the `ChatOllama` [interface](https://python.langchain.com/v0.2/docs/integrations/chat/ollama/).\n",
|
||||
"\n",
|
||||
"This includes [special tokens](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) for system message and user input.\n",
|
||||
"This includes [special tokens](https://ollama.com/library/llama3) for system message and user input.\n",
|
||||
"\n",
|
||||
"## Interacting with Models \n",
|
||||
"\n",
|
||||
"Here are a few ways to interact with pulled local models\n",
|
||||
"\n",
|
||||
"#### directly in the terminal:\n",
|
||||
"#### In the terminal:\n",
|
||||
"\n",
|
||||
"* All of your local models are automatically served on `localhost:11434`\n",
|
||||
"* Run `ollama run <name-of-model>` to start interacting via the command line directly\n",
|
||||
"\n",
|
||||
"### via an API\n",
|
||||
"#### Via the API\n",
|
||||
"\n",
|
||||
"Send an `application/json` request to the API endpoint of Ollama to interact.\n",
|
||||
"\n",
|
||||
@@ -61,11 +61,20 @@
|
||||
"}'\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"See the Ollama [API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md) for all endpoints.\n",
|
||||
"See the Ollama [API documentation](https://github.com/ollama/ollama/blob/main/docs/api.md) for all endpoints.\n",
|
||||
"\n",
|
||||
"#### via LangChain\n",
|
||||
"\n",
|
||||
"See a typical basic example of using Ollama chat model in your LangChain application."
|
||||
"See a typical basic example of using [Ollama chat model](https://python.langchain.com/v0.2/docs/integrations/chat/ollama/) in your LangChain application."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install langchain-community"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -87,7 +96,9 @@
|
||||
"source": [
|
||||
"from langchain_community.llms import Ollama\n",
|
||||
"\n",
|
||||
"llm = Ollama(model=\"llama3\")\n",
|
||||
"llm = Ollama(\n",
|
||||
" model=\"llama3\"\n",
|
||||
") # assuming you have Ollama installed and have llama3 model pulled with `ollama pull llama3 `\n",
|
||||
"\n",
|
||||
"llm.invoke(\"Tell me a joke\")"
|
||||
]
|
||||
@@ -280,6 +291,24 @@
|
||||
"llm_with_image_context = bakllava.bind(images=[image_b64])\n",
|
||||
"llm_with_image_context.invoke(\"What is the dollar based gross retention rate:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Concurrency Features\n",
|
||||
"\n",
|
||||
"Ollama supports concurrency inference for a single model, and or loading multiple models simulatenously (at least [version 0.1.33](https://github.com/ollama/ollama/releases)).\n",
|
||||
"\n",
|
||||
"Start the Ollama server with:\n",
|
||||
"\n",
|
||||
"* `OLLAMA_NUM_PARALLEL`: Handle multiple requests simultaneously for a single model\n",
|
||||
"* `OLLAMA_MAX_LOADED_MODELS`: Load multiple models simultaneously\n",
|
||||
"\n",
|
||||
"Example: `OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve`\n",
|
||||
"\n",
|
||||
"Learn more about configuring Ollama server in [the official guide](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
@@ -7,7 +7,27 @@
|
||||
"source": [
|
||||
"# Ollama\n",
|
||||
"\n",
|
||||
"Let's load the Ollama Embeddings class."
|
||||
"\"Ollama supports embedding models, making it possible to build retrieval augmented generation (RAG) applications that combine text prompts with existing documents or other data.\" Learn more about the introduction to [Ollama Embeddings](https://ollama.com/blog/embedding-models) in the blog post.\n",
|
||||
"\n",
|
||||
"To use Ollama Embeddings, first, install [LangChain Community](https://pypi.org/project/langchain-community/) package:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "854d6a2e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install langchain-community"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "54fbb4cd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Load the Ollama Embeddings class:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -17,26 +37,12 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.embeddings import OllamaEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "2c66e5da",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = OllamaEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "01370375",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.embeddings import OllamaEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = (\n",
|
||||
" OllamaEmbeddings()\n",
|
||||
") # by default, uses llama2. Run `ollama pull llama2` to pull down the model\n",
|
||||
"\n",
|
||||
"text = \"This is a test document.\""
|
||||
]
|
||||
},
|
||||
@@ -105,7 +111,13 @@
|
||||
"id": "bb61bbeb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's load the Ollama Embeddings class with smaller model (e.g. llama:7b). Note: See other supported models [https://ollama.ai/library](https://ollama.ai/library)"
|
||||
"### Embedding Models\n",
|
||||
"\n",
|
||||
"Ollama has embedding models, that are lightweight enough for use in embeddings, with the smallest about the size of 25Mb. See some of the available [embedding models from Ollama](https://ollama.com/blog/embedding-models).\n",
|
||||
"\n",
|
||||
"Let's load the Ollama Embeddings class with smaller model (e.g. `mxbai-embed-large`). \n",
|
||||
"\n",
|
||||
"> Note: See other supported models [https://ollama.ai/library](https://ollama.ai/library)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -115,26 +127,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = OllamaEmbeddings(model=\"llama2:7b\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "14aefb64",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"text = \"This is a test document.\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "3c39ed33",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = OllamaEmbeddings(model=\"mxbai-embed-large\")\n",
|
||||
"text = \"This is a test document.\"\n",
|
||||
"query_result = embeddings.embed_query(text)"
|
||||
]
|
||||
},
|
||||
|
Reference in New Issue
Block a user