{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Ollama\n", "\n", "[Ollama](https://ollama.ai/) allows you to run open-source large language models, such as Llama 2, locally.\n", "\n", "Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. \n", "\n", "It optimizes setup and configuration details, including GPU usage.\n", "\n", "For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/jmorganca/ollama#model-library).\n", "\n", "## Setup\n", "\n", "First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n", "\n", "* [Download](https://ollama.ai/download)\n", "* Fetch a model via `ollama pull `\n", "* e.g., for `Llama-7b`: `ollama pull llama2` (see full list [here](https://github.com/jmorganca/ollama))\n", "* This will download the most basic version of the model typically (e.g., smallest # parameters and `q4_0`)\n", "* On Mac, it will download to \n", "\n", "`~/.ollama/models/manifests/registry.ollama.ai/library//latest`\n", "\n", "* And we specify a particular version, e.g., for `ollama pull vicuna:13b-v1.5-16k-q4_0`\n", "* The file is here with the model version in place of `latest`\n", "\n", "`~/.ollama/models/manifests/registry.ollama.ai/library/vicuna/13b-v1.5-16k-q4_0`\n", "\n", "You can easily access models in a few ways:\n", "\n", "1/ if the app is running:\n", "* All of your local models are automatically served on `localhost:11434`\n", "* Select your model when setting `llm = Ollama(..., model=\":\")`\n", "* If you set `llm = Ollama(..., model=\" None:\n", " print(response.generations[0][0].generation_info)\n", " \n", "callback_manager = CallbackManager([StreamingStdOutCallbackHandler(), GenerationStatisticsCallback()])\n", "\n", "llm = Ollama(base_url=\"http://localhost:11434\",\n", " model=\"llama2\",\n", " verbose=True,\n", " callback_manager=callback_manager)\n", "\n", "qa_chain = RetrievalQA.from_chain_type(\n", " llm,\n", " retriever=vectorstore.as_retriever(),\n", " chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT},\n", ")\n", "\n", "question = \"What are the approaches to Task Decomposition?\"\n", "result = qa_chain({\"query\": question})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`eval_count` / (`eval_duration`/10e9) gets `tok / s`" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "47.22003469910937" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "62 / (1313002000/1000/1000/1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the Hub for prompt management\n", " \n", "Open source models often benefit from specific prompts. \n", "\n", "For example, [Mistral 7b](https://mistral.ai/news/announcing-mistral-7b/) was fine-tuned for chat using the prompt format shown [here](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).\n", "\n", "Get the model: `ollama pull mistral:7b-instruct`" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# LLM\n", "from langchain.llms import Ollama\n", "from langchain.callbacks.manager import CallbackManager\n", "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n", "llm = Ollama(model=\"mistral:7b-instruct\",\n", " verbose=True,\n", " callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "from langchain import hub\n", "QA_CHAIN_PROMPT = hub.pull(\"rlm/rag-prompt-mistral\")\n", "\n", "# QA chain\n", "from langchain.chains import RetrievalQA\n", "qa_chain = RetrievalQA.from_chain_type(\n", " llm,\n", " retriever=vectorstore.as_retriever(),\n", " chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT},\n", ")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "There are different approaches to Task Decomposition for AI Agents such as Chain of thought (CoT) and Tree of Thoughts (ToT). CoT breaks down big tasks into multiple manageable tasks and generates multiple thoughts per step, while ToT explores multiple reasoning possibilities at each step. Task decomposition can be done by LLM with simple prompting or using task-specific instructions or human inputs." ] } ], "source": [ "question = \"What are the various approaches to Task Decomposition for AI Agents?\"\n", "result = qa_chain({\"query\": question})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 4 }