langchain/docs/versioned_docs/version-0.2.x/integrations/llms/titan_takeoff.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Titan Takeoff\n",
    "\n",
    "`TitanML` helps businesses build and deploy better, smaller, cheaper, and faster NLP models through our training, compression, and inference optimization platform.\n",
    "\n",
    "Our inference server, [Titan Takeoff](https://docs.titanml.co/docs/intro) enables deployment of LLMs locally on your hardware in a single command. Most generative model architectures are supported, such as Falcon, Llama 2, GPT2, T5 and many more. If you experience trouble with a specific model, please let us know at hello@titanml.co."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example usage\n",
    "Here are some helpful examples to get started using Titan Takeoff Server. You need to make sure Takeoff Server has been started in the background before running these commands. For more information see [docs page for launching Takeoff](https://docs.titanml.co/docs/Docs/launching/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "from langchain.callbacks.manager import CallbackManager\n",
    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
    "from langchain.prompts import PromptTemplate\n",
    "\n",
    "# Note importing TitanTakeoffPro instead of TitanTakeoff will work as well both use same object under the hood\n",
    "from langchain_community.llms import TitanTakeoff"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 1\n",
    "\n",
    "Basic use assuming Takeoff is running on your machine using its default ports (ie localhost:3000).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = TitanTakeoff()\n",
    "output = llm.invoke(\"What is the weather in London in August?\")\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 2\n",
    "\n",
    "Specifying a port and other generation parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = TitanTakeoff(port=3000)\n",
    "# A comprehensive list of parameters can be found at https://docs.titanml.co/docs/next/apis/Takeoff%20inference_REST_API/generate#request\n",
    "output = llm.invoke(\n",
    "    \"What is the largest rainforest in the world?\",\n",
    "    consumer_group=\"primary\",\n",
    "    min_new_tokens=128,\n",
    "    max_new_tokens=512,\n",
    "    no_repeat_ngram_size=2,\n",
    "    sampling_topk=1,\n",
    "    sampling_topp=1.0,\n",
    "    sampling_temperature=1.0,\n",
    "    repetition_penalty=1.0,\n",
    "    regex_string=\"\",\n",
    "    json_schema=None,\n",
    ")\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 3\n",
    "\n",
    "Using generate for multiple inputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = TitanTakeoff()\n",
    "rich_output = llm.generate([\"What is Deep Learning?\", \"What is Machine Learning?\"])\n",
    "print(rich_output.generations)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 4\n",
    "\n",
    "Streaming output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = TitanTakeoff(\n",
    "    streaming=True, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])\n",
    ")\n",
    "prompt = \"What is the capital of France?\"\n",
    "output = llm.invoke(prompt)\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 5\n",
    "\n",
    "Using LCEL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = TitanTakeoff()\n",
    "prompt = PromptTemplate.from_template(\"Tell me about {topic}\")\n",
    "chain = prompt | llm\n",
    "output = chain.invoke({\"topic\": \"the universe\"})\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 6\n",
    "\n",
    "Starting readers using TitanTakeoff Python Wrapper. If you haven't created any readers with first launching Takeoff, or you want to add another you can do so when you initialize the TitanTakeoff object. Just pass a list of model configs you want to start as the `models` parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Model config for the llama model, where you can specify the following parameters:\n",
    "#   model_name (str): The name of the model to use\n",
    "#   device: (str): The device to use for inference, cuda or cpu\n",
    "#   consumer_group (str): The consumer group to place the reader into\n",
    "#   tensor_parallel (Optional[int]): The number of gpus you would like your model to be split across\n",
    "#   max_seq_length (int): The maximum sequence length to use for inference, defaults to 512\n",
    "#   max_batch_size (int_: The max batch size for continuous batching of requests\n",
    "llama_model = {\n",
    "    \"model_name\": \"TheBloke/Llama-2-7b-Chat-AWQ\",\n",
    "    \"device\": \"cuda\",\n",
    "    \"consumer_group\": \"llama\",\n",
    "}\n",
    "llm = TitanTakeoff(models=[llama_model])\n",
    "\n",
    "# The model needs time to spin up, length of time need will depend on the size of model and your network connection speed\n",
    "time.sleep(60)\n",
    "\n",
    "prompt = \"What is the capital of France?\"\n",
    "output = llm.invoke(prompt, consumer_group=\"llama\")\n",
    "print(output)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}