langchain/docs/docs/integrations/chat/vllm.ipynb

{
 "cells": [
  {
   "cell_type": "raw",
   "id": "eb65deaa",
   "metadata": {},
   "source": [
    "---\n",
    "sidebar_label: vLLM Chat\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb7e5679-aa06-47e4-a1a3-b6b70e604017",
   "metadata": {},
   "source": [
    "# vLLM Chat\n",
    "\n",
    "vLLM can be deployed as a server that mimics the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API. This server can be queried in the same format as OpenAI API.\n",
    "\n",
    "This notebook covers how to get started with vLLM chat models using langchain's `ChatOpenAI` **as it is**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "060a2e3d-d42f-4221-bd09-a9a06544dcd3",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.prompts.chat import (\n",
    "    ChatPromptTemplate,\n",
    "    HumanMessagePromptTemplate,\n",
    "    SystemMessagePromptTemplate,\n",
    ")\n",
    "from langchain.schema import HumanMessage, SystemMessage\n",
    "from langchain_community.chat_models import ChatOpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "bf24d732-68a9-44fd-b05d-4903ce5620c6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "inference_server_url = \"http://localhost:8000/v1\"\n",
    "\n",
    "chat = ChatOpenAI(\n",
    "    model=\"mosaicml/mpt-7b\",\n",
    "    openai_api_key=\"EMPTY\",\n",
    "    openai_api_base=inference_server_url,\n",
    "    max_tokens=5,\n",
    "    temperature=0,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "aea4e363-5688-4b07-82ed-6aa8153c2377",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content=' Io amo programmare', additional_kwargs={}, example=False)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "messages = [\n",
    "    SystemMessage(\n",
    "        content=\"You are a helpful assistant that translates English to Italian.\"\n",
    "    ),\n",
    "    HumanMessage(\n",
    "        content=\"Translate the following sentence from English to Italian: I love programming.\"\n",
    "    ),\n",
    "]\n",
    "chat(messages)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55fc7046-a6dc-4720-8c0c-24a6db76a4f4",
   "metadata": {},
   "source": [
    "You can make use of templating by using a `MessagePromptTemplate`. You can build a `ChatPromptTemplate` from one or more `MessagePromptTemplates`. You can use ChatPromptTemplate's format_prompt -- this returns a `PromptValue`, which you can convert to a string or `Message` object, depending on whether you want to use the formatted value as input to an llm or chat model.\n",
    "\n",
    "For convenience, there is a `from_template` method exposed on the template. If you were to use this template, this is what it would look like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "123980e9-0dee-4ce5-bde6-d964dd90129c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "template = (\n",
    "    \"You are a helpful assistant that translates {input_language} to {output_language}.\"\n",
    ")\n",
    "system_message_prompt = SystemMessagePromptTemplate.from_template(template)\n",
    "human_template = \"{text}\"\n",
    "human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "b2fb8c59-8892-4270-85a2-4f8ab276b75d",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content=' I love programming too.', additional_kwargs={}, example=False)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chat_prompt = ChatPromptTemplate.from_messages(\n",
    "    [system_message_prompt, human_message_prompt]\n",
    ")\n",
    "\n",
    "# get a chat completion from the formatted messages\n",
    "chat(\n",
    "    chat_prompt.format_prompt(\n",
    "        input_language=\"English\", output_language=\"Italian\", text=\"I love programming.\"\n",
    "    ).to_messages()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0bbd9861-2b94-4920-8708-b690004f4c4d",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_pytorch_p310",
   "language": "python",
   "name": "conda_pytorch_p310"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}