add vertex prod features (#10910)

- chat vertex async - vertex stream - vertex full generation info - vertex use server-side stopping - model garden async - update docs for all the above in follow up will add [] chat vertex full generation info [] chat vertex retries [] scheduled tests
2026-01-29 21:30:18 +00:00 · 2023-09-22 01:44:09 -07:00
parent dccc20b402
commit cab55e9bc1
10 changed files with 721 additions and 267 deletions
--- a/docs/extras/integrations/chat/google_vertex_ai_palm.ipynb
+++ b/docs/extras/integrations/chat/google_vertex_ai_palm.ipynb
@@ -5,7 +5,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Google Cloud Platform Vertex AI PaLM \n",
+    "# GCP Vertex AI \n",
    "\n",
    "Note: This is seperate from the Google PaLM integration. Google has chosen to offer an enterprise version of PaLM through GCP, and this supports the models made available through there. \n",
    "\n",
@@ -31,7 +31,7 @@
   },
   "outputs": [],
   "source": [
-    "#!pip install google-cloud-aiplatform"
+    "#!pip install langchain google-cloud-aiplatform"
   ]
  },
  {
@@ -41,12 +41,7 @@
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatVertexAI\n",
-    "from langchain.prompts.chat import (\n",
-    "    ChatPromptTemplate,\n",
-    "    SystemMessagePromptTemplate,\n",
-    "    HumanMessagePromptTemplate,\n",
-    ")\n",
-    "from langchain.schema import HumanMessage, SystemMessage"
+    "from langchain.prompts import ChatPromptTemplate"
   ]
  },
  {
@@ -60,82 +55,78 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 34,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "system = \"You are a helpful assistant who translate English to French\"\n",
+    "human = \"Translate this sentence from English to French. I love programming.\"\n",
+    "prompt = ChatPromptTemplate.from_messages(\n",
+    "    [(\"system\", system), (\"human\",  human)]\n",
+    ")\n",
+    "messages = prompt.format_messages()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='Sure, here is the translation of the sentence \"I love programming\" from English to French:\\n\\nJ\\'aime programmer.', additional_kwargs={}, example=False)"
+       "AIMessage(content=\" J'aime la programmation.\", additional_kwargs={}, example=False)"
      ]
     },
-     "execution_count": 4,
+     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "messages = [\n",
-    "    SystemMessage(\n",
-    "        content=\"You are a helpful assistant that translates English to French.\"\n",
-    "    ),\n",
-    "    HumanMessage(\n",
-    "        content=\"Translate this sentence from English to French. I love programming.\"\n",
-    "    ),\n",
-    "]\n",
    "chat(messages)"
   ]
  },
  {
-   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "You can make use of templating by using a `MessagePromptTemplate`. You can build a `ChatPromptTemplate` from one or more `MessagePromptTemplates`. You can use `ChatPromptTemplate`'s `format_prompt` -- this returns a `PromptValue`, which you can convert to a string or Message object, depending on whether you want to use the formatted value as input to an llm or chat model.\n",
-    "\n",
-    "For convenience, there is a `from_template` method exposed on the template. If you were to use this template, this is what it would look like:"
+    "If we want to construct a simple chain that takes user specified parameters:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
-    "template = (\n",
-    "    \"You are a helpful assistant that translates {input_language} to {output_language}.\"\n",
-    ")\n",
-    "system_message_prompt = SystemMessagePromptTemplate.from_template(template)\n",
-    "human_template = \"{text}\"\n",
-    "human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)"
+    "system = \"You are a helpful assistant that translates {input_language} to {output_language}.\"\n",
+    "human = \"{text}\"\n",
+    "prompt = ChatPromptTemplate.from_messages(\n",
+    "    [(\"system\", system), (\"human\",  human)]\n",
+    ")"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='Sure, here is the translation of \"I love programming\" in French:\\n\\nJ\\'aime programmer.', additional_kwargs={}, example=False)"
+       "AIMessage(content=' 私はプログラミングが大好きです。', additional_kwargs={}, example=False)"
      ]
     },
-     "execution_count": 7,
+     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "chat_prompt = ChatPromptTemplate.from_messages(\n",
-    "    [system_message_prompt, human_message_prompt]\n",
-    ")\n",
-    "\n",
-    "# get a chat completion from the formatted messages\n",
-    "chat(\n",
-    "    chat_prompt.format_prompt(\n",
-    "        input_language=\"English\", output_language=\"French\", text=\"I love programming.\"\n",
-    "    ).to_messages()\n",
+    "chain = prompt | chat\n",
+    "chain.invoke(\n",
+    "    {\"input_language\": \"English\", \"output_language\": \"Japanese\", \"text\": \"I love programming\"}\n",
    ")"
   ]
  },
@@ -153,60 +144,129 @@
    "tags": []
   },
   "source": [
+    "## Code generation chat models\n",
    "You can now leverage the Codey API for code chat within Vertex AI. The model name is:\n",
    "- codechat-bison: for code assistance"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 18,
   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2023-06-17T21:30:43.974841Z",
-     "iopub.status.busy": "2023-06-17T21:30:43.974431Z",
-     "iopub.status.idle": "2023-06-17T21:30:44.248119Z",
-     "shell.execute_reply": "2023-06-17T21:30:44.247362Z",
-     "shell.execute_reply.started": "2023-06-17T21:30:43.974820Z"
-    },
    "tags": []
   },
   "outputs": [],
   "source": [
-    "chat = ChatVertexAI(model_name=\"codechat-bison\")"
+    "chat = ChatVertexAI(\n",
+    "    model_name=\"codechat-bison\",\n",
+    "    max_output_tokens=1000,\n",
+    "    temperature=0.5\n",
+    ")"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 20,
   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2023-06-17T21:30:45.146093Z",
-     "iopub.status.busy": "2023-06-17T21:30:45.145752Z",
-     "iopub.status.idle": "2023-06-17T21:30:47.449126Z",
-     "shell.execute_reply": "2023-06-17T21:30:47.448609Z",
-     "shell.execute_reply.started": "2023-06-17T21:30:45.146069Z"
-    },
    "tags": []
   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " ```python\n",
+      "def is_prime(x): \n",
+      "    if (x <= 1): \n",
+      "        return False\n",
+      "    for i in range(2, x): \n",
+      "        if (x % i == 0): \n",
+      "            return False\n",
+      "    return True\n",
+      "```\n"
+     ]
+    }
+   ],
+   "source": [
+    "# For simple string in string out usage, we can use the `predict` method:\n",
+    "print(chat.predict(\"Write a Python function to identify all prime numbers\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Asynchronous calls\n",
+    "\n",
+    "We can make asynchronous calls via the `agenerate` and `ainvoke` methods."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import asyncio\n",
+    "# import nest_asyncio\n",
+    "# nest_asyncio.apply()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='The following Python function can be used to identify all prime numbers up to a given integer:\\n\\n```\\ndef is_prime(n):\\n  \"\"\"\\n  Determines whether the given integer is prime.\\n\\n  Args:\\n    n: The integer to be tested for primality.\\n\\n  Returns:\\n    True if n is prime, False otherwise.\\n  \"\"\"\\n\\n  # Check if n is divisible by 2.\\n  if n % 2 == 0:\\n    return False\\n\\n  # Check if n is divisible by any integer from 3 to the square root', additional_kwargs={}, example=False)"
+       "LLMResult(generations=[[ChatGeneration(text=\" J'aime la programmation.\", generation_info=None, message=AIMessage(content=\" J'aime la programmation.\", additional_kwargs={}, example=False))]], llm_output={}, run=[RunInfo(run_id=UUID('223599ef-38f8-4c79-ac6d-a5013060eb9d'))])"
      ]
     },
-     "execution_count": 4,
+     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "messages = [\n",
-    "    HumanMessage(\n",
-    "        content=\"How do I create a python function to identify all prime numbers?\"\n",
-    "    )\n",
-    "]\n",
-    "chat(messages)"
+    "chat = ChatVertexAI(\n",
+    "    model_name=\"chat-bison\",\n",
+    "    max_output_tokens=1000,\n",
+    "    temperature=0.7,\n",
+    "    top_p=0.95,\n",
+    "    top_k=40,\n",
+    ")\n",
+    "\n",
+    "asyncio.run(chat.agenerate([messages]))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "AIMessage(content=' अहं प्रोग्रामिंग प्रेमामि', additional_kwargs={}, example=False)"
+      ]
+     },
+     "execution_count": 36,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "asyncio.run(chain.ainvoke({\"input_language\": \"English\", \"output_language\": \"Sanskrit\", \"text\": \"I love programming\"}))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Streaming calls\n",
+    "\n",
+    "We can also stream outputs via the `stream` method:"
   ]
  },
  {
@@ -214,14 +274,51 @@
   "execution_count": null,
   "metadata": {},
   "outputs": [],
-   "source": []
+   "source": [
+    "import sys"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " 1. China (1,444,216,107)\n",
+      "2. India (1,393,409,038)\n",
+      "3. United States (332,403,650)\n",
+      "4. Indonesia (273,523,615)\n",
+      "5. Pakistan (220,892,340)\n",
+      "6. Brazil (212,559,409)\n",
+      "7. Nigeria (206,139,589)\n",
+      "8. Bangladesh (164,689,383)\n",
+      "9. Russia (145,934,462)\n",
+      "10. Mexico (128,932,488)\n",
+      "11. Japan (126,476,461)\n",
+      "12. Ethiopia (115,063,982)\n",
+      "13. Philippines (109,581,078)\n",
+      "14. Egypt (102,334,404)\n",
+      "15. Vietnam (97,338,589)"
+     ]
+    }
+   ],
+   "source": [
+    "prompt = ChatPromptTemplate.from_messages([(\"human\", \"List out the 15 most populous countries in the world\")])\n",
+    "messages = prompt.format_messages()\n",
+    "for chunk in chat.stream(messages):\n",
+    "    sys.stdout.write(chunk.content)\n",
+    "    sys.stdout.flush()"
+   ]
  }
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "poetry-venv",
   "language": "python",
-   "name": "python3"
+   "name": "poetry-venv"
  },
  "language_info": {
   "codemirror_mode": {
--- a/docs/extras/integrations/chat/index.mdx
+++ b/docs/extras/integrations/chat/index.mdx
@@ -26,7 +26,7 @@ ChatLiteLLM|✅|✅|✅|✅
 ChatMLflowAIGateway|✅|❌|❌|❌
 ChatOllama|✅|❌|✅|❌
 ChatOpenAI|✅|✅|✅|✅
-ChatVertexAI|✅|❌|✅|❌
+ChatVertexAI|✅|✅|✅|❌
 ErnieBotChat|✅|❌|❌|❌
 JinaChat|✅|✅|✅|✅
 MiniMaxChat|✅|✅|❌|❌