From e080281623597fb1bf5c7c8470b1fb48edb37a52 Mon Sep 17 00:00:00 2001 From: Leonid Kuligin Date: Fri, 1 Mar 2024 20:50:55 +0100 Subject: [PATCH] docs: cookbook on gemma integrations (#18213) - [ ] **PR title**: "cookbook: using Gemma on LangChain" - [ ] **PR message**: - **Description:** added a tutorial how to use Gemma with LangChain (from VertexAI or locally from Kaggle or HF) - **Dependencies:** langchain-google-vertexai==0.0.7 - **Twitter handle:** lkuligin --- cookbook/Gemma_LangChain.ipynb | 932 +++++++++++++++++++++++++++++++++ 1 file changed, 932 insertions(+) create mode 100644 cookbook/Gemma_LangChain.ipynb diff --git a/cookbook/Gemma_LangChain.ipynb b/cookbook/Gemma_LangChain.ipynb new file mode 100644 index 00000000000..80acfb07d7b --- /dev/null +++ b/cookbook/Gemma_LangChain.ipynb @@ -0,0 +1,932 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "BYejgj8Zf-LG", + "tags": [] + }, + "source": [ + "## Getting started with LangChain and Gemma, running locally or in the Cloud" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2IxjMb9-jIJ8" + }, + "source": [ + "### Installing dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 9436, + "status": "ok", + "timestamp": 1708975187360, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "XZaTsXfcheTF", + "outputId": "eb21d603-d824-46c5-f99f-087fb2f618b1", + "tags": [] + }, + "outputs": [], + "source": [ + "!pip install --upgrade langchain langchain-google-vertexai" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IXmAujvC3Kwp" + }, + "source": [ + "### Running the model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CI8Elyc5gBQF" + }, + "source": [ + "Go to the VertexAI Model Garden on Google Cloud [console](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/335), and deploy the desired version of Gemma to VertexAI. It will take a few minutes, and after the endpoint it ready, you need to copy its number." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "gv1j8FrVftsC" + }, + "outputs": [], + "source": [ + "# @title Basic parameters\n", + "project: str = \"PUT_YOUR_PROJECT_ID_HERE\" # @param {type:\"string\"}\n", + "endpoint_id: str = \"PUT_YOUR_ENDPOINT_ID_HERE\" # @param {type:\"string\"}\n", + "location: str = \"PUT_YOUR_ENDPOINT_LOCAtION_HERE\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "executionInfo": { + "elapsed": 3, + "status": "ok", + "timestamp": 1708975440503, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "bhIHsFGYjtFt", + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024-02-27 17:15:10.457149: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", + "2024-02-27 17:15:10.508925: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", + "2024-02-27 17:15:10.508957: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", + "2024-02-27 17:15:10.510289: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2024-02-27 17:15:10.518898: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", + "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" + ] + } + ], + "source": [ + "from langchain_google_vertexai import (\n", + " GemmaChatVertexAIModelGarden,\n", + " GemmaVertexAIModelGarden,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "executionInfo": { + "elapsed": 351, + "status": "ok", + "timestamp": 1708975440852, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "WJv-UVWwh0lk", + "tags": [] + }, + "outputs": [], + "source": [ + "llm = GemmaVertexAIModelGarden(\n", + " endpoint_id=endpoint_id,\n", + " project=project,\n", + " location=location,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 714, + "status": "ok", + "timestamp": 1708975441564, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "6kM7cEFdiN9h", + "outputId": "fb420c56-5614-4745-cda8-0ee450a3e539", + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Prompt:\n", + "What is the meaning of life?\n", + "Output:\n", + " Who am I? Why do I exist? These are questions I have struggled with\n" + ] + } + ], + "source": [ + "output = llm.invoke(\"What is the meaning of life?\")\n", + "print(output)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zzep9nfmuUcO" + }, + "source": [ + "We can also use Gemma as a multi-turn chat model:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 964, + "status": "ok", + "timestamp": 1708976298189, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "8tPHoM5XiZOl", + "outputId": "7b8fb652-9aed-47b0-c096-aa1abfc3a2a9", + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "content='Prompt:\\nuser\\nHow much is 2+2?\\nmodel\\nOutput:\\n8-years old.\\n\\n=0.3.1, but you have ml-dtypes 0.2.0 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0m" + ] + } + ], + "source": [ + "!pip install keras>=3 keras_nlp" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "E9zn8nYpv3QZ" + }, + "source": [ + "### Usage" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "executionInfo": { + "elapsed": 8536, + "status": "ok", + "timestamp": 1708976601206, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "0LFRmY8TjCkI", + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024-02-27 16:38:40.797559: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", + "2024-02-27 16:38:40.848444: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", + "2024-02-27 16:38:40.848478: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", + "2024-02-27 16:38:40.849728: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2024-02-27 16:38:40.857936: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", + "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" + ] + } + ], + "source": [ + "from langchain_google_vertexai import GemmaLocalKaggle" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v-o7oXVavdMQ" + }, + "source": [ + "You can specify the keras backend (by default it's `tensorflow`, but you can change it be `jax` or `torch`)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "executionInfo": { + "elapsed": 9, + "status": "ok", + "timestamp": 1708976601206, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "vvTUH8DNj5SF", + "tags": [] + }, + "outputs": [], + "source": [ + "# @title Basic parameters\n", + "keras_backend: str = \"jax\" # @param {type:\"string\"}\n", + "model_name: str = \"gemma_2b_en\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "executionInfo": { + "elapsed": 40836, + "status": "ok", + "timestamp": 1708976761257, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "YOmrqxo5kHXK", + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024-02-27 16:23:14.661164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20549 MB memory: -> device: 0, name: NVIDIA L4, pci bus id: 0000:00:03.0, compute capability: 8.9\n", + "normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.\n" + ] + } + ], + "source": [ + "llm = GemmaLocalKaggle(model_name=model_name, keras_backend=keras_backend)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "Zu6yPDUgkQtQ", + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "W0000 00:00:1709051129.518076 774855 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "What is the meaning of life?\n", + "\n", + "The question is one of the most important questions in the world.\n", + "\n", + "It’s the question that has\n" + ] + } + ], + "source": [ + "output = llm.invoke(\"What is the meaning of life?\", max_tokens=30)\n", + "print(output)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### ChatModel" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MSctpRE4u43N" + }, + "source": [ + "Same as above, using Gemma locally as a multi-turn chat model. You might need to re-start the notebook and clean your GPU memory in order to avoid OOM errors:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024-02-27 16:58:22.331067: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", + "2024-02-27 16:58:22.382948: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", + "2024-02-27 16:58:22.382978: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", + "2024-02-27 16:58:22.384312: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2024-02-27 16:58:22.392767: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", + "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" + ] + } + ], + "source": [ + "from langchain_google_vertexai import GemmaChatLocalKaggle" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# @title Basic parameters\n", + "keras_backend: str = \"jax\" # @param {type:\"string\"}\n", + "model_name: str = \"gemma_2b_en\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024-02-27 16:58:29.001922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20549 MB memory: -> device: 0, name: NVIDIA L4, pci bus id: 0000:00:03.0, compute capability: 8.9\n", + "normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.\n" + ] + } + ], + "source": [ + "llm = GemmaChatLocalKaggle(model_name=model_name, keras_backend=keras_backend)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "executionInfo": { + "elapsed": 3, + "status": "aborted", + "timestamp": 1708976382957, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": -60 + }, + "id": "JrJmvZqwwLqj" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024-02-27 16:58:49.848412: I external/local_xla/xla/service/service.cc:168] XLA service 0x55adc0cf2c10 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n", + "2024-02-27 16:58:49.848458: I external/local_xla/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA L4, Compute Capability 8.9\n", + "2024-02-27 16:58:50.116614: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.\n", + "2024-02-27 16:58:54.389324: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8900\n", + "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", + "I0000 00:00:1709053145.225207 784891 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.\n", + "W0000 00:00:1709053145.284227 784891 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "content=\"user\\nHi! Who are you?\\nmodel\\nI'm a model.\\n Tampoco\\nI'm a model.\"\n" + ] + } + ], + "source": [ + "from langchain_core.messages import HumanMessage\n", + "\n", + "message1 = HumanMessage(content=\"Hi! Who are you?\")\n", + "answer1 = llm.invoke([message1], max_tokens=30)\n", + "print(answer1)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "content=\"user\\nHi! Who are you?\\nmodel\\nuser\\nHi! Who are you?\\nmodel\\nI'm a model.\\n Tampoco\\nI'm a model.\\nuser\\nWhat can you help me with?\\nmodel\"\n" + ] + } + ], + "source": [ + "message2 = HumanMessage(content=\"What can you help me with?\")\n", + "answer2 = llm.invoke([message1, answer1, message2], max_tokens=60)\n", + "\n", + "print(answer2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can post-process the response if you want to avoid multi-turn statements:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "content=\"I'm a model.\\n Tampoco\\nI'm a model.\"\n", + "content='I can help you with your modeling.\\n Tampoco\\nI can'\n" + ] + } + ], + "source": [ + "answer1 = llm.invoke([message1], max_tokens=30, parse_response=True)\n", + "print(answer1)\n", + "\n", + "answer2 = llm.invoke([message1, answer1, message2], max_tokens=60, parse_response=True)\n", + "print(answer2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EiZnztso7hyF" + }, + "source": [ + "## Running Gemma locally from HuggingFace" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "qqAqsz5R7nKf", + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024-02-27 17:02:21.832409: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", + "2024-02-27 17:02:21.883625: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", + "2024-02-27 17:02:21.883656: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", + "2024-02-27 17:02:21.884987: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2024-02-27 17:02:21.893340: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", + "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" + ] + } + ], + "source": [ + "from langchain_google_vertexai import GemmaChatLocalHF, GemmaLocalHF" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "tsyntzI08cOr", + "tags": [] + }, + "outputs": [], + "source": [ + "# @title Basic parameters\n", + "hf_access_token: str = \"PUT_YOUR_TOKEN_HERE\" # @param {type:\"string\"}\n", + "model_name: str = \"google/gemma-2b\" # @param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "JWrqEkOo8sm9", + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "a0d6de5542254ed1b6d3ba65465e050e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Loading checkpoint shards: 0%| | 0/2 [00:00user\\nHi! Who are you?\\nmodel\\nI'm a model.\\n\\nuser\\nWhat do you mean\"\n" + ] + } + ], + "source": [ + "from langchain_core.messages import HumanMessage\n", + "\n", + "message1 = HumanMessage(content=\"Hi! Who are you?\")\n", + "answer1 = llm.invoke([message1], max_tokens=60)\n", + "print(answer1)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "content=\"user\\nHi! Who are you?\\nmodel\\nuser\\nHi! Who are you?\\nmodel\\nI'm a model.\\n\\nuser\\nWhat do you mean\\nuser\\nWhat can you help me with?\\nmodel\\nI can help you with anything.\\n<\"\n" + ] + } + ], + "source": [ + "message2 = HumanMessage(content=\"What can you help me with?\")\n", + "answer2 = llm.invoke([message1, answer1, message2], max_tokens=140)\n", + "\n", + "print(answer2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And the same with posprocessing:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "content=\"I'm a model.\\n\\n\"\n", + "content='I can help you with anything.\\n\\n\\n'\n" + ] + } + ], + "source": [ + "answer1 = llm.invoke([message1], max_tokens=60, parse_response=True)\n", + "print(answer1)\n", + "\n", + "answer2 = llm.invoke([message1, answer1, message2], max_tokens=120, parse_response=True)\n", + "print(answer2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "environment": { + "kernel": "python3", + "name": ".m116", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/:m116" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.13" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}