langchain/docs/use_cases/evaluation/evaluating_traced_examples.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1a4596ea-a631-416d-a2a4-3577c140493d",
   "metadata": {},
   "source": [
    "# Running Chains on Traced Datasets\n",
    "\n",
    "Developing applications with language models can be uniquely challenging. To manage this complexity and ensure reliable performance, LangChain provides tracing and evaluation functionality through . This notebook demonstrates how to run Chains, which are language model functions, on previously captured datasets or traces. Some common use cases for this approach include:\n",
    "\n",
    "- Running an evaluation chain to grade previous runs.\n",
    "- Comparing different chains, LLMs, and agents on traced datasets.\n",
    "- Executing a stochastic chain multiple times over a dataset to generate metrics before deployment.\n",
    "\n",
    "Please note that this notebook assumes you have LangChain+ tracing running in the background. It is also configured to work only with the V2 endpoints. To set it up, follow the [tracing directions here](..\\/..\\/tracing\\/local_installation.md).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "904db9a5-f387-4a57-914c-c8af8d39e249",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.client import LangChainPlusClient\n",
    "\n",
    "client = LangChainPlusClient(\n",
    "    api_url=\"http://localhost:8000\",\n",
    "    api_key=None,\n",
    "    # tenant_id=\"your_tenant_uuid\",  # This is required when connecting to a hosted LangChain instance\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db79dea2-fbaa-4c12-9083-f6154b51e2d3",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Seed an example dataset\n",
    "\n",
    "If you have been using LangChainPlus already, you may have datasets available. To view all saved datasets, run:\n",
    "\n",
    "```\n",
    "datasets = client.list_datasets()\n",
    "datasets\n",
    "```\n",
    "Datasets can be created in a number of ways, most often by collecting `Run`'s captured through the LangChain tracing API.\n",
    "\n",
    "However, this notebook assumes you're running locally for the first time, so we'll start by uploading an example evaluation dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "1baa677c-5642-4378-8e01-3aa1647f19d6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# !pip install datasets > /dev/null\n",
    "# !pip install pandas > /dev/null"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "60d14593-c61f-449f-a38f-772ca43707c2",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Found cached dataset json (/Users/wfh/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--agent-search-calculator-8a025c0ce5fb99d2/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "60be643da25c47c6aa729b37046a2e64",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>input</th>\n",
       "      <th>output</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>How many people live in canada as of 2023?</td>\n",
       "      <td>approximately 38,625,801</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>who is dua lipa's boyfriend? what is his age r...</td>\n",
       "      <td>her boyfriend is Romain Gravas. his age raised...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>what is dua lipa's boyfriend age raised to the...</td>\n",
       "      <td>her boyfriend is Romain Gravas. his age raised...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>how far is it from paris to boston in miles</td>\n",
       "      <td>approximately 3,435 mi</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>what was the total number of points scored in ...</td>\n",
       "      <td>approximately 2.682651500990882</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               input  \\\n",
       "0         How many people live in canada as of 2023?   \n",
       "1  who is dua lipa's boyfriend? what is his age r...   \n",
       "2  what is dua lipa's boyfriend age raised to the...   \n",
       "3        how far is it from paris to boston in miles   \n",
       "4  what was the total number of points scored in ...   \n",
       "\n",
       "                                              output  \n",
       "0                           approximately 38,625,801  \n",
       "1  her boyfriend is Romain Gravas. his age raised...  \n",
       "2  her boyfriend is Romain Gravas. his age raised...  \n",
       "3                             approximately 3,435 mi  \n",
       "4                    approximately 2.682651500990882  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from langchain.evaluation.loading import load_dataset\n",
    "\n",
    "dataset = load_dataset(\"agent-search-calculator\")\n",
    "df = pd.DataFrame(dataset, columns=[\"question\", \"answer\"])\n",
    "df.columns = [\"input\", \"output\"] # The chain we want to evaluate below expects inputs with the \"input\" key \n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "f27e45f1-e299-4de8-a538-ee1272ac5024",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "dataset_name = f\"calculator-example-dataset\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "52a7ea76-79ca-4765-abf7-231e884040d6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "if dataset_name not in set([dataset.name for dataset in client.list_datasets()]):\n",
    "    dataset = client.upload_dataframe(df, \n",
    "                            name=dataset_name,\n",
    "                            description=\"Acalculator example dataset\",\n",
    "                            input_keys=[\"input\"],\n",
    "                            output_keys=[\"output\"],\n",
    "                   )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07885b10",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Running a Chain on a Traced Dataset\n",
    "\n",
    "Once you have a dataset, you can run a chain over it to see its results. The run traces will automatically be associated with the dataset for easy attribution and analysis.\n",
    "\n",
    "**First, we'll define the chain we wish to run over the dataset.**\n",
    "\n",
    "In this case, we're using an agent, but it can be any simple chain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c2b59104-b90e-466a-b7ea-c5bd0194263b",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.agents import initialize_agent, load_tools\n",
    "from langchain.agents import AgentType\n",
    "\n",
    "llm = ChatOpenAI(temperature=0)\n",
    "tools = load_tools(['serpapi', 'llm-math'], llm=llm)\n",
    "agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84094a4a-1d76-461c-bc37-8c537939b466",
   "metadata": {},
   "source": [
    "**Now we're ready to run the chain!**\n",
    "\n",
    "The docstring below hints out ways you can configure the method to run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "112d7bdf-7e50-4c1a-9285-5bac8473f2ee",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\u001b[0;31mSignature:\u001b[0m\n",
       "\u001b[0mclient\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marun_on_dataset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mdataset_name\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'str'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mllm_or_chain\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'Union[Chain, BaseLanguageModel]'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mnum_workers\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'int'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mnum_repetitions\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'int'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0msession_name\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'Optional[str]'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mverbose\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'bool'\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0;34m'Dict[str, Any]'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
       "\u001b[0;31mDocstring:\u001b[0m\n",
       "Run the chain on a dataset and store traces to the specified session name.\n",
       "\n",
       "Args:\n",
       "    dataset_name: Name of the dataset to run the chain on.\n",
       "    llm_or_chain: Chain or language model to run over the dataset.\n",
       "    num_workers: Number of async workers to run in parallel.\n",
       "    num_repetitions: Number of times to run the model on each example.\n",
       "        This is useful when testing success rates or generating confidence\n",
       "        intervals.\n",
       "    session_name: Name of the session to store the traces in.\n",
       "        Defaults to {dataset_name}-{chain class name}-{datetime}.\n",
       "    verbose: Whether to print progress.\n",
       "\n",
       "Returns:\n",
       "    A dictionary mapping example ids to the model outputs.\n",
       "\u001b[0;31mFile:\u001b[0m      ~/code/lc/lckg/langchain/client/langchain.py\n",
       "\u001b[0;31mType:\u001b[0m      method"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "?client.arun_on_dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a8088b7d-3ab6-4279-94c8-5116fe7cee33",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/wfh/code/lc/lckg/langchain/callbacks/manager.py:65: UserWarning: The experimental tracing v2 is in development. This is not yet stable and may change in the future.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processed examples: 1\r"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Chain failed for example 4aca2037-5393-4357-9bf9-ef7451b8f0e3. Error: unknown format from LLM: Assuming we don't have any information about the actual number of points scored in the 2023 super bowl, we cannot provide a mathematical expression to solve this problem.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processed examples: 2\r"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Chain failed for example b0adfea5-1212-4cee-9206-11bd69ccfcc5. Error: 'age'. Please try again with a valid numerical expression\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processed examples: 5\r"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Chain failed for example 7e5873e4-ed38-428b-909f-f8c0fb80bab9. Error: invalid syntax. Perhaps you forgot a comma? (<expr>, line 1). Please try again with a valid numerical expression\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processed examples: 10\r"
     ]
    }
   ],
   "source": [
    "chain_results = await client.arun_on_dataset(\n",
    "    dataset_name=dataset_name,\n",
    "    llm_or_chain=agent,\n",
    "    verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2737458-b20c-4288-8790-1f4a8d237b2a",
   "metadata": {},
   "source": [
    "## Reviewing the Chain Results\n",
    "\n",
    "The method called above returns a dictionary mapping Example IDs to the output of the chain.\n",
    "You can directly inspect the results below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "136db492-d6ca-4215-96f9-439c23538241",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<a href=\"http://localhost\", target=\"_blank\" rel=\"noopener\">LangChain+ Client</a>"
      ],
      "text/plain": [
       "LangChainPlusClient (API URL: http://localhost:8000)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# You can navigate to the UI by clicking on the link below\n",
    "client"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c70cceb5-aa53-4851-bb12-386f092191f9",
   "metadata": {},
   "source": [
    "### Running a Chat Model over a Traced Dataset\n",
    "\n",
    "We've shown how to run a _chain_ over a dataset, but you can also run an LLM or Chat model over a datasets formed from runs.\n",
    "\n",
    "First, we'll show an example using a ChatModel. This is useful for things like:\n",
    "- Comparing results under different decoding parameters\n",
    "- Comparing model providers\n",
    "- Testing for regressions in model behavior\n",
    "- Running multiple times with a temperature to gauge stability"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "64490d7c-9a18-49ed-a3ac-36049c522cb4",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Found cached dataset parquet (/Users/wfh/.cache/huggingface/datasets/LangChainDatasets___parquet/LangChainDatasets--two-player-dnd-2e84407830cdedfc/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "35a85ea78f934758a5f8b16af0d1944a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>prompts</th>\n",
       "      <th>generations</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[System: Here is the topic for a Dungeons &amp; Dr...</td>\n",
       "      <td>[[{'generation_info': None, 'message': {'conte...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[System: Here is the topic for a Dungeons &amp; Dr...</td>\n",
       "      <td>[[{'generation_info': None, 'message': {'conte...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[System: Here is the topic for a Dungeons &amp; Dr...</td>\n",
       "      <td>[[{'generation_info': None, 'message': {'conte...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[System: Here is the topic for a Dungeons &amp; Dr...</td>\n",
       "      <td>[[{'generation_info': None, 'message': {'conte...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[System: Here is the topic for a Dungeons &amp; Dr...</td>\n",
       "      <td>[[{'generation_info': None, 'message': {'conte...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             prompts  \\\n",
       "0  [System: Here is the topic for a Dungeons & Dr...   \n",
       "1  [System: Here is the topic for a Dungeons & Dr...   \n",
       "2  [System: Here is the topic for a Dungeons & Dr...   \n",
       "3  [System: Here is the topic for a Dungeons & Dr...   \n",
       "4  [System: Here is the topic for a Dungeons & Dr...   \n",
       "\n",
       "                                         generations  \n",
       "0  [[{'generation_info': None, 'message': {'conte...  \n",
       "1  [[{'generation_info': None, 'message': {'conte...  \n",
       "2  [[{'generation_info': None, 'message': {'conte...  \n",
       "3  [[{'generation_info': None, 'message': {'conte...  \n",
       "4  [[{'generation_info': None, 'message': {'conte...  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chat_dataset = load_dataset(\"two-player-dnd\")\n",
    "chat_df = pd.DataFrame(chat_dataset)\n",
    "chat_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "348acd86-a927-4d60-8d52-02e64585e4fc",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "chat_dataset_name = \"two-player-dnd\"\n",
    "\n",
    "if chat_dataset_name not in set([dataset.name for dataset in client.list_datasets()]):\n",
    "    client.upload_dataframe(chat_df, \n",
    "                            name=chat_dataset_name,\n",
    "                            description=\"An example dataset traced from chat models in a multiagent bidding dialogue\",\n",
    "                            input_keys=[\"prompts\"],\n",
    "                            output_keys=[\"generations\"],\n",
    "                   )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "a69dd183-ad5e-473d-b631-db90706e837f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "\n",
    "chat_model = ChatOpenAI(temperature=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "063da2a9-3692-4b7b-8edb-e474824fe416",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/wfh/code/lc/lckg/langchain/callbacks/manager.py:65: UserWarning: The experimental tracing v2 is in development. This is not yet stable and may change in the future.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processed examples: 35\r"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 2a718b30703cc8ffef8de0bb48935fd6 in your message.).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processed examples: 36\r"
     ]
    }
   ],
   "source": [
    "chat_model_results = await client.arun_on_dataset(\n",
    "    dataset_name=chat_dataset_name,\n",
    "    llm_or_chain=chat_model,\n",
    "    num_workers=5, # Optional, sets the number of examples to run at a time\n",
    "    # session_name=\"Calculator Dataset Runs\", # Optional. Will be seed with a default session otherwise\n",
    "    verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de7bfe08-215c-4328-b9b0-631d9a41f0e8",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Reviewing the Chat Model Results\n",
    "\n",
    "You can once again review the latest runs by clicking on the link below and navigating to the \"two-player-dnd\" session."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "5b7a81f2-d19d-438b-a4bb-5678f746b965",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<a href=\"http://localhost\", target=\"_blank\" rel=\"noopener\">LangChain+ Client</a>"
      ],
      "text/plain": [
       "LangChainPlusClient (API URL: http://localhost:8000)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7896cbeb-345f-430b-ab5e-e108973174f8",
   "metadata": {},
   "source": [
    "## Running an LLM over a Traced Dataset\n",
    "\n",
    "You can run an LLM over a dataset in much the same way as the chain and chat models, provided the dataset you've captured is in the appropriate format. Again, we've cached one for you, but using application-specific traces will be much more useful for your use cases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "d6805d0b-4612-4671-bffb-e6978992bd40",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/wfh/code/lc/lckg/langchain/llms/anthropic.py:134: UserWarning: This Anthropic LLM is deprecated. Please use `from langchain.chat_models import ChatAnthropic` instead\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "from langchain.llms import Anthropic\n",
    "\n",
    "llm = Anthropic(temperature=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "5d7cb243-40c3-44dd-8158-a7b910441e9f",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Found cached dataset parquet (/Users/wfh/.cache/huggingface/datasets/LangChainDatasets___parquet/LangChainDatasets--state-of-the-union-completions-ae7542e7bbd0ae0a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c8628ee0ce5b4924be636250b18e9803",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>prompts</th>\n",
       "      <th>generations</th>\n",
       "      <th>ground_truth</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[Putin may circle Kyiv with tanks, but he will...</td>\n",
       "      <td>[[{'generation_info': {'finish_reason': 'stop'...</td>\n",
       "      <td>The pandemic has been punishing. \\n\\nAnd so ma...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[Madam Speaker, Madam Vice President, our Firs...</td>\n",
       "      <td>[[]]</td>\n",
       "      <td>With a duty to one another to the American peo...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[With a duty to one another to the American pe...</td>\n",
       "      <td>[[{'generation_info': {'finish_reason': 'stop'...</td>\n",
       "      <td>He thought he could roll into Ukraine and the ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[Madam Speaker, Madam Vice President, our Firs...</td>\n",
       "      <td>[[{'generation_info': {'finish_reason': 'lengt...</td>\n",
       "      <td>With a duty to one another to the American peo...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[Please rise if you are able and show that, Ye...</td>\n",
       "      <td>[[]]</td>\n",
       "      <td>And the costs and the threats to America and t...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             prompts  \\\n",
       "0  [Putin may circle Kyiv with tanks, but he will...   \n",
       "1  [Madam Speaker, Madam Vice President, our Firs...   \n",
       "2  [With a duty to one another to the American pe...   \n",
       "3  [Madam Speaker, Madam Vice President, our Firs...   \n",
       "4  [Please rise if you are able and show that, Ye...   \n",
       "\n",
       "                                         generations  \\\n",
       "0  [[{'generation_info': {'finish_reason': 'stop'...   \n",
       "1                                               [[]]   \n",
       "2  [[{'generation_info': {'finish_reason': 'stop'...   \n",
       "3  [[{'generation_info': {'finish_reason': 'lengt...   \n",
       "4                                               [[]]   \n",
       "\n",
       "                                        ground_truth  \n",
       "0  The pandemic has been punishing. \\n\\nAnd so ma...  \n",
       "1  With a duty to one another to the American peo...  \n",
       "2  He thought he could roll into Ukraine and the ...  \n",
       "3  With a duty to one another to the American peo...  \n",
       "4  And the costs and the threats to America and t...  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "completions_dataset = load_dataset(\"state-of-the-union-completions\")\n",
    "completions_df = pd.DataFrame(completions_dataset)\n",
    "completions_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "c7dcc1b2-7aef-44c0-ba0f-c812279099a5",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "completions_dataset_name = \"state-of-the-union-completions\"\n",
    "\n",
    "if completions_dataset_name not in set([dataset.name for dataset in client.list_datasets()]):\n",
    "    client.upload_dataframe(completions_df, \n",
    "                            name=completions_dataset_name,\n",
    "                            description=\"An example dataset traced from completion endpoints over the state of the union address\",\n",
    "                            input_keys=[\"prompts\"],\n",
    "                            output_keys=[\"generations\"],\n",
    "                   )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "02a0b79d-4867-45b2-b43d-662e5dcefb0f",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/wfh/code/lc/lckg/langchain/llms/anthropic.py:134: UserWarning: This Anthropic LLM is deprecated. Please use `from langchain.chat_models import ChatAnthropic` instead\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "from langchain.llms import Anthropic\n",
    "\n",
    "\n",
    "llm = Anthropic(temperature=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "e946138e-bf7c-43d7-861d-9c5740c933fa",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/wfh/code/lc/lckg/langchain/callbacks/manager.py:65: UserWarning: The experimental tracing v2 is in development. This is not yet stable and may change in the future.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processed examples: 55\r"
     ]
    }
   ],
   "source": [
    "completions_model_results = await client.arun_on_dataset(\n",
    "    dataset_name=completions_dataset_name,\n",
    "    llm_or_chain=llm,\n",
    "    num_workers=2, # Optional, sets the number of examples to run at a time\n",
    "    num_repetitions=1, # Increasing this will run the model multiple times per example\n",
    "    verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc86e8e6-cee2-429e-942b-289284d14816",
   "metadata": {},
   "source": [
    "## Reviewing the LLM Results\n",
    "\n",
    "You can once again inspect the latest runs by clicking on the link below and navigating to the \"two-player-dnd\" session."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "2bf96f17-74c1-4f7d-8458-ae5ab5c6bd36",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<a href=\"http://localhost\", target=\"_blank\" rel=\"noopener\">LangChain+ Client</a>"
      ],
      "text/plain": [
       "LangChainPlusClient (API URL: http://localhost:8000)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bbafa5a5-a278-42c0-a24f-191cbcb41156",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}