community[patch]: Update UpTrain Callback Handler to support the new UpTrain evaluation schema (#21656)

UpTrain has a new dashboard now that makes it easier to view projects and evaluations. Using this requires specifying both project_name and evaluation_name when performing evaluations. I have updated the code to support it.
2025-12-24 08:24:14 +00:00 · 2024-05-21 05:36:00 +05:30
parent c0e3c3a350
commit d4359d3de6
2 changed files with 96 additions and 68 deletions
--- a/docs/docs/integrations/callbacks/uptrain.ipynb
+++ b/docs/docs/integrations/callbacks/uptrain.ipynb
@@ -58,7 +58,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
@@ -100,7 +100,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -131,7 +131,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -148,7 +148,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -165,7 +165,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -183,7 +183,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -194,55 +194,69 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Set the openai API key\n",
-    "This key is required to perform the evaluations. UpTrain uses the GPT models to evaluate the responses generated by the LLM."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 28,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "OPENAI_API_KEY = getpass()"
+    "## Setup\n",
+    "\n",
+    "UpTrain provides you with:\n",
+    "1. Dashboards with advanced drill-down and filtering options\n",
+    "1. Insights and common topics among failing cases\n",
+    "1. Observability and real-time monitoring of production data\n",
+    "1. Regression testing via seamless integration with your CI/CD pipelines\n",
+    "\n",
+    "You can choose between the following options for evaluating using UpTrain:\n",
+    "### 1. **UpTrain's Open-Source Software (OSS)**: \n",
+    "You can use the open-source evaluation service to evaluate your model. In this case, you will need to provie an OpenAI API key. UpTrain uses the GPT models to evaluate the responses generated by the LLM. You can get yours [here](https://platform.openai.com/account/api-keys).\n",
+    "\n",
+    "In order to view your evaluations in the UpTrain dashboard, you will need to set it up by running the following commands in your terminal:\n",
+    "\n",
+    "```bash\n",
+    "git clone https://github.com/uptrain-ai/uptrain\n",
+    "cd uptrain\n",
+    "bash run_uptrain.sh\n",
+    "```\n",
+    "\n",
+    "This will start the UpTrain dashboard on your local machine. You can access it at `http://localhost:3000/dashboard`.\n",
+    "\n",
+    "Parameters:\n",
+    "- key_type=\"openai\"\n",
+    "- api_key=\"OPENAI_API_KEY\"\n",
+    "- project_name=\"PROJECT_NAME\"\n",
+    "\n",
+    "\n",
+    "### 2. **UpTrain Managed Service and Dashboards**:\n",
+    "Alternatively, you can use UpTrain's managed service to evaluate your model. You can create a free UpTrain account [here](https://uptrain.ai/) and get free trial credits. If you want more trial credits, [book a call with the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).\n",
+    "\n",
+    "The benefits of using the managed service are:\n",
+    "1. No need to set up the UpTrain dashboard on your local machine.\n",
+    "1. Access to many LLMs without needing their API keys.\n",
+    "\n",
+    "Once you perform the evaluations, you can view them in the UpTrain dashboard at `https://dashboard.uptrain.ai/dashboard`\n",
+    "\n",
+    "Parameters:\n",
+    "- key_type=\"uptrain\"\n",
+    "- api_key=\"UPTRAIN_API_KEY\"\n",
+    "- project_name=\"PROJECT_NAME\"\n",
+    "\n",
+    "\n",
+    "**Note:** The `project_name` will be the project name under which the evaluations performed will be shown in the UpTrain dashboard."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Setup\n",
+    "## Set the API key\n",
    "\n",
-    "For each of the retrievers below, it is better to define the callback handler again to avoid interference. You can choose between the following options for evaluating using UpTrain:\n",
-    "\n",
-    "### 1. **UpTrain's Open-Source Software (OSS)**: \n",
-    "You can use the open-source evaluation service to evaluate your model.\n",
-    "In this case, you will need to provie an OpenAI API key. You can get yours [here](https://platform.openai.com/account/api-keys).\n",
-    "\n",
-    "Parameters:\n",
-    "- key_type=\"openai\"\n",
-    "- api_key=\"OPENAI_API_KEY\"\n",
-    "- project_name_prefix=\"PROJECT_NAME_PREFIX\"\n",
-    "\n",
-    "\n",
-    "### 2. **UpTrain Managed Service and Dashboards**: \n",
-    "You can create a free UpTrain account [here](https://uptrain.ai/) and get free trial credits. If you want more trial credits, [book a call with the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).\n",
-    "\n",
-    "UpTrain Managed service provides:\n",
-    "1. Dashboards with advanced drill-down and filtering options\n",
-    "1. Insights and common topics among failing cases\n",
-    "1. Observability and real-time monitoring of production data\n",
-    "1. Regression testing via seamless integration with your CI/CD pipelines\n",
-    "\n",
-    "The notebook contains some screenshots of the dashboards and the insights that you can get from the UpTrain managed service.\n",
-    "\n",
-    "Parameters:\n",
-    "- key_type=\"uptrain\"\n",
-    "- api_key=\"UPTRAIN_API_KEY\"\n",
-    "- project_name_prefix=\"PROJECT_NAME_PREFIX\"\n",
-    "\n",
-    "\n",
-    "**Note:** The `project_name_prefix` will be used as prefix for the project names in the UpTrain dashboard. These will be different for different types of evals. For example, if you set project_name_prefix=\"langchain\" and perform the multi_query evaluation, the project name will be \"langchain_multi_query\"."
+    "The notebook will prompt you to enter the API key. You can choose between the OpenAI API key or the UpTrain API key by changing the `key_type` parameter in the cell below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "KEY_TYPE = \"openai\"  # or \"uptrain\"\n",
+    "API_KEY = getpass()"
   ]
  },
  {
@@ -264,7 +278,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
@@ -306,7 +320,7 @@
    ")\n",
    "\n",
    "# Create the uptrain callback handler\n",
-    "uptrain_callback = UpTrainCallbackHandler(key_type=\"openai\", api_key=OPENAI_API_KEY)\n",
+    "uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)\n",
    "config = {\"callbacks\": [uptrain_callback]}\n",
    "\n",
    "# Invoke the chain with a query\n",
@@ -328,7 +342,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
@@ -380,7 +394,7 @@
    "multi_query_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)\n",
    "\n",
    "# Create the uptrain callback\n",
-    "uptrain_callback = UpTrainCallbackHandler(key_type=\"openai\", api_key=OPENAI_API_KEY)\n",
+    "uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)\n",
    "config = {\"callbacks\": [uptrain_callback]}\n",
    "\n",
    "# Create the RAG prompt\n",
@@ -415,7 +429,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
@@ -470,13 +484,24 @@
    "chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)\n",
    "\n",
    "# Create the uptrain callback\n",
-    "uptrain_callback = UpTrainCallbackHandler(key_type=\"openai\", api_key=OPENAI_API_KEY)\n",
+    "uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)\n",
    "config = {\"callbacks\": [uptrain_callback]}\n",
    "\n",
    "# Invoke the chain with a query\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = chain.invoke(query, config=config)"
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# UpTrain's Dashboard and Insights\n",
+    "\n",
+    "Here's a short video showcasing the dashboard and the insights:\n",
+    "\n",
+    "![langchain_uptrain.gif](https://uptrain-assets.s3.ap-south-1.amazonaws.com/images/langchain/langchain_uptrain.gif)"
+   ]
  }
 ],
 "metadata": {
--- a/libs/community/langchain_community/callbacks/uptrain_callback.py
+++ b/libs/community/langchain_community/callbacks/uptrain_callback.py
@@ -83,10 +83,10 @@ class UpTrainDataSchema:
    """The UpTrain data schema for tracking evaluation results.

    Args:
-        project_name_prefix (str): Prefix for the project name.
+        project_name (str): The project name to be shown in UpTrain dashboard.

    Attributes:
-        project_name_prefix (str): Prefix for the project name.
+        project_name (str): The project name to be shown in UpTrain dashboard.
        uptrain_results (DefaultDict[str, Any]): Dictionary to store evaluation results.
        eval_types (Set[str]): Set to store the types of evaluations.
        query (str): Query for the RAG evaluation.
@@ -101,10 +101,10 @@ class UpTrainDataSchema:

    """

-    def __init__(self, project_name_prefix: str) -> None:
+    def __init__(self, project_name: str) -> None:
        """Initialize the UpTrain data schema."""
        # For tracking project name and results
-        self.project_name_prefix: str = project_name_prefix
+        self.project_name: str = project_name
        self.uptrain_results: DefaultDict[str, Any] = defaultdict(list)

        # For tracking event types
@@ -130,7 +130,7 @@ class UpTrainCallbackHandler(BaseCallbackHandler):
    """Callback Handler that logs evaluation results to uptrain and the console.

    Args:
-        project_name_prefix (str): Prefix for the project name.
+        project_name (str): The project name to be shown in UpTrain dashboard.
        key_type (str): Type of key to use. Must be 'uptrain' or 'openai'.
        api_key (str): API key for the UpTrain or OpenAI API.
        (This key is required to perform evaluations using GPT.)
@@ -144,7 +144,7 @@ class UpTrainCallbackHandler(BaseCallbackHandler):
    def __init__(
        self,
        *,
-        project_name_prefix: str = "langchain",
+        project_name: str = "langchain",
        key_type: str = "openai",
        api_key: str = "sk-****************",  # The API key to use for evaluation
        model: str = "gpt-3.5-turbo",  # The model to use for evaluation
@@ -158,7 +158,7 @@ class UpTrainCallbackHandler(BaseCallbackHandler):
        self.log_results = log_results

        # Set uptrain variables
-        self.schema = UpTrainDataSchema(project_name_prefix=project_name_prefix)
+        self.schema = UpTrainDataSchema(project_name=project_name)
        self.first_score_printed_flag = False

        if key_type == "uptrain":
@@ -166,7 +166,7 @@ class UpTrainCallbackHandler(BaseCallbackHandler):
            self.uptrain_client = uptrain.APIClient(settings=settings)
        elif key_type == "openai":
            settings = uptrain.Settings(
-                openai_api_key=api_key, evaluate_locally=False, model=model
+                openai_api_key=api_key, evaluate_locally=True, model=model
            )
            self.uptrain_client = uptrain.EvalLLM(settings=settings)
        else:
@@ -174,23 +174,26 @@ class UpTrainCallbackHandler(BaseCallbackHandler):

    def uptrain_evaluate(
        self,
-        project_name: str,
+        evaluation_name: str,
        data: List[Dict[str, Any]],
        checks: List[str],
    ) -> None:
        """Run an evaluation on the UpTrain server using UpTrain client."""
        if self.uptrain_client.__class__.__name__ == "APIClient":
            uptrain_result = self.uptrain_client.log_and_evaluate(
-                project_name=project_name,
+                project_name=self.schema.project_name,
+                evaluation_name=evaluation_name,
                data=data,
                checks=checks,
            )
        else:
            uptrain_result = self.uptrain_client.evaluate(
+                project_name=self.schema.project_name,
+                evaluation_name=evaluation_name,
                data=data,
                checks=checks,
            )
-        self.schema.uptrain_results[project_name].append(uptrain_result)
+        self.schema.uptrain_results[self.schema.project_name].append(uptrain_result)

        score_name_map = {
            "score_context_relevance": "Context Relevance Score",
@@ -258,7 +261,7 @@ class UpTrainCallbackHandler(BaseCallbackHandler):
            ]

            self.uptrain_evaluate(
-                project_name=f"{self.schema.project_name_prefix}_rag",
+                evaluation_name="rag",
                data=data,
                checks=[
                    uptrain.Evals.CONTEXT_RELEVANCE,
@@ -340,7 +343,7 @@ class UpTrainCallbackHandler(BaseCallbackHandler):
            ]

            self.uptrain_evaluate(
-                project_name=f"{self.schema.project_name_prefix}_multi_query",
+                evaluation_name="multi_query",
                data=data,
                checks=[uptrain.Evals.MULTI_QUERY_ACCURACY],
            )
@@ -372,7 +375,7 @@ class UpTrainCallbackHandler(BaseCallbackHandler):
                    }
                ]
                self.uptrain_evaluate(
-                    project_name=f"{self.schema.project_name_prefix}_context_reranking",
+                    evaluation_name="context_reranking",
                    data=data,
                    checks=[
                        uptrain.Evals.CONTEXT_CONCISENESS,