mirror of
				https://github.com/hwchase17/langchain.git
				synced 2025-11-04 02:03:32 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			161 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			161 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
{
 | 
						|
 "cells": [
 | 
						|
  {
 | 
						|
   "cell_type": "markdown",
 | 
						|
   "id": "a175c650",
 | 
						|
   "metadata": {},
 | 
						|
   "source": [
 | 
						|
    "# Benchmarking Template\n",
 | 
						|
    "\n",
 | 
						|
    "This is an example notebook that can be used to create a benchmarking notebook for a task of your choice. Evaluation is really hard, and so we greatly welcome any contributions that can make it easier for people to experiment"
 | 
						|
   ]
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "markdown",
 | 
						|
   "id": "984169ca",
 | 
						|
   "metadata": {},
 | 
						|
   "source": [
 | 
						|
    "It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."
 | 
						|
   ]
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "code",
 | 
						|
   "execution_count": 28,
 | 
						|
   "id": "9fe4d1b4",
 | 
						|
   "metadata": {},
 | 
						|
   "outputs": [],
 | 
						|
   "source": [
 | 
						|
    "# Comment this out if you are NOT using tracing\n",
 | 
						|
    "import os\n",
 | 
						|
    "os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
 | 
						|
   ]
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "markdown",
 | 
						|
   "id": "0f66405e",
 | 
						|
   "metadata": {},
 | 
						|
   "source": [
 | 
						|
    "## Loading the data\n",
 | 
						|
    "\n",
 | 
						|
    "First, let's load the data."
 | 
						|
   ]
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "code",
 | 
						|
   "execution_count": null,
 | 
						|
   "id": "79402a8f",
 | 
						|
   "metadata": {},
 | 
						|
   "outputs": [],
 | 
						|
   "source": [
 | 
						|
    "# This notebook should so how to load the dataset from LangChainDatasets on Hugging Face\n",
 | 
						|
    "\n",
 | 
						|
    "# Please upload your dataset to https://huggingface.co/LangChainDatasets\n",
 | 
						|
    "\n",
 | 
						|
    "# The value passed into `load_dataset` should NOT have the `LangChainDatasets/` prefix\n",
 | 
						|
    "from langchain.evaluation.loading import load_dataset\n",
 | 
						|
    "dataset = load_dataset(\"TODO\")"
 | 
						|
   ]
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "markdown",
 | 
						|
   "id": "8a16b75d",
 | 
						|
   "metadata": {},
 | 
						|
   "source": [
 | 
						|
    "## Setting up a chain\n",
 | 
						|
    "\n",
 | 
						|
    "This next section should have an example of setting up a chain that can be run on this dataset."
 | 
						|
   ]
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "code",
 | 
						|
   "execution_count": null,
 | 
						|
   "id": "a2661ce0",
 | 
						|
   "metadata": {},
 | 
						|
   "outputs": [],
 | 
						|
   "source": []
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "markdown",
 | 
						|
   "id": "6c0062e7",
 | 
						|
   "metadata": {},
 | 
						|
   "source": [
 | 
						|
    "## Make a prediction\n",
 | 
						|
    "\n",
 | 
						|
    "First, we can make predictions one datapoint at a time. Doing it at this level of granularity allows use to explore the outputs in detail, and also is a lot cheaper than running over multiple datapoints"
 | 
						|
   ]
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "code",
 | 
						|
   "execution_count": 1,
 | 
						|
   "id": "d28c5e7d",
 | 
						|
   "metadata": {},
 | 
						|
   "outputs": [],
 | 
						|
   "source": [
 | 
						|
    "# Example of running the chain on a single datapoint (`dataset[0]`) goes here"
 | 
						|
   ]
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "markdown",
 | 
						|
   "id": "d0c16cd7",
 | 
						|
   "metadata": {},
 | 
						|
   "source": [
 | 
						|
    "## Make many predictions\n",
 | 
						|
    "Now we can make predictions."
 | 
						|
   ]
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "code",
 | 
						|
   "execution_count": 2,
 | 
						|
   "id": "24b4c66e",
 | 
						|
   "metadata": {},
 | 
						|
   "outputs": [],
 | 
						|
   "source": [
 | 
						|
    "# Example of running the chain on many predictions goes here\n",
 | 
						|
    "\n",
 | 
						|
    "# Sometimes its as simple as `chain.apply(dataset)`\n",
 | 
						|
    "\n",
 | 
						|
    "# Othertimes you may want to write a for loop to catch errors"
 | 
						|
   ]
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "markdown",
 | 
						|
   "id": "4783344b",
 | 
						|
   "metadata": {},
 | 
						|
   "source": [
 | 
						|
    "## Evaluate performance\n",
 | 
						|
    "\n",
 | 
						|
    "Any guide to evaluating performance in a more systematic manner goes here."
 | 
						|
   ]
 | 
						|
  },
 | 
						|
  {
 | 
						|
   "cell_type": "code",
 | 
						|
   "execution_count": null,
 | 
						|
   "id": "7710401a",
 | 
						|
   "metadata": {},
 | 
						|
   "outputs": [],
 | 
						|
   "source": []
 | 
						|
  }
 | 
						|
 ],
 | 
						|
 "metadata": {
 | 
						|
  "kernelspec": {
 | 
						|
   "display_name": "Python 3 (ipykernel)",
 | 
						|
   "language": "python",
 | 
						|
   "name": "python3"
 | 
						|
  },
 | 
						|
  "language_info": {
 | 
						|
   "codemirror_mode": {
 | 
						|
    "name": "ipython",
 | 
						|
    "version": 3
 | 
						|
   },
 | 
						|
   "file_extension": ".py",
 | 
						|
   "mimetype": "text/x-python",
 | 
						|
   "name": "python",
 | 
						|
   "nbconvert_exporter": "python",
 | 
						|
   "pygments_lexer": "ipython3",
 | 
						|
   "version": "3.9.1"
 | 
						|
  }
 | 
						|
 },
 | 
						|
 "nbformat": 4,
 | 
						|
 "nbformat_minor": 5
 | 
						|
}
 |