mirror of
				https://github.com/hwchase17/langchain.git
				synced 2025-10-31 16:08:59 +00:00 
			
		
		
		
	This PR introduces [Label Studio](https://labelstud.io/) integration with LangChain via `LabelStudioCallbackHandler`: - sending data to the Label Studio instance - labeling dataset for supervised LLM finetuning - rating model responses - tracking and displaying chat history - support for custom data labeling workflow ### Example ``` chat_llm = ChatOpenAI(callbacks=[LabelStudioCallbackHandler(mode="chat")]) chat_llm([ SystemMessage(content="Always use emojis in your responses."), HumanMessage(content="Hey AI, how's your day going?"), AIMessage(content="🤖 I don't have feelings, but I'm running smoothly! How can I help you today?"), HumanMessage(content="I'm feeling a bit down. Any advice?"), AIMessage(content="🤗 I'm sorry to hear that. Remember, it's okay to seek help or talk to someone if you need to. 💬"), HumanMessage(content="Can you tell me a joke to lighten the mood?"), AIMessage(content="Of course! 🎭 Why did the scarecrow win an award? Because he was outstanding in his field! 🌾"), HumanMessage(content="Haha, that was a good one! Thanks for cheering me up."), AIMessage(content="Always here to help! 😊 If you need anything else, just let me know."), HumanMessage(content="Will do! By the way, can you recommend a good movie?"), ]) ``` <img width="906" alt="image" src="https://github.com/langchain-ai/langchain/assets/6087484/0a1cf559-0bd3-4250-ad96-6e71dbb1d2f3"> ### Dependencies - [label-studio](https://pypi.org/project/label-studio/) - [label-studio-sdk](https://pypi.org/project/label-studio-sdk/) https://twitter.com/labelstudiohq --------- Co-authored-by: nik <nik@heartex.net>
		
			
				
	
	
		
			383 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			383 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| {
 | |
|  "cells": [
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "collapsed": true,
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "# Label Studio\n",
 | |
|     "\n",
 | |
|     "<div>\n",
 | |
|     "<img src=\"https://labelstudio-pub.s3.amazonaws.com/lc/open-source-data-labeling-platform.png\" width=\"400\"/>\n",
 | |
|     "</div>\n",
 | |
|     "\n",
 | |
|     "Label Studio is an open-source data labeling platform that provides LangChain with flexibility when it comes to labeling data for fine-tuning large language models (LLMs). It also enables the preparation of custom training data and the collection and evaluation of responses through human feedback.\n",
 | |
|     "\n",
 | |
|     "In this guide, you will learn how to connect a LangChain pipeline to Label Studio to:\n",
 | |
|     "\n",
 | |
|     "- Aggregate all input prompts, conversations, and responses in a single LabelStudio project. This consolidates all the data in one place for easier labeling and analysis.\n",
 | |
|     "- Refine prompts and responses to create a dataset for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) scenarios. The labeled data can be used to further train the LLM to improve its performance.\n",
 | |
|     "- Evaluate model responses through human feedback. LabelStudio provides an interface for humans to review and provide feedback on model responses, allowing evaluation and iteration."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "## Installation and setup"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "First install latest versions of Label Studio and Label Studio API client:"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "!pip install -U label-studio label-studio-sdk openai"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "Next, run `label-studio` on the command line to start the local LabelStudio instance at `http://localhost:8080`. See the [Label Studio installation guide](https://labelstud.io/guide/install) for more options."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "You'll need a token to make API calls.\n",
 | |
|     "\n",
 | |
|     "Open your LabelStudio instance in your browser, go to `Account & Settings > Access Token` and copy the key.\n",
 | |
|     "\n",
 | |
|     "Set environment variables with your LabelStudio URL, API key and OpenAI API key:"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "import os\n",
 | |
|     "\n",
 | |
|     "os.environ['LABEL_STUDIO_URL'] = '<YOUR-LABEL-STUDIO-URL>'  # e.g. http://localhost:8080\n",
 | |
|     "os.environ['LABEL_STUDIO_API_KEY'] = '<YOUR-LABEL-STUDIO-API-KEY>'\n",
 | |
|     "os.environ['OPENAI_API_KEY'] = '<YOUR-OPENAI-API-KEY>'"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "## Collecting LLMs prompts and responses"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "The data used for labeling is stored in projects within Label Studio. Every project is identified by an XML configuration that details the specifications for input and output data. \n",
 | |
|     "\n",
 | |
|     "Create a project that takes human input in text format and outputs an editable LLM response in a text area:\n",
 | |
|     "\n",
 | |
|     "```xml\n",
 | |
|     "<View>\n",
 | |
|     "<Style>\n",
 | |
|     "    .prompt-box {\n",
 | |
|     "        background-color: white;\n",
 | |
|     "        border-radius: 10px;\n",
 | |
|     "        box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);\n",
 | |
|     "        padding: 20px;\n",
 | |
|     "    }\n",
 | |
|     "</Style>\n",
 | |
|     "<View className=\"root\">\n",
 | |
|     "    <View className=\"prompt-box\">\n",
 | |
|     "        <Text name=\"prompt\" value=\"$prompt\"/>\n",
 | |
|     "    </View>\n",
 | |
|     "    <TextArea name=\"response\" toName=\"prompt\"\n",
 | |
|     "              maxSubmissions=\"1\" editable=\"true\"\n",
 | |
|     "              required=\"true\"/>\n",
 | |
|     "</View>\n",
 | |
|     "<Header value=\"Rate the response:\"/>\n",
 | |
|     "<Rating name=\"rating\" toName=\"prompt\"/>\n",
 | |
|     "</View>\n",
 | |
|     "```\n",
 | |
|     "\n",
 | |
|     "1. To create a project in Label Studio, click on the \"Create\" button. \n",
 | |
|     "2. Enter a name for your project in the \"Project Name\" field, such as `My Project`.\n",
 | |
|     "3. Navigate to `Labeling Setup > Custom Template` and paste the XML configuration provided above."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "You can collect input LLM prompts and output responses in a LabelStudio project, connecting it via `LabelStudioCallbackHandler`:"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "from langchain.llms import OpenAI\n",
 | |
|     "from langchain.callbacks import LabelStudioCallbackHandler\n",
 | |
|     "\n",
 | |
|     "llm = OpenAI(\n",
 | |
|     "    temperature=0,\n",
 | |
|     "    callbacks=[\n",
 | |
|     "        LabelStudioCallbackHandler(\n",
 | |
|     "            project_name=\"My Project\"\n",
 | |
|     "    )]\n",
 | |
|     ")\n",
 | |
|     "print(llm(\"Tell me a joke\"))"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "In the Label Studio, open `My Project`. You will see the prompts, responses, and metadata like the model name. "
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "## Collecting Chat model Dialogues"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "You can also track and display full chat dialogues in LabelStudio, with the ability to rate and modify the last response:\n",
 | |
|     "\n",
 | |
|     "1. Open Label Studio and click on the \"Create\" button.\n",
 | |
|     "2. Enter a name for your project in the \"Project Name\" field, such as `New Project with Chat`.\n",
 | |
|     "3. Navigate to Labeling Setup > Custom Template and paste the following XML configuration:\n",
 | |
|     "\n",
 | |
|     "```xml\n",
 | |
|     "<View>\n",
 | |
|     "<View className=\"root\">\n",
 | |
|     "     <Paragraphs name=\"dialogue\"\n",
 | |
|     "               value=\"$prompt\"\n",
 | |
|     "               layout=\"dialogue\"\n",
 | |
|     "               textKey=\"content\"\n",
 | |
|     "               nameKey=\"role\"\n",
 | |
|     "               granularity=\"sentence\"/>\n",
 | |
|     "  <Header value=\"Final response:\"/>\n",
 | |
|     "    <TextArea name=\"response\" toName=\"dialogue\"\n",
 | |
|     "              maxSubmissions=\"1\" editable=\"true\"\n",
 | |
|     "              required=\"true\"/>\n",
 | |
|     "</View>\n",
 | |
|     "<Header value=\"Rate the response:\"/>\n",
 | |
|     "<Rating name=\"rating\" toName=\"dialogue\"/>\n",
 | |
|     "</View>\n",
 | |
|     "```"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "from langchain.chat_models import ChatOpenAI\n",
 | |
|     "from langchain.schema import HumanMessage, SystemMessage\n",
 | |
|     "from langchain.callbacks import LabelStudioCallbackHandler\n",
 | |
|     "\n",
 | |
|     "chat_llm = ChatOpenAI(callbacks=[\n",
 | |
|     "    LabelStudioCallbackHandler(\n",
 | |
|     "        mode=\"chat\",\n",
 | |
|     "        project_name=\"New Project with Chat\",\n",
 | |
|     "    )\n",
 | |
|     "])\n",
 | |
|     "llm_results = chat_llm([\n",
 | |
|     "    SystemMessage(content=\"Always use a lot of emojis\"),\n",
 | |
|     "    HumanMessage(content=\"Tell me a joke\")\n",
 | |
|     "])"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "In Label Studio, open \"New Project with Chat\". Click on a created task to view dialog history and edit/annotate responses."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "## Custom Labeling Configuration"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "You can modify the default labeling configuration in LabelStudio to add more target labels like response sentiment, relevance, and many [other types annotator's feedback](https://labelstud.io/tags/).\n",
 | |
|     "\n",
 | |
|     "New labeling configuration can be added from UI: go to `Settings > Labeling Interface` and set up a custom configuration with additional tags like `Choices` for sentiment or `Rating` for relevance. Keep in mind that [`TextArea` tag](https://labelstud.io/tags/textarea) should be presented in any configuration to display the LLM responses.\n",
 | |
|     "\n",
 | |
|     "Alternatively, you can specify the labeling configuration on the initial call before project creation:"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%%\n"
 | |
|     }
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "ls = LabelStudioCallbackHandler(project_config='''\n",
 | |
|     "<View>\n",
 | |
|     "<Text name=\"prompt\" value=\"$prompt\"/>\n",
 | |
|     "<TextArea name=\"response\" toName=\"prompt\"/>\n",
 | |
|     "<TextArea name=\"user_feedback\" toName=\"prompt\"/>\n",
 | |
|     "<Rating name=\"rating\" toName=\"prompt\"/>\n",
 | |
|     "<Choices name=\"sentiment\" toName=\"prompt\">\n",
 | |
|     "    <Choice value=\"Positive\"/>\n",
 | |
|     "    <Choice value=\"Negative\"/>\n",
 | |
|     "</Choices>\n",
 | |
|     "</View>\n",
 | |
|     "''')"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Note that if the project doesn't exist, it will be created with the specified labeling configuration."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "## Other parameters"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "pycharm": {
 | |
|      "name": "#%% md\n"
 | |
|     }
 | |
|    },
 | |
|    "source": [
 | |
|     "The `LabelStudioCallbackHandler` accepts several optional parameters:\n",
 | |
|     "\n",
 | |
|     "- **api_key** - Label Studio API key. Overrides environmental variable `LABEL_STUDIO_API_KEY`.\n",
 | |
|     "- **url** - Label Studio URL. Overrides `LABEL_STUDIO_URL`, default `http://localhost:8080`.\n",
 | |
|     "- **project_id** - Existing Label Studio project ID. Overrides `LABEL_STUDIO_PROJECT_ID`. Stores data in this project.\n",
 | |
|     "- **project_name** - Project name if project ID not specified. Creates a new project. Default is `\"LangChain-%Y-%m-%d\"` formatted with the current date.\n",
 | |
|     "- **project_config** - [custom labeling configuration](#custom-labeling-configuration)\n",
 | |
|     "- **mode**: use this shortcut to create target configuration from scratch:\n",
 | |
|     "   - `\"prompt\"` - Single prompt, single response. Default.\n",
 | |
|     "   - `\"chat\"` - Multi-turn chat mode.\n",
 | |
|     "\n"
 | |
|    ]
 | |
|   }
 | |
|  ],
 | |
|  "metadata": {
 | |
|   "kernelspec": {
 | |
|    "display_name": "labelops",
 | |
|    "language": "python",
 | |
|    "name": "labelops"
 | |
|   },
 | |
|   "language_info": {
 | |
|    "codemirror_mode": {
 | |
|     "name": "ipython",
 | |
|     "version": 3
 | |
|    },
 | |
|    "file_extension": ".py",
 | |
|    "mimetype": "text/x-python",
 | |
|    "name": "python",
 | |
|    "nbconvert_exporter": "python",
 | |
|    "pygments_lexer": "ipython3",
 | |
|    "version": "3.9.16"
 | |
|   }
 | |
|  },
 | |
|  "nbformat": 4,
 | |
|  "nbformat_minor": 1
 | |
| }
 |