mirror of
https://github.com/hwchase17/langchain.git
synced 2025-04-30 04:45:23 +00:00
## Description: As I was following the docs I found a couple of small issues on the docs. this fixes some unused imports on the [extraction page](https://python.langchain.com/docs/tutorials/extraction/#the-extractor) and updates the examples on [classification page](https://python.langchain.com/docs/tutorials/classification/#quickstart) to be independent from the chat model.
382 lines
10 KiB
Plaintext
382 lines
10 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "raw",
|
|
"id": "cb6f552e-775f-4d84-bc7c-dca94c06a33c",
|
|
"metadata": {},
|
|
"source": [
|
|
"---\n",
|
|
"title: Tagging\n",
|
|
"sidebar_class_name: hidden\n",
|
|
"---"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a0507a4b",
|
|
"metadata": {},
|
|
"source": [
|
|
"[](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/tagging.ipynb)\n",
|
|
"\n",
|
|
"# Classify Text into Labels\n",
|
|
"\n",
|
|
"Tagging means labeling a document with classes such as:\n",
|
|
"\n",
|
|
"- Sentiment\n",
|
|
"- Language\n",
|
|
"- Style (formal, informal etc.)\n",
|
|
"- Covered topics\n",
|
|
"- Political tendency\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"## Overview\n",
|
|
"\n",
|
|
"Tagging has a few components:\n",
|
|
"\n",
|
|
"* `function`: Like [extraction](/docs/tutorials/extraction), tagging uses [functions](https://openai.com/blog/function-calling-and-other-api-updates) to specify how the model should tag a document\n",
|
|
"* `schema`: defines how we want to tag the document\n",
|
|
"\n",
|
|
"## Quickstart\n",
|
|
"\n",
|
|
"Let's see a very straightforward example of how we can use OpenAI tool calling for tagging in LangChain. We'll use the [`with_structured_output`](/docs/how_to/structured_output) method supported by OpenAI models."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "dc5cbb6f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pip install --upgrade --quiet langchain-core"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cc2b7cdf-babb-46e2-98d0-302f69446842",
|
|
"metadata": {},
|
|
"source": [
|
|
"We'll need to load a [chat model](/docs/integrations/chat/):\n",
|
|
"\n",
|
|
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
|
|
"\n",
|
|
"<ChatModelTabs customVarName=\"llm\" />"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "608ee181-3f06-4719-842d-9672fdce6e57",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# | output: false\n",
|
|
"# | echo: false\n",
|
|
"\n",
|
|
"from langchain_openai import ChatOpenAI\n",
|
|
"\n",
|
|
"llm = ChatOpenAI(model=\"gpt-4o-mini\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b8ca3f93",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's specify a Pydantic model with a few properties and their expected type in our schema."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "39f3ce3e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.prompts import ChatPromptTemplate\n",
|
|
"from langchain_openai import ChatOpenAI\n",
|
|
"from pydantic import BaseModel, Field\n",
|
|
"\n",
|
|
"tagging_prompt = ChatPromptTemplate.from_template(\n",
|
|
" \"\"\"\n",
|
|
"Extract the desired information from the following passage.\n",
|
|
"\n",
|
|
"Only extract the properties mentioned in the 'Classification' function.\n",
|
|
"\n",
|
|
"Passage:\n",
|
|
"{input}\n",
|
|
"\"\"\"\n",
|
|
")\n",
|
|
"\n",
|
|
"\n",
|
|
"class Classification(BaseModel):\n",
|
|
" sentiment: str = Field(description=\"The sentiment of the text\")\n",
|
|
" aggressiveness: int = Field(\n",
|
|
" description=\"How aggressive the text is on a scale from 1 to 10\"\n",
|
|
" )\n",
|
|
" language: str = Field(description=\"The language the text is written in\")\n",
|
|
"\n",
|
|
"\n",
|
|
"# Structured LLM\n",
|
|
"structured_llm = llm.with_structured_output(Classification)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "5509b6a6",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Classification(sentiment='positive', aggressiveness=1, language='Spanish')"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"inp = \"Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!\"\n",
|
|
"prompt = tagging_prompt.invoke({\"input\": inp})\n",
|
|
"response = structured_llm.invoke(prompt)\n",
|
|
"\n",
|
|
"response"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ff3cf30d",
|
|
"metadata": {},
|
|
"source": [
|
|
"If we want dictionary output, we can just call `.model_dump()`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9154474c",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'sentiment': 'enojado', 'aggressiveness': 8, 'language': 'es'}"
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"inp = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
|
|
"prompt = tagging_prompt.invoke({\"input\": inp})\n",
|
|
"response = structured_llm.invoke(prompt)\n",
|
|
"\n",
|
|
"response.model_dump()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d921bb53",
|
|
"metadata": {},
|
|
"source": [
|
|
"As we can see in the examples, it correctly interprets what we want.\n",
|
|
"\n",
|
|
"The results vary so that we may get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
|
|
"\n",
|
|
"We will see how to control these results in the next section."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bebb2f83",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Finer control\n",
|
|
"\n",
|
|
"Careful schema definition gives us more control over the model's output. \n",
|
|
"\n",
|
|
"Specifically, we can define:\n",
|
|
"\n",
|
|
"- Possible values for each property\n",
|
|
"- Description to make sure that the model understands the property\n",
|
|
"- Required properties to be returned"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "69ef0b9a",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's redeclare our Pydantic model to control for each of the previously mentioned aspects using enums:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"id": "6a5f7961",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"class Classification(BaseModel):\n",
|
|
" sentiment: str = Field(..., enum=[\"happy\", \"neutral\", \"sad\"])\n",
|
|
" aggressiveness: int = Field(\n",
|
|
" ...,\n",
|
|
" description=\"describes how aggressive the statement is, the higher the number the more aggressive\",\n",
|
|
" enum=[1, 2, 3, 4, 5],\n",
|
|
" )\n",
|
|
" language: str = Field(\n",
|
|
" ..., enum=[\"spanish\", \"english\", \"french\", \"german\", \"italian\"]\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"id": "e5a5881f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"tagging_prompt = ChatPromptTemplate.from_template(\n",
|
|
" \"\"\"\n",
|
|
"Extract the desired information from the following passage.\n",
|
|
"\n",
|
|
"Only extract the properties mentioned in the 'Classification' function.\n",
|
|
"\n",
|
|
"Passage:\n",
|
|
"{input}\n",
|
|
"\"\"\"\n",
|
|
")\n",
|
|
"\n",
|
|
"llm = ChatOpenAI(temperature=0, model=\"gpt-4o-mini\").with_structured_output(\n",
|
|
" Classification\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5ded2332",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now the answers will be restricted in a way we expect!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"id": "d9b9d53d",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Classification(sentiment='positive', aggressiveness=1, language='Spanish')"
|
|
]
|
|
},
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"inp = \"Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!\"\n",
|
|
"prompt = tagging_prompt.invoke({\"input\": inp})\n",
|
|
"llm.invoke(prompt)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"id": "1c12fa00",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Classification(sentiment='enojado', aggressiveness=8, language='es')"
|
|
]
|
|
},
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"inp = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
|
|
"prompt = tagging_prompt.invoke({\"input\": inp})\n",
|
|
"llm.invoke(prompt)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"id": "0bdfcb05",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Classification(sentiment='neutral', aggressiveness=1, language='English')"
|
|
]
|
|
},
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"inp = \"Weather is ok here, I can go outside without much more than a coat\"\n",
|
|
"prompt = tagging_prompt.invoke({\"input\": inp})\n",
|
|
"llm.invoke(prompt)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cf6b7389",
|
|
"metadata": {},
|
|
"source": [
|
|
"The [LangSmith trace](https://smith.langchain.com/public/38294e04-33d8-4c5a-ae92-c2fe68be8332/r) lets us peek under the hood:\n",
|
|
"\n",
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "29346d09",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Going deeper\n",
|
|
"\n",
|
|
"* You can use the [metadata tagger](/docs/integrations/document_transformers/openai_metadata_tagger) document transformer to extract metadata from a LangChain `Document`. \n",
|
|
"* This covers the same basic functionality as the tagging chain, only applied to a LangChain `Document`."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.4"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|