mirror of
https://github.com/hwchase17/langchain.git
synced 2025-08-11 13:55:03 +00:00
update extraction use-case docs (#17979)
Update extraction use-case docs to showcase and explain all modes of `create_structured_output_runnable`.
This commit is contained in:
parent
8a81fcd5d3
commit
9bf58ec7dd
@ -19,9 +19,7 @@
|
||||
"\n",
|
||||
"## Use case\n",
|
||||
"\n",
|
||||
"Getting structured output from raw LLM generations is hard.\n",
|
||||
"\n",
|
||||
"For example, suppose you need the model output formatted with a specific schema for:\n",
|
||||
"LLMs can be used to generate text that is structured according to a specific schema. This can be useful in a number of scenarios, including:\n",
|
||||
"\n",
|
||||
"- Extracting a structured row to insert into a database \n",
|
||||
"- Extracting API parameters\n",
|
||||
@ -43,17 +41,23 @@
|
||||
"source": [
|
||||
"## Overview \n",
|
||||
"\n",
|
||||
"There are two primary approaches for this:\n",
|
||||
"There are two broad approaches for this:\n",
|
||||
"\n",
|
||||
"- `Functions`: Some LLMs can call [functions](https://openai.com/blog/function-calling-and-other-api-updates) to extract arbitrary entities from LLM responses.\n",
|
||||
"- `Tools and JSON mode`: Some LLMs specifically support structured output generation in certain contexts. Examples include OpenAI's [function and tool calling](https://platform.openai.com/docs/guides/function-calling) or [JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode).\n",
|
||||
"\n",
|
||||
"- `Parsing`: [Output parsers](/docs/modules/model_io/output_parsers/) are classes that structure LLM responses. \n",
|
||||
"\n",
|
||||
"Only some LLMs support functions (e.g., OpenAI), and they are more general than parsers. \n",
|
||||
"- `Parsing`: LLMs can often be instructed to output their response in a dseired format. [Output parsers](/docs/modules/model_io/output_parsers/) will parse text generations into a structured form.\n",
|
||||
"\n",
|
||||
"Parsers extract precisely what is enumerated in a provided schema (e.g., specific attributes of a person).\n",
|
||||
"\n",
|
||||
"Functions can infer things beyond of a provided schema (e.g., attributes about a person that you did not ask for)."
|
||||
"Functions and tools can infer things beyond of a provided schema (e.g., attributes about a person that you did not ask for)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fbea06b5-66b6-4958-936d-23212061e4c8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Option 1: Leveraging tools and JSON mode"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -61,13 +65,16 @@
|
||||
"id": "25d89f21",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Quickstart\n",
|
||||
"### Quickstart\n",
|
||||
"\n",
|
||||
"OpenAI functions are one way to get started with extraction.\n",
|
||||
"`create_structured_output_runnable` will create Runnables to support structured data extraction via OpenAI tool use and JSON mode.\n",
|
||||
"\n",
|
||||
"Define a schema that specifies the properties we want to extract from the LLM output.\n",
|
||||
"The desired output schema can be expressed either via a Pydantic model or a Python dict representing valid [JsonSchema](https://json-schema.org/).\n",
|
||||
"\n",
|
||||
"Then, we can use `create_extraction_chain` to extract our desired schema using an OpenAI function call."
|
||||
"This function supports three modes for structured data extraction:\n",
|
||||
"- `\"openai-functions\"` will define OpenAI functions and bind them to the given LLM;\n",
|
||||
"- `\"openai-tools\"` will define OpenAI tools and bind them to the given LLM;\n",
|
||||
"- `\"openai-json\"` will bind `response_format={\"type\": \"json_object\"}` to the given LLM.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -86,28 +93,131 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "3e017ba0",
|
||||
"execution_count": 1,
|
||||
"id": "4c2bc413-eacd-44bd-9fcb-bbbe1f97ca6c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Optional\n",
|
||||
"\n",
|
||||
"from langchain.chains import create_structured_output_runnable\n",
|
||||
"from langchain_core.pydantic_v1 import BaseModel\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Person(BaseModel):\n",
|
||||
" person_name: str\n",
|
||||
" person_height: int\n",
|
||||
" person_hair_color: str\n",
|
||||
" dog_breed: Optional[str]\n",
|
||||
" dog_name: Optional[str]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-4-0125-preview\", temperature=0)\n",
|
||||
"runnable = create_structured_output_runnable(Person, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "de8c9d7b-bb7b-45bc-9794-a355ed0d1508",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
|
||||
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]"
|
||||
"Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None)"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains import create_extraction_chain\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"inp = \"Alex is 5 feet tall and has blond hair.\"\n",
|
||||
"runnable.invoke(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02fd21ff-27a8-4890-bb18-fc852cafb18a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Specifying schemas"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a5a74f3e-92aa-4ac7-96f2-ea89b8740ba8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"A convenient way to express desired output schemas is via Pydantic. The above example specified the desired output schema via `Person`, a Pydantic model. Such schemas can be easily combined together to generate richer output formats:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "c1c8fe71-0ae4-466a-b32f-001c59b62bb3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Sequence\n",
|
||||
"\n",
|
||||
"# Schema\n",
|
||||
"\n",
|
||||
"class People(BaseModel):\n",
|
||||
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
|
||||
"\n",
|
||||
" people: Sequence[Person]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"runnable = create_structured_output_runnable(People, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "c5aa9e43-9202-4b2d-a767-e596296b3a81",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed='beagle', dog_name='Harry')])"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"inp = \"\"\"Alex is 5 feet tall and has blond hair.\n",
|
||||
"Claudia is 1 feet taller Alex and jumps higher than him.\n",
|
||||
"Claudia is a brunette and has a beagle named Harry.\"\"\"\n",
|
||||
"\n",
|
||||
"runnable.invoke(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "53e316ea-b74a-4512-a9ab-c5d01ff583fe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note that `dog_breed` and `dog_name` are optional attributes, such that here they are extracted for Claudia and not for Alex.\n",
|
||||
"\n",
|
||||
"One can also specify the desired output format with a Python dict representing valid JsonSchema:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "3e017ba0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"name\": {\"type\": \"string\"},\n",
|
||||
" \"height\": {\"type\": \"integer\"},\n",
|
||||
@ -116,167 +226,51 @@
|
||||
" \"required\": [\"name\", \"height\"],\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# Input\n",
|
||||
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
|
||||
"\n",
|
||||
"# Run chain\n",
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
|
||||
"chain = create_extraction_chain(schema, llm)\n",
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6f7eb826",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Option 1: OpenAI functions\n",
|
||||
"\n",
|
||||
"### Looking under the hood\n",
|
||||
"\n",
|
||||
"Let's dig into what is happening when we call `create_extraction_chain`.\n",
|
||||
"\n",
|
||||
"The [LangSmith trace](https://smith.langchain.com/public/72bc3205-7743-4ca6-929a-966a9d4c2a77/r) shows that we call the function `information_extraction` on the input string, `inp`.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This `information_extraction` function is defined [here](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/openai_functions/extraction.py) and returns a dict.\n",
|
||||
"\n",
|
||||
"We can see the `dict` in the model output:\n",
|
||||
"```\n",
|
||||
" {\n",
|
||||
" \"info\": [\n",
|
||||
" {\n",
|
||||
" \"name\": \"Alex\",\n",
|
||||
" \"height\": 5,\n",
|
||||
" \"hair_color\": \"blonde\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"Claudia\",\n",
|
||||
" \"height\": 6,\n",
|
||||
" \"hair_color\": \"brunette\"\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" }\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"The `create_extraction_chain` then parses the raw LLM output for us using [`JsonKeyOutputFunctionsParser`](https://github.com/langchain-ai/langchain/blob/f81e613086d211327b67b0fb591fd4d5f9a85860/libs/langchain/langchain/chains/openai_functions/extraction.py#L62).\n",
|
||||
"\n",
|
||||
"This results in the list of JSON objects returned by the chain above:\n",
|
||||
"```\n",
|
||||
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
|
||||
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]\n",
|
||||
" ```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dcb03138",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Multiple entity types\n",
|
||||
"\n",
|
||||
"We can extend this further.\n",
|
||||
"\n",
|
||||
"Let's say we want to differentiate between dogs and people.\n",
|
||||
"\n",
|
||||
"We can add `person_` and `dog_` prefixes for each property"
|
||||
"runnable = create_structured_output_runnable(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "01eae733",
|
||||
"execution_count": 6,
|
||||
"id": "fb525991-643d-4d47-9111-a3d4364c03d7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'person_name': 'Alex',\n",
|
||||
" 'person_height': 5,\n",
|
||||
" 'person_hair_color': 'blonde',\n",
|
||||
" 'dog_name': 'Frosty',\n",
|
||||
" 'dog_breed': 'labrador'},\n",
|
||||
" {'person_name': 'Claudia',\n",
|
||||
" 'person_height': 6,\n",
|
||||
" 'person_hair_color': 'brunette'}]"
|
||||
"{'name': 'Alex', 'height': 60}"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"person_name\": {\"type\": \"string\"},\n",
|
||||
" \"person_height\": {\"type\": \"integer\"},\n",
|
||||
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
||||
" \"dog_name\": {\"type\": \"string\"},\n",
|
||||
" \"dog_breed\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
" \"required\": [\"person_name\", \"person_height\"],\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"chain = create_extraction_chain(schema, llm)\n",
|
||||
"\n",
|
||||
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"Alex's dog Frosty is a labrador and likes to play hide and seek.\"\"\"\n",
|
||||
"\n",
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f205905c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Unrelated entities\n",
|
||||
"\n",
|
||||
"If we use `required: []`, we allow the model to return **only** person attributes or **only** dog attributes for a single entity (person or dog)."
|
||||
"inp = \"Alex is 5 feet tall. I don't know his hair color.\"\n",
|
||||
"runnable.invoke(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "6ff4ac7e",
|
||||
"execution_count": 7,
|
||||
"id": "a3d3f0d2-c9d4-4ab8-9a5a-1ddda62db6ec",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
|
||||
" {'person_name': 'Claudia',\n",
|
||||
" 'person_height': 6,\n",
|
||||
" 'person_hair_color': 'brunette'},\n",
|
||||
" {'dog_name': 'Willow', 'dog_breed': 'German Shepherd'},\n",
|
||||
" {'dog_name': 'Milo', 'dog_breed': 'border collie'}]"
|
||||
"{'name': 'Alex', 'height': 60, 'hair_color': 'blond'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"person_name\": {\"type\": \"string\"},\n",
|
||||
" \"person_height\": {\"type\": \"integer\"},\n",
|
||||
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
||||
" \"dog_name\": {\"type\": \"string\"},\n",
|
||||
" \"dog_breed\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
" \"required\": [],\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"chain = create_extraction_chain(schema, llm)\n",
|
||||
"\n",
|
||||
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\"\"\"\n",
|
||||
"\n",
|
||||
"chain.run(inp)"
|
||||
"inp = \"Alex is 5 feet tall. He is blond.\"\n",
|
||||
"runnable.invoke(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -284,11 +278,9 @@
|
||||
"id": "34f3b958",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Extra information\n",
|
||||
"#### Extra information\n",
|
||||
"\n",
|
||||
"The power of functions (relative to using parsers alone) lies in the ability to perform semantic extraction.\n",
|
||||
"\n",
|
||||
"In particular, `we can ask for things that are not explicitly enumerated in the schema`.\n",
|
||||
"Runnables constructed via `create_structured_output_runnable` generally are capable of semantic extraction, such that they can populate information that is not explicitly enumerated in the schema.\n",
|
||||
"\n",
|
||||
"Suppose we want unspecified additional information about dogs. \n",
|
||||
"\n",
|
||||
@ -297,44 +289,53 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "40c7b26f",
|
||||
"execution_count": 8,
|
||||
"id": "0ed3b5e6-a7f3-453e-be61-d94fc665c16b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"Alex is 5 feet tall and has blond hair.\n",
|
||||
"Claudia is 1 feet taller Alex and jumps higher than him.\n",
|
||||
"Claudia is a brunette and has a beagle named Harry.\n",
|
||||
"Harry likes to play with other dogs and can always be found\n",
|
||||
"playing with Milo, a border collie that lives close by.\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "be07928a-8022-4963-a15e-eb3097beef9f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
|
||||
" {'person_name': 'Claudia',\n",
|
||||
" 'person_height': 6,\n",
|
||||
" 'person_hair_color': 'brunette'},\n",
|
||||
" {'dog_name': 'Willow',\n",
|
||||
" 'dog_breed': 'German Shepherd',\n",
|
||||
" 'dog_extra_info': 'likes to play with other dogs'},\n",
|
||||
" {'dog_name': 'Milo',\n",
|
||||
" 'dog_breed': 'border collie',\n",
|
||||
" 'dog_extra_info': 'lives close by'}]"
|
||||
"People(people=[Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None), Person(person_name='Claudia', person_height=72, person_hair_color='brunette', dog_breed='beagle', dog_name='Harry', dog_extra_info='likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.')])"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"person_name\": {\"type\": \"string\"},\n",
|
||||
" \"person_height\": {\"type\": \"integer\"},\n",
|
||||
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
||||
" \"dog_name\": {\"type\": \"string\"},\n",
|
||||
" \"dog_breed\": {\"type\": \"string\"},\n",
|
||||
" \"dog_extra_info\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
"}\n",
|
||||
"class Person(BaseModel):\n",
|
||||
" person_name: str\n",
|
||||
" person_height: int\n",
|
||||
" person_hair_color: str\n",
|
||||
" dog_breed: Optional[str]\n",
|
||||
" dog_name: Optional[str]\n",
|
||||
" dog_extra_info: Optional[str]\n",
|
||||
"\n",
|
||||
"chain = create_extraction_chain(schema, llm)\n",
|
||||
"chain.run(inp)"
|
||||
"\n",
|
||||
"class People(BaseModel):\n",
|
||||
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
|
||||
"\n",
|
||||
" people: Sequence[Person]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"runnable = create_structured_output_runnable(People, llm)\n",
|
||||
"runnable.invoke(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -347,66 +348,289 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bf71ddce",
|
||||
"id": "97ed9f5e-33be-4667-aa82-af49cc874e1d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Pydantic \n",
|
||||
"### Specifying extraction mode\n",
|
||||
"\n",
|
||||
"Pydantic is a data validation and settings management library for Python. \n",
|
||||
"`create_structured_output_runnable` supports varying implementations of the underlying extraction under the hood, which are configured via the `mode` parameter. This parameter can be one of `\"openai-functions\"`, `\"openai-tools\"`, or `\"openai-json\"`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7c8e0b00-d6e6-432d-b9b0-8d0a3c0c6572",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### OpenAI Functions and Tools"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "07ccdbb1-cbe5-45af-87e4-dde42baee5eb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Some LLMs are fine-tuned to support the invocation of functions or tools. If they are given an input schema for a tool and recognize an occasion to use it, they may emit JSON output conforming to that schema. We can leverage this to drive structured data extraction from natural language.\n",
|
||||
"\n",
|
||||
"It allows you to create data classes with attributes that are automatically validated when you instantiate an object.\n",
|
||||
"\n",
|
||||
"Lets define a class with attributes annotated with types."
|
||||
"OpenAI originally released this via a [`functions` parameter in its chat completions API](https://openai.com/blog/function-calling-and-other-api-updates). This has since been deprecated in favor of a [`tools` parameter](https://platform.openai.com/docs/guides/function-calling), which can include (multiple) functions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e6b02442-2884-4b45-a5a0-4fdac729fdb3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Using OpenAI Functions:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "d36a743b",
|
||||
"execution_count": 10,
|
||||
"id": "7b1c2266-b04b-4a23-83a9-da3cd2f88137",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Properties(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed=None, dog_name=None),\n",
|
||||
" Properties(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)]"
|
||||
"Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None)"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from typing import Optional\n",
|
||||
"runnable = create_structured_output_runnable(Person, llm, mode=\"openai-functions\")\n",
|
||||
"\n",
|
||||
"from langchain.chains import create_extraction_chain_pydantic\n",
|
||||
"from langchain_core.pydantic_v1 import BaseModel\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Pydantic data class\n",
|
||||
"class Properties(BaseModel):\n",
|
||||
" person_name: str\n",
|
||||
" person_height: int\n",
|
||||
" person_hair_color: str\n",
|
||||
" dog_breed: Optional[str]\n",
|
||||
" dog_name: Optional[str]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Extraction\n",
|
||||
"chain = create_extraction_chain_pydantic(pydantic_schema=Properties, llm=llm)\n",
|
||||
"\n",
|
||||
"# Run\n",
|
||||
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
|
||||
"chain.run(inp)"
|
||||
"inp = \"Alex is 5 feet tall and has blond hair.\"\n",
|
||||
"runnable.invoke(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "07a0351a",
|
||||
"id": "1c07427b-a582-4489-a486-4c24a6c3165f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see from the [trace](https://smith.langchain.com/public/fed50ae6-26bb-4235-a254-e0b7a229d10f/r), we use the function `information_extraction`, as above, with the Pydantic schema. "
|
||||
"Using OpenAI Tools:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "0b1ca93a-ffd9-4d37-8baa-377757405357",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Person(person_name='Alex', person_height=152, person_hair_color='blond', dog_breed=None, dog_name=None)"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"runnable = create_structured_output_runnable(Person, llm, mode=\"openai-tools\")\n",
|
||||
"\n",
|
||||
"runnable.invoke(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4018a8fc-1799-4c9d-b655-a66f618204b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The corresponding [LangSmith trace](https://smith.langchain.com/public/04cc37a7-7a1c-4bae-b972-1cb1a642568c/r) illustrates the tool call that generated our structured output.\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fb2662d5-9492-4acc-935b-eb8fccebbe0f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### JSON Mode"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c0fd98ba-c887-4c30-8c9e-896ae90ac56a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Some LLMs support generating JSON more generally. OpenAI implements this via a [`response_format` parameter](https://platform.openai.com/docs/guides/text-generation/json-mode) in its chat completions API.\n",
|
||||
"\n",
|
||||
"Note that this method may require explicit prompting (e.g., OpenAI requires that input messages contain the word \"json\" in some form when using this parameter)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "6b3e4679-eadc-42c8-b882-92a600083f2f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None)"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"\n",
|
||||
"system_prompt = \"\"\"You extract information in structured JSON formats.\n",
|
||||
"\n",
|
||||
"Extract a valid JSON blob from the user input that matches the following JSON Schema:\n",
|
||||
"\n",
|
||||
"{output_schema}\"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", system_prompt),\n",
|
||||
" (\"human\", \"{input}\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"runnable = create_structured_output_runnable(\n",
|
||||
" Person,\n",
|
||||
" llm,\n",
|
||||
" mode=\"openai-json\",\n",
|
||||
" prompt=prompt,\n",
|
||||
" enforce_function_usage=False,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"runnable.invoke({\"input\": inp})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b22d8262-a9b8-415c-a142-d0ee4db7ec2b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Few-shot examples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a01c75f6-99d7-4d7b-a58f-b0ea7e8f338a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Suppose we want to tune the behavior of our extractor. There are a few options available. For example, if we want to redact names but retain other information, we could adjust the system prompt:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "c5d16ad6-824e-434a-906a-d94e78259d4f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Person(person_name='REDACTED', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None)"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"system_prompt = \"\"\"You extract information in structured JSON formats.\n",
|
||||
"\n",
|
||||
"Extract a valid JSON blob from the user input that matches the following JSON Schema:\n",
|
||||
"\n",
|
||||
"{output_schema}\n",
|
||||
"\n",
|
||||
"Redact all names.\n",
|
||||
"\"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [(\"system\", system_prompt), (\"human\", \"{input}\")]\n",
|
||||
")\n",
|
||||
"runnable = create_structured_output_runnable(\n",
|
||||
" Person,\n",
|
||||
" llm,\n",
|
||||
" mode=\"openai-json\",\n",
|
||||
" prompt=prompt,\n",
|
||||
" enforce_function_usage=False,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"runnable.invoke({\"input\": inp})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "be611688-1224-4d5a-9e34-a158b3c04296",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Few-shot examples are another, effective way to illustrate intended behavior. For instance, if we want to redact names with a specific character string, a one-shot example will convey this. We can use a `FewShotChatMessagePromptTemplate` to easily accommodate both a fixed set of examples as well as the dynamic selection of examples based on the input."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "0aeee951-7f73-4e24-9033-c81a08af14dc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Person(person_name='#####', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None)"
|
||||
]
|
||||
},
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_core.prompts import FewShotChatMessagePromptTemplate\n",
|
||||
"\n",
|
||||
"examples = [\n",
|
||||
" {\n",
|
||||
" \"input\": \"Samus is 6 ft tall and blonde.\",\n",
|
||||
" \"output\": Person(\n",
|
||||
" person_name=\"######\",\n",
|
||||
" person_height=5,\n",
|
||||
" person_hair_color=\"blonde\",\n",
|
||||
" ).dict(),\n",
|
||||
" }\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"example_prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [(\"human\", \"{input}\"), (\"ai\", \"{output}\")]\n",
|
||||
")\n",
|
||||
"few_shot_prompt = FewShotChatMessagePromptTemplate(\n",
|
||||
" examples=examples,\n",
|
||||
" example_prompt=example_prompt,\n",
|
||||
")\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [(\"system\", system_prompt), few_shot_prompt, (\"human\", \"{input}\")]\n",
|
||||
")\n",
|
||||
"runnable = create_structured_output_runnable(\n",
|
||||
" Person,\n",
|
||||
" llm,\n",
|
||||
" mode=\"openai-json\",\n",
|
||||
" prompt=prompt,\n",
|
||||
" enforce_function_usage=False,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"runnable.invoke({\"input\": inp})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "51846211-e86b-4807-9348-eb263999f7f7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here, the [LangSmith trace](https://smith.langchain.com/public/6fe5e694-9c04-48f7-83ff-e541da764781/r) for the chat model call shows how the one-shot example is formatted into the prompt.\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -418,41 +642,26 @@
|
||||
"\n",
|
||||
"[Output parsers](/docs/modules/model_io/output_parsers/) are classes that help structure language model responses. \n",
|
||||
"\n",
|
||||
"As shown above, they are used to parse the output of the OpenAI function calls in `create_extraction_chain`.\n",
|
||||
"As shown above, they are used to parse the output of the runnable created by `create_structured_output_runnable`.\n",
|
||||
"\n",
|
||||
"But, they can be used independent of functions.\n",
|
||||
"They can also be used more generally, if a LLM is instructed to emit its output in a certain format. Parsers include convenience methods for generating formatting instructions for use in prompts.\n",
|
||||
"\n",
|
||||
"### Pydantic\n",
|
||||
"\n",
|
||||
"Just as a above, let's parse a generation based on a Pydantic data class."
|
||||
"Below we implement an example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 16,
|
||||
"id": "64650362",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)])"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Optional, Sequence\n",
|
||||
"\n",
|
||||
"from langchain.output_parsers import PydanticOutputParser\n",
|
||||
"from langchain.prompts import (\n",
|
||||
" PromptTemplate,\n",
|
||||
")\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain_core.pydantic_v1 import BaseModel, Field, validator\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Person(BaseModel):\n",
|
||||
@ -470,7 +679,7 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"# Run\n",
|
||||
"query = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
|
||||
"query = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blond.\"\"\"\n",
|
||||
"\n",
|
||||
"# Set up a parser + inject instructions into the prompt template.\n",
|
||||
"parser = PydanticOutputParser(pydantic_object=People)\n",
|
||||
@ -484,9 +693,30 @@
|
||||
"\n",
|
||||
"# Run\n",
|
||||
"_input = prompt.format_prompt(query=query)\n",
|
||||
"model = OpenAI(temperature=0)\n",
|
||||
"output = model(_input.to_string())\n",
|
||||
"parser.parse(output)"
|
||||
"model = ChatOpenAI()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "727f3bf2-31b1-4b07-94f5-9568acf3ffdf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)])"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"output = model.invoke(_input.to_string())\n",
|
||||
"\n",
|
||||
"parser.parse(output.content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -494,46 +724,31 @@
|
||||
"id": "826899df",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can see from the [LangSmith trace](https://smith.langchain.com/public/8e3aa858-467e-46a5-aa49-5db65f0a2b9a/r) that we get the same output as above.\n",
|
||||
"We can see from the [LangSmith trace](https://smith.langchain.com/public/aec42dd3-d471-4d34-801b-20dd88444931/r) that we get the same output as above.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"We can see that we provide a two-shot prompt in order to instruct the LLM to output in our desired format.\n",
|
||||
"\n",
|
||||
"And, we need to do a bit more work:\n",
|
||||
"\n",
|
||||
"* Define a class that holds multiple instances of `Person`\n",
|
||||
"* Explicitly parse the output of the LLM to the Pydantic class\n",
|
||||
"\n",
|
||||
"We can see this for other cases, too."
|
||||
"We can see that we provide a two-shot prompt in order to instruct the LLM to output in our desired format."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 21,
|
||||
"id": "837c350e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')"
|
||||
"Joke(setup=\"Why couldn't the bicycle find its way home?\", punchline='Because it lost its bearings!')"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"execution_count": 21,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.output_parsers import PydanticOutputParser\n",
|
||||
"from langchain.prompts import (\n",
|
||||
" PromptTemplate,\n",
|
||||
")\n",
|
||||
"from langchain_core.pydantic_v1 import BaseModel, Field, validator\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Define your desired data structure.\n",
|
||||
"class Joke(BaseModel):\n",
|
||||
" setup: str = Field(description=\"question to set up a joke\")\n",
|
||||
@ -562,9 +777,9 @@
|
||||
"\n",
|
||||
"# Run\n",
|
||||
"_input = prompt.format_prompt(query=joke_query)\n",
|
||||
"model = OpenAI(temperature=0)\n",
|
||||
"output = model(_input.to_string())\n",
|
||||
"parser.parse(output)"
|
||||
"model = ChatOpenAI(temperature=0)\n",
|
||||
"output = model.invoke(_input.to_string())\n",
|
||||
"parser.parse(output.content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -574,9 +789,7 @@
|
||||
"source": [
|
||||
"As we can see, we get an output of the `Joke` class, which respects our originally desired schema: 'setup' and 'punchline'.\n",
|
||||
"\n",
|
||||
"We can look at the [LangSmith trace](https://smith.langchain.com/public/69f11d41-41be-4319-93b0-6d0eda66e969/r) to see exactly what is going on under the hood.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"We can look at the [LangSmith trace](https://smith.langchain.com/public/557ad630-af35-43e9-b043-93800539025f/r) to see exactly what is going on under the hood.\n",
|
||||
"\n",
|
||||
"### Going deeper\n",
|
||||
"\n",
|
||||
@ -610,7 +823,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.10.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
BIN
docs/static/img/extraction_trace_few_shot.png
vendored
Normal file
BIN
docs/static/img/extraction_trace_few_shot.png
vendored
Normal file
Binary file not shown.
After Width: | Height: | Size: 325 KiB |
BIN
docs/static/img/extraction_trace_function_2.png
vendored
BIN
docs/static/img/extraction_trace_function_2.png
vendored
Binary file not shown.
Before Width: | Height: | Size: 63 KiB |
BIN
docs/static/img/extraction_trace_joke.png
vendored
BIN
docs/static/img/extraction_trace_joke.png
vendored
Binary file not shown.
Before Width: | Height: | Size: 132 KiB |
BIN
docs/static/img/extraction_trace_parsing.png
vendored
Normal file
BIN
docs/static/img/extraction_trace_parsing.png
vendored
Normal file
Binary file not shown.
After Width: | Height: | Size: 432 KiB |
BIN
docs/static/img/extraction_trace_tool.png
vendored
Normal file
BIN
docs/static/img/extraction_trace_tool.png
vendored
Normal file
Binary file not shown.
After Width: | Height: | Size: 336 KiB |
Loading…
Reference in New Issue
Block a user