mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-29 15:28:54 +00:00
## Description: As I was following the docs I found a couple of small issues on the docs. this fixes some unused imports on the [extraction page](https://python.langchain.com/docs/tutorials/extraction/#the-extractor) and updates the examples on [classification page](https://python.langchain.com/docs/tutorials/classification/#quickstart) to be independent from the chat model.
636 lines
22 KiB
Plaintext
636 lines
22 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "raw",
|
|
"id": "df29b30a-fd27-4e08-8269-870df5631f9e",
|
|
"metadata": {},
|
|
"source": [
|
|
"---\n",
|
|
"sidebar_position: 4\n",
|
|
"---"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d28530a6-ddfd-49c0-85dc-b723551f6614",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Build an Extraction Chain\n",
|
|
"\n",
|
|
"In this tutorial, we will use [tool-calling](/docs/concepts/tool_calling) features of [chat models](/docs/concepts/chat_models) to extract structured information from unstructured text. We will also demonstrate how to use [few-shot prompting](/docs/concepts/few_shot_prompting/) in this context to improve performance.\n",
|
|
"\n",
|
|
":::important\n",
|
|
"This tutorial requires `langchain-core>=0.3.20` and will only work with models that support **tool calling**.\n",
|
|
":::"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4412def2-38e3-4bd0-bbf0-fb09ff9e5985",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup\n",
|
|
"\n",
|
|
"### Jupyter Notebook\n",
|
|
"\n",
|
|
"This and other tutorials are perhaps most conveniently run in a [Jupyter notebooks](https://jupyter.org/). Going through guides in an interactive environment is a great way to better understand them. See [here](https://jupyter.org/install) for instructions on how to install.\n",
|
|
"\n",
|
|
"### Installation\n",
|
|
"\n",
|
|
"To install LangChain run:\n",
|
|
"\n",
|
|
"import Tabs from '@theme/Tabs';\n",
|
|
"import TabItem from '@theme/TabItem';\n",
|
|
"import CodeBlock from \"@theme/CodeBlock\";\n",
|
|
"\n",
|
|
"<Tabs>\n",
|
|
" <TabItem value=\"pip\" label=\"Pip\" default>\n",
|
|
" <CodeBlock language=\"bash\">pip install --upgrade langchain-core</CodeBlock>\n",
|
|
" </TabItem>\n",
|
|
" <TabItem value=\"conda\" label=\"Conda\">\n",
|
|
" <CodeBlock language=\"bash\">conda install langchain-core -c conda-forge</CodeBlock>\n",
|
|
" </TabItem>\n",
|
|
"</Tabs>\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"For more details, see our [Installation guide](/docs/how_to/installation).\n",
|
|
"\n",
|
|
"### LangSmith\n",
|
|
"\n",
|
|
"Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls.\n",
|
|
"As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent.\n",
|
|
"The best way to do this is with [LangSmith](https://smith.langchain.com).\n",
|
|
"\n",
|
|
"After you sign up at the link above, make sure to set your environment variables to start logging traces:\n",
|
|
"\n",
|
|
"```shell\n",
|
|
"export LANGSMITH_TRACING=\"true\"\n",
|
|
"export LANGSMITH_API_KEY=\"...\"\n",
|
|
"```\n",
|
|
"\n",
|
|
"Or, if in a notebook, you can set them with:\n",
|
|
"\n",
|
|
"```python\n",
|
|
"import getpass\n",
|
|
"import os\n",
|
|
"\n",
|
|
"os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
|
|
"os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass()\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "54d6b970-2ea3-4192-951e-21237212b359",
|
|
"metadata": {},
|
|
"source": [
|
|
"## The Schema\n",
|
|
"\n",
|
|
"First, we need to describe what information we want to extract from the text.\n",
|
|
"\n",
|
|
"We'll use Pydantic to define an example schema to extract personal information."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "c141084c-fb94-4093-8d6a-81175d688e40",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from typing import Optional\n",
|
|
"\n",
|
|
"from pydantic import BaseModel, Field\n",
|
|
"\n",
|
|
"\n",
|
|
"class Person(BaseModel):\n",
|
|
" \"\"\"Information about a person.\"\"\"\n",
|
|
"\n",
|
|
" # ^ Doc-string for the entity Person.\n",
|
|
" # This doc-string is sent to the LLM as the description of the schema Person,\n",
|
|
" # and it can help to improve extraction results.\n",
|
|
"\n",
|
|
" # Note that:\n",
|
|
" # 1. Each field is an `optional` -- this allows the model to decline to extract it!\n",
|
|
" # 2. Each field has a `description` -- this description is used by the LLM.\n",
|
|
" # Having a good description can help improve extraction results.\n",
|
|
" name: Optional[str] = Field(default=None, description=\"The name of the person\")\n",
|
|
" hair_color: Optional[str] = Field(\n",
|
|
" default=None, description=\"The color of the person's hair if known\"\n",
|
|
" )\n",
|
|
" height_in_meters: Optional[str] = Field(\n",
|
|
" default=None, description=\"Height measured in meters\"\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f248dd54-e36d-435a-b154-394ab4ed6792",
|
|
"metadata": {},
|
|
"source": [
|
|
"There are two best practices when defining schema:\n",
|
|
"\n",
|
|
"1. Document the **attributes** and the **schema** itself: This information is sent to the LLM and is used to improve the quality of information extraction.\n",
|
|
"2. Do not force the LLM to make up information! Above we used `Optional` for the attributes allowing the LLM to output `None` if it doesn't know the answer.\n",
|
|
"\n",
|
|
":::important\n",
|
|
"For best performance, document the schema well and make sure the model isn't force to return results if there's no information to be extracted in the text.\n",
|
|
":::\n",
|
|
"\n",
|
|
"## The Extractor\n",
|
|
"\n",
|
|
"Let's create an information extractor using the schema we defined above."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "a5e490f6-35ad-455e-8ae4-2bae021583ff",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
|
|
"\n",
|
|
"# Define a custom prompt to provide instructions and any additional context.\n",
|
|
"# 1) You can add examples into the prompt template to improve extraction quality\n",
|
|
"# 2) Introduce additional parameters to take context into account (e.g., include metadata\n",
|
|
"# about the document from which the text was extracted.)\n",
|
|
"prompt_template = ChatPromptTemplate.from_messages(\n",
|
|
" [\n",
|
|
" (\n",
|
|
" \"system\",\n",
|
|
" \"You are an expert extraction algorithm. \"\n",
|
|
" \"Only extract relevant information from the text. \"\n",
|
|
" \"If you do not know the value of an attribute asked to extract, \"\n",
|
|
" \"return null for the attribute's value.\",\n",
|
|
" ),\n",
|
|
" # Please see the how-to about improving performance with\n",
|
|
" # reference examples.\n",
|
|
" # MessagesPlaceholder('examples'),\n",
|
|
" (\"human\", \"{text}\"),\n",
|
|
" ]\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "832bf6a1-8e0c-4b6a-aa37-12fe9c42a6d9",
|
|
"metadata": {},
|
|
"source": [
|
|
"We need to use a model that supports function/tool calling.\n",
|
|
"\n",
|
|
"Please review [the documentation](/docs/concepts/tool_calling) for all models that can be used with this API.\n",
|
|
"\n",
|
|
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
|
|
"\n",
|
|
"<ChatModelTabs customVarName=\"llm\" />"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "77c1311c-5252-41d6-83e6-fdb40b172e47",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# | output: false\n",
|
|
"# | echo: false\n",
|
|
"\n",
|
|
"from langchain_openai import ChatOpenAI\n",
|
|
"\n",
|
|
"llm = ChatOpenAI(model=\"gpt-4o-mini\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "04d846a6-d5cb-4009-ac19-61e3aac0177e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"structured_llm = llm.with_structured_output(schema=Person)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "23582c0b-00ed-403f-a10e-3aeabf921f12",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's test it out:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "dd42a935-022f-4860-b9e0-84268f55b22a",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Person(name='Alan Smith', hair_color='blond', height_in_meters='1.83')"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"text = \"Alan Smith is 6 feet tall and has blond hair.\"\n",
|
|
"prompt = prompt_template.invoke({\"text\": text})\n",
|
|
"structured_llm.invoke(prompt)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bd1c493d-f9dc-4236-8da9-50f6919f5710",
|
|
"metadata": {},
|
|
"source": [
|
|
":::important \n",
|
|
"\n",
|
|
"Extraction is Generative 🤯\n",
|
|
"\n",
|
|
"LLMs are generative models, so they can do some pretty cool things like correctly extract the height of the person in meters\n",
|
|
"even though it was provided in feet!\n",
|
|
":::\n",
|
|
"\n",
|
|
"We can see the LangSmith trace [here](https://smith.langchain.com/public/44b69a63-3b3b-47b8-8a6d-61b46533f015/r). Note that the [chat model portion of the trace](https://smith.langchain.com/public/44b69a63-3b3b-47b8-8a6d-61b46533f015/r/dd1f6305-f1e9-4919-bd8f-339d03a12d01) reveals the exact sequence of messages sent to the model, tools invoked, and other metadata."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "28c5ef0c-b8d1-4e12-bd0e-e2528de87fcc",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Multiple Entities\n",
|
|
"\n",
|
|
"In **most cases**, you should be extracting a list of entities rather than a single entity.\n",
|
|
"\n",
|
|
"This can be easily achieved using pydantic by nesting models inside one another."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"id": "591a0c16-7a17-4883-91ee-0d6d2fdb265c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from typing import List, Optional\n",
|
|
"\n",
|
|
"from pydantic import BaseModel, Field\n",
|
|
"\n",
|
|
"\n",
|
|
"class Person(BaseModel):\n",
|
|
" \"\"\"Information about a person.\"\"\"\n",
|
|
"\n",
|
|
" # ^ Doc-string for the entity Person.\n",
|
|
" # This doc-string is sent to the LLM as the description of the schema Person,\n",
|
|
" # and it can help to improve extraction results.\n",
|
|
"\n",
|
|
" # Note that:\n",
|
|
" # 1. Each field is an `optional` -- this allows the model to decline to extract it!\n",
|
|
" # 2. Each field has a `description` -- this description is used by the LLM.\n",
|
|
" # Having a good description can help improve extraction results.\n",
|
|
" name: Optional[str] = Field(default=None, description=\"The name of the person\")\n",
|
|
" hair_color: Optional[str] = Field(\n",
|
|
" default=None, description=\"The color of the person's hair if known\"\n",
|
|
" )\n",
|
|
" height_in_meters: Optional[str] = Field(\n",
|
|
" default=None, description=\"Height measured in meters\"\n",
|
|
" )\n",
|
|
"\n",
|
|
"\n",
|
|
"class Data(BaseModel):\n",
|
|
" \"\"\"Extracted data about people.\"\"\"\n",
|
|
"\n",
|
|
" # Creates a model so that we can extract multiple entities.\n",
|
|
" people: List[Person]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5f5cda33-fd7b-481e-956a-703f45e40e1d",
|
|
"metadata": {},
|
|
"source": [
|
|
":::important\n",
|
|
"Extraction results might not be perfect here. Read on to see how to use **Reference Examples** to improve the quality of extraction, and check out our extraction [how-to](/docs/how_to/#extraction) guides for more detail.\n",
|
|
":::"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "83ecf0db-757b-4ae3-a9d2-eb1c9f6b2631",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Data(people=[Person(name='Jeff', hair_color='black', height_in_meters='1.83'), Person(name='Anna', hair_color='black', height_in_meters=None)])"
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"structured_llm = llm.with_structured_output(schema=Data)\n",
|
|
"text = \"My name is Jeff, my hair is black and i am 6 feet tall. Anna has the same color hair as me.\"\n",
|
|
"prompt = prompt_template.invoke({\"text\": text})\n",
|
|
"structured_llm.invoke(prompt)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "fba1d770-bf4d-4de4-9e4f-7384872ef0dc",
|
|
"metadata": {},
|
|
"source": [
|
|
":::tip\n",
|
|
"When the schema accommodates the extraction of **multiple entities**, it also allows the model to extract **no entities** if no relevant information\n",
|
|
"is in the text by providing an empty list. \n",
|
|
"\n",
|
|
"This is usually a **good** thing! It allows specifying **required** attributes on an entity without necessarily forcing the model to detect this entity.\n",
|
|
":::\n",
|
|
"\n",
|
|
"We can see the LangSmith trace [here](https://smith.langchain.com/public/7173764d-5e76-45fe-8496-84460bd9cdef/r)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c590f366-050a-43d4-8c78-acf84ccfbf9b",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Reference examples\n",
|
|
"\n",
|
|
"The behavior of LLM applications can be steered using [few-shot prompting](/docs/concepts/few_shot_prompting/). For [chat models](/docs/concepts/chat_models/), this can take the form of a sequence of pairs of input and response messages demonstrating desired behaviors.\n",
|
|
"\n",
|
|
"For example, we can convey the meaning of a symbol with alternating `user` and `assistant` [messages](/docs/concepts/messages/#role):"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"id": "0bb138d7-116e-4542-aa5f-bebf0c301ec6",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"7\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"messages = [\n",
|
|
" {\"role\": \"user\", \"content\": \"2 🦜 2\"},\n",
|
|
" {\"role\": \"assistant\", \"content\": \"4\"},\n",
|
|
" {\"role\": \"user\", \"content\": \"2 🦜 3\"},\n",
|
|
" {\"role\": \"assistant\", \"content\": \"5\"},\n",
|
|
" {\"role\": \"user\", \"content\": \"3 🦜 4\"},\n",
|
|
"]\n",
|
|
"\n",
|
|
"response = llm.invoke(messages)\n",
|
|
"print(response.content)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b5691d07-e2b8-4ab3-a943-9b0b503e2549",
|
|
"metadata": {},
|
|
"source": [
|
|
"[Structured output](/docs/concepts/structured_outputs/) often uses [tool calling](/docs/concepts/tool_calling/) under-the-hood. This typically involves the generation of [AI messages](/docs/concepts/messages/#aimessage) containing tool calls, as well as [tool messages](/docs/concepts/messages/#toolmessage) containing the results of tool calls. What should a sequence of messages look like in this case?\n",
|
|
"\n",
|
|
"Different [chat model providers](/docs/integrations/chat/) impose different requirements for valid message sequences. Some will accept a (repeating) message sequence of the form:\n",
|
|
"\n",
|
|
"- User message\n",
|
|
"- AI message with tool call\n",
|
|
"- Tool message with result\n",
|
|
"\n",
|
|
"Others require a final AI message containing some sort of response.\n",
|
|
"\n",
|
|
"LangChain includes a utility function [tool_example_to_messages](https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.tool_example_to_messages.html) that will generate a valid sequence for most model providers. It simplifies the generation of structured few-shot examples by just requiring Pydantic representations of the corresponding tool calls.\n",
|
|
"\n",
|
|
"Let's try this out. We can convert pairs of input strings and desired Pydantic objects to a sequence of messages that can be provided to a chat model. Under the hood, LangChain will format the tool calls to each provider's required format.\n",
|
|
"\n",
|
|
"Note: this version of `tool_example_to_messages` requires `langchain-core>=0.3.20`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"id": "c604e476-a2be-4eda-b128-71399e280732",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.utils.function_calling import tool_example_to_messages\n",
|
|
"\n",
|
|
"examples = [\n",
|
|
" (\n",
|
|
" \"The ocean is vast and blue. It's more than 20,000 feet deep.\",\n",
|
|
" Data(people=[]),\n",
|
|
" ),\n",
|
|
" (\n",
|
|
" \"Fiona traveled far from France to Spain.\",\n",
|
|
" Data(people=[Person(name=\"Fiona\", height_in_meters=None, hair_color=None)]),\n",
|
|
" ),\n",
|
|
"]\n",
|
|
"\n",
|
|
"\n",
|
|
"messages = []\n",
|
|
"\n",
|
|
"for txt, tool_call in examples:\n",
|
|
" if tool_call.people:\n",
|
|
" # This final message is optional for some providers\n",
|
|
" ai_response = \"Detected people.\"\n",
|
|
" else:\n",
|
|
" ai_response = \"Detected no people.\"\n",
|
|
" messages.extend(tool_example_to_messages(txt, [tool_call], ai_response=ai_response))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "beecc7a6-e423-4ca1-82b7-c2a751362fd6",
|
|
"metadata": {},
|
|
"source": [
|
|
"Inspecting the result, we see these two example pairs generated eight messages:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"id": "628f67dd-aee0-4200-ac38-24a9fb16f1d1",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
|
"\n",
|
|
"The ocean is vast and blue. It's more than 20,000 feet deep.\n",
|
|
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
|
"Tool Calls:\n",
|
|
" Data (d8f2e054-7fb9-417f-b28f-0447a775b2c3)\n",
|
|
" Call ID: d8f2e054-7fb9-417f-b28f-0447a775b2c3\n",
|
|
" Args:\n",
|
|
" people: []\n",
|
|
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
|
"\n",
|
|
"You have correctly called this tool.\n",
|
|
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
|
"\n",
|
|
"Detected no people.\n",
|
|
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
|
"\n",
|
|
"Fiona traveled far from France to Spain.\n",
|
|
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
|
"Tool Calls:\n",
|
|
" Data (0178939e-a4b1-4d2a-a93e-b87f665cdfd6)\n",
|
|
" Call ID: 0178939e-a4b1-4d2a-a93e-b87f665cdfd6\n",
|
|
" Args:\n",
|
|
" people: [{'name': 'Fiona', 'hair_color': None, 'height_in_meters': None}]\n",
|
|
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
|
"\n",
|
|
"You have correctly called this tool.\n",
|
|
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
|
"\n",
|
|
"Detected people.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"for message in messages:\n",
|
|
" message.pretty_print()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "dc8846f0-8bd1-48e1-bc4d-a62fbfa6a9f4",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's compare performance with and without these messages. For example, let's pass a message for which we intend no people to be extracted:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"id": "6b73d4e2-d18d-4d47-89ec-99b5eb6b234f",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Data(people=[Person(name='Earth', hair_color='None', height_in_meters='0.00')])"
|
|
]
|
|
},
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"message_no_extraction = {\n",
|
|
" \"role\": \"user\",\n",
|
|
" \"content\": \"The solar system is large, but earth has only 1 moon.\",\n",
|
|
"}\n",
|
|
"\n",
|
|
"structured_llm = llm.with_structured_output(schema=Data)\n",
|
|
"structured_llm.invoke([message_no_extraction])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "350e1298-14f1-48e4-b11c-534af643e3a6",
|
|
"metadata": {},
|
|
"source": [
|
|
"In this example, the model is liable to erroneously generate records of people.\n",
|
|
"\n",
|
|
"Because our few-shot examples contain examples of \"negatives\", we encourage the model to behave correctly in this case:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"id": "eb1b3a99-4750-45bc-ad28-5d12751ed9f8",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Data(people=[])"
|
|
]
|
|
},
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"structured_llm.invoke(messages + [message_no_extraction])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4d1ae320-14bc-45ee-aeeb-8a986f3e6808",
|
|
"metadata": {},
|
|
"source": [
|
|
":::tip\n",
|
|
"\n",
|
|
"The [LangSmith](https://smith.langchain.com/public/b3433f57-7905-4430-923c-fed214525bf1/r) trace for the run reveals the exact sequence of messages sent to the chat model, tool calls generated, latency, token counts, and other metadata.\n",
|
|
"\n",
|
|
":::\n",
|
|
"\n",
|
|
"See [this guide](/docs/how_to/extraction_examples/) for more detail on extraction workflows with reference examples, including how to incorporate prompt templates and customize the generation of example messages."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f07a7455-7de6-4a6f-9772-0477ef65e3dc",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Next steps\n",
|
|
"\n",
|
|
"Now that you understand the basics of extraction with LangChain, you're ready to proceed to the rest of the how-to guides:\n",
|
|
"\n",
|
|
"- [Add Examples](/docs/how_to/extraction_examples): More detail on using **reference examples** to improve performance.\n",
|
|
"- [Handle Long Text](/docs/how_to/extraction_long_text): What should you do if the text does not fit into the context window of the LLM?\n",
|
|
"- [Use a Parsing Approach](/docs/how_to/extraction_parse): Use a prompt based approach to extract with models that do not support **tool/function calling**."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3deb47ba",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.4"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|