mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-22 07:05:36 +00:00
358 lines
12 KiB
Plaintext
358 lines
12 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "raw",
|
|
"id": "df29b30a-fd27-4e08-8269-870df5631f9e",
|
|
"metadata": {},
|
|
"source": [
|
|
"---\n",
|
|
"title: Quickstart\n",
|
|
"sidebar_position: 0\n",
|
|
"---"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d28530a6-ddfd-49c0-85dc-b723551f6614",
|
|
"metadata": {},
|
|
"source": [
|
|
"In this quick start, we will use [chat models](/docs/modules/model_io/chat/) that are capable of **function/tool calling** to extract information from text.\n",
|
|
"\n",
|
|
":::{.callout-important}\n",
|
|
"Extraction using **function/tool calling** only works with [models that support **function/tool calling**](/docs/modules/model_io/chat/function_calling).\n",
|
|
":::"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4412def2-38e3-4bd0-bbf0-fb09ff9e5985",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Set up\n",
|
|
"\n",
|
|
"We will use the [structured output](/docs/modules/model_io/chat/structured_output) method available on LLMs that are capable of **function/tool calling**. \n",
|
|
"\n",
|
|
"Select a model, install the dependencies for it and set up API keys!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "380c0425-6062-4837-8630-c220240c83b9",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!pip install langchain\n",
|
|
"\n",
|
|
"# Install a model capable of tool calling\n",
|
|
"# pip install langchain-openai\n",
|
|
"# pip install langchain-mistralai\n",
|
|
"# pip install langchain-fireworks\n",
|
|
"\n",
|
|
"# Set env vars for the relevant model or load from a .env file:\n",
|
|
"# import dotenv\n",
|
|
"# dotenv.load_dotenv()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "54d6b970-2ea3-4192-951e-21237212b359",
|
|
"metadata": {},
|
|
"source": [
|
|
"## The Schema\n",
|
|
"\n",
|
|
"First, we need to describe what information we want to extract from the text.\n",
|
|
"\n",
|
|
"We'll use Pydantic to define an example schema to extract personal information."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "c141084c-fb94-4093-8d6a-81175d688e40",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from typing import Optional\n",
|
|
"\n",
|
|
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
|
|
"\n",
|
|
"\n",
|
|
"class Person(BaseModel):\n",
|
|
" \"\"\"Information about a person.\"\"\"\n",
|
|
"\n",
|
|
" # ^ Doc-string for the entity Person.\n",
|
|
" # This doc-string is sent to the LLM as the description of the schema Person,\n",
|
|
" # and it can help to improve extraction results.\n",
|
|
"\n",
|
|
" # Note that:\n",
|
|
" # 1. Each field is an `optional` -- this allows the model to decline to extract it!\n",
|
|
" # 2. Each field has a `description` -- this description is used by the LLM.\n",
|
|
" # Having a good description can help improve extraction results.\n",
|
|
" name: Optional[str] = Field(default=None, description=\"The name of the person\")\n",
|
|
" hair_color: Optional[str] = Field(\n",
|
|
" default=None, description=\"The color of the peron's hair if known\"\n",
|
|
" )\n",
|
|
" height_in_meters: Optional[str] = Field(\n",
|
|
" default=None, description=\"Height measured in meters\"\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f248dd54-e36d-435a-b154-394ab4ed6792",
|
|
"metadata": {},
|
|
"source": [
|
|
"There are two best practices when defining schema:\n",
|
|
"\n",
|
|
"1. Document the **attributes** and the **schema** itself: This information is sent to the LLM and is used to improve the quality of information extraction.\n",
|
|
"2. Do not force the LLM to make up information! Above we used `Optional` for the attributes allowing the LLM to output `None` if it doesn't know the answer.\n",
|
|
"\n",
|
|
":::{.callout-important}\n",
|
|
"For best performance, document the schema well and make sure the model isn't force to return results if there's no information to be extracted in the text.\n",
|
|
":::\n",
|
|
"\n",
|
|
"## The Extractor\n",
|
|
"\n",
|
|
"Let's create an information extractor using the schema we defined above."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "a5e490f6-35ad-455e-8ae4-2bae021583ff",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from typing import Optional\n",
|
|
"\n",
|
|
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
|
|
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
|
|
"from langchain_openai import ChatOpenAI\n",
|
|
"\n",
|
|
"# Define a custom prompt to provide instructions and any additional context.\n",
|
|
"# 1) You can add examples into the prompt template to improve extraction quality\n",
|
|
"# 2) Introduce additional parameters to take context into account (e.g., include metadata\n",
|
|
"# about the document from which the text was extracted.)\n",
|
|
"prompt = ChatPromptTemplate.from_messages(\n",
|
|
" [\n",
|
|
" (\n",
|
|
" \"system\",\n",
|
|
" \"You are an expert extraction algorithm. \"\n",
|
|
" \"Only extract relevant information from the text. \"\n",
|
|
" \"If you do not know the value of an attribute asked to extract, \"\n",
|
|
" \"return null for the attribute's value.\",\n",
|
|
" ),\n",
|
|
" # Please see the how-to about improving performance with\n",
|
|
" # reference examples.\n",
|
|
" # MessagesPlaceholder('examples'),\n",
|
|
" (\"human\", \"{text}\"),\n",
|
|
" ]\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "832bf6a1-8e0c-4b6a-aa37-12fe9c42a6d9",
|
|
"metadata": {},
|
|
"source": [
|
|
"We need to use a model that supports function/tool calling.\n",
|
|
"\n",
|
|
"Please review [structured output](/docs/modules/model_io/chat/structured_output) for list of some models that can be used with this API."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 31,
|
|
"id": "04d846a6-d5cb-4009-ac19-61e3aac0177e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_mistralai import ChatMistralAI\n",
|
|
"\n",
|
|
"llm = ChatMistralAI(model=\"mistral-large-latest\", temperature=0)\n",
|
|
"\n",
|
|
"runnable = prompt | llm.with_structured_output(schema=Person)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "23582c0b-00ed-403f-a10e-3aeabf921f12",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's test it out"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 32,
|
|
"id": "13165ac8-a1dc-44ce-a6ed-f52b577473e4",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Person(name='Alan Smith', hair_color='blond', height_in_meters='1.8288')"
|
|
]
|
|
},
|
|
"execution_count": 32,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"text = \"Alan Smith is 6 feet tall and has blond hair.\"\n",
|
|
"runnable.invoke({\"text\": text})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bd1c493d-f9dc-4236-8da9-50f6919f5710",
|
|
"metadata": {},
|
|
"source": [
|
|
":::{.callout-important} \n",
|
|
"\n",
|
|
"Extraction is Generative 🤯\n",
|
|
"\n",
|
|
"LLMs are generative models, so they can do some pretty cool things like correctly extract the height of the person in meters\n",
|
|
"even though it was provided in feet!\n",
|
|
":::"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "28c5ef0c-b8d1-4e12-bd0e-e2528de87fcc",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Multiple Entities\n",
|
|
"\n",
|
|
"In **most cases**, you should be extracting a list of entities rather than a single entity.\n",
|
|
"\n",
|
|
"This can be easily achieved using pydantic by nesting models inside one another."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 33,
|
|
"id": "591a0c16-7a17-4883-91ee-0d6d2fdb265c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from typing import List, Optional\n",
|
|
"\n",
|
|
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
|
|
"\n",
|
|
"\n",
|
|
"class Person(BaseModel):\n",
|
|
" \"\"\"Information about a person.\"\"\"\n",
|
|
"\n",
|
|
" # ^ Doc-string for the entity Person.\n",
|
|
" # This doc-string is sent to the LLM as the description of the schema Person,\n",
|
|
" # and it can help to improve extraction results.\n",
|
|
"\n",
|
|
" # Note that:\n",
|
|
" # 1. Each field is an `optional` -- this allows the model to decline to extract it!\n",
|
|
" # 2. Each field has a `description` -- this description is used by the LLM.\n",
|
|
" # Having a good description can help improve extraction results.\n",
|
|
" name: Optional[str] = Field(default=None, description=\"The name of the person\")\n",
|
|
" hair_color: Optional[str] = Field(\n",
|
|
" default=None, description=\"The color of the peron's hair if known\"\n",
|
|
" )\n",
|
|
" height_in_meters: Optional[str] = Field(\n",
|
|
" default=None, description=\"Height measured in meters\"\n",
|
|
" )\n",
|
|
"\n",
|
|
"\n",
|
|
"class Data(BaseModel):\n",
|
|
" \"\"\"Extracted data about people.\"\"\"\n",
|
|
"\n",
|
|
" # Creates a model so that we can extract multiple entities.\n",
|
|
" people: List[Person]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5f5cda33-fd7b-481e-956a-703f45e40e1d",
|
|
"metadata": {},
|
|
"source": [
|
|
":::{.callout-important}\n",
|
|
"Extraction might not be perfect here. Please continue to see how to use **Reference Examples** to improve the quality of extraction, and see the **guidelines** section!\n",
|
|
":::"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 34,
|
|
"id": "cf7062cc-1d1d-4a37-9122-509d1b87f0a6",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Data(people=[Person(name='Jeff', hair_color=None, height_in_meters=None), Person(name='Anna', hair_color=None, height_in_meters=None)])"
|
|
]
|
|
},
|
|
"execution_count": 34,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"runnable = prompt | llm.with_structured_output(schema=Data)\n",
|
|
"text = \"My name is Jeff, my hair is black and i am 6 feet tall. Anna has the same color hair as me.\"\n",
|
|
"runnable.invoke({\"text\": text})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "fba1d770-bf4d-4de4-9e4f-7384872ef0dc",
|
|
"metadata": {},
|
|
"source": [
|
|
":::{.callout-tip}\n",
|
|
"When the schema accommodates the extraction of **multiple entities**, it also allows the model to extract **no entities** if no relevant information\n",
|
|
"is in the text by providing an empty list. \n",
|
|
"\n",
|
|
"This is usually a **good** thing! It allows specifying **required** attributes on an entity without necessarily forcing the model to detect this entity.\n",
|
|
":::"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f07a7455-7de6-4a6f-9772-0477ef65e3dc",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Next steps\n",
|
|
"\n",
|
|
"Now that you understand the basics of extraction with LangChain, you're ready to proceed to the rest of the how-to guide:\n",
|
|
"\n",
|
|
"- [Add Examples](/docs/use_cases/extraction/how_to/examples): Learn how to use **reference examples** to improve performance.\n",
|
|
"- [Handle Long Text](/docs/use_cases/extraction/how_to/handle_long_text): What should you do if the text does not fit into the context window of the LLM?\n",
|
|
"- [Handle Files](/docs/use_cases/extraction/how_to/handle_files): Examples of using LangChain document loaders and parsers to extract from files like PDFs.\n",
|
|
"- [Use a Parsing Approach](/docs/use_cases/extraction/how_to/parse): Use a prompt based approach to extract with models that do not support **tool/function calling**.\n",
|
|
"- [Guidelines](/docs/use_cases/extraction/guidelines): Guidelines for getting good performance on extraction tasks."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.1"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|