Bagatur/update agent docs (#13167)

This commit is contained in:
Bagatur 2023-11-09 21:14:30 -08:00 committed by GitHub
parent 0a2b1c7471
commit fbf7047468
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 674 additions and 313 deletions

View File

@ -0,0 +1,670 @@
{
"cells": [
{
"cell_type": "raw",
"id": "97e00fdb-f771-473f-90fc-d6038e19fd9a",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 3\n",
"sidebar_class_name: hidden\n",
"title: Agents\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "f4c03f40-1328-412d-8a48-1db0cd481b77",
"metadata": {},
"source": [
"The core idea of agents is to use a language model to choose a sequence of actions to take.\n",
"In chains, a sequence of actions is hardcoded (in code).\n",
"In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.\n",
"\n",
"## Concepts\n",
"There are several key components here:\n",
"\n",
"### Agent\n",
"\n",
"This is the chain responsible for deciding what step to take next.\n",
"This is powered by a language model and a prompt.\n",
"The inputs to this chain are:\n",
"\n",
"1. Tools: Descriptions of available tools\n",
"2. User input: The high level objective\n",
"3. Intermediate steps: Any (action, tool output) pairs previously executed in order to achieve the user input\n",
"\n",
"The output is the next action(s) to take or the final response to send to the user (`AgentAction`s or `AgentFinish`). An action specifies a tool and the input to that tool. \n",
"\n",
"Different agents have different prompting styles for reasoning, different ways of encoding inputs, and different ways of parsing the output.\n",
"For a full list of built-in agents see [agent types](/docs/modules/agents/agent_types/).\n",
"You can also **easily build custom agents**, which we show how to do in the Get started section below.\n",
"\n",
"### Tools\n",
"\n",
"Tools are functions that an agent can invoke.\n",
"There are two important design considerations around tools:\n",
"\n",
"1. Giving the agent access to the right tools\n",
"2. Describing the tools in a way that is most helpful to the agent\n",
"\n",
"Without thinking through both, you won't be able to build a working agent.\n",
"If you don't give the agent access to a correct set of tools, it will never be able to accomplish the objectives you give it.\n",
"If you don't describe the tools well, the agent won't know how to use them properly.\n",
"\n",
"LangChain provides a wide set of built-in tools, but also makes it easy to define your own (including custom descriptions).\n",
"For a full list of built-in tools, see the [tools integrations section](/docs/integrations/tools/)\n",
"\n",
"### Toolkits\n",
"\n",
"For many common tasks, an agent will need a set of related tools.\n",
"For this LangChain provides the concept of toolkits - groups of around 3-5 tools needed to accomplish specific objectives.\n",
"For example, the GitHub toolkit has a tool for searching through GitHub issues, a tool for reading a file, a tool for commenting, etc.\n",
"\n",
"LangChain provides a wide set of toolkits to get started.\n",
"For a full list of built-in toolkits, see the [toolkits integrations section](/docs/integrations/toolkits/)\n",
"\n",
"### AgentExecutor\n",
"\n",
"The agent executor is the runtime for an agent.\n",
"This is what actually calls the agent, executes the actions it chooses, passes the action outputs back to the agent, and repeats.\n",
"In pseudocode, this looks roughly like:\n",
"\n",
"```python\n",
"next_action = agent.get_action(...)\n",
"while next_action != AgentFinish:\n",
" observation = run(next_action)\n",
" next_action = agent.get_action(..., next_action, observation)\n",
"return next_action\n",
"```\n",
"\n",
"While this may seem simple, there are several complexities this runtime handles for you, including:\n",
"\n",
"1. Handling cases where the agent selects a non-existent tool\n",
"2. Handling cases where the tool errors\n",
"3. Handling cases where the agent produces output that cannot be parsed into a tool invocation\n",
"4. Logging and observability at all levels (agent decisions, tool calls) to stdout and/or to [LangSmith](/docs/langsmith).\n",
"\n",
"### Other types of agent runtimes\n",
"\n",
"The `AgentExecutor` class is the main agent runtime supported by LangChain.\n",
"However, there are other, more experimental runtimes we also support.\n",
"These include:\n",
"\n",
"- [Plan-and-execute Agent](/docs/use_cases/more/agents/autonomous_agents/plan_and_execute)\n",
"- [Baby AGI](/docs/use_cases/more/agents/autonomous_agents/baby_agi)\n",
"- [Auto GPT](/docs/use_cases/more/agents/autonomous_agents/autogpt)\n",
"\n",
"You can also always create your own custom execution logic, which we show how to do below.\n",
"\n",
"## Get started\n",
"\n",
"To best understand the agent framework, lets build an agent from scratch using LangChain Expression Language (LCEL).\n",
"We'll need to build the agent itself, define custom tools, and run the agent and tools in a custom loop. At the end we'll show how to use the standard LangChain `AgentExecutor` to make execution easier.\n",
"\n",
"Some important terminology (and schema) to know:\n",
"\n",
"1. `AgentAction`: This is a dataclass that represents the action an agent should take. It has a `tool` property (which is the name of the tool that should be invoked) and a `tool_input` property (the input to that tool)\n",
"2. `AgentFinish`: This is a dataclass that signifies that the agent has finished and should return to the user. It has a `return_values` parameter, which is a dictionary to return. It often only has one key - `output` - that is a string, and so often it is just this key that is returned.\n",
"3. `intermediate_steps`: These represent previous agent actions and corresponding outputs that are passed around. These are important to pass to future iteration so the agent knows what work it has already done. This is typed as a `List[Tuple[AgentAction, Any]]`. Note that observation is currently left as type `Any` to be maximally flexible. In practice, this is often a string.\n",
"\n",
"### Setup: LangSmith\n",
"\n",
"By definition, agents take a self-determined, input-dependent sequence of steps before returning a user-facing output. This makes debugging these systems particularly tricky, and observability particularly important. [LangSmith](/docs/langsmith) is especially useful for such cases.\n",
"\n",
"When building with LangChain, any built-in agent or custom agent built with LCEL will automatically be traced in LangSmith. And if we use the `AgentExecutor`, we'll get full tracing of not only the agent planning steps but also the tool inputs and outputs.\n",
"\n",
"To set up LangSmith we just need set the following environment variables:\n",
"\n",
"```bash\n",
"export LANGCHAIN_TRACING_V2=\"true\"\n",
"export LANGCHAIN_API_KEY=\"<your-api-key>\"\n",
"```\n",
"\n",
"### Define the agent\n",
"\n",
"We first need to create our agent.\n",
"This is the chain responsible for determining what action to take next.\n",
"\n",
"In this example, we will use OpenAI Function Calling to create this agent.\n",
"**This is generally the most reliable way to create agents.**\n",
"\n",
"For this guide, we will construct a custom agent that has access to a custom tool.\n",
"We are choosing this example because for most real world use cases you will NEED to customize either the agent or the tools. \n",
"We'll create a simple tool that computes the length of a word.\n",
"This is useful because it's actually something LLMs can mess up due to tokenization.\n",
"We will first create it WITHOUT memory, but we will then show how to add memory in.\n",
"Memory is needed to enable conversation.\n",
"\n",
"First, let's load the language model we're going to use to control the agent."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "89cf72b4-6046-4b47-8f27-5522d8cb8036",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)"
]
},
{
"cell_type": "markdown",
"id": "0afe32b4-5b67-49fd-9f05-e94c46fbcc08",
"metadata": {},
"source": [
"We can see that it struggles to count the letters in the string \"educa\"."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "d8eafbad-4084-4f27-b880-308430c44bcf",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='There are 6 letters in the word \"educa\".')"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm.invoke(\"how many letters in the word educa?\")"
]
},
{
"cell_type": "markdown",
"id": "20f353a1-7b03-4692-ba6c-581d82de454b",
"metadata": {},
"source": [
"Next, let's define some tools to use.\n",
"Let's write a really simple Python function to calculate the length of a word that is passed in."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "6bf6c6a6-4aa2-44fc-9d90-5981de827c2f",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import tool\n",
"\n",
"@tool\n",
"def get_word_length(word: str) -> int:\n",
" \"\"\"Returns the length of a word.\"\"\"\n",
" return len(word)\n",
"\n",
"\n",
"tools = [get_word_length]"
]
},
{
"cell_type": "markdown",
"id": "22dc3aeb-012f-4fe6-a980-2bd6d7612e1d",
"metadata": {},
"source": [
"Now let us create the prompt.\n",
"Because OpenAI Function Calling is finetuned for tool usage, we hardly need any instructions on how to reason, or how to output format.\n",
"We will just have two input variables: `input` and `agent_scratchpad`. `input` should be a string containing the user objective. `agent_scratchpad` should be a sequence of messages that contains the previous agent tool invocations and the corresponding tool outputs."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "62c98f77-d203-42cf-adcf-7da9ee93f7c8",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are very powerful assistant, but bad at calculating lengths of words.\",\n",
" ),\n",
" (\"user\", \"{input}\"),\n",
" MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "be29b821-b988-4921-8a1f-f04ec87e2863",
"metadata": {},
"source": [
"How does the agent know what tools it can use?\n",
"In this case we're relying on OpenAI function calling LLMs, which take functions as a separate argument and have been specifically trained to know when to invoke those functions.\n",
"\n",
"To pass in our tools to the agent, we just need to format them to the OpenAI function format and pass them to our model. (By `bind`-ing the functions, we're making sure that they're passed in each time the model is invoked.)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "5231ffd7-a044-4ebd-8e31-d1fe334334c6",
"metadata": {},
"outputs": [],
"source": [
"from langchain.tools.render import format_tool_to_openai_function\n",
"\n",
"llm_with_tools = llm.bind(functions=[format_tool_to_openai_function(t) for t in tools])"
]
},
{
"cell_type": "markdown",
"id": "6efbf02b-8686-4559-8b4c-c2be803cb475",
"metadata": {},
"source": [
"Putting those pieces together, we can now create the agent.\n",
"We will import two last utility functions: a component for formatting intermediate steps (agent action, tool output pairs) to input messages that can be sent to the model, and a component for converting the output message into an agent action/agent finish."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b2f24d11-1133-48f3-ba70-fc3dd1da5f2c",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents.format_scratchpad import format_to_openai_function_messages\n",
"from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser\n",
"\n",
"agent = (\n",
" {\n",
" \"input\": lambda x: x[\"input\"],\n",
" \"agent_scratchpad\": lambda x: format_to_openai_function_messages(\n",
" x[\"intermediate_steps\"]\n",
" ),\n",
" }\n",
" | prompt\n",
" | llm_with_tools\n",
" | OpenAIFunctionsAgentOutputParser()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "7d55d2ad-6608-44ab-9949-b16ae8031f53",
"metadata": {},
"source": [
"Now that we have our agent, let's play around with it!\n",
"Let's pass in a simple question and empty intermediate steps and see what it returns:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "01cb7adc-97b6-4713-890e-5d1ddeba909c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AgentActionMessageLog(tool='get_word_length', tool_input={'word': 'educa'}, log=\"\\nInvoking: `get_word_length` with `{'word': 'educa'}`\\n\\n\\n\", message_log=[AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\\n \"word\": \"educa\"\\n}', 'name': 'get_word_length'}})])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.invoke({\"input\": \"how many letters in the word educa?\", \"intermediate_steps\": []})"
]
},
{
"cell_type": "markdown",
"id": "689ec562-3ec1-4b28-928b-c78c788aa097",
"metadata": {},
"source": [
"We can see that it responds with an `AgentAction` to take (it's actually an `AgentActionMessageLog` - a subclass of `AgentAction` which also tracks the full message log). \n",
"\n",
"If we've set up LangSmith, we'll see a trace that let's us inspect the input and output to each step in the sequence: https://smith.langchain.com/public/04110122-01a8-413c-8cd0-b4df6eefa4b7/r\n",
"\n",
"### Define the runtime\n",
"\n",
"So this is just the first step - now we need to write a runtime for this.\n",
"The simplest one is just one that continuously loops, calling the agent, then taking the action, and repeating until an `AgentFinish` is returned.\n",
"Let's code that up below:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "29bbf63b-f866-4b8c-aeea-2f9cffe70b78",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"TOOL NAME: get_word_length\n",
"TOOL INPUT: {'word': 'educa'}\n",
"There are 5 letters in the word \"educa\".\n"
]
}
],
"source": [
"from langchain.schema.agent import AgentFinish\n",
"\n",
"user_input = \"how many letters in the word educa?\"\n",
"intermediate_steps = []\n",
"while True:\n",
" output = agent.invoke(\n",
" {\n",
" \"input\": user_input,\n",
" \"intermediate_steps\": intermediate_steps,\n",
" }\n",
" )\n",
" if isinstance(output, AgentFinish):\n",
" final_result = output.return_values[\"output\"]\n",
" break\n",
" else:\n",
" print(f\"TOOL NAME: {output.tool}\")\n",
" print(f\"TOOL INPUT: {output.tool_input}\")\n",
" tool = {\"get_word_length\": get_word_length}[output.tool]\n",
" observation = tool.run(output.tool_input)\n",
" intermediate_steps.append((output, observation))\n",
"print(final_result)"
]
},
{
"cell_type": "markdown",
"id": "2de8e688-fed4-4efc-a2bc-8d3c504dd764",
"metadata": {},
"source": [
"Woo! It's working.\n",
"\n",
"### Using AgentExecutor\n",
"\n",
"To simplify this a bit, we can import and use the `AgentExecutor` class.\n",
"This bundles up all of the above and adds in error handling, early stopping, tracing, and other quality-of-life improvements that reduce safeguards you need to write."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "9c94ee41-f146-403e-bd0a-5756a53d7842",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import AgentExecutor\n",
"\n",
"agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)"
]
},
{
"cell_type": "markdown",
"id": "9cbd94a2-b456-45e6-835c-a33be3475119",
"metadata": {},
"source": [
"Now let's test it out!"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "6e1e64c7-627c-4713-82ca-8f6db3d9c8f5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Invoking: `get_word_length` with `{'word': 'educa'}`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m5\u001b[0m\u001b[32;1m\u001b[1;3mThere are 5 letters in the word \"educa\".\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'input': 'how many letters in the word educa?',\n",
" 'output': 'There are 5 letters in the word \"educa\".'}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.invoke({\"input\": \"how many letters in the word educa?\"})"
]
},
{
"cell_type": "markdown",
"id": "1578aede-2ad2-4c15-832e-3e0a1660b342",
"metadata": {},
"source": [
"And looking at the trace, we can see that all of our agent calls and tool invocations are automatically logged: https://smith.langchain.com/public/957b7e26-bef8-4b5b-9ca3-4b4f1c96d501/r"
]
},
{
"cell_type": "markdown",
"id": "a29c0705-b9bc-419f-aae4-974fc092faab",
"metadata": {},
"source": [
"### Adding memory\n",
"\n",
"This is great - we have an agent!\n",
"However, this agent is stateless - it doesn't remember anything about previous interactions.\n",
"This means you can't ask follow up questions easily.\n",
"Let's fix that by adding in memory.\n",
"\n",
"In order to do this, we need to do two things:\n",
"\n",
"1. Add a place for memory variables to go in the prompt\n",
"2. Keep track of the chat history\n",
"\n",
"First, let's add a place for memory in the prompt.\n",
"We do this by adding a placeholder for messages with the key `\"chat_history\"`.\n",
"Notice that we put this ABOVE the new user input (to follow the conversation flow)."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "ceef8c26-becc-4893-b55c-efcf52c4b9d9",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import MessagesPlaceholder\n",
"\n",
"MEMORY_KEY = \"chat_history\"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are very powerful assistant, but bad at calculating lengths of words.\",\n",
" ),\n",
" MessagesPlaceholder(variable_name=MEMORY_KEY),\n",
" (\"user\", \"{input}\"),\n",
" MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "fc4f1e1b-695d-4b25-88aa-d46c015e6342",
"metadata": {},
"source": [
"We can then set up a list to track the chat history"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "935abfee-ab5d-4e9a-b33c-6a40a6fa4777",
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema.messages import HumanMessage, AIMessage\n",
"\n",
"chat_history = []"
]
},
{
"cell_type": "markdown",
"id": "c107b5dd-b934-48a0-a8c5-3b5bd76f2b98",
"metadata": {},
"source": [
"We can then put it all together!"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "24b094ff-bbea-45c4-8000-ed2b5de459a9",
"metadata": {},
"outputs": [],
"source": [
"agent = (\n",
" {\n",
" \"input\": lambda x: x[\"input\"],\n",
" \"agent_scratchpad\": lambda x: format_to_openai_function_messages(\n",
" x[\"intermediate_steps\"]\n",
" ),\n",
" \"chat_history\": lambda x: x[\"chat_history\"],\n",
" }\n",
" | prompt\n",
" | llm_with_tools\n",
" | OpenAIFunctionsAgentOutputParser()\n",
")\n",
"agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)"
]
},
{
"cell_type": "markdown",
"id": "e34ee9bd-20be-4ab7-b384-a5f0335e7611",
"metadata": {},
"source": [
"When running, we now need to track the inputs and outputs as chat history\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "f238022b-3348-45cd-bd6a-c6770b7dc600",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Invoking: `get_word_length` with `{'word': 'educa'}`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m5\u001b[0m\u001b[32;1m\u001b[1;3mThere are 5 letters in the word \"educa\".\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mNo, \"educa\" is not a real word in English.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'input': 'is that a real word?',\n",
" 'chat_history': [HumanMessage(content='how many letters in the word educa?'),\n",
" AIMessage(content='There are 5 letters in the word \"educa\".')],\n",
" 'output': 'No, \"educa\" is not a real word in English.'}"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"input1 = \"how many letters in the word educa?\"\n",
"result = agent_executor.invoke({\"input\": input1, \"chat_history\": chat_history})\n",
"chat_history.extend([\n",
" HumanMessage(content=input1),\n",
" AIMessage(content=result[\"output\"]),\n",
"])\n",
"agent_executor.invoke({\"input\": \"is that a real word?\", \"chat_history\": chat_history})"
]
},
{
"cell_type": "markdown",
"id": "6ba072cd-eb58-409d-83be-55c8110e37f0",
"metadata": {},
"source": [
"Here's the LangSmith trace: https://smith.langchain.com/public/1e1b7e07-3220-4a6c-8a1e-f04182a755b3/r"
]
},
{
"cell_type": "markdown",
"id": "9e8b9127-758b-4dab-b093-2e6357dca3e6",
"metadata": {},
"source": [
"## Next Steps\n",
"\n",
"Awesome! You've now run your first end-to-end agent.\n",
"To dive deeper, you can:\n",
"\n",
"- Check out all the different [agent types](/docs/modules/agents/agent_types/) supported\n",
"- Learn all the controls for [AgentExecutor](/docs/modules/agents/how_to/)\n",
"- Explore the how-to's of [tools](/docs/modules/agents/tools/) and all the [tool integrations](/docs/integrations/tools)\n",
"- See a full list of all the off-the-shelf [toolkits](/docs/integrations/toolkits/) we provide"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "abbe7160-7c82-48ba-a4d3-4426c62edd2a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -1,313 +0,0 @@
---
sidebar_position: 4
sidebar_class_name: hidden
---
# Agents
The core idea of agents is to use an LLM to choose a sequence of actions to take.
In chains, a sequence of actions is hardcoded (in code).
In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.
Some important terminology (and schema) to know:
1. `AgentAction`: This is a dataclass that represents the action an agent should take. It has a `tool` property (which is the name of the tool that should be invoked) and a `tool_input` property (the input to that tool)
2. `AgentFinish`: This is a dataclass that signifies that the agent has finished and should return to the user. It has a `return_values` parameter, which is a dictionary to return. It often only has one key - `output` - that is a string, and so often it is just this key that is returned.
3. `intermediate_steps`: These represent previous agent actions and corresponding outputs that are passed around. These are important to pass to future iteration so the agent knows what work it has already done. This is typed as a `List[Tuple[AgentAction, Any]]`. Note that observation is currently left as type `Any` to be maximally flexible. In practice, this is often a string.
There are several key components here:
## Agent
This is the chain responsible for deciding what step to take next.
This is powered by a language model and a prompt.
The inputs to this chain are:
1. List of available tools
2. User input
3. Any previously executed steps (`intermediate_steps`)
This chain then returns either the next action to take or the final response to send to the user (`AgentAction` or `AgentFinish`).
Different agents have different prompting styles for reasoning, different ways of encoding input, and different ways of parsing the output.
For a full list of agent types see [agent types](/docs/modules/agents/agent_types/)
## Tools
Tools are functions that an agent calls.
There are two important considerations here:
1. Giving the agent access to the right tools
2. Describing the tools in a way that is most helpful to the agent
Without both, the agent you are trying to build will not work.
If you don't give the agent access to a correct set of tools, it will never be able to accomplish the objective.
If you don't describe the tools properly, the agent won't know how to properly use them.
LangChain provides a wide set of tools to get started, but also makes it easy to define your own (including custom descriptions).
For a full list of tools, see [here](/docs/modules/agents/tools/)
## Toolkits
Often the set of tools an agent has access to is more important than a single tool.
For this LangChain provides the concept of toolkits - groups of tools needed to accomplish specific objectives.
There are generally around 3-5 tools in a toolkit.
LangChain provides a wide set of toolkits to get started.
For a full list of toolkits, see [here](/docs/modules/agents/toolkits/)
## AgentExecutor
The agent executor is the runtime for an agent.
This is what actually calls the agent and executes the actions it chooses.
Pseudocode for this runtime is below:
```python
next_action = agent.get_action(...)
while next_action != AgentFinish:
observation = run(next_action)
next_action = agent.get_action(..., next_action, observation)
return next_action
```
While this may seem simple, there are several complexities this runtime handles for you, including:
1. Handling cases where the agent selects a non-existent tool
2. Handling cases where the tool errors
3. Handling cases where the agent produces output that cannot be parsed into a tool invocation
4. Logging and observability at all levels (agent decisions, tool calls) either to stdout or [LangSmith](https://smith.langchain.com).
## Other types of agent runtimes
The `AgentExecutor` class is the main agent runtime supported by LangChain.
However, there are other, more experimental runtimes we also support.
These include:
- [Plan-and-execute Agent](/docs/use_cases/more/agents/autonomous_agents/plan_and_execute)
- [Baby AGI](/docs/use_cases/more/agents/autonomous_agents/baby_agi)
- [Auto GPT](/docs/use_cases/more/agents/autonomous_agents/autogpt)
## Get started
This will go over how to get started building an agent.
We will create this agent from scratch, using LangChain Expression Language.
We will then define custom tools, and then run it in a custom loop (we will also show how to use the standard LangChain `AgentExecutor`).
### Set up the agent
We first need to create our agent.
This is the chain responsible for determining what action to take next.
In this example, we will use OpenAI Function Calling to create this agent.
This is generally the most reliable way create agents.
In this example we will show what it is like to construct this agent from scratch, using LangChain Expression Language.
For this guide, we will construct a custom agent that has access to a custom tool.
We are choosing this example because we think for most use cases you will NEED to customize either the agent or the tools.
The tool we will give the agent is a tool to calculate the length of a word.
This is useful because this is actually something LLMs can mess up due to tokenization.
We will first create it WITHOUT memory, but we will then show how to add memory in.
Memory is needed to enable conversation.
First, let's load the language model we're going to use to control the agent.
```python
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(temperature=0)
```
Next, let's define some tools to use.
Let's write a really simple Python function to calculate the length of a word that is passed in.
```python
from langchain.agents import tool
@tool
def get_word_length(word: str) -> int:
"""Returns the length of a word."""
return len(word)
tools = [get_word_length]
```
Now let us create the prompt.
Because OpenAI Function Calling is finetuned for tool usage, we hardly need any instructions on how to reason, or how to output format.
We will just have two input variables: `input` (for the user question) and `agent_scratchpad` (for any previous steps taken)
```python
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages([
("system", "You are very powerful assistant, but bad at calculating lengths of words."),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
```
How does the agent know what tools it can use?
Those are passed in as a separate argument, so we can bind those as keyword arguments to the LLM.
```python
from langchain.tools.render import format_tool_to_openai_function
llm_with_tools = llm.bind(
functions=[format_tool_to_openai_function(t) for t in tools]
)
```
Putting those pieces together, we can now create the agent.
We will import two last utility functions: a component for formatting intermediate steps to messages, and a component for converting the output message into an agent action/agent finish.
```python
from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
agent = {
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_function_messages(x['intermediate_steps'])
} | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser()
```
Now that we have our agent, let's play around with it!
Let's pass in a simple question and empty intermediate steps and see what it returns:
```python
agent.invoke({
"input": "how many letters in the word educa?",
"intermediate_steps": []
})
```
We can see that it responds with an `AgentAction` to take (it's actually an `AgentActionMessageLog` - a subclass of `AgentAction` which also tracks the full message log).
So this is just the first step - now we need to write a runtime for this.
The simplest one is just one that continuously loops, calling the agent, then taking the action, and repeating until an `AgentFinish` is returned.
Let's code that up below:
```python
from langchain.schema.agent import AgentFinish
intermediate_steps = []
while True:
output = agent.invoke({
"input": "how many letters in the word educa?",
"intermediate_steps": intermediate_steps
})
if isinstance(output, AgentFinish):
final_result = output.return_values["output"]
break
else:
print(output.tool, output.tool_input)
tool = {
"get_word_length": get_word_length
}[output.tool]
observation = tool.run(output.tool_input)
intermediate_steps.append((output, observation))
print(final_result)
```
We can see this prints out the following:
<CodeOutputBlock lang="python">
```
get_word_length {'word': 'educa'}
There are 5 letters in the word "educa".
```
</CodeOutputBlock>
Woo! It's working.
To simplify this a bit, we can import and use the `AgentExecutor` class.
This bundles up all of the above and adds in error handling, early stopping, tracing, and other quality-of-life improvements that reduce safeguards you need to write.
```python
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
```
Now let's test it out!
```python
agent_executor.invoke({"input": "how many letters in the word educa?"})
```
<CodeOutputBlock lang="python">
```
> Entering new AgentExecutor chain...
Invoking: `get_word_length` with `{'word': 'educa'}`
5
There are 5 letters in the word "educa".
> Finished chain.
'There are 5 letters in the word "educa".'
```
</CodeOutputBlock>
This is great - we have an agent!
However, this agent is stateless - it doesn't remember anything about previous interactions.
This means you can't ask follow up questions easily.
Let's fix that by adding in memory.
In order to do this, we need to do two things:
1. Add a place for memory variables to go in the prompt
2. Keep track of the chat history
First, let's add a place for memory in the prompt.
We do this by adding a placeholder for messages with the key `"chat_history"`.
Notice that we put this ABOVE the new user input (to follow the conversation flow).
```python
from langchain.prompts import MessagesPlaceholder
MEMORY_KEY = "chat_history"
prompt = ChatPromptTemplate.from_messages([
("system", "You are very powerful assistant, but bad at calculating lengths of words."),
MessagesPlaceholder(variable_name=MEMORY_KEY),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
```
We can then set up a list to track the chat history
```
from langchain.schema.messages import HumanMessage, AIMessage
chat_history = []
```
We can then put it all together!
```python
agent = {
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_function_messages(x['intermediate_steps']),
"chat_history": lambda x: x["chat_history"]
} | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser()
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
```
When running, we now need to track the inputs and outputs as chat history
```
input1 = "how many letters in the word educa?"
result = agent_executor.invoke({"input": input1, "chat_history": chat_history})
chat_history.append(HumanMessage(content=input1))
chat_history.append(AIMessage(content=result['output']))
agent_executor.invoke({"input": "is that a real word?", "chat_history": chat_history})
```
## Next Steps
Awesome! You've now run your first end-to-end agent.
To dive deeper, you can:
- Check out all the different [agent types](/docs/modules/agents/agent_types/) supported
- Learn all the controls for [AgentExecutor](/docs/modules/agents/how_to/)
- See a full list of all the off-the-shelf [toolkits](/docs/modules/agents/toolkits/) we provide
- Explore all the individual [tools](/docs/modules/agents/tools/) supported

View File

@ -1,5 +1,9 @@
{
"redirects": [
{
"source": "/docs/modules/agents/toolkits(/?)",
"destination": "/docs/modules/agents/tools/toolkits"
},
{
"source": "/docs/modules/model_io/models(/?)",
"destination": "/docs/modules/model_io/"