This commit is contained in:
Eugene Yurtsev 2024-10-11 14:14:34 -04:00
parent 8d9ef40118
commit 18c569f3e3
14 changed files with 302 additions and 404 deletions

View File

@ -0,0 +1,40 @@
# Agents
By themselves, language models can't take actions - they just output text.
A big use case for LangChain is creating **agents**.
Agents are systems that use an LLM as a reasoning engine to determine which actions to take and what the inputs to those actions should be.
The results of those actions can then be fed back into the agent and it determine whether more actions are needed, or whether it is okay to finish.
[LangGraph](https://github.com/langchain-ai/langgraph) is an extension of LangChain specifically aimed at creating highly controllable and customizable agents.
Please check out that documentation for a more in depth overview of agent concepts.
There is a legacy `agent` concept in LangChain that we are moving towards deprecating: `AgentExecutor`.
AgentExecutor was essentially a runtime for agents.
It was a great place to get started, however, it was not flexible enough as you started to have more customized agents.
In order to solve that we built LangGraph to be this flexible, highly-controllable runtime.
If you are still using AgentExecutor, do not fear: we still have a guide on [how to use AgentExecutor](/docs/how_to/agent_executor).
It is recommended, however, that you start to transition to LangGraph.
In order to assist in this, we have put together a [transition guide on how to do so](/docs/how_to/migrate_agent).
## ReAct agents
<span data-heading-keywords="react,react agent"></span>
One popular architecture for building agents is [**ReAct**](https://arxiv.org/abs/2210.03629).
ReAct combines reasoning and acting in an iterative process - in fact the name "ReAct" stands for "Reason" and "Act".
The general flow looks like this:
- The model will "think" about what step to take in response to an input and any previous observations.
- The model will then choose an action from available tools (or choose to respond to the user).
- The model will generate arguments to that tool.
- The agent runtime (executor) will parse out the chosen tool and call it with the generated arguments.
- The executor will return the results of the tool call back to the model as an observation.
- This process repeats until the agent chooses to respond.
There are general prompting based implementations that do not require any model-specific features, but the most
reliable implementations use features like [tool calling](/docs/how_to/tool_calling/) to reliably format outputs
and reduce variance.
Please see the [LangGraph documentation](https://langchain-ai.github.io/langgraph/) for more information,
or [this how-to guide](/docs/how_to/migrate_agent/) for specific information on migrating to LangGraph.

View File

@ -1 +1 @@
# Chat model
# Chat models

View File

@ -0,0 +1,15 @@
# Embedding models
<span data-heading-keywords="embedding,embeddings"></span>
Embedding models create a vector representation of a piece of text. You can think of a vector as an array of numbers that captures the semantic meaning of the text.
By representing the text in this way, you can perform mathematical operations that allow you to do things like search for other pieces of text that are most similar in meaning.
These natural language search capabilities underpin many types of [context retrieval](/docs/concepts/#retrieval),
where we provide an LLM with the relevant data it needs to effectively respond to a query.
![](/img/embeddings.png)
The `Embeddings` class is a class designed for interfacing with text embedding models. There are many different embedding model providers (OpenAI, Cohere, Hugging Face, etc) and local models, and this class is designed to provide a standard interface for all of them.
The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).
For specifics on how to use embedding models, see the [relevant how-to guides here](/docs/how_to/#embedding-models).

View File

@ -65,77 +65,20 @@ A developer platform that lets you debug, test, evaluate, and monitor LLM applic
style={{ width: "100%" }}
/>
## LangChain Expression Language (LCEL)
<span data-heading-keywords="lcel"></span>
`LangChain Expression Language`, or `LCEL`, is a declarative way to chain LangChain components.
LCEL was designed from day 1 to **support putting prototypes in production, with no code changes**, from the simplest “prompt + LLM” chain to the most complex chains (weve seen folks successfully run LCEL chains with 100s of steps in production). To highlight a few of the reasons you might want to use LCEL:
- **First-class streaming support:**
When you build your chains with LCEL you get the best possible time-to-first-token (time elapsed until the first chunk of output comes out). For some chains this means eg. we stream tokens straight from an LLM to a streaming output parser, and you get back parsed, incremental chunks of output at the same rate as the LLM provider outputs the raw tokens.
- **Async support:**
Any chain built with LCEL can be called both with the synchronous API (eg. in your Jupyter notebook while prototyping) as well as with the asynchronous API (eg. in a [LangServe](/docs/langserve/) server). This enables using the same code for prototypes and in production, with great performance, and the ability to handle many concurrent requests in the same server.
- **Optimized parallel execution:**
Whenever your LCEL chains have steps that can be executed in parallel (eg if you fetch documents from multiple retrievers) we automatically do it, both in the sync and the async interfaces, for the smallest possible latency.
- **Retries and fallbacks:**
Configure retries and fallbacks for any part of your LCEL chain. This is a great way to make your chains more reliable at scale. Were currently working on adding streaming support for retries/fallbacks, so you can get the added reliability without any latency cost.
- **Access intermediate results:**
For more complex chains its often very useful to access the results of intermediate steps even before the final output is produced. This can be used to let end-users know something is happening, or even just to debug your chain. You can stream intermediate results, and its available on every [LangServe](/docs/langserve) server.
- **Input and output schemas**
Input and output schemas give every LCEL chain Pydantic and JSONSchema schemas inferred from the structure of your chain. This can be used for validation of inputs and outputs, and is an integral part of LangServe.
- [**Seamless LangSmith tracing**](https://docs.smith.langchain.com)
As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at every step.
With LCEL, **all** steps are automatically logged to [LangSmith](https://docs.smith.langchain.com/) for maximum observability and debuggability.
LCEL aims to provide consistency around behavior and customization over legacy subclassed chains such as `LLMChain` and
`ConversationalRetrievalChain`. Many of these legacy chains hide important details like prompts, and as a wider variety
of viable models emerge, customization has become more and more important.
If you are currently using one of these legacy chains, please see [this guide for guidance on how to migrate](/docs/versions/migrating_chains).
For guides on how to do specific tasks with LCEL, check out [the relevant how-to guides](/docs/how_to/#langchain-expression-language-lcel).
### Runnable interface
## Runnable interface
<span data-heading-keywords="invoke,runnable"></span>
To make it as easy as possible to create custom chains, we've implemented a ["Runnable"](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable) protocol. Many LangChain components implement the `Runnable` protocol, including chat models, LLMs, output parsers, retrievers, prompt templates, and more. There are also several useful primitives for working with runnables, which you can read about below.
* Conceptual Guide: [About the Runnable interface](/docs/concepts/runnables)
* How-to Guides: [How to use the Runnable interface](/docs/how_to/#langchain-expression-language-lcel)
This is a standard interface, which makes it easy to define custom chains as well as invoke them in a standard way.
The standard interface includes:
The Runnable interface is a standard interface for defining and invoking LangChain components.
- `stream`: stream back chunks of the response
- `invoke`: call the chain on an input
- `batch`: call the chain on a list of inputs
## LangChain Expression Language (LCEL)
These also have corresponding async methods that should be used with [asyncio](https://docs.python.org/3/library/asyncio.html) `await` syntax for concurrency:
<span data-heading-keywords="lcel"></span>
- `astream`: stream back chunks of the response async
- `ainvoke`: call the chain on an input async
- `abatch`: call the chain on a list of inputs async
- `astream_log`: stream back intermediate steps as they happen, in addition to the final response
- `astream_events`: **beta** stream events as they happen in the chain (introduced in `langchain-core` 0.1.14)
The **input type** and **output type** varies by component:
| Component | Input Type | Output Type |
|--------------|-------------------------------------------------------|-----------------------|
| Prompt | Dictionary | PromptValue |
| ChatModel | Single string, list of chat messages or a PromptValue | ChatMessage |
| LLM | Single string, list of chat messages or a PromptValue | String |
| OutputParser | The output of an LLM or ChatModel | Depends on the parser |
| Retriever | Single string | List of Documents |
| Tool | Single string or dictionary, depending on the tool | Depends on the tool |
All runnables expose input and output **schemas** to inspect the inputs and outputs:
- `input_schema`: an input Pydantic model auto-generated from the structure of the Runnable
- `output_schema`: an output Pydantic model auto-generated from the structure of the Runnable
* Conceptual Guide: [About the Runnable interface](/docs/concepts/lcel)
* How-to Guides: [How to use the Runnable interface](/docs/how_to/#langchain-expression-language-lcel)
## Components
@ -143,51 +86,16 @@ LangChain provides standard, extendable interfaces and external integrations for
Some components LangChain implements, some components we rely on third-party integrations for, and others are a mix.
### Chat models
<span data-heading-keywords="chat model,chat models"></span>
Language models that use a sequence of messages as inputs and return chat messages as outputs (as opposed to using plain text).
These are traditionally newer models (older models are generally `LLMs`, see below).
Chat models support the assignment of distinct roles to conversation messages, helping to distinguish messages from the AI, users, and instructions such as system messages.
Although the underlying models are messages in, message out, the LangChain wrappers also allow these models to take a string as input. This means you can easily use chat models in place of LLMs.
When a string is passed in as input, it is converted to a `HumanMessage` and then passed to the underlying model.
LangChain does not host any Chat Models, rather we rely on third party integrations.
We have some standardized parameters when constructing ChatModels:
- `model`: the name of the model
- `temperature`: the sampling temperature
- `timeout`: request timeout
- `max_tokens`: max tokens to generate
- `stop`: default stop sequences
- `max_retries`: max number of times to retry requests
- `api_key`: API key for the model provider
- `base_url`: endpoint to send requests to
Some important things to note:
- standard params only apply to model providers that expose parameters with the intended functionality. For example, some providers do not expose a configuration for maximum output tokens, so max_tokens can't be supported on these.
- standard params are currently only enforced on integrations that have their own integration packages (e.g. `langchain-openai`, `langchain-anthropic`, etc.), they're not enforced on models in ``langchain-community``.
ChatModels also accept other parameters that are specific to that integration. To find all the parameters supported by a ChatModel head to the API reference for that model.
:::important
Some chat models have been fine-tuned for **tool calling** and provide a dedicated API for it.
Generally, such models are better at tool calling than non-fine-tuned models, and are recommended for use cases that require tool calling.
Please see the [tool calling section](/docs/concepts/#functiontool-calling) for more information.
:::
For specifics on how to use chat models, see the [relevant how-to guides here](/docs/how_to/#chat-models).
* Conceptual Guide: [About Chat Models](/docs/concepts/chat_models)
* Integrations: [LangChain Chat Model Integrations](/docs/integrations/chat/)
* How-to Guides: [How to use Chat Models](/docs/how_to/#chat-models)
#### Multimodality
Some chat models are multimodal, accepting images, audio and even video as inputs. These are still less common, meaning model providers haven't standardized on the "best" way to define the API. Multimodal **outputs** are even less common. As such, we've kept our multimodal abstractions fairly light weight and plan to further solidify the multimodal APIs and interaction patterns as the field matures.
In LangChain, most chat models that support multimodal inputs also accept those values in OpenAI's content blocks format. So far this is restricted to image inputs. For models like Gemini which support video and other bytes input, the APIs also support the native, model-specific representations.
For specifics on how to use multimodal models, see the [relevant how-to guides here](/docs/how_to/#multimodal).
For a full list of LangChain model providers with multimodal models, [check out this table](/docs/integrations/chat/#advanced-features).
* Conceptual Guide: [About Multimodal Chat Models](/docs/concepts/multimodality)
### LLMs
<span data-heading-keywords="llm,llms"></span>
@ -199,159 +107,33 @@ even for non-chat use cases.
You are probably looking for [the section above instead](/docs/concepts/#chat-models).
:::
Language models that takes a string as input and returns a string.
These are traditionally older models (newer models generally are [Chat Models](/docs/concepts/#chat-models), see above).
Although the underlying models are string in, string out, the LangChain wrappers also allow these models to take messages as input.
This gives them the same interface as [Chat Models](/docs/concepts/#chat-models).
When messages are passed in as input, they will be formatted into a string under the hood before being passed to the underlying model.
LangChain does not host any LLMs, rather we rely on third party integrations.
For specifics on how to use LLMs, see the [how-to guides](/docs/how_to/#llms).
* Conceptual Guide: [About Language Models](/docs/concepts/llms)
* Integrations: [LangChain LLM Integrations](/docs/integrations/llms/)
* How-to Guides: [How to use LLMs](/docs/how_to/#llms)
<a id="aimessage"></a>
<a id="systemmessage"></a>
<a id="humanmessage"></a>
<a id="toolmessage"></a>
<a id="legacy-functionmessage"></a>
### Messages
Some language models take a list of messages as input and return a message.
There are a few different types of messages.
All messages have a `role`, `content`, and `response_metadata` property.
The `role` describes WHO is saying the message. The standard roles are "user", "assistant", "system", and "tool".
LangChain has different message classes for different roles.
The `content` property describes the content of the message.
This can be a few different things:
- A string (most models deal with this type of content)
- A List of dictionaries (this is used for multimodal input, where the dictionary contains information about that input type and that input location)
Optionally, messages can have a `name` property which allows for differentiating between multiple speakers with the same role.
For example, if there are two users in the chat history it can be useful to differentiate between them. Not all models support this.
#### HumanMessage
This represents a message with role "user".
#### AIMessage
This represents a message with role "assistant". In addition to the `content` property, these messages also have:
**`response_metadata`**
The `response_metadata` property contains additional metadata about the response. The data here is often specific to each model provider.
This is where information like log-probs and token usage may be stored.
**`tool_calls`**
These represent a decision from a language model to call a tool. They are included as part of an `AIMessage` output.
They can be accessed from there with the `.tool_calls` property.
This property returns a list of `ToolCall`s. A `ToolCall` is a dictionary with the following arguments:
- `name`: The name of the tool that should be called.
- `args`: The arguments to that tool.
- `id`: The id of that tool call.
#### SystemMessage
This represents a message with role "system", which tells the model how to behave. Not every model provider supports this.
#### ToolMessage
This represents a message with role "tool", which contains the result of calling a tool. In addition to `role` and `content`, this message has:
- a `tool_call_id` field which conveys the id of the call to the tool that was called to produce this result.
- an `artifact` field which can be used to pass along arbitrary artifacts of the tool execution which are useful to track but which should not be sent to the model.
With most chat models, a `ToolMessage` can only appear in the chat history after an `AIMessage` that has a populated `tool_calls` field.
#### (Legacy) FunctionMessage
This is a legacy message type, corresponding to OpenAI's legacy function-calling API. `ToolMessage` should be used instead to correspond to the updated tool-calling API.
This represents the result of a function call. In addition to `role` and `content`, this message has a `name` parameter which conveys the name of the function that was called to produce this result.
* Conceptual Guide: [About Messages](/docs/concepts/messages)
* How-to Guides: [How to use Messages](/docs/how_to/#messages)
### Prompt templates
<span data-heading-keywords="prompt,prompttemplate,chatprompttemplate"></span>
Prompt templates help to translate user input and parameters into instructions for a language model.
This can be used to guide a model's response, helping it understand the context and generate relevant and coherent language-based output.
Prompt Templates take as input a dictionary, where each key represents a variable in the prompt template to fill in.
Prompt Templates output a PromptValue. This PromptValue can be passed to an LLM or a ChatModel, and can also be cast to a string or a list of messages.
The reason this PromptValue exists is to make it easy to switch between strings and messages.
There are a few different types of prompt templates:
Conceptual Guide: [About Prompt Templates](/docs/concepts/prompts)
How-to Guides: [How to use Prompt Templates](/docs/how_to/#prompt-templates)
#### String PromptTemplates
These prompt templates are used to format a single string, and generally are used for simpler inputs.
For example, a common way to construct and use a PromptTemplate is as follows:
```python
from langchain_core.prompts import PromptTemplate
prompt_template = PromptTemplate.from_template("Tell me a joke about {topic}")
prompt_template.invoke({"topic": "cats"})
```
#### ChatPromptTemplates
These prompt templates are used to format a list of messages. These "templates" consist of a list of templates themselves.
For example, a common way to construct and use a ChatPromptTemplate is as follows:
```python
from langchain_core.prompts import ChatPromptTemplate
prompt_template = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant"),
("user", "Tell me a joke about {topic}")
])
prompt_template.invoke({"topic": "cats"})
```
In the above example, this ChatPromptTemplate will construct two messages when called.
The first is a system message, that has no variables to format.
The second is a HumanMessage, and will be formatted by the `topic` variable the user passes in.
#### MessagesPlaceholder
<span data-heading-keywords="messagesplaceholder"></span>
This prompt template is responsible for adding a list of messages in a particular place.
In the above ChatPromptTemplate, we saw how we could format two messages, each one a string.
But what if we wanted the user to pass in a list of messages that we would slot into a particular spot?
This is how you use MessagesPlaceholder.
```python
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage
prompt_template = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant"),
MessagesPlaceholder("msgs")
])
prompt_template.invoke({"msgs": [HumanMessage(content="hi!")]})
```
This will produce a list of two messages, the first one being a system message, and the second one being the HumanMessage we passed in.
If we had passed in 5 messages, then it would have produced 6 messages in total (the system message plus the 5 passed in).
This is useful for letting a list of messages be slotted into a particular spot.
An alternative way to accomplish the same thing without using the `MessagesPlaceholder` class explicitly is:
```python
prompt_template = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant"),
("placeholder", "{msgs}") # <-- This is the changed part
])
```
For specifics on how to use prompt templates, see the [relevant how-to guides here](/docs/how_to/#prompt-templates).
### Example selectors
One common prompting technique for achieving better performance is to include examples as part of the prompt.
@ -367,41 +149,15 @@ For specifics on how to use example selectors, see the [relevant how-to guides h
:::note
The information here refers to parsers that take a text output from a model try to parse it into a more structured representation.
More and more models are supporting function (or tool) calling, which handles this automatically.
It is recommended to use function/tool calling rather than output parsing.
See documentation for that [here](/docs/concepts/#function-tool-calling).
Output parsers precede chat models that were capable of calling tools. These days, it is recommended to use function/tool calling
as it's simpler while providing better quality results.
See documentation for that [here](/docs/concepts/#function-tool-calling).
:::
`Output parser` is responsible for taking the output of a model and transforming it to a more suitable format for downstream tasks.
Useful when you are using LLMs to generate structured data, or to normalize output from chat models and LLMs.
Conceptual Guide: [About Output Parsers](/docs/concepts/output_parsers)
How-to Guides: [How to use Output Parsers](/docs/how_to/#output-parsers)
LangChain has lots of different types of output parsers. This is a list of output parsers LangChain supports. The table below has various pieces of information:
- **Name**: The name of the output parser
- **Supports Streaming**: Whether the output parser supports streaming.
- **Has Format Instructions**: Whether the output parser has format instructions. This is generally available except when (a) the desired schema is not specified in the prompt but rather in other parameters (like OpenAI function calling), or (b) when the OutputParser wraps another OutputParser.
- **Calls LLM**: Whether this output parser itself calls an LLM. This is usually only done by output parsers that attempt to correct misformatted output.
- **Input Type**: Expected input type. Most output parsers work on both strings and messages, but some (like OpenAI Functions) need a message with specific kwargs.
- **Output Type**: The output type of the object returned by the parser.
- **Description**: Our commentary on this output parser and when to use it.
| Name | Supports Streaming | Has Format Instructions | Calls LLM | Input Type | Output Type | Description |
|-----------------|--------------------|-------------------------------|-----------|----------------------------------|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [JSON](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.json.JsonOutputParser.html#langchain_core.output_parsers.json.JsonOutputParser) | ✅ | ✅ | | `str` \| `Message` | JSON object | Returns a JSON object as specified. You can specify a Pydantic model and it will return JSON for that model. Probably the most reliable output parser for getting structured data that does NOT use function calling. |
| [XML](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.xml.XMLOutputParser.html#langchain_core.output_parsers.xml.XMLOutputParser) | ✅ | ✅ | | `str` \| `Message` | `dict` | Returns a dictionary of tags. Use when XML output is needed. Use with models that are good at writing XML (like Anthropic's). |
| [CSV](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.list.CommaSeparatedListOutputParser.html#langchain_core.output_parsers.list.CommaSeparatedListOutputParser) | ✅ | ✅ | | `str` \| `Message` | `List[str]` | Returns a list of comma separated values. |
| [OutputFixing](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.fix.OutputFixingParser.html#langchain.output_parsers.fix.OutputFixingParser) | | | ✅ | `str` \| `Message` | | Wraps another output parser. If that output parser errors, then this will pass the error message and the bad output to an LLM and ask it to fix the output. |
| [RetryWithError](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.retry.RetryWithErrorOutputParser.html#langchain.output_parsers.retry.RetryWithErrorOutputParser) | | | ✅ | `str` \| `Message` | | Wraps another output parser. If that output parser errors, then this will pass the original inputs, the bad output, and the error message to an LLM and ask it to fix it. Compared to OutputFixingParser, this one also sends the original instructions. |
| [Pydantic](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.pydantic.PydanticOutputParser.html#langchain_core.output_parsers.pydantic.PydanticOutputParser) | | ✅ | | `str` \| `Message` | `pydantic.BaseModel` | Takes a user defined Pydantic model and returns data in that format. |
| [YAML](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.yaml.YamlOutputParser.html#langchain.output_parsers.yaml.YamlOutputParser) | | ✅ | | `str` \| `Message` | `pydantic.BaseModel` | Takes a user defined Pydantic model and returns data in that format. Uses YAML to encode it. |
| [PandasDataFrame](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.pandas_dataframe.PandasDataFrameOutputParser.html#langchain.output_parsers.pandas_dataframe.PandasDataFrameOutputParser) | | ✅ | | `str` \| `Message` | `dict` | Useful for doing operations with pandas DataFrames. |
| [Enum](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.enum.EnumOutputParser.html#langchain.output_parsers.enum.EnumOutputParser) | | ✅ | | `str` \| `Message` | `Enum` | Parses response into one of the provided enum values. |
| [Datetime](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.datetime.DatetimeOutputParser.html#langchain.output_parsers.datetime.DatetimeOutputParser) | | ✅ | | `str` \| `Message` | `datetime.datetime` | Parses response into a datetime string. |
| [Structured](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.structured.StructuredOutputParser.html#langchain.output_parsers.structured.StructuredOutputParser) | | ✅ | | `str` \| `Message` | `Dict[str, str]` | An output parser that returns structured information. It is less powerful than other output parsers since it only allows for fields to be strings. This can be useful when you are working with smaller LLMs. |
For specifics on how to use output parsers, see the [relevant how-to guides here](/docs/how_to/#output-parsers).
### Chat history
Most LLM applications have a conversational interface.
@ -428,82 +184,40 @@ These classes load Document objects. LangChain has hundreds of integrations with
Each DocumentLoader has its own specific parameters, but they can all be invoked in the same way with the `.load` method.
An example use case is as follows:
```python
from langchain_community.document_loaders.csv_loader import CSVLoader
### Output parsers
<span data-heading-keywords="output parser"></span>
loader = CSVLoader(
... # <-- Integration specific parameters here
)
data = loader.load()
```
:::note
The information here refers to parsers that take a text output from a model try to parse it into a more structured representation.
More and more models are supporting function (or tool) calling, which handles this automatically.
It is recommended to use function/tool calling rather than output parsing.
See documentation for that [here](/docs/concepts/#function-tool-calling).
:::
For specifics on how to use document loaders, see the [relevant how-to guides here](/docs/how_to/#document-loaders).
* Conceptual Guide: [About Output Parsers](/docs/concepts/output_parsers)
* How-to Guides: [How to use Output Parsers](/docs/how_to/#output-parsers)
### Text splitters
Once you've loaded documents, you'll often want to transform them to better suit your application. The simplest example is you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.
When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds, there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together. What "semantically related" means could depend on the type of text. This notebook showcases several ways to do that.
At a high level, text splitters work as following:
1. Split the text up into small, semantically meaningful chunks (often sentences).
2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks).
That means there are two different axes along which you can customize your text splitter:
1. How the text is split
2. How the chunk size is measured
For specifics on how to use text splitters, see the [relevant how-to guides here](/docs/how_to/#text-splitters).
* Conceptual Guide: [About Text Splitters](/docs/concepts/text_splitters)
### Embedding models
<span data-heading-keywords="embedding,embeddings"></span>
Embedding models create a vector representation of a piece of text. You can think of a vector as an array of numbers that captures the semantic meaning of the text.
By representing the text in this way, you can perform mathematical operations that allow you to do things like search for other pieces of text that are most similar in meaning.
These natural language search capabilities underpin many types of [context retrieval](/docs/concepts/#retrieval),
where we provide an LLM with the relevant data it needs to effectively respond to a query.
![](/img/embeddings.png)
The `Embeddings` class is a class designed for interfacing with text embedding models. There are many different embedding model providers (OpenAI, Cohere, Hugging Face, etc) and local models, and this class is designed to provide a standard interface for all of them.
The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).
For specifics on how to use embedding models, see the [relevant how-to guides here](/docs/how_to/#embedding-models).
### Vector stores
<span data-heading-keywords="vector,vectorstore,vectorstores,vector store,vector stores"></span>
One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors,
and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query.
A vector store takes care of storing embedded data and performing vector search for you.
Most vector stores can also store metadata about embedded vectors and support filtering on that metadata before
similarity search, allowing you more control over returned documents.
Vector stores can be converted to the retriever interface by doing:
```python
vectorstore = MyVectorStore()
retriever = vectorstore.as_retriever()
```
For specifics on how to use vector stores, see the [relevant how-to guides here](/docs/how_to/#vector-stores).
* Conceptual Guide: [About Embedding Models](/docs/concepts/embedding_models)
* How-to Guides: [How to use Embedding Models](/docs/how_to/#embedding-models)
### Retrievers
<span data-heading-keywords="retriever,retrievers"></span>
A retriever is an interface that returns documents given an unstructured query.
It is more general than a vector store.
A retriever does not need to be able to store documents, only to return (or retrieve) them.
Retrievers can be created from vector stores, but are also broad enough to include [Wikipedia search](/docs/integrations/retrievers/wikipedia/) and [Amazon Kendra](/docs/integrations/retrievers/amazon_kendra_retriever/).
* Conceptual Guide: [About Retrievers](/docs/concepts/retrievers)
* How-to Guides: [How to use Retrievers](/docs/how_to/#retrievers)
Retrievers accept a string query as input and return a list of Document's as output.
### Vector stores
<span data-heading-keywords="vector,vectorstore,vectorstores,vector store,vector stores"></span>
For specifics on how to use retrievers, see the [relevant how-to guides here](/docs/how_to/#retrievers).
* Conceptual Guide: [About Vector Stores](/docs/concepts/vectorstores)
* How-to Guides: [How to use Vector Stores](/docs/how_to/#vector-stores)
### Key-value stores
@ -645,44 +359,6 @@ tools = toolkit.get_tools()
### Agents
By themselves, language models can't take actions - they just output text.
A big use case for LangChain is creating **agents**.
Agents are systems that use an LLM as a reasoning engine to determine which actions to take and what the inputs to those actions should be.
The results of those actions can then be fed back into the agent and it determine whether more actions are needed, or whether it is okay to finish.
[LangGraph](https://github.com/langchain-ai/langgraph) is an extension of LangChain specifically aimed at creating highly controllable and customizable agents.
Please check out that documentation for a more in depth overview of agent concepts.
There is a legacy `agent` concept in LangChain that we are moving towards deprecating: `AgentExecutor`.
AgentExecutor was essentially a runtime for agents.
It was a great place to get started, however, it was not flexible enough as you started to have more customized agents.
In order to solve that we built LangGraph to be this flexible, highly-controllable runtime.
If you are still using AgentExecutor, do not fear: we still have a guide on [how to use AgentExecutor](/docs/how_to/agent_executor).
It is recommended, however, that you start to transition to LangGraph.
In order to assist in this, we have put together a [transition guide on how to do so](/docs/how_to/migrate_agent).
#### ReAct agents
<span data-heading-keywords="react,react agent"></span>
One popular architecture for building agents is [**ReAct**](https://arxiv.org/abs/2210.03629).
ReAct combines reasoning and acting in an iterative process - in fact the name "ReAct" stands for "Reason" and "Act".
The general flow looks like this:
- The model will "think" about what step to take in response to an input and any previous observations.
- The model will then choose an action from available tools (or choose to respond to the user).
- The model will generate arguments to that tool.
- The agent runtime (executor) will parse out the chosen tool and call it with the generated arguments.
- The executor will return the results of the tool call back to the model as an observation.
- This process repeats until the agent chooses to respond.
There are general prompting based implementations that do not require any model-specific features, but the most
reliable implementations use features like [tool calling](/docs/how_to/tool_calling/) to reliably format outputs
and reduce variance.
Please see the [LangGraph documentation](https://langchain-ai.github.io/langgraph/) for more information,
or [this how-to guide](/docs/how_to/migrate_agent/) for specific information on migrating to LangGraph.
### Callbacks
@ -759,39 +435,11 @@ For specifics on how to use callbacks, see the [relevant how-to guides here](/do
### Streaming
<span data-heading-keywords="stream,streaming"></span>
Individual LLM calls often run for much longer than traditional resource requests.
This compounds when you build more complex chains or agents that require multiple reasoning steps.
Fortunately, LLMs generate output iteratively, which means it's possible to show sensible intermediate results
before the final response is ready. Consuming output as soon as it becomes available has therefore become a vital part of the UX
around building apps with LLMs to help alleviate latency issues, and LangChain aims to have first-class support for streaming.
Below, we'll discuss some concepts and considerations around streaming in LangChain.
Conceptual Guide: [Streaming](/docs/concepts/streaming)
#### `.stream()` and `.astream()`
Most modules in LangChain include the `.stream()` method (and the equivalent `.astream()` method for [async](https://docs.python.org/3/library/asyncio.html) environments) as an ergonomic streaming interface.
`.stream()` returns an iterator, which you can consume with a simple `for` loop. Here's an example with a chat model:
```python
from langchain_anthropic import ChatAnthropic
model = ChatAnthropic(model="claude-3-sonnet-20240229")
for chunk in model.stream("what color is the sky?"):
print(chunk.content, end="|", flush=True)
```
For models (or other components) that don't support streaming natively, this iterator would just yield a single chunk, but
you could still use the same general pattern when calling them. Using `.stream()` will also automatically call the model in streaming mode
without the need to provide additional config.
The type of each outputted chunk depends on the type of component - for example, chat models yield [`AIMessageChunks`](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.AIMessageChunk.html).
Because this method is part of [LangChain Expression Language](/docs/concepts/#langchain-expression-language-lcel),
you can handle formatting differences from different outputs using an [output parser](/docs/concepts/#output-parsers) to transform
each yielded chunk.
You can check out [this guide](/docs/how_to/streaming/#using-stream) for more detail on how to use `.stream()`.
Conceptual Guide: [Streaming](/docs/concepts/streaming#stream)
#### `.astream_events()`
<span data-heading-keywords="astream_events,stream_events,stream events"></span>

View File

@ -0,0 +1 @@
# LCEL

View File

@ -0,0 +1 @@
# LLMs

View File

@ -0,0 +1,12 @@
# Messages
## HumanMessage
## AIMessage
## SystemMessage
## ToolMessage
## (Legacy) FunctionMessage

View File

@ -0,0 +1,11 @@
# MultiModality
Some chat models are multimodal, accepting images, audio and even video as inputs. These are still less common, meaning model providers haven't standardized on the "best" way to define the API. Multimodal **outputs** are even less common. As such, we've kept our multimodal abstractions fairly light weight and plan to further solidify the multimodal APIs and interaction patterns as the field matures.
In LangChain, most chat models that support multimodal inputs also accept those values in OpenAI's content blocks format. So far this is restricted to image inputs. For models like Gemini which support video and other bytes input, the APIs also support the native, model-specific representations.
For specifics on how to use multimodal models, see the [relevant how-to guides here](/docs/how_to/#multimodal).
For a full list of LangChain model providers with multimodal models, [check out this table](/docs/integrations/chat/#advanced-features).

View File

@ -0,0 +1,41 @@
# Output parsers
<span data-heading-keywords="output parser"></span>
:::note
The information here refers to parsers that take a text output from a model try to parse it into a more structured representation.
More and more models are supporting function (or tool) calling, which handles this automatically.
It is recommended to use function/tool calling rather than output parsing.
See documentation for that [here](/docs/concepts/#function-tool-calling).
:::
`Output parser` is responsible for taking the output of a model and transforming it to a more suitable format for downstream tasks.
Useful when you are using LLMs to generate structured data, or to normalize output from chat models and LLMs.
LangChain has lots of different types of output parsers. This is a list of output parsers LangChain supports. The table below has various pieces of information:
- **Name**: The name of the output parser
- **Supports Streaming**: Whether the output parser supports streaming.
- **Has Format Instructions**: Whether the output parser has format instructions. This is generally available except when (a) the desired schema is not specified in the prompt but rather in other parameters (like OpenAI function calling), or (b) when the OutputParser wraps another OutputParser.
- **Calls LLM**: Whether this output parser itself calls an LLM. This is usually only done by output parsers that attempt to correct misformatted output.
- **Input Type**: Expected input type. Most output parsers work on both strings and messages, but some (like OpenAI Functions) need a message with specific kwargs.
- **Output Type**: The output type of the object returned by the parser.
- **Description**: Our commentary on this output parser and when to use it.
| Name | Supports Streaming | Has Format Instructions | Calls LLM | Input Type | Output Type | Description |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|-------------------------|-----------|--------------------|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [JSON](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.json.JsonOutputParser.html#langchain_core.output_parsers.json.JsonOutputParser) | ✅ | ✅ | | `str` \| `Message` | JSON object | Returns a JSON object as specified. You can specify a Pydantic model and it will return JSON for that model. Probably the most reliable output parser for getting structured data that does NOT use function calling. |
| [XML](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.xml.XMLOutputParser.html#langchain_core.output_parsers.xml.XMLOutputParser) | ✅ | ✅ | | `str` \| `Message` | `dict` | Returns a dictionary of tags. Use when XML output is needed. Use with models that are good at writing XML (like Anthropic's). |
| [CSV](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.list.CommaSeparatedListOutputParser.html#langchain_core.output_parsers.list.CommaSeparatedListOutputParser) | ✅ | ✅ | | `str` \| `Message` | `List[str]` | Returns a list of comma separated values. |
| [OutputFixing](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.fix.OutputFixingParser.html#langchain.output_parsers.fix.OutputFixingParser) | | | ✅ | `str` \| `Message` | | Wraps another output parser. If that output parser errors, then this will pass the error message and the bad output to an LLM and ask it to fix the output. |
| [RetryWithError](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.retry.RetryWithErrorOutputParser.html#langchain.output_parsers.retry.RetryWithErrorOutputParser) | | | ✅ | `str` \| `Message` | | Wraps another output parser. If that output parser errors, then this will pass the original inputs, the bad output, and the error message to an LLM and ask it to fix it. Compared to OutputFixingParser, this one also sends the original instructions. |
| [Pydantic](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.pydantic.PydanticOutputParser.html#langchain_core.output_parsers.pydantic.PydanticOutputParser) | | ✅ | | `str` \| `Message` | `pydantic.BaseModel` | Takes a user defined Pydantic model and returns data in that format. |
| [YAML](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.yaml.YamlOutputParser.html#langchain.output_parsers.yaml.YamlOutputParser) | | ✅ | | `str` \| `Message` | `pydantic.BaseModel` | Takes a user defined Pydantic model and returns data in that format. Uses YAML to encode it. |
| [PandasDataFrame](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.pandas_dataframe.PandasDataFrameOutputParser.html#langchain.output_parsers.pandas_dataframe.PandasDataFrameOutputParser) | | ✅ | | `str` \| `Message` | `dict` | Useful for doing operations with pandas DataFrames. |
| [Enum](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.enum.EnumOutputParser.html#langchain.output_parsers.enum.EnumOutputParser) | | ✅ | | `str` \| `Message` | `Enum` | Parses response into one of the provided enum values. |
| [Datetime](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.datetime.DatetimeOutputParser.html#langchain.output_parsers.datetime.DatetimeOutputParser) | | ✅ | | `str` \| `Message` | `datetime.datetime` | Parses response into a datetime string. |
| [Structured](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.structured.StructuredOutputParser.html#langchain.output_parsers.structured.StructuredOutputParser) | | ✅ | | `str` \| `Message` | `Dict[str, str]` | An output parser that returns structured information. It is less powerful than other output parsers since it only allows for fields to be strings. This can be useful when you are working with smaller LLMs. |
For specifics on how to use output parsers, see the [relevant how-to guides here](/docs/how_to/#output-parsers).

View File

@ -0,0 +1,79 @@
# Prompt Templates
Prompt templates help to translate user input and parameters into instructions for a language model.
This can be used to guide a model's response, helping it understand the context and generate relevant and coherent language-based output.
Prompt Templates take as input a dictionary, where each key represents a variable in the prompt template to fill in.
Prompt Templates output a PromptValue. This PromptValue can be passed to an LLM or a ChatModel, and can also be cast to a string or a list of messages.
The reason this PromptValue exists is to make it easy to switch between strings and messages.
There are a few different types of prompt templates:
## String PromptTemplates
These prompt templates are used to format a single string, and generally are used for simpler inputs.
For example, a common way to construct and use a PromptTemplate is as follows:
```python
from langchain_core.prompts import PromptTemplate
prompt_template = PromptTemplate.from_template("Tell me a joke about {topic}")
prompt_template.invoke({"topic": "cats"})
```
## ChatPromptTemplates
These prompt templates are used to format a list of messages. These "templates" consist of a list of templates themselves.
For example, a common way to construct and use a ChatPromptTemplate is as follows:
```python
from langchain_core.prompts import ChatPromptTemplate
prompt_template = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant"),
("user", "Tell me a joke about {topic}")
])
prompt_template.invoke({"topic": "cats"})
```
In the above example, this ChatPromptTemplate will construct two messages when called.
The first is a system message, that has no variables to format.
The second is a HumanMessage, and will be formatted by the `topic` variable the user passes in.
## MessagesPlaceholder
<span data-heading-keywords="messagesplaceholder"></span>
This prompt template is responsible for adding a list of messages in a particular place.
In the above ChatPromptTemplate, we saw how we could format two messages, each one a string.
But what if we wanted the user to pass in a list of messages that we would slot into a particular spot?
This is how you use MessagesPlaceholder.
```python
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage
prompt_template = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant"),
MessagesPlaceholder("msgs")
])
prompt_template.invoke({"msgs": [HumanMessage(content="hi!")]})
```
This will produce a list of two messages, the first one being a system message, and the second one being the HumanMessage we passed in.
If we had passed in 5 messages, then it would have produced 6 messages in total (the system message plus the 5 passed in).
This is useful for letting a list of messages be slotted into a particular spot.
An alternative way to accomplish the same thing without using the `MessagesPlaceholder` class explicitly is:
```python
prompt_template = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant"),
("placeholder", "{msgs}") # <-- This is the changed part
])
```
For specifics on how to use prompt templates, see the [relevant how-to guides here](/docs/how_to/#prompt-templates).

View File

@ -0,0 +1,12 @@
# Retrievers
<span data-heading-keywords="retriever,retrievers"></span>
A retriever is an interface that returns documents given an unstructured query.
It is more general than a vector store.
A retriever does not need to be able to store documents, only to return (or retrieve) them.
Retrievers can be created from vector stores, but are also broad enough to include [Wikipedia search](/docs/integrations/retrievers/wikipedia/) and [Amazon Kendra](/docs/integrations/retrievers/amazon_kendra_retriever/).
Retrievers accept a string query as input and return a list of Document's as output.
For specifics on how to use retrievers, see the [relevant how-to guides here](/docs/how_to/#retrievers).

View File

@ -0,0 +1 @@
# Runnable Interface

View File

@ -0,0 +1,18 @@
# Text Splitters
Once you've loaded documents, you'll often want to transform them to better suit your application. The simplest example is you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.
When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds, there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together. What "semantically related" means could depend on the type of text. This notebook showcases several ways to do that.
At a high level, text splitters work as following:
1. Split the text up into small, semantically meaningful chunks (often sentences).
2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks).
That means there are two different axes along which you can customize your text splitter:
1. How the text is split
2. How the chunk size is measured
For specifics on how to use text splitters, see the [relevant how-to guides here](/docs/how_to/#text-splitters).

View File

@ -0,0 +1,19 @@
# Vector stores
<span data-heading-keywords="vector,vectorstore,vectorstores,vector store,vector stores"></span>
One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors,
and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query.
A vector store takes care of storing embedded data and performing vector search for you.
Most vector stores can also store metadata about embedded vectors and support filtering on that metadata before
similarity search, allowing you more control over returned documents.
Vector stores can be converted to the retriever interface by doing:
```python
vectorstore = MyVectorStore()
retriever = vectorstore.as_retriever()
```
For specifics on how to use vector stores, see the [relevant how-to guides here](/docs/how_to/#vector-stores).