x

docs: fix some typos (#27519 )
* Fix some typos * Add some missing links
2026-01-30 13:50:11 +00:00 · 2024-10-21 17:17:11 -04:00 · 2024-10-21 16:00:31 -04:00 · 2024-10-21 15:34:21 -04:00 · 2024-10-21 08:10:16 -07:00 · 2024-10-18 12:52:52 -07:00
47 changed files with 3177 additions and 623 deletions
--- a/docs/docs/concepts/agents.mdx
+++ b/docs/docs/concepts/agents.mdx
@@ -0,0 +1,40 @@
+# Agents
+
+By themselves, language models can't take actions - they just output text.
+A big use case for LangChain is creating **agents**.
+Agents are systems that use an LLM as a reasoning engine to determine which actions to take and what the inputs to those actions should be.
+The results of those actions can then be fed back into the agent and it determine whether more actions are needed, or whether it is okay to finish.
+
+[LangGraph](https://github.com/langchain-ai/langgraph) is an extension of LangChain specifically aimed at creating highly controllable and customizable agents.
+Please check out that documentation for a more in depth overview of agent concepts.
+
+There is a legacy `agent` concept in LangChain that we are moving towards deprecating: `AgentExecutor`.
+AgentExecutor was essentially a runtime for agents.
+It was a great place to get started, however, it was not flexible enough as you started to have more customized agents.
+In order to solve that we built LangGraph to be this flexible, highly-controllable runtime.
+
+If you are still using AgentExecutor, do not fear: we still have a guide on [how to use AgentExecutor](/docs/how_to/agent_executor).
+It is recommended, however, that you start to transition to LangGraph.
+In order to assist in this, we have put together a [transition guide on how to do so](/docs/how_to/migrate_agent).
+
+## ReAct agents
+<span data-heading-keywords="react,react agent"></span>
+
+One popular architecture for building agents is [**ReAct**](https://arxiv.org/abs/2210.03629).
+ReAct combines reasoning and acting in an iterative process - in fact the name "ReAct" stands for "Reason" and "Act".
+
+The general flow looks like this:
+
+- The model will "think" about what step to take in response to an input and any previous observations.
+- The model will then choose an action from available tools (or choose to respond to the user).
+- The model will generate arguments to that tool.
+- The agent runtime (executor) will parse out the chosen tool and call it with the generated arguments.
+- The executor will return the results of the tool call back to the model as an observation.
+- This process repeats until the agent chooses to respond.
+
+There are general prompting based implementations that do not require any model-specific features, but the most
+reliable implementations use features like [tool calling](/docs/how_to/tool_calling/) to reliably format outputs
+and reduce variance.
+
+Please see the [LangGraph documentation](https://langchain-ai.github.io/langgraph/) for more information,
+or [this how-to guide](/docs/how_to/migrate_agent/) for specific information on migrating to LangGraph.
--- a/docs/docs/concepts/architecture.mdx
+++ b/docs/docs/concepts/architecture.mdx
@@ -0,0 +1,59 @@
+import ThemedImage from '@theme/ThemedImage';
+import useBaseUrl from '@docusaurus/useBaseUrl';
+
+In this section, you'll find explanations of the key concepts, providing a deeper understanding of core principles.
+
+The conceptual guide will not cover step-by-step instructions or specific implementation details — those are found in the [How-To Guides](/docs/how_to/) and [Tutorials](/docs/tutorials) sections. For detailed reference material, please visit the [API Reference](https://python.langchain.com/api_reference/).
+
+## Architecture
+
+LangChain as a framework consists of a number of packages.
+
+### `langchain-core`
+This package contains base abstractions of different components and ways to compose them together.
+The interfaces for core components like LLMs, vector stores, retrievers and more are defined here.
+No third party integrations are defined here.
+The dependencies are kept purposefully very lightweight.
+
+### `langchain`
+
+The main `langchain` package contains chains, agents, and retrieval strategies that make up an application's cognitive architecture.
+These are NOT third party integrations.
+All chains, agents, and retrieval strategies here are NOT specific to any one integration, but rather generic across all integrations.
+
+### `langchain-community`
+
+This package contains third party integrations that are maintained by the LangChain community.
+Key partner packages are separated out (see below).
+This contains all integrations for various components (LLMs, vector stores, retrievers).
+All dependencies in this package are optional to keep the package as lightweight as possible.
+
+### Partner packages
+
+While the long tail of integrations is in `langchain-community`, we split popular integrations into their own packages (e.g. `langchain-openai`, `langchain-anthropic`, etc).
+This was done in order to improve support for these important integrations.
+
+### [`langgraph`](https://langchain-ai.github.io/langgraph)
+
+`langgraph` is an extension of `langchain` aimed at
+building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
+
+LangGraph exposes high level interfaces for creating common types of agents, as well as a low-level API for composing custom flows.
+
+### [`langserve`](/docs/langserve)
+
+A package to deploy LangChain chains as REST APIs. Makes it easy to get a production ready API up and running.
+
+### [LangSmith](https://docs.smith.langchain.com)
+
+A developer platform that lets you debug, test, evaluate, and monitor LLM applications.
+
+<ThemedImage
+    alt="Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers."
+    sources={{
+        light: useBaseUrl('/svg/langchain_stack_062024.svg'),
+        dark: useBaseUrl('/svg/langchain_stack_062024_dark.svg'),
+    }}
+    title="LangChain Framework Overview"
+    style={{ width: "100%" }}
+/>
--- a/docs/docs/concepts/async.mdx
+++ b/docs/docs/concepts/async.mdx
@@ -0,0 +1,83 @@
+# Async Programming with LangChain
+
+:::info Prerequisites
+* [Runnable Interface](/docs/concepts/runnables)
+* [asyncio documentation](https://docs.python.org/3/library/asyncio.html)
+:::
+
+## Overview
+
+LLM based applications often involve a lot of I/O-bound operations, such as making API calls to language models, databases, or other services. Asynchronous programming (or async programming) is a paradigm that allows a program to perform multiple tasks concurrently without blocking the execution of other tasks, improving efficiency and responsiveness, particularly in I/O-bound operations.
+
+:::note
+You are expected to be familiar with asynchronous programming in Python before reading this guide. If you are not, please find appropriate resources online to learn how to program asynchronously in Python.
+This guide specifically focuses on what you need to know to work with LangChain in an asynchronous context, assuming that you are already familiar with asynch
+:::
+
+## LangChain Asynchronous APIs
+
+Many LangChain APIs are designed to be asynchronous, allowing you to build efficient and responsive applications.
+
+Typically, any method that may perform I/O operations (e.g., making API calls, reading files) will have an asynchronous counterpart.
+
+In LangChain, async implementations are located in the same classes as their synchronous counterparts, with the asynchronous methods having an "a" prefix. For example, the synchronous `invoke` method has an asynchronous counterpart called `ainvoke`.
+
+Many components of LangChain implement the [Runnable Interface](/docs/concepts/runnables), which includes support for asynchronous execution. This means that you can run Runnables asynchronously using the `await` keyword in Python.
+
+```python
+await some_runnable.ainvoke(some_input)
+```
+
+Other components like [Embedding Models](/docs/concepts/embedding_models) and [VectorStore](/docs/concepts/vectorstores) that do not implement the [Runnable Interface](/docs/concepts/runnables) usually still follow the same rule and include the asynchronous version of method in the same class with an "a" prefix.
+
+For example,
+
+```python
+await some_vectorstore.aadd_documents(documents)
+```
+
+Runnables created using the [LangChain Expression Language (LCEL)](/docs/concepts/lcel) can also be run asynchronously as they implement
+the full [Runnable Interface](/docs/concepts/runnables).
+
+Fore more information, please review the [API reference](https://python.langchain.com/api_reference/) for the specific component you are using.
+
+## Delegation to Sync Methods
+
+Most popular LangChain integrations implement asynchronous support of their APIs. For example, the `ainvoke` method of many ChatModel implementations uses the `httpx.AsyncClient` to make asynchronous HTTP requests to the model provider's API.
+
+When an asynchronous implementation is not available, LangChain tries to provide a default implementation, even if it incurs
+a **slight** overhead.
+
+By default, LangChain will delegate the execution of a unimplemented asynchronous methods to the synchronous counterparts. LangChain almost always assumes that the synchronous method should be treated as a blocking operation and should be run in a separate thread.
+This is done using [asyncio.loop.run_in_executor](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor) functionality provided by the `asyncio` library. LangChain uses the default executor provided by the `asyncio` library, which lazily initializes a thread pool executor with a default number of threads that is reused in the given event loop. While this strategy incurs a slight overhead due to context switching between threads, it guarantees that every asynchronous method has a default implementation that works out of the box.
+
+## Performance
+
+Async code in LangChain should generally perform relatively well with minimal overhead out of the box, and is unlikely
+to be a bottleneck in most applications.
+
+The two main sources of overhead are:
+
+1. Cost of context switching between threads when [delegating to synchronous methods](#delegation-to-sync-methods). This can be addressed by providing a native asynchronous implementation.
+2. In [LCEL](/docs/concepts/lcel) any "cheap functions" that appear as part of the chain will be either scheduled as tasks on the event loop (if they are async) or run in a separate thread (if they are sync), rather than just be run inline.
+
+The latency overhead you should expect from these is between tens of microseconds to a few milliseconds.
+
+A more common source of performance issues arises from users accidentally blocking the event loop by calling synchronous code in an async context (e.g., calling `invoke` rather than `ainvoke`).
+
+## Compatibility
+
+LangChain is only compatible with the `asyncio` library, which is distributed as part of the Python standard library. It will not work with other async libraries like `trio` or `curio`.
+
+In Python 3.9 and 3.10, [asyncio's tasks](https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task) did not
+accept a `context` parameter. Due to this limitation, LangChain cannot automatically propagate the `RunnableConfig` down the call chain
+in certain scenarios.
+
+If you are experiencing issues with streaming, callbacks or tracing in async code and are using Python 3.9 or 3.10, this is a likely cause.
+
+Please read [Propagation RunnableConfig](/docs/concepts/runnables#propagation-runnableconfig) for more details to learn how to propagate the `RunnableConfig` down the call chain manually (or upgrade to Python 3.11 where this is no longer an issue).
+
+## How to use in IPython and Jupyter Notebooks
+
+As of IPython 7.0, IPython supports asynchronous REPLs. This means that you can use the `await` keyword in the IPython REPL and Jupyter Notebooks without any additional setup. For more information, see the [IPython blog post](https://blog.jupyter.org/ipython-7-0-async-repl-a35ce050f7f7).
+
--- a/docs/docs/concepts/callbacks.mdx
+++ b/docs/docs/concepts/callbacks.mdx
@@ -0,0 +1,21 @@
+# Callbacks
+
+:::note Pre-requisites
+- [Runnable interface](/docs/concepts/#runnable-interface)
+:::
+
+The lowest level way to stream outputs from LLMs in LangChain is via the [callbacks](/docs/concepts/#callbacks) system. You can pass a
+callback handler that handles the [`on_llm_new_token`](https://python.langchain.com/api_reference/langchain/callbacks/langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.html#langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.on_llm_new_token) event into LangChain components. When that component is invoked, any
+[LLM](/docs/concepts/#llms) or [chat model](/docs/concepts/#chat-models) contained in the component calls
+the callback with the generated token. Within the callback, you could pipe the tokens into some other destination, e.g. a HTTP response.
+You can also handle the [`on_llm_end`](https://python.langchain.com/api_reference/langchain/callbacks/langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.html#langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.on_llm_end) event to perform any necessary cleanup.
+
+You can see [this how-to section](/docs/how_to/#callbacks) for more specifics on using callbacks.
+
+Callbacks were the first technique for streaming introduced in LangChain. While powerful and generalizable,
+they can be unwieldy for developers. For example:
+
+- You need to explicitly initialize and manage some aggregator or other stream to collect results.
+- The execution order isn't explicitly guaranteed, and you could theoretically have a callback run after the `.invoke()` method finishes.
+- Providers would often make you pass an additional parameter to stream outputs instead of returning them all at once.
+- You would often ignore the result of the actual model call in favor of callback results.
--- a/docs/docs/concepts/chat_history.mdx
+++ b/docs/docs/concepts/chat_history.mdx
@@ -0,0 +1,46 @@
+# Chat History
+
+:::info Prerequisites
+
+- [Messages](/docs/concepts/messages)
+- [Chat Models](/docs/concepts/chat_models)
+- [Tool Calling](/docs/concepts/tool_calling)
+:::
+
+## Overview
+
+Chat history is a record of the conversation between the user and the chat model. It is used to maintain context and state throughout the conversation. The chat history is sequence of [messages](/docs/concepts/messages), each of which is associated with a specific [role](/docs/concepts/messages#role), such as "user", "assistant", "system", or "tool".
+
+## Conversation Patterns
+
+Most conversations start with a **system message** that sets the context for the conversation. This is followed by a **user message** containing the user's input, and then an **assistant message** containing the model's response.
+
+The **assistant** may respond directly to the user or if configured with tools request that a [tool](/docs/concepts/tool_calling) be invoked to perform a specific task.
+
+So a full conversation often involves a combination of two patterns of alternating messages:
+
+1. The **user** and the **assistant** representing a back-and-forth conversation.
+2. The **assistant** and **tool messages** representing an ["agentic" workflow](/docs/concepts/agents) where the assistant is invoking tools to perform specific tasks.
+
+## Managing Chat History
+
+Since chat models have a maximum limit on input size, it's important to manage chat history and trim it as needed to avoid exceeding the [context window](/docs/concepts/chat_models#context_window).
+
+While processing chat history, it's essential to preserve a correct conversation structure. 
+
+Key guidelines for managing chat history:
+
+- The conversation should follow one of these structures:
+    - The first message is either a "user" message or a "system" message, followed by a "user" and then an "assistant" message.
+    - The last message should be either a "user" message or a "tool" message containing the result of a tool call.
+- When using [tool calling](/docs/concepts/tool_calling), a "tool" message should only follow an "assistant" message that requested the tool invocation.
+
+:::tip
+Understanding correct conversation structure is essential for being able to properly implement
+[memory](https://langchain-ai.github.io/langgraph/concepts/memory/) in chat models.
+:::
+
+## Related Resources
+
+- [How to Trim Messages](https://python.langchain.com/docs/how_to/trim_messages/)
+- [Memory Guide](https://langchain-ai.github.io/langgraph/concepts/memory/) for information on implementing short-term and long-term memory in chat models using [LangGraph](https://langchain-ai.github.io/langgraph/).
--- a/docs/docs/concepts/chat_models.mdx
+++ b/docs/docs/concepts/chat_models.mdx
@@ -0,0 +1,162 @@
+# Chat Models
+
+## Overview
+
+Large Language Models (LLMs) are advanced machine learning models that excel in a wide range of language-related tasks such as text generation, translation, summarization, question answering, and more, without needing task-specific tuning for every scenario.
+
+Modern LLMs are typically accessed through a chat model interface that takes [messages](/docs/concepts/messages) as input and returns [messages](/docs/concepts/messages) as output.
+
+The newest generation of chat models offer additional capabilities:
+
+* [Tool Calling](/docs/concepts#tool-calling): Many popular chat models offer a native [tool calling](/docs/concepts#tool-calling) API. This API allows developers to build rich applications that enable AI to interact with external services, APIs, and databases. Tool calling can also be used to extract structured information from unstructured data and perform various other tasks.
+* [Multimodality](/docs/concepts/multimodality): The ability to work with data other than text; for example, images, audio, and video.
+
+## Features
+
+LangChain provides a consistent interface for working with chat models from different providers while offering additional features for monitoring, debugging, and optimizing the performance of applications that use LLMs.
+
+* Integrations with many chat model providers (e.g., Anthropic, OpenAI, Ollama, Cohere, Hugging Face, Groq, Microsoft Azure, Google Vertex, Amazon Bedrock). Please see [chat model integrations](/docs/integrations/chat/) for an up-to-date list of supported models.
+* Use either LangChain's [messages](/docs/concepts/messages) format or OpenAI format.
+* Standard [tool calling API](/docs/concepts#tool-calling): standard interface for binding tools to models, accessing tool call requests made by models, and sending tool results back to the model.
+* Standard API for structuring outputs (/docs/concepts/structured_outputs) via the `with_structured_output` method.
+* Provides support for [async programming](/docs/concepts/async), [efficient batching](/docs/concepts/runnables#batch), [a rich streaming API](/docs/concepts/streaming).
+* Integration with [LangSmith](https://docs.smith.langchain.com) for monitoring and debugging production-grade applications based on LLMs.
+* Additional features like standardized [token usage](/docs/concepts/messages#token_usage), [rate limiting](#rate-limiting), [caching](#cache) and more.
+
+##  Available Integrations
+
+LangChain has many chat model integrations that allow you to use a wide variety of models from different providers.
+
+These integrations are one of two types:
+
+1. **Official Models**: These are models that are officially supported by LangChain and/or model provider. You can find these models in the `langchain-<provider>` packages.
+2. **Community Models**: There are models that are mostly contributed and supported by the community. You can find these models in the `langchain-community` package.
+
+LangChain chat models are named with a convention that prefixes "Chat" to their class names (e.g., `ChatOllama`, `ChatAnthropic`, `ChatOpenAI`, etc.).
+
+Please review the [chat model integrations](/docs/integrations/chat/) for a list of supported models.
+
+:::note
+Models that do **not** include the prefix "Chat" in their name or include "LLM" as a suffix in their name typically refer to older models that do not follow the chat model interface and instead use an interface that takes a string as input and returns a string as output.
+:::
+
+## Interface
+
+LangChain chat models implement the [BaseChatModel](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html) interface. Because [BaseChatModel] also implements the [Runnable Interface](/docs/concepts/runnables), chat models support a [standard streaming interface](/docs/concepts/streaming), [async programming](/docs/concepts/async), optimized [batching](/docs/concepts/runnables#batch), and more. Please see the [Runnable Interface](/docs/concepts/runnables) for more details.
+
+Many of the key methods of chat models operate on [messages](/docs/concepts/messages) as input and return messages as output.
+
+Chat models offer a standard set of parameters that can be used to configure the model. These parameters are typically used to control the behavior of the model, such as the temperature of the output, the maximum number of tokens in the response, and the maximum time to wait for a response. Please see the [standard parameters](#standard-parameters) section for more details.
+
+### Key Methods
+
+The key methods of a chat model are:
+
+1. **invoke**: The primary method for interacting with a chat model. It takes a list of [messages](/docs/concepts/messages) as input and returns a list of messages as output.
+2. **stream**: A method that allows you to stream the output of a chat model as it is generated.
+3. **batch**: A method that allows you to batch multiple requests to a chat model together for more efficient processing.
+4. **bind_tools**: A method that allows you to bind a tool to a chat model for use in the model's execution context.
+5. **with_structured_output**: A wrapper around the `invoke` method for models that natively support [structured output](/docs/concepts#structured_output).
+
+Other important methods can be found in the [BaseChatModel API Reference](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html).
+
+### Inputs and Outputs 
+
+Modern LLMs are typically accessed through a chat model interface that takes [messages](/docs/concepts/messages) as input and returns [messages](/docs/concepts/messages) as output. Messages are typically associated with a role (e.g., "system", "human", "assistant") and one or more content blocks that contain text or potentially multimodal data (e.g., images, audio, video).
+
+LangChain supports two message formats to interact with chat models:
+
+1. **LangChain Message Format**: LangChain's own message format, which is used by default and is used internally by LangChain.
+2. **OpenAI's Message Format**: OpenAI's message format.
+
+### Standard Parameters
+
+Many chat models have standardized parameters that can be used to configure the model:
+
+| Parameter      | Description                                                                                                                                                                                                                                                                                                    |
+|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `model`        | The name or identifier of the specific AI model you want to use (e.g., `"gpt-3.5-turbo"` or `"gpt-4"`).                                                                                                                                                                                                        |
+| `temperature`  | Controls the randomness of the model's output. A higher value (e.g., 1.0) makes responses more creative, while a lower value (e.g., 0.1) makes them more deterministic and focused.                                                                                                                            |
+| `timeout`      | The maximum time (in seconds) to wait for a response from the model before canceling the request. Ensures the request doesn’t hang indefinitely.                                                                                                                                                               |
+| `max_tokens`   | Limits the total number of tokens (words and punctuation) in the response. This controls how long the output can be.                                                                                                                                                                                           |
+| `stop`         | Specifies stop sequences that indicate when the model should stop generating tokens. For example, you might use specific strings to signal the end of a response.                                                                                                                                              |
+| `max_retries`  | The maximum number of attempts the system will make to resend a request if it fails due to issues like network timeouts or rate limits.                                                                                                                                                                        |
+| `api_key`      | The API key required for authenticating with the model provider. This is usually issued when you sign up for access to the model.                                                                                                                                                                              |
+| `base_url`     | The URL of the API endpoint where requests are sent. This is typically provided by the model's provider and is necessary for directing your requests.                                                                                                                                                          |
+| `rate_limiter` | An optional [BaseRateLimiter](https://python.langchain.com/api_reference/core/rate_limiters/langchain_core.rate_limiters.BaseRateLimiter.html#langchain_core.rate_limiters.BaseRateLimiter) to space out requests to avoid exceeding rate limits.  See [rate-limiting](#rate-limiting) below for more details. |
+
+Some important things to note:
+
+- Standard parameters only apply to model providers that expose parameters with the intended functionality. For example, some providers do not expose a configuration for maximum output tokens, so max_tokens can't be supported on these.
+- Standard params are currently only enforced on integrations that have their own integration packages (e.g. `langchain-openai`, `langchain-anthropic`, etc.), they're not enforced on models in ``langchain-community``.
+
+ChatModels also accept other parameters that are specific to that integration. To find all the parameters supported by a ChatModel head to the [API reference](https://python.langchain.com/api_reference/) for that model.
+
+## Tool Calling
+
+Chat models can call [tools](/docs/concepts/tools) to perform tasks such as fetching data from a database, making API requests, or running custom code. Please
+see the [tool calling](/docs/concepts#tool-calling) guide for more information.
+
+## Structured Outputs
+
+Chat models can be requested to respond in a particular format (e.g., JSON or matching a particular schema). This feature is extremely
+useful for information extraction tasks. Please read more about
+the technique in the [structured outputs](/docs/concepts#structured_output) guide.
+
+## Multimodality
+
+Large Language Models (LLMs) are not limited to processing text. They can also be used to process other types of data, such as images, audio, and video. This is known as [multimodality](/docs/concepts/multimodality).
+
+Currently, only some LLMs support multimodal inputs, and almost none support multimodal outputs. Please consult the specific model documentation for details.
+
+## Context Window
+
+A chat model's context window refers to the maximum size of the input sequence the model can process at one time. While the context windows of modern LLMs are quite large, they still present a limitation that developers must keep in mind when working with chat models.
+
+If the input exceeds the context window, the model may not be able to process the entire input and could raise an error. In conversational applications, this is especially important because the context window determines how much information the model can "remember" throughout a conversation. Developers often need to manage the input within the context window to maintain a coherent dialogue without exceeding the limit. For more details on handling memory in conversations, refer to the [memory](https://langchain-ai.github.io/langgraph/concepts/memory/).
+
+The size of the input is measured in [tokens](/docs/concepts/tokens) which are the unit of processing that the model uses.
+
+## Advanced Topics 
+ 
+### Rate-limiting
+
+Many chat model providers impose a limit on the number of requests that can be made in a given time period.
+
+If you hit a rate limit, you will typically receive a rate limit error response from the provider, and will need to wait before making more requests.
+
+You have a few options to deal with rate limits:
+
+1. Try to avoid hitting rate limits by spacing out requests: Chat models accept a `rate_limiter` parameter that can be provided during initialization. This parameter is used to control the rate at which requests are made to the model provider. Spacing out the requests to a given model is a particularly useful strategy when benchmarking models to evaluate their performance. Please see the [how to handle rate limits](https://python.langchain.com/docs/how_to/chat_model_rate_limiting/) for more information on how to use this feature.
+2. Try to recover from rate limit errors: If you receive a rate limit error, you can wait a certain amount of time before retrying the request. The amount of time to wait can be increased with each subsequent rate limit error. Chat models have a `max_retries` parameter that can be used to control the number of retries. See the [standard parameters](#standard-parameters) section for more information.
+3. Fallback to another chat model: If you hit a rate limit with one chat model, you can switch to another chat model that is not rate-limited.
+
+### Caching
+
+Chat model APIs can be slow, so a natural question is whether to cache the results of previous conversations. Theoretically, caching can help improve performance by reducing the number of requests made to the model provider. In practice, caching chat model responses is a complex problem and should be approached with caution.
+
+The reason is that getting a cache hit is unlikely after the first or second interaction in a conversation if relying on caching the **exact** inputs into the model. For example, how likely do you think that multiple conversations start with the exact same message? What about the exact same three messages?
+
+An alternative approach is to use semantic caching, where you cache responses based on the meaning of the input rather than the exact input itself. This can be effective in some situations, but not in others.
+
+A semantic cache introduces a dependency on another model on the critical path of your application (e.g., the semantic cache may rely on an [embedding model](/docs/concepts/embedding_models) to convert text to a vector representation), and it's not guaranteed to capture the meaning of the input accurately.
+
+However, there might be situations where caching chat model responses is beneficial. For example, if you have a chat model that is used to answer frequently asked questions, caching responses can help reduce the load on the model provider and improve response times.
+
+Please see the [how to cache chat model responses](/docs/how_to/#chat-model-caching) guide for more details.
+
+## Related Resources
+
+* How-to guides on using chat models: [how-to guides](/docs/how_to/#chat-models).
+* List of supported chat models: [chat model integrations](/docs/integrations/chat/).
+
+### Conceptual guides
+
+* [Messages](/docs/concepts/messages)
+* [Tool calling](/docs/concepts#tool-calling)
+* [Multimodality](/docs/concepts/multimodality)
+* [Structured outputs](/docs/concepts#structured_output)
+* [Tokens](/docs/concepts/tokens)
+
+
+
--- a/docs/docs/concepts/embedding_models.mdx
+++ b/docs/docs/concepts/embedding_models.mdx
@@ -0,0 +1,165 @@
+# Embedding models
+<span data-heading-keywords="embedding,embeddings"></span>
+
+:::info[Prerequisites]
+
+* [Documents](/docs/concepts/retrievers/#interface)
+
+:::
+
+:::info[Note]
+This conceptual overview focuses on text-based embedding models.
+
+Embedding models can also be [multimodal](/docs/concepts/multimodality) though such models are not currently supported by LangChain.
+:::
+
+## Overview
+
+Imagine being able to capture the essence of any text - a tweet, document, or book - in a single, compact representation. 
+This is the power of embedding models, which lie at the heart of many retrieval systems.
+Embedding models transform human language into a format that machines can understand and compare with speed and accuracy. 
+These models take text as input and produce a fixed-length array of numbers, a numerical fingerprint of the text's semantic meaning.
+Embeddings allow search system to find relevant documents not just based on keyword matches, but on semantic understanding. 
+
+## Key concepts
+
+![Conceptual Overview](/img/embeddings_concept.png)
+
+(1) **Embed text as a vector**: Embeddings transform text into a numerical vector representation.
+
+(2) **Measure similarity**: Embedding vectors can be comparing using simple mathematical operations.
+
+## Embedding data 
+
+### Historical context 
+
+The landscape of embedding models has evolved significantly over the years. 
+A pivotal moment came in 2018 when Google introduced [BERT (Bidirectional Encoder Representations from Transformers)](https://www.nvidia.com/en-us/glossary/bert/). 
+BERT applied transformer models to embed text as a simple vector representation, which lead to unprecedented performance across various NLP tasks.
+However, BERT wasn't optimized for generating sentence embeddings efficiently. 
+This limitation spurred the creation of [SBERT (Sentence-BERT)](https://www.sbert.net/examples/training/sts/README.html), which adapted the BERT architecture to generate semantically rich sentence embeddings, easily comparable via similarity metrics like cosine similarity, dramatically reduced the computational overhead for tasks like finding similar sentences.
+Today, the embedding model ecosystem is diverse, with numerous providers offering their own implementations. 
+To navigate this variety, researchers and practitioners often turn to benchmarks like the Massive Text Embedding Benchmark (MTEB) [here](https://huggingface.co/blog/mteb) for objective comparisons.
+
+:::info[Further reading]
+
+* See the [seminal BERT paper](https://arxiv.org/abs/1810.04805).
+* See Cameron Wolfe's [excellent review](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2) of embedding models.
+* See the [Massive Text Embedding Benchmark (MTEB)](https://huggingface.co/blog/mteb) leaderboard for a comprehensive overview of embedding models.
+
+:::
+
+### LangChain Interface  
+
+Today, there are [many different embedding models](/docs/integrations/text_embedding/).
+LangChain provides a universal interface for working with them, providing standard methods for common operations.
+This common interface simplifies interaction with various embedding providers through two central methods:
+
+- `embed_documents`: For embedding multiple texts (documents)
+- `embed_query`: For embedding a single text (query)
+
+This distinction is important, as some providers employ different embedding strategies for documents (which are to be searched) versus queries (the search input itself).
+To illustrate, here's a practical example using LangChain's `.embed_documents` method to embed a list of strings:
+
+```python
+from langchain_openai import OpenAIEmbeddings
+embeddings_model = OpenAIEmbeddings()
+embeddings = embeddings_model.embed_documents(
+    [
+        "Hi there!",
+        "Oh, hello!",
+        "What's your name?",
+        "My friends call me World",
+        "Hello World!"
+    ]
+)
+len(embeddings), len(embeddings[0])
+(5, 1536)
+```
+
+For convenience, you can also use the `embed_query` method to embed a single text:
+
+```python
+query_embedding = embeddings_model.embed_query("What is the meaning of life?")
+```
+
+:::info[Further reading]
+
+* See the full list of [LangChain embedding model integrations](/docs/integrations/text_embedding/).
+* See these [how-to guides](/docs/how_to/embed_text) for working with embedding models.
+
+:::
+
+## Measure similarity
+
+Each embedding is essentially a set of coordinates in a vast, abstract space. 
+In this space, the position of each point (embedding) reflects the meaning of its corresponding text.
+Just as similar words might be close to each other in a thesaurus, similar concepts end up close to each other in this embedding space. 
+This allows for intuitive comparisons between different pieces of text.
+By reducing text to these numerical representations, we can use simple mathematical operations to quickly measure how alike two pieces of text are, regardless of their original length or structure.
+Some common similarity metrics include:
+
+- **Cosine Similarity**: Measures the cosine of the angle between two vectors.
+- **Euclidean Distance**: Measures the straight-line distance between two points.
+- **Dot Product**: Measures the projection of one vector onto another.
+
+As an example, any two embedded texts can be compared with cosine_similarity:
+
+```python
+import numpy as np
+
+def cosine_similarity(vec1, vec2):
+    dot_product = np.dot(vec1, vec2)
+    norm_vec1 = np.linalg.norm(vec1)
+    norm_vec2 = np.linalg.norm(vec2)
+    return dot_product / (norm_vec1 * norm_vec2)
+
+similarity = cosine_similarity(query_result, document_result)
+print("Cosine Similarity:", similarity)
+```  
+
+:::info[Further reading]
+
+* See Simon Willison’s [nice blog post and video](https://simonwillison.net/2023/Oct/23/embeddings/) on embeddings and similarity metrics.
+* See [this documentation](https://developers.google.com/machine-learning/clustering/dnn-clustering/supervised-similarity) from Google on similarity metrics to consider with embeddings.
+* See Pinecone's [blog post](https://www.pinecone.io/learn/vector-similarity/) on similarity metrics.
+* See OpenAI's [FAQ](https://platform.openai.com/docs/guides/embeddings/faq) on what similarity metric to use with OpenAI embeddings.
+
+::: 
+
+
+## Advanced 
+
+### Embedding with higher granularity  
+
+![](/img/embeddings_colbert.png)
+
+Embedding models compress text into fixed-length (vector) representations, which can put a heavy burden on that single vector to capture the semantic nuance and detail of the document. 
+In some cases, irrelevant or redundant content can dilute the semantic usefulness of the embedding.
+[ColBERT](https://arxiv.org/abs/2004.12832) (Contextualized Late Interaction over BERT) is an innovative approach to address this limitation by using higher granularity embeddings. 
+Here's how ColBERT works:
+
+- **Token-level embeddings**: Produce contextually influenced embeddings for each token in the document and the query.
+- **MaxSim operation**: For each query token, compute its maximum similarity with all document tokens.
+- **Aggregation**: The final relevance score is obtained by summing these maximum similarities across all query tokens.
+
+This token-wise scoring can yield strong results, especially for tasks requiring precise matching or handling longer documents.
+Key advantages of ColBERT:
+
+- **Improved accuracy**: Token-level interactions can capture more nuanced relationships between query and document.
+- **Interpretability**: The token-level matching allows for easier interpretation of why a document was considered relevant.
+
+However, ColBERT does come with some trade-offs:
+
+- **Increased computational cost**: Processing and storing token-level embeddings requires more resources.
+- **Complexity**: Implementing and optimizing ColBERT can be more challenging than simpler embedding models.
+
+| Name                                                                             | When to use                                    | Description                                                                                                                                                                            |
+|----------------------------------------------------------------------------------|------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [ColBERT](/docs/integrations/providers/ragatouille/#using-colbert-as-a-reranker) | When higher granularity embeddings are needed. | ColBERT uses contextually influenced embeddings for each token in the document and query to get a granular query-document similarity score. [Paper](https://arxiv.org/abs/2112.01488). |
+
+:::tip
+
+See our RAG from Scratch video on [ColBERT](https://youtu.be/cN6S0Ehm7_8?feature=shared>).
+
+:::
--- a/docs/docs/concepts/index.mdx
+++ b/docs/docs/concepts/index.mdx
@@ -1,134 +1,32 @@
+---
+sidebar_position: 0
+sidebar_class_name: hidden
+---
+
 # Conceptual guide

-import ThemedImage from '@theme/ThemedImage';
-import useBaseUrl from '@docusaurus/useBaseUrl';
+In this section, you'll find explanations of the key concepts, providing a deeper understanding of core principles.

-This section contains introductions to key parts of LangChain.
+The conceptual guide will not cover step-by-step instructions or specific implementation details — those are found in the [How-To Guides](/docs/how_to/) and [Tutorials](/docs/tutorials) sections. For detailed reference material, please visit the [API Reference](https://python.langchain.com/api_reference/).

 ## Architecture

-LangChain as a framework consists of a number of packages.
+* Conceptual Guide: [LangChain Architecture](/docs/concepts/architecture)

-### `langchain-core`
-This package contains base abstractions of different components and ways to compose them together.
-The interfaces for core components like LLMs, vector stores, retrievers and more are defined here.
-No third party integrations are defined here.
-The dependencies are kept purposefully very lightweight.
-
-### `langchain`
-
-The main `langchain` package contains chains, agents, and retrieval strategies that make up an application's cognitive architecture.
-These are NOT third party integrations.
-All chains, agents, and retrieval strategies here are NOT specific to any one integration, but rather generic across all integrations.
-
-### `langchain-community`
-
-This package contains third party integrations that are maintained by the LangChain community.
-Key partner packages are separated out (see below).
-This contains all integrations for various components (LLMs, vector stores, retrievers).
-All dependencies in this package are optional to keep the package as lightweight as possible.
-
-### Partner packages
-
-While the long tail of integrations is in `langchain-community`, we split popular integrations into their own packages (e.g. `langchain-openai`, `langchain-anthropic`, etc).
-This was done in order to improve support for these important integrations.
-
-### [`langgraph`](https://langchain-ai.github.io/langgraph)
-
-`langgraph` is an extension of `langchain` aimed at
-building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
-
-LangGraph exposes high level interfaces for creating common types of agents, as well as a low-level API for composing custom flows.
-
-### [`langserve`](/docs/langserve)
-
-A package to deploy LangChain chains as REST APIs. Makes it easy to get a production ready API up and running.
-
-### [LangSmith](https://docs.smith.langchain.com)
-
-A developer platform that lets you debug, test, evaluate, and monitor LLM applications.
-
-<ThemedImage
-  alt="Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers."
-  sources={{
-    light: useBaseUrl('/svg/langchain_stack_062024.svg'),
-    dark: useBaseUrl('/svg/langchain_stack_062024_dark.svg'),
-  }}
-  title="LangChain Framework Overview"
-  style={{ width: "100%" }}
-/>
-
-## LangChain Expression Language (LCEL)
-<span data-heading-keywords="lcel"></span>
-
-`LangChain Expression Language`, or `LCEL`, is a declarative way to chain LangChain components.
-LCEL was designed from day 1 to **support putting prototypes in production, with no code changes**, from the simplest “prompt + LLM” chain to the most complex chains (we’ve seen folks successfully run LCEL chains with 100s of steps in production). To highlight a few of the reasons you might want to use LCEL:
-
- **First-class streaming support:**
-When you build your chains with LCEL you get the best possible time-to-first-token (time elapsed until the first chunk of output comes out). For some chains this means eg. we stream tokens straight from an LLM to a streaming output parser, and you get back parsed, incremental chunks of output at the same rate as the LLM provider outputs the raw tokens.
-
- **Async support:**
-Any chain built with LCEL can be called both with the synchronous API (eg. in your Jupyter notebook while prototyping) as well as with the asynchronous API (eg. in a [LangServe](/docs/langserve/) server). This enables using the same code for prototypes and in production, with great performance, and the ability to handle many concurrent requests in the same server.
-
- **Optimized parallel execution:**
-Whenever your LCEL chains have steps that can be executed in parallel (eg if you fetch documents from multiple retrievers) we automatically do it, both in the sync and the async interfaces, for the smallest possible latency.
-
- **Retries and fallbacks:**
-Configure retries and fallbacks for any part of your LCEL chain. This is a great way to make your chains more reliable at scale. We’re currently working on adding streaming support for retries/fallbacks, so you can get the added reliability without any latency cost.
-
- **Access intermediate results:**
-For more complex chains it’s often very useful to access the results of intermediate steps even before the final output is produced. This can be used to let end-users know something is happening, or even just to debug your chain. You can stream intermediate results, and it’s available on every [LangServe](/docs/langserve) server.
-
- **Input and output schemas**
-Input and output schemas give every LCEL chain Pydantic and JSONSchema schemas inferred from the structure of your chain. This can be used for validation of inputs and outputs, and is an integral part of LangServe.
-
- [**Seamless LangSmith tracing**](https://docs.smith.langchain.com)
-As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at every step.
-With LCEL, **all** steps are automatically logged to [LangSmith](https://docs.smith.langchain.com/) for maximum observability and debuggability.
-
-LCEL aims to provide consistency around behavior and customization over legacy subclassed chains such as `LLMChain` and
-`ConversationalRetrievalChain`. Many of these legacy chains hide important details like prompts, and as a wider variety
-of viable models emerge, customization has become more and more important.
-
-If you are currently using one of these legacy chains, please see [this guide for guidance on how to migrate](/docs/versions/migrating_chains).
-
-For guides on how to do specific tasks with LCEL, check out [the relevant how-to guides](/docs/how_to/#langchain-expression-language-lcel).
-
-### Runnable interface
+## Runnable interface
 <span data-heading-keywords="invoke,runnable"></span>

-To make it as easy as possible to create custom chains, we've implemented a ["Runnable"](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable) protocol. Many LangChain components implement the `Runnable` protocol, including chat models, LLMs, output parsers, retrievers, prompt templates, and more. There are also several useful primitives for working with runnables, which you can read about below.
+* Conceptual Guide: [About the Runnable interface](/docs/concepts/runnables)
+* How-to Guides: [How to use the Runnable interface](/docs/how_to/#langchain-expression-language-lcel)

-This is a standard interface, which makes it easy to define custom chains as well as invoke them in a standard way.
-The standard interface includes:
+The Runnable interface is a standard interface for defining and invoking LangChain components.

- `stream`: stream back chunks of the response
- `invoke`: call the chain on an input
- `batch`: call the chain on a list of inputs
+## LangChain Expression Language (LCEL)

-These also have corresponding async methods that should be used with [asyncio](https://docs.python.org/3/library/asyncio.html) `await` syntax for concurrency:
+<span data-heading-keywords="lcel"></span>

- `astream`: stream back chunks of the response async
- `ainvoke`: call the chain on an input async
- `abatch`: call the chain on a list of inputs async
- `astream_log`: stream back intermediate steps as they happen, in addition to the final response
- `astream_events`: **beta** stream events as they happen in the chain (introduced in `langchain-core` 0.1.14)
-
-The **input type** and **output type** varies by component:
-
-| Component    | Input Type                                            | Output Type           |
-|--------------|-------------------------------------------------------|-----------------------|
-| Prompt       | Dictionary                                            | PromptValue           |
-| ChatModel    | Single string, list of chat messages or a PromptValue | ChatMessage           |
-| LLM          | Single string, list of chat messages or a PromptValue | String                |
-| OutputParser | The output of an LLM or ChatModel                     | Depends on the parser |
-| Retriever    | Single string                                         | List of Documents     |
-| Tool         | Single string or dictionary, depending on the tool    | Depends on the tool   |
-
-
-All runnables expose input and output **schemas** to inspect the inputs and outputs:
- `input_schema`: an input Pydantic model auto-generated from the structure of the Runnable
- `output_schema`: an output Pydantic model auto-generated from the structure of the Runnable
+* Conceptual Guide: [About the Runnable interface](/docs/concepts/lcel)
+* How-to Guides: [How to use the Runnable interface](/docs/how_to/#langchain-expression-language-lcel)

 ## Components

@@ -136,51 +34,16 @@ LangChain provides standard, extendable interfaces and external integrations for
 Some components LangChain implements, some components we rely on third-party integrations for, and others are a mix.

 ### Chat models
+
 <span data-heading-keywords="chat model,chat models"></span>

-Language models that use a sequence of messages as inputs and return chat messages as outputs (as opposed to using plain text).
-These are traditionally newer models (older models are generally `LLMs`, see below).
-Chat models support the assignment of distinct roles to conversation messages, helping to distinguish messages from the AI, users, and instructions such as system messages.
-
-Although the underlying models are messages in, message out, the LangChain wrappers also allow these models to take a string as input. This means you can easily use chat models in place of LLMs.
-
-When a string is passed in as input, it is converted to a `HumanMessage` and then passed to the underlying model.
-
-LangChain does not host any Chat Models, rather we rely on third party integrations.
-
-We have some standardized parameters when constructing ChatModels:
- `model`: the name of the model
- `temperature`: the sampling temperature
- `timeout`: request timeout
- `max_tokens`: max tokens to generate
- `stop`: default stop sequences
- `max_retries`: max number of times to retry requests
- `api_key`: API key for the model provider
- `base_url`: endpoint to send requests to
-
-Some important things to note:
- standard params only apply to model providers that expose parameters with the intended functionality. For example, some providers do not expose a configuration for maximum output tokens, so max_tokens can't be supported on these.
- standard params are currently only enforced on integrations that have their own integration packages (e.g. `langchain-openai`, `langchain-anthropic`, etc.), they're not enforced on models in ``langchain-community``.
-
-ChatModels also accept other parameters that are specific to that integration. To find all the parameters supported by a ChatModel head to the API reference for that model.
-
-:::important
-Some chat models have been fine-tuned for **tool calling** and provide a dedicated API for it.
-Generally, such models are better at tool calling than non-fine-tuned models, and are recommended for use cases that require tool calling.
-Please see the [tool calling section](/docs/concepts/#functiontool-calling) for more information.
-:::
-
-For specifics on how to use chat models, see the [relevant how-to guides here](/docs/how_to/#chat-models).
+* Conceptual Guide: [About Chat Models](/docs/concepts/chat_models)
+* Integrations: [LangChain Chat Model Integrations](/docs/integrations/chat/)
+* How-to Guides: [How to use Chat Models](/docs/how_to/#chat-models)

 #### Multimodality

-Some chat models are multimodal, accepting images, audio and even video as inputs. These are still less common, meaning model providers haven't standardized on the "best" way to define the API. Multimodal **outputs** are even less common. As such, we've kept our multimodal abstractions fairly light weight and plan to further solidify the multimodal APIs and interaction patterns as the field matures.
-
-In LangChain, most chat models that support multimodal inputs also accept those values in OpenAI's content blocks format. So far this is restricted to image inputs. For models like Gemini which support video and other bytes input, the APIs also support the native, model-specific representations.
-
-For specifics on how to use multimodal models, see the [relevant how-to guides here](/docs/how_to/#multimodal).
-
-For a full list of LangChain model providers with multimodal models, [check out this table](/docs/integrations/chat/#advanced-features).
+* Conceptual Guide: [About Multimodal Chat Models](/docs/concepts/multimodality)

 ### LLMs
 <span data-heading-keywords="llm,llms"></span>
@@ -192,159 +55,33 @@ even for non-chat use cases.
 You are probably looking for [the section above instead](/docs/concepts/#chat-models).
 :::

-Language models that takes a string as input and returns a string.
-These are traditionally older models (newer models generally are [Chat Models](/docs/concepts/#chat-models), see above).
-
-Although the underlying models are string in, string out, the LangChain wrappers also allow these models to take messages as input.
-This gives them the same interface as [Chat Models](/docs/concepts/#chat-models).
-When messages are passed in as input, they will be formatted into a string under the hood before being passed to the underlying model.
-
-LangChain does not host any LLMs, rather we rely on third party integrations.
-
-For specifics on how to use LLMs, see the [how-to guides](/docs/how_to/#llms).
+* Conceptual Guide: [About Language Models](/docs/concepts/llms)
+* Integrations: [LangChain LLM Integrations](/docs/integrations/llms/)
+* How-to Guides: [How to use LLMs](/docs/how_to/#llms)

+<a id="aimessage"></a>
+<a id="systemmessage"></a>
+<a id="humanmessage"></a>
+<a id="toolmessage"></a>
+<a id="legacy-functionmessage"></a>
 ### Messages

-Some language models take a list of messages as input and return a message.
-There are a few different types of messages.
-All messages have a `role`, `content`, and `response_metadata` property.
-
-The `role` describes WHO is saying the message. The standard roles are "user", "assistant", "system", and "tool".
-LangChain has different message classes for different roles.
-
-The `content` property describes the content of the message.
-This can be a few different things:
-
- A string (most models deal with this type of content)
- A List of dictionaries (this is used for multimodal input, where the dictionary contains information about that input type and that input location)
-
-Optionally, messages can have a `name` property which allows for differentiating between multiple speakers with the same role.
-For example, if there are two users in the chat history it can be useful to differentiate between them. Not all models support this.
-
-#### HumanMessage
-
-This represents a message with role "user".
-
-#### AIMessage
-
-This represents a message with role "assistant". In addition to the `content` property, these messages also have:
-
-**`response_metadata`**
-
-The `response_metadata` property contains additional metadata about the response. The data here is often specific to each model provider.
-This is where information like log-probs and token usage may be stored.
-
-**`tool_calls`**
-
-These represent a decision from a language model to call a tool. They are included as part of an `AIMessage` output.
-They can be accessed from there with the `.tool_calls` property.
-
-This property returns a list of `ToolCall`s. A `ToolCall` is a dictionary with the following arguments:
-
- `name`: The name of the tool that should be called.
- `args`: The arguments to that tool.
- `id`: The id of that tool call.
-
-#### SystemMessage
-
-This represents a message with role "system", which tells the model how to behave. Not every model provider supports this.
-
-#### ToolMessage
-
-This represents a message with role "tool", which contains the result of calling a tool. In addition to `role` and `content`, this message has:
-
- a `tool_call_id` field which conveys the id of the call to the tool that was called to produce this result.
- an `artifact` field which can be used to pass along arbitrary artifacts of the tool execution which are useful to track but which should not be sent to the model.
-
-With most chat models, a `ToolMessage` can only appear in the chat history after an `AIMessage` that has a populated `tool_calls` field.
-
-#### (Legacy) FunctionMessage
-
-This is a legacy message type, corresponding to OpenAI's legacy function-calling API. `ToolMessage` should be used instead to correspond to the updated tool-calling API.
-
-This represents the result of a function call. In addition to `role` and `content`, this message has a `name` parameter which conveys the name of the function that was called to produce this result.
-
+* Conceptual Guide: [About Messages](/docs/concepts/messages)
+* How-to Guides: [How to use Messages](/docs/how_to/#messages)

 ### Prompt templates
 <span data-heading-keywords="prompt,prompttemplate,chatprompttemplate"></span>

-Prompt templates help to translate user input and parameters into instructions for a language model.
-This can be used to guide a model's response, helping it understand the context and generate relevant and coherent language-based output.
-
-Prompt Templates take as input a dictionary, where each key represents a variable in the prompt template to fill in.
-
-Prompt Templates output a PromptValue. This PromptValue can be passed to an LLM or a ChatModel, and can also be cast to a string or a list of messages.
-The reason this PromptValue exists is to make it easy to switch between strings and messages.
-
-There are a few different types of prompt templates:
+Conceptual Guide: [About Prompt Templates](/docs/concepts/prompts)
+How-to Guides: [How to use Prompt Templates](/docs/how_to/#prompt-templates)

 #### String PromptTemplates

-These prompt templates are used to format a single string, and generally are used for simpler inputs.
-For example, a common way to construct and use a PromptTemplate is as follows:
-
-```python
-from langchain_core.prompts import PromptTemplate
-
-prompt_template = PromptTemplate.from_template("Tell me a joke about {topic}")
-
-prompt_template.invoke({"topic": "cats"})
-```
-
 #### ChatPromptTemplates

-These prompt templates are used to format a list of messages. These "templates" consist of a list of templates themselves.
-For example, a common way to construct and use a ChatPromptTemplate is as follows:
-
-```python
-from langchain_core.prompts import ChatPromptTemplate
-
-prompt_template = ChatPromptTemplate.from_messages([
-    ("system", "You are a helpful assistant"),
-    ("user", "Tell me a joke about {topic}")
-])
-
-prompt_template.invoke({"topic": "cats"})
-```
-
-In the above example, this ChatPromptTemplate will construct two messages when called.
-The first is a system message, that has no variables to format.
-The second is a HumanMessage, and will be formatted by the `topic` variable the user passes in.
-
 #### MessagesPlaceholder
 <span data-heading-keywords="messagesplaceholder"></span>

-This prompt template is responsible for adding a list of messages in a particular place.
-In the above ChatPromptTemplate, we saw how we could format two messages, each one a string.
-But what if we wanted the user to pass in a list of messages that we would slot into a particular spot?
-This is how you use MessagesPlaceholder.
-
-```python
-from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
-from langchain_core.messages import HumanMessage
-
-prompt_template = ChatPromptTemplate.from_messages([
-    ("system", "You are a helpful assistant"),
-    MessagesPlaceholder("msgs")
-])
-
-prompt_template.invoke({"msgs": [HumanMessage(content="hi!")]})
-```
-
-This will produce a list of two messages, the first one being a system message, and the second one being the HumanMessage we passed in.
-If we had passed in 5 messages, then it would have produced 6 messages in total (the system message plus the 5 passed in).
-This is useful for letting a list of messages be slotted into a particular spot.
-
-An alternative way to accomplish the same thing without using the `MessagesPlaceholder` class explicitly is:
-
-```python
-prompt_template = ChatPromptTemplate.from_messages([
-    ("system", "You are a helpful assistant"),
-    ("placeholder", "{msgs}") # <-- This is the changed part
-])
-```
-
-For specifics on how to use prompt templates, see the [relevant how-to guides here](/docs/how_to/#prompt-templates).

 ### Example selectors
 One common prompting technique for achieving better performance is to include examples as part of the prompt.
@@ -360,41 +97,15 @@ For specifics on how to use example selectors, see the [relevant how-to guides h

 :::note

-The information here refers to parsers that take a text output from a model try to parse it into a more structured representation.
-More and more models are supporting function (or tool) calling, which handles this automatically.
-It is recommended to use function/tool calling rather than output parsing.
-See documentation for that [here](/docs/concepts/#function-tool-calling).
+Output parsers precede chat models that were capable of calling tools. These days, it is recommended to use function/tool calling
+as it's simpler while providing better quality results.

+See documentation for that [here](/docs/concepts/#function-tool-calling).
 :::

-`Output parser` is responsible for taking the output of a model and transforming it to a more suitable format for downstream tasks.
-Useful when you are using LLMs to generate structured data, or to normalize output from chat models and LLMs.
+Conceptual Guide: [About Output Parsers](/docs/concepts/output_parsers)
+How-to Guides: [How to use Output Parsers](/docs/how_to/#output-parsers)

-LangChain has lots of different types of output parsers. This is a list of output parsers LangChain supports. The table below has various pieces of information:
-
- **Name**: The name of the output parser
- **Supports Streaming**: Whether the output parser supports streaming.
- **Has Format Instructions**: Whether the output parser has format instructions. This is generally available except when (a) the desired schema is not specified in the prompt but rather in other parameters (like OpenAI function calling), or (b) when the OutputParser wraps another OutputParser.
- **Calls LLM**: Whether this output parser itself calls an LLM. This is usually only done by output parsers that attempt to correct misformatted output.
- **Input Type**: Expected input type. Most output parsers work on both strings and messages, but some (like OpenAI Functions) need a message with specific kwargs.
- **Output Type**: The output type of the object returned by the parser.
- **Description**: Our commentary on this output parser and when to use it.
-
-| Name            | Supports Streaming | Has Format Instructions       | Calls LLM | Input Type                       | Output Type          | Description                                                                                                                                                                                                                                              |
-|-----------------|--------------------|-------------------------------|-----------|----------------------------------|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| [JSON](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.json.JsonOutputParser.html#langchain_core.output_parsers.json.JsonOutputParser)            | ✅                  | ✅                             |           | `str` \| `Message`               | JSON object          | Returns a JSON object as specified. You can specify a Pydantic model and it will return JSON for that model. Probably the most reliable output parser for getting structured data that does NOT use function calling.                                    |
-| [XML](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.xml.XMLOutputParser.html#langchain_core.output_parsers.xml.XMLOutputParser)            | ✅                  | ✅                             |           | `str` \| `Message`                 | `dict`               | Returns a dictionary of tags. Use when XML output is needed. Use with models that are good at writing XML (like Anthropic's).                                                                                                                            |
-| [CSV](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.list.CommaSeparatedListOutputParser.html#langchain_core.output_parsers.list.CommaSeparatedListOutputParser)           | ✅                  | ✅                             |           | `str` \| `Message`                 | `List[str]`          | Returns a list of comma separated values.                                                                                                                                                                                                                |
-| [OutputFixing](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.fix.OutputFixingParser.html#langchain.output_parsers.fix.OutputFixingParser)    |                    |                               | ✅         | `str` \| `Message`                 |                      | Wraps another output parser. If that output parser errors, then this will pass the error message and the bad output to an LLM and ask it to fix the output.                                                                                              |
-| [RetryWithError](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.retry.RetryWithErrorOutputParser.html#langchain.output_parsers.retry.RetryWithErrorOutputParser)  |                    |                               | ✅         | `str` \| `Message`                 |                      | Wraps another output parser. If that output parser errors, then this will pass the original inputs, the bad output, and the error message to an LLM and ask it to fix it. Compared to OutputFixingParser, this one also sends the original instructions. |
-| [Pydantic](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.pydantic.PydanticOutputParser.html#langchain_core.output_parsers.pydantic.PydanticOutputParser)        |                    | ✅                             |           | `str` \| `Message`                 | `pydantic.BaseModel` | Takes a user defined Pydantic model and returns data in that format.                                                                                                                                                                                     |
-| [YAML](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.yaml.YamlOutputParser.html#langchain.output_parsers.yaml.YamlOutputParser)        |                    | ✅                             |           | `str` \| `Message`                 | `pydantic.BaseModel` | Takes a user defined Pydantic model and returns data in that format. Uses YAML to encode it.                                                                                                                                                                                    |
-| [PandasDataFrame](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.pandas_dataframe.PandasDataFrameOutputParser.html#langchain.output_parsers.pandas_dataframe.PandasDataFrameOutputParser) |                    | ✅                             |           | `str` \| `Message`                 | `dict`               | Useful for doing operations with pandas DataFrames.                                                                                                                                                                                                      |
-| [Enum](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.enum.EnumOutputParser.html#langchain.output_parsers.enum.EnumOutputParser)            |                    | ✅                             |           | `str` \| `Message`                 | `Enum`               | Parses response into one of the provided enum values.                                                                                                                                                                                                    |
-| [Datetime](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.datetime.DatetimeOutputParser.html#langchain.output_parsers.datetime.DatetimeOutputParser)        |                    | ✅                             |           | `str` \| `Message`                 | `datetime.datetime`  | Parses response into a datetime string.                                                                                                                                                                                                                  |
-| [Structured](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.structured.StructuredOutputParser.html#langchain.output_parsers.structured.StructuredOutputParser)      |                    | ✅                             |           | `str` \| `Message`                 | `Dict[str, str]`     | An output parser that returns structured information. It is less powerful than other output parsers since it only allows for fields to be strings. This can be useful when you are working with smaller LLMs.                                            |
-
-For specifics on how to use output parsers, see the [relevant how-to guides here](/docs/how_to/#output-parsers).

 ### Chat history
 Most LLM applications have a conversational interface.
@@ -421,82 +132,40 @@ These classes load Document objects. LangChain has hundreds of integrations with
 Each DocumentLoader has its own specific parameters, but they can all be invoked in the same way with the `.load` method.
 An example use case is as follows:

-```python
-from langchain_community.document_loaders.csv_loader import CSVLoader
+### Output parsers
+<span data-heading-keywords="output parser"></span>

-loader = CSVLoader(
-    ...  # <-- Integration specific parameters here
-)
-data = loader.load()
-```
+:::note
+The information here refers to parsers that take a text output from a model try to parse it into a more structured representation.
+More and more models are supporting function (or tool) calling, which handles this automatically.
+It is recommended to use function/tool calling rather than output parsing.
+See documentation for that [here](/docs/concepts/#function-tool-calling).
+:::

-For specifics on how to use document loaders, see the [relevant how-to guides here](/docs/how_to/#document-loaders).
+* Conceptual Guide: [About Output Parsers](/docs/concepts/output_parsers)
+* How-to Guides: [How to use Output Parsers](/docs/how_to/#output-parsers)

 ### Text splitters

-Once you've loaded documents, you'll often want to transform them to better suit your application. The simplest example is you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.
-
-When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds, there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together. What "semantically related" means could depend on the type of text. This notebook showcases several ways to do that.
-
-At a high level, text splitters work as following:
-
-1. Split the text up into small, semantically meaningful chunks (often sentences).
-2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
-3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks).
-
-That means there are two different axes along which you can customize your text splitter:
-
-1. How the text is split
-2. How the chunk size is measured
-
-For specifics on how to use text splitters, see the [relevant how-to guides here](/docs/how_to/#text-splitters).
+* Conceptual Guide: [About Text Splitters](/docs/concepts/text_splitters)

 ### Embedding models
 <span data-heading-keywords="embedding,embeddings"></span>

-Embedding models create a vector representation of a piece of text. You can think of a vector as an array of numbers that captures the semantic meaning of the text.
-By representing the text in this way, you can perform mathematical operations that allow you to do things like search for other pieces of text that are most similar in meaning.
-These natural language search capabilities underpin many types of [context retrieval](/docs/concepts/#retrieval),
-where we provide an LLM with the relevant data it needs to effectively respond to a query.
-
-![](/img/embeddings.png)
-
-The `Embeddings` class is a class designed for interfacing with text embedding models. There are many different embedding model providers (OpenAI, Cohere, Hugging Face, etc) and local models, and this class is designed to provide a standard interface for all of them.
-
-The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).
-
-For specifics on how to use embedding models, see the [relevant how-to guides here](/docs/how_to/#embedding-models).
-
-### Vector stores
-<span data-heading-keywords="vector,vectorstore,vectorstores,vector store,vector stores"></span>
-
-One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors,
-and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query.
-A vector store takes care of storing embedded data and performing vector search for you.
-
-Most vector stores can also store metadata about embedded vectors and support filtering on that metadata before
-similarity search, allowing you more control over returned documents.
-
-Vector stores can be converted to the retriever interface by doing:
-
-```python
-vectorstore = MyVectorStore()
-retriever = vectorstore.as_retriever()
-```
-
-For specifics on how to use vector stores, see the [relevant how-to guides here](/docs/how_to/#vector-stores).
+* Conceptual Guide: [About Embedding Models](/docs/concepts/embedding_models)
+* How-to Guides: [How to use Embedding Models](/docs/how_to/#embedding-models)

 ### Retrievers
 <span data-heading-keywords="retriever,retrievers"></span>

-A retriever is an interface that returns documents given an unstructured query.
-It is more general than a vector store.
-A retriever does not need to be able to store documents, only to return (or retrieve) them.
-Retrievers can be created from vector stores, but are also broad enough to include [Wikipedia search](/docs/integrations/retrievers/wikipedia/) and [Amazon Kendra](/docs/integrations/retrievers/amazon_kendra_retriever/).
+* Conceptual Guide: [About Retrievers](/docs/concepts/retrievers)
+* How-to Guides: [How to use Retrievers](/docs/how_to/#retrievers)

-Retrievers accept a string query as input and return a list of Document's as output.
+### Vector stores
+<span data-heading-keywords="vector,vectorstore,vectorstores,vector store,vector stores"></span>

-For specifics on how to use retrievers, see the [relevant how-to guides here](/docs/how_to/#retrievers).
+* Conceptual Guide: [About Vector Stores](/docs/concepts/vectorstores)
+* How-to Guides: [How to use Vector Stores](/docs/how_to/#vector-stores)

 ### Key-value stores

@@ -525,100 +194,7 @@ For key-value store implementations, see [this section](/docs/integrations/store
 ### Tools
 <span data-heading-keywords="tool,tools"></span>

-Tools are utilities designed to be called by a model: their inputs are designed to be generated by models, and their outputs are designed to be passed back to models.
-Tools are needed whenever you want a model to control parts of your code or call out to external APIs.
-
-A tool consists of:
-
-1. The `name` of the tool.
-2. A `description` of what the tool does.
-3. A `JSON schema` defining the inputs to the tool.
-4. A `function` (and, optionally, an async variant of the function).
-
-When a tool is bound to a model, the name, description and JSON schema are provided as context to the model.
-Given a list of tools and a set of instructions, a model can request to call one or more tools with specific inputs.
-Typical usage may look like the following:
-
-```python
-tools = [...] # Define a list of tools
-llm_with_tools = llm.bind_tools(tools)
-ai_msg = llm_with_tools.invoke("do xyz...")
-# -> AIMessage(tool_calls=[ToolCall(...), ...], ...)
-```
-
-The `AIMessage` returned from the model MAY have `tool_calls` associated with it.
-Read [this guide](/docs/concepts/#aimessage) for more information on what the response type may look like.
-
-Once the chosen tools are invoked, the results can be passed back to the model so that it can complete whatever task
-it's performing.
-There are generally two different ways to invoke the tool and pass back the response:
-
-#### Invoke with just the arguments
-
-When you invoke a tool with just the arguments, you will get back the raw tool output (usually a string).
-This generally looks like:
-
-```python
-# You will want to previously check that the LLM returned tool calls
-tool_call = ai_msg.tool_calls[0]
-# ToolCall(args={...}, id=..., ...)
-tool_output = tool.invoke(tool_call["args"])
-tool_message = ToolMessage(
-    content=tool_output,
-    tool_call_id=tool_call["id"],
-    name=tool_call["name"]
-)
-```
-
-Note that the `content` field will generally be passed back to the model.
-If you do not want the raw tool response to be passed to the model, but you still want to keep it around,
-you can transform the tool output but also pass it as an artifact (read more about [`ToolMessage.artifact` here](/docs/concepts/#toolmessage))
-
-```python
-... # Same code as above
-response_for_llm = transform(response)
-tool_message = ToolMessage(
-    content=response_for_llm,
-    tool_call_id=tool_call["id"],
-    name=tool_call["name"],
-    artifact=tool_output
-)
-```
-
-#### Invoke with `ToolCall`
-
-The other way to invoke a tool is to call it with the full `ToolCall` that was generated by the model.
-When you do this, the tool will return a ToolMessage.
-The benefits of this are that you don't have to write the logic yourself to transform the tool output into a ToolMessage.
-This generally looks like:
-
-```python
-tool_call = ai_msg.tool_calls[0]
-# -> ToolCall(args={...}, id=..., ...)
-tool_message = tool.invoke(tool_call)
-# -> ToolMessage(
-#      content="tool result foobar...",
-#      tool_call_id=...,
-#      name="tool_name"
-#    )
-```
-
-If you are invoking the tool this way and want to include an [artifact](/docs/concepts/#toolmessage) for the ToolMessage, you will need to have the tool return two things.
-Read more about [defining tools that return artifacts here](/docs/how_to/tool_artifacts/).
-
-#### Best practices
-
-When designing tools to be used by a model, it is important to keep in mind that:
-
- Chat models that have explicit [tool-calling APIs](/docs/concepts/#functiontool-calling) will be better at tool calling than non-fine-tuned models.
- Models will perform better if the tools have well-chosen names, descriptions, and JSON schemas. This is another form of prompt engineering.
- Simple, narrowly scoped tools are easier for models to use than complex tools.
-
-#### Related
-
-For specifics on how to use tools, see the [tools how-to guides](/docs/how_to/#tools).
-
-To use a pre-built tool, see the [tool integration docs](/docs/integrations/tools/).
+[Tools](/docs/concepts/tools) are utilities designed to be called by a model: their inputs are designed to be generated by models, and their outputs are designed to be passed back to models.

 ### Toolkits
 <span data-heading-keywords="toolkit,toolkits"></span>
@@ -638,44 +214,6 @@ tools = toolkit.get_tools()

 ### Agents

-By themselves, language models can't take actions - they just output text.
-A big use case for LangChain is creating **agents**.
-Agents are systems that use an LLM as a reasoning engine to determine which actions to take and what the inputs to those actions should be.
-The results of those actions can then be fed back into the agent and it determine whether more actions are needed, or whether it is okay to finish.
-
-[LangGraph](https://github.com/langchain-ai/langgraph) is an extension of LangChain specifically aimed at creating highly controllable and customizable agents.
-Please check out that documentation for a more in depth overview of agent concepts.
-
-There is a legacy `agent` concept in LangChain that we are moving towards deprecating: `AgentExecutor`.
-AgentExecutor was essentially a runtime for agents.
-It was a great place to get started, however, it was not flexible enough as you started to have more customized agents.
-In order to solve that we built LangGraph to be this flexible, highly-controllable runtime.
-
-If you are still using AgentExecutor, do not fear: we still have a guide on [how to use AgentExecutor](/docs/how_to/agent_executor).
-It is recommended, however, that you start to transition to LangGraph.
-In order to assist in this, we have put together a [transition guide on how to do so](/docs/how_to/migrate_agent).
-
-#### ReAct agents
-<span data-heading-keywords="react,react agent"></span>
-
-One popular architecture for building agents is [**ReAct**](https://arxiv.org/abs/2210.03629).
-ReAct combines reasoning and acting in an iterative process - in fact the name "ReAct" stands for "Reason" and "Act".
-
-The general flow looks like this:
-
- The model will "think" about what step to take in response to an input and any previous observations.
- The model will then choose an action from available tools (or choose to respond to the user).
- The model will generate arguments to that tool.
- The agent runtime (executor) will parse out the chosen tool and call it with the generated arguments.
- The executor will return the results of the tool call back to the model as an observation.
- This process repeats until the agent chooses to respond.
-
-There are general prompting based implementations that do not require any model-specific features, but the most
-reliable implementations use features like [tool calling](/docs/how_to/tool_calling/) to reliably format outputs
-and reduce variance.
-
-Please see the [LangGraph documentation](https://langchain-ai.github.io/langgraph/) for more information,
-or [this how-to guide](/docs/how_to/migrate_agent/) for specific information on migrating to LangGraph.

 ### Callbacks

@@ -752,117 +290,23 @@ For specifics on how to use callbacks, see the [relevant how-to guides here](/do
 ### Streaming
 <span data-heading-keywords="stream,streaming"></span>

-Individual LLM calls often run for much longer than traditional resource requests.
-This compounds when you build more complex chains or agents that require multiple reasoning steps.
-
-Fortunately, LLMs generate output iteratively, which means it's possible to show sensible intermediate results
-before the final response is ready. Consuming output as soon as it becomes available has therefore become a vital part of the UX
-around building apps with LLMs to help alleviate latency issues, and LangChain aims to have first-class support for streaming.
-
-Below, we'll discuss some concepts and considerations around streaming in LangChain.
+Conceptual Guide: [Streaming](/docs/concepts/streaming)

 #### `.stream()` and `.astream()`
-
-Most modules in LangChain include the `.stream()` method (and the equivalent `.astream()` method for [async](https://docs.python.org/3/library/asyncio.html) environments) as an ergonomic streaming interface.
-`.stream()` returns an iterator, which you can consume with a simple `for` loop. Here's an example with a chat model:
-
-```python
-from langchain_anthropic import ChatAnthropic
-
-model = ChatAnthropic(model="claude-3-sonnet-20240229")
-
-for chunk in model.stream("what color is the sky?"):
-    print(chunk.content, end="|", flush=True)
-```
-
-For models (or other components) that don't support streaming natively, this iterator would just yield a single chunk, but
-you could still use the same general pattern when calling them. Using `.stream()` will also automatically call the model in streaming mode
-without the need to provide additional config.
-
-The type of each outputted chunk depends on the type of component - for example, chat models yield [`AIMessageChunks`](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.AIMessageChunk.html).
-Because this method is part of [LangChain Expression Language](/docs/concepts/#langchain-expression-language-lcel),
-you can handle formatting differences from different outputs using an [output parser](/docs/concepts/#output-parsers) to transform
-each yielded chunk.
-
-You can check out [this guide](/docs/how_to/streaming/#using-stream) for more detail on how to use `.stream()`.
+TODO(concepts): Add URL fragment

 #### `.astream_events()`
 <span data-heading-keywords="astream_events,stream_events,stream events"></span>
-
-While the `.stream()` method is intuitive, it can only return the final generated value of your chain. This is fine for single LLM calls,
-but as you build more complex chains of several LLM calls together, you may want to use the intermediate values of
-the chain alongside the final output - for example, returning sources alongside the final generation when building a chat
-over documents app.
-
-There are ways to do this [using callbacks](/docs/concepts/#callbacks-1), or by constructing your chain in such a way that it passes intermediate
-values to the end with something like chained [`.assign()`](/docs/how_to/passthrough/) calls, but LangChain also includes an
-`.astream_events()` method that combines the flexibility of callbacks with the ergonomics of `.stream()`. When called, it returns an iterator
-which yields [various types of events](/docs/how_to/streaming/#event-reference) that you can filter and process according
-to the needs of your project.
-
-Here's one small example that prints just events containing streamed chat model output:
-
-```python
-from langchain_core.output_parsers import StrOutputParser
-from langchain_core.prompts import ChatPromptTemplate
-from langchain_anthropic import ChatAnthropic
-
-model = ChatAnthropic(model="claude-3-sonnet-20240229")
-
-prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
-parser = StrOutputParser()
-chain = prompt | model | parser
-
-async for event in chain.astream_events({"topic": "parrot"}, version="v2"):
-    kind = event["event"]
-    if kind == "on_chat_model_stream":
-        print(event, end="|", flush=True)
-```
-
-You can roughly think of it as an iterator over callback events (though the format differs) - and you can use it on almost all LangChain components!
-
-See [this guide](/docs/how_to/streaming/#using-stream-events) for more detailed information on how to use `.astream_events()`,
-including a table listing available events.
+TODO(concepts): Add URL fragment

 #### Callbacks

-The lowest level way to stream outputs from LLMs in LangChain is via the [callbacks](/docs/concepts/#callbacks) system. You can pass a
-callback handler that handles the [`on_llm_new_token`](https://python.langchain.com/api_reference/langchain/callbacks/langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.html#langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.on_llm_new_token) event into LangChain components. When that component is invoked, any
-[LLM](/docs/concepts/#llms) or [chat model](/docs/concepts/#chat-models) contained in the component calls
-the callback with the generated token. Within the callback, you could pipe the tokens into some other destination, e.g. a HTTP response.
-You can also handle the [`on_llm_end`](https://python.langchain.com/api_reference/langchain/callbacks/langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.html#langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.on_llm_end) event to perform any necessary cleanup.
-
-You can see [this how-to section](/docs/how_to/#callbacks) for more specifics on using callbacks.
-
-Callbacks were the first technique for streaming introduced in LangChain. While powerful and generalizable,
-they can be unwieldy for developers. For example:
-
- You need to explicitly initialize and manage some aggregator or other stream to collect results.
- The execution order isn't explicitly guaranteed, and you could theoretically have a callback run after the `.invoke()` method finishes.
- Providers would often make you pass an additional parameter to stream outputs instead of returning them all at once.
- You would often ignore the result of the actual model call in favor of callback results.
+* Conceptual Guide: [Callbacks](/docs/concepts/callbacks)
+* How-to Guides: [How to use Callbacks](/docs/how_to/#callbacks)

 #### Tokens

-The unit that most model providers use to measure input and output is via a unit called a **token**.
-Tokens are the basic units that language models read and generate when processing or producing text.
-The exact definition of a token can vary depending on the specific way the model was trained -
-for instance, in English, a token could be a single word like "apple", or a part of a word like "app".
-
-When you send a model a prompt, the words and characters in the prompt are encoded into tokens using a **tokenizer**.
-The model then streams back generated output tokens, which the tokenizer decodes into human-readable text.
-The below example shows how OpenAI models tokenize `LangChain is cool!`:
-
-![](/img/tokenization.png)
-
-You can see that it gets split into 5 different tokens, and that the boundaries between tokens are not exactly the same as word boundaries.
-
-The reason language models use tokens rather than something more immediately intuitive like "characters"
-has to do with how they process and understand text. At a high-level, language models iteratively predict their next generated output based on
-the initial input and their previous generations. Training the model using tokens language models to handle linguistic
-units (like words or subwords) that carry meaning, rather than individual characters, which makes it easier for the model
-to learn and understand the structure of the language, including grammar and context.
-Furthermore, using tokens can also improve efficiency, since the model processes fewer units of text compared to character-level processing.
+* Conceptual Guide: [Tokens](/docs/concepts/tokens)

 ### Function/tool calling

--- a/docs/docs/concepts/langgraph.mdx
+++ b/docs/docs/concepts/langgraph.mdx
@@ -0,0 +1,29 @@
+# LangGraph
+
+## Overview
+
+[LangGraph](https://langchain-ai.github.io/langgraph/) is a library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. Compared to other LLM frameworks, it offers these core benefits: cycles, controllability, and persistence. LangGraph allows you to define flows that involve cycles, essential for most agentic architectures, differentiating it from DAG-based solutions. As a very low-level framework, it provides fine-grained control over both the flow and state of your application, crucial for creating reliable agents. Additionally, LangGraph includes built-in persistence, enabling advanced human-in-the-loop and memory features.
+
+LangGraph is inspired by [Pregel](https://research.google/pubs/pub37252/) and [Apache Beam](https://beam.apache.org/). The public interface draws inspiration from [NetworkX](https://networkx.org/documentation/latest/). LangGraph is built by LangChain Inc, the creators of LangChain, but can be used without LangChain.
+
+To learn more about LangGraph, check out our first LangChain Academy course, *Introduction to LangGraph*, available for free [here](https://academy.langchain.com/courses/intro-to-langgraph).
+
+### Key Features
+
+- **Cycles and Branching**: Implement loops and conditionals in your apps.
+- **Persistence**: Automatically save state after each step in the graph. Pause and resume the graph execution at any point to support error recovery, human-in-the-loop workflows, time travel and more.
+- **Human-in-the-Loop**: Interrupt graph execution to approve or edit next action planned by the agent.
+- **Streaming Support**: Stream outputs as they are produced by each node (including token streaming).
+- **Integration with LangChain**: LangGraph integrates seamlessly with [LangChain](https://github.com/langchain-ai/langchain/) and [LangSmith](https://docs.smith.langchain.com/) (but does not require them).
+
+## How does it compare to LCEL?
+
+The [**L**ang**C**hain **E**xpression **L**anguage (LCEL)](/docs/concepts/lcel) is an orchestration layer that allows LangChain to handle the run-time execution of chains in an optimized way.
+
+While we have seen users run chains with hundreds of steps in production, we generally recommend using LCEL for simpler orchestration tasks. When the application requires complex state management, branching, cycles or multiple agents, we recommend that users take advantage of [LangGraph](/docs/concepts/langgraph).
+
+If you are build complex LLM applications that may require multiple agents, branching, cycles, or advanced state management, LangGraph is the right tool for you, and remember that you can still use LCEL within individual nodes in LangGraph.
+
+## Documentation
+
+For additional information on LangGraph, please visit the [LangGraph documentation](https://langchain-ai.github.io/langgraph/) page.
--- a/docs/docs/concepts/langserve.md
+++ b/docs/docs/concepts/langserve.md
@@ -0,0 +1,4 @@
+# LangServe
+
+PLACE HOLDER TO BE REPLACED BY ACTUAL DOCUMENTATION
+USED TO MAKE SURE THAT WE DO NOT FORGET TO ADD LINKS LATER
--- a/docs/docs/concepts/lcel.mdx
+++ b/docs/docs/concepts/lcel.mdx
@@ -0,0 +1,221 @@
+# LangChain Expression Language (LCEL)
+
+:::info Prerequisites
+* [Runnable Interface](/docs/concepts/runnables)
+:::
+
+The **L**ang**C**hain **E**xpression **L**anguage (LCEL) takes a [declarative](https://en.wikipedia.org/wiki/Declarative_programming) approach to building new [Runnables](/docs/concepts/runnables) from existing Runnables.
+
+This means that you describe what you want to happen, rather than how you want it to happen, allowing LangChain to optimize the run-time execution of the chains.
+
+We often refer to a `Runnable` created using LCEL as a "chain". It's important to remember that a "chain" is `Runnable` and it implements the full [Runnable Interface](/docs/concepts/runnables).
+
+:::note
+* The [LCEL cheatsheet](https://python.langchain.com/docs/how_to/lcel_cheatsheet/) shows common patterns that involve the Runnable interface and LCEL expressions.
+* Please see the following list of [how-to guides](/docs/how_to/#langchain-expression-language-lcel) that cover common tasks with LCEL.
+* A list of built-in `Runnables` can be found in the [LangChain Core API Reference](https://python.langchain.com/api_reference/core/runnables.html). Many of these Runnables are useful when composing custom "chains" in LangChain using LCEL.
+:::
+
+## Benefits of LCEL
+
+LangChain optimizes the run-time execution of chains built with LCEL in a number of ways:
+
+- **Optimize parallel execution**: Run Runnables in parallel using [RunnableParallel](#RunnableParallel) or run multiple inputs through a given chain in parallel using the [Runnable Batch API](/docs/concepts/runnables#batch). Parallel execution can significantly reduce the latency as processing can be done in parallel instead of sequentially.
+- **Guarantee Async support**: Any chain built with LCEL can be run asynchronously using the [Runnable Async API](/docs/concepts/runnables#async-api). This can be useful when running chains in a server environment where you want to handle large number of requests concurrently.
+- **Simplify streaming**: LCEL chains can be streamed, allowing for incremental output as the chain is executed. LangChain can optimize the streaming of the output to minimize the time-to-first-token(time elapsed until the first chunk of output from a [chat model](/docs/concepts/chat_models) or [llm](/docs/concepts/llms) comes out).
+
+Other benefits include:
+
+- [**Seamless LangSmith tracing**](https://docs.smith.langchain.com)
+As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at every step.
+With LCEL, **all** steps are automatically logged to [LangSmith](https://docs.smith.langchain.com/) for maximum observability and debuggability.
+- **Standard API**: Because all chains are built using the Runnable interface, they can be used in the same way as any other Runnable.
+- [**Deployable with LangServe**](/docs/concepts/langserve): Chains built with LCEL can be deployed using for production use.
+
+## Should I use LCEL?
+
+LCEL is an [orchestration solution](https://en.wikipedia.org/wiki/Orchestration_(computing)) -- it allows LangChain to handle run-time execution of chains in an optimized way.
+
+While we have seen users run chains with hundreds of steps in production, we generally recommend using LCEL for simpler orchestration tasks. When the application requires complex state management, branching, cycles or multiple agents, we recommend that users take advantage of [LangGraph](/docs/concepts/langgraph).
+
+In LangGraph, users define graphs that specify the flow of the application. This allows users to keep using LCEL within individual nodes when LCEL is needed, while making it easy to define complex orchestration logic that is more readable and maintainable.
+
+Here are some guidelines:
+
+* If you are making a single LLM call, you don't need LCEL; instead call the underlying [chat model](/docs/concepts/chat_models) directly.
+* If you have a simple chain (e.g., prompt + llm + parser, simple retrieval set up etc.), LCEL is a reasonable fit, if you're taking advantage of the LCEL benefits.
+* If you're building a complex chain (e.g., with branching, cycles, multiple agents, etc.) use [LangGraph](/docs/concepts/langgraph) instead. Remember that you can always use LCEL within individual nodes in LangGraph.
+
+## Composition Primitives
+
+`LCEL` chains are built by composing existing `Runnables` together. The two main composition primitives are [RunnableSequence](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.RunnableSequence.html#langchain_core.runnables.base.RunnableSequence) and [RunnableParallel](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.RunnableParallel.html#langchain_core.runnables.base.RunnableParallel).
+
+Many other composition primitives (e.g., [RunnableAssign](
+https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.passthrough.RunnableAssign.html#langchain_core.runnables.passthrough.RunnableAssign
+)) can be thought of as variations of these two primitives.
+
+:::note
+You can find a list of all composition primitives in the [LangChain Core API Reference](https://python.langchain.com/api_reference/core/runnables.html).
+:::
+
+### RunnableSequence
+
+`RunnableSequence` is a composition primitive that allows you "chain" multiple runnables sequentially, with the output of one runnable serving as the input to the next.
+
+```python
+from langchain_core.runnables import RunnableSequence
+chain = RunnableSequence([runnable1, runnable2])
+```
+
+Invoking the `chain` with some input:
+
+```python
+final_output = chain.invoke(some_input)
+```
+
+corresponds to the following:
+
+```python
+output1 = runnable1.invoke(some_input)
+final_output = runnable2.invoke(output1)
+```
+
+:::note
+`runnable1` and `runnable2` are placeholders for any `Runnable` that you want to chain together.
+:::
+
+### RunnableParallel
+
+`RunnableParallel` is a composition primitive that allows you to run multiple runnables concurrently, with the same input provided to each.
+
+```python
+from langchain_core.runnables import RunnableParallel
+chain = RunnableParallel({
+    "key1": runnable1,
+    "key2": runnable2,
+})
+```
+
+Invoking the `chain` with some input:
+
+```python
+final_output = chain.invoke(some_input)
+```
+
+Will yield a `final_output` dictionary with the same keys as the input dictionary, but with the values replaced by the output of the corresponding runnable.
+
+```python
+{
+    "key1": runnable1.invoke(some_input),
+    "key2": runnable2.invoke(some_input),
+}
+```
+
+Recall, that the runnables are executed in parallel, so while the result is the same as
+dictionary comprehension shown above, the execution time is much faster.
+
+:::note
+`RunnableParallel`supports both synchronous and asynchronous execution (as all `Runnables` do).
+
+* For synchronous execution, `RunnableParallel` uses a [ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) to run the runnables concurrently.
+* For asynchronous execution, `RunnableParallel` uses [asyncio.gather](https://docs.python.org/3/library/asyncio.html#asyncio.gather) to run the runnables concurrently.
+:::
+
+## Composition Syntax
+
+The usage of `RunnableSequence` and `RunnableParallel` is so common that we created a shorthand syntax for using them. This helps
+to make the code more readable and concise.
+
+### The `|` operator
+
+We have [overloaded](https://docs.python.org/3/reference/datamodel.html#special-method-names) the `|` operator to create a `RunnableSequence` from two `Runnables`.
+
+```python
+chain = runnable1 | runnable2
+```
+
+is Equivalent to:
+
+```python
+chain = RunnableSequence([runnable1, runnable2])
+```
+
+### The `.pipe` method`
+
+If you have moral qualms with operator overloading, you can use the `.pipe` method instead. This is equivalent to the `|` operator.
+
+```python
+chain = runnable1.pipe(runnable2)
+```
+
+### Coercion
+
+LCEL applies automatic type coercion to make it easier to compose chains.
+
+If you do not understand the type coercion, you can always use the `RunnableSequence` and `RunnableParallel` classes directly.
+
+This will make the code more verbose, but it will also make it more explicit.
+
+#### Dictionary to RunnableParallel
+
+Inside an LCEL expression, a dictionary is automatically converted to a `RunnableParallel`.
+
+For example, the following code:
+
+```python
+mapping = {
+    "key1": runnable1,
+    "key2": runnable2,
+}
+
+chain = mapping | runnable3
+```
+
+It gets automatically converted to the following:
+
+```python
+chain = RunnableSequence([RunnableParallel(mapping), runnable3])
+```
+
+:::caution
+You have to be careful because the `mapping` dictionary is not a `RunnableParallel` object, it is just a dictionary. This means that the following code will raise an `AttributeError`:
+
+```python
+mapping.invoke(some_input)
+```
+:::
+
+#### Function to RunnableLambda
+
+Inside an LCEL expression, a function is automatically converted to a `RunnableLambda`.
+
+```
+def some_func(x):
+    return x
+
+chain = some_func | runnable1
+```
+
+It gets automatically converted to the following:
+
+```python
+chain = RunnableSequence([RunnableLambda(some_func), runnable1])
+```
+
+:::caution
+You have to be careful because the lambda function is not a `RunnableLambda` object, it is just a function. This means that the following code will raise an `AttributeError`:
+
+```python
+lambda x: x + 1.invoke(some_input)
+```
+:::
+
+## Legacy Chains
+
+LCEL aims to provide consistency around behavior and customization over legacy subclassed chains such as `LLMChain` and
+`ConversationalRetrievalChain`. Many of these legacy chains hide important details like prompts, and as a wider variety
+of viable models emerge, customization has become more and more important.
+
+If you are currently using one of these legacy chains, please see [this guide for guidance on how to migrate](/docs/versions/migrating_chains).
+
+For guides on how to do specific tasks with LCEL, check out [the relevant how-to guides](/docs/how_to/#langchain-expression-language-lcel).
--- a/docs/docs/concepts/llms.mdx
+++ b/docs/docs/concepts/llms.mdx
@@ -0,0 +1,39 @@
+# Large Language Models (LLMs)
+
+Large Language Models (LLMs) are advanced machine learning models that excel in a wide range of language-related tasks such as
+text generation, translation, summarization, question answering, and more, without needing task-specific tuning for every scenario.
+
+## Chat Models
+
+Modern LLMs are typically exposed to users via a [Chat Model interface](/docs/concepts/chat_models). These models process sequences of [messages](/docs/concepts/messages) as input and output messages.
+
+Popular chat models support native [tool calling](/docs/concepts#tool-calling) capabilities, which allows building applications
+that can interact with external services, APIs, databases, extract structured information from unstructured text, and more.
+
+Modern LLMs are not limited to processing natural language text. They can also process other types of data, such as images, audio, and video. This is known as [multimodality](/docs/concepts/multimodality). Please see the [Chat Model Concept Guide](/docs/concepts/chat_models) page for more information.
+
+## Terminology
+
+In documentation, we will often use the terms "LLM" and "Chat Model" interchangeably. This is because most modern LLMs are exposed to users via a chat model interface.
+
+However, users must know that there are two distinct interfaces for LLMs in LangChain:
+
+1. Modern LLMs implement the [BaseChatModel](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html) interface. These are chat models that process sequences of messages as input and output messages. Such models will typically be named with a convention that prefixes "Chat" to their class names (e.g., `ChatOllama`, `ChatAnthropic`, `ChatOpenAI`, etc.).
+2. Older LLMs implement the [BaseLLM](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.llms.BaseLLM.html#langchain_core.language_models.llms.BaseLLM) interface. These are LLMs that take as input text strings and output text strings. Such models are typically named just using the provider's name (e.g., `Ollama`, `Anthropic`, `OpenAI`, etc.). Generally, users should not use these models.
+
+## Related Resources
+
+Modern LLMs (aka Chat Models):
+
+* [Conceptual Guide about Chat Models](/docs/concepts/chat_models/)
+* [Chat Model Integrations](/docs/integrations/chat/)
+* How-to Guides: [LLMs](/docs/how_to/#chat_models)
+
+Text-in, text-out LLMs (older or lower-level models):
+
+:::caution
+Unless you have a specific use case that requires using these models, you should use the chat models instead.
+:::
+
+* [LLM Integrations](/docs/integrations/llms/)
+* How-to Guides: [LLMs](/docs/how_to/#llms)
--- a/docs/docs/concepts/messages.mdx
+++ b/docs/docs/concepts/messages.mdx
@@ -0,0 +1,244 @@
+# Messages
+
+:::info Prerequisites
+- [Chat Models](/docs/concepts/chat_models)
+:::
+
+## Overview
+
+Messages are the unit of communication in [chat models](/docs/concepts/chat_models). They are used to represent the input and output of a chat model, as well as any additional context or metadata that may be associated with a conversation.
+
+Each message has a **role** (e.g., "user", "assistant"), **content** (e.g., text, multimodal data), and additional metadata that can vary depending on the chat model provider.
+
+LangChain provides a unified message format that can be used across chat models, allowing users to work with different chat models without worrying about the specific details of the message format used by each model provider.
+
+## What inside a message?
+
+A message typically consists of the following pieces of information:
+
+- **Role**: The role of the message (e.g., "user", "assistant").
+- **Content**: The content of the message (e.g., text, multimodal data).
+- Additional metadata: id, name, [token usage](/docs/concepts/tokens) and other model-specific metadata.
+
+### Role
+
+Roles are used to distinguish between different types of messages in a conversation and help the chat model understand how to respond to a given sequence of messages.
+
+| **Role**              | **Description**                                                                                                                                                                                                 |
+|-----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **system**            | Used to tell the chat model how to behave and provide additional context. Not supported by all chat model providers.                                                                                            |
+| **user**              | Represents input from a user interacting with the model, usually in the form of text or other interactive input.                                                                                                |
+| **assistant**         | Represents a response from the model, which can include text or a request to invoke tools.                                                                                                                      |
+| **tool**              | A message used to pass the results of a tool invocation back to the model after external data or processing has been retrieved. Used with chat models that support [tool calling](/docs/concepts/tool_calling). |
+| **function (legacy)** | This is a legacy role, corresponding to OpenAI's legacy function-calling API. **tool** role should be used instead.                                                                                             |
+
+### Content
+
+The content of a message text or a list of dictionaries representing [multimodal data](/docs/concepts/multimodality) (e.g., images, audio, video). The exact format of the content can vary between different chat model providers.
+
+Currently, most chat models support text as the primary content type, with some models also supporting multimodal data. However, support for multimodal data is still limited across most chat model providers.
+
+For more information see:
+* [HumanMessage](#humanmessage) -- for content in the input from the user.
+* [AIMessage](#aimessage) -- for content in the response from the model.
+* [Multimodality](/docs/concepts/multimodality) -- for more information on multimodal content.
+
+### Other Message Data
+
+Depending on the chat model provider, messages can include other data such as:
+
+- **ID**: An optional unique identifier for the message.
+- **Name**: An optional `name` property which allows differentiate between different entities/speakers with the same role. Not all models support this!
+- **Metadata**: Additional information about the message, such as timestamps, token usage, etc.
+- **Tool Calls**: A request made by the model to call one or more tools> See [tool calling](/docs/concepts/tool_calling) for more information.
+
+## Conversation Structure
+
+The sequence of messages into a chat model should follow a specific structure to ensure that the chat model can generate a valid response.
+
+For example, a typical conversation structure might look like this:
+
+1. **User Message**: "Hello, how are you?"
+2. **Assistant Message**: "I'm doing well, thank you for asking."
+3. **User Message**: "Can you tell me a joke?"
+4. **Assistant Message**: "Sure! Why did the scarecrow win an award? Because he was outstanding in his field!"
+
+Please read the [chat history](/docs/concepts/chat_history) guide for more information on managing chat history and ensuring that the conversation structure is correct.
+
+## LangChain Messages
+
+LangChain provides a unified message format that can be used across all chat models, allowing users to work with different chat models without worrying about the specific details of the message format used by each model provider.
+
+LangChain messages are Python objects that subclass from a [BaseMessage](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.base.BaseMessage.html).
+
+The five main message types are:
+
+- [SystemMessage](#systemmessage): corresponds to **system** role
+- [HumanMessage](#humanmessage): corresponds to **user** role
+- [AIMessage](#aimessage): corresponds to **assistant** role
+- [AIMessageChunk](#aimessagechunk): corresponds to **assistant** role, used for [streaming](/docs/concepts/streaming) responses
+- [ToolMessage](#toolmessage): corresponds to **tool** role
+
+Other important messages include:
+
+- [RemoveMessage](#removemessage) -- does not correspond to any role. This is an abstraction, mostly used in [LangGraph](/docs/concepts/langgraph) to manage chat history.
+- **Legacy** [FunctionMessage](#legacy-functionmessage): corresponds to the **function** role in OpenAI's **legacy** function-calling API.
+
+You can find more information about **messages** in the [API Reference](https://python.langchain.com/api_reference/core/messages.html).
+
+### SystemMessage
+
+A `SystemMessage` is used to prime the behavior of the AI model and provide additional context, such as instructing the model to adopt a specific persona or setting the tone of the conversation (e.g., "This is a conversation about cooking").
+
+Different chat providers may support system message in one of the following ways:
+
+* **Through a "system" message role**: In this case, a system message is included as part of the message sequence with the role explicitly set as "system."
+* **Through a separate API parameter for system instructions**: Instead of being included as a message, system instructions are passed via a dedicated API parameter.
+* **No support for system messages**: Some models do not support system messages at all.
+
+Most major chat model providers support system instructions via either a chat message or a separate API parameter. LangChain will automatically adapt based on the provider’s capabilities. If the provider supports a separate API parameter for system instructions, LangChain will extract the content of a system message and pass it through that parameter.
+
+If no system message is supported by the provider, in most cases LangChain will attempt to incorporate the system message's content into a HumanMessage or raise an exception if that is not possible. However, this behavior is not yet consistently enforced across all implementations, and if using a less popular implementation of a chat model (e.g., an implementation from the `langchain-community` package) it is recommended to check the specific documentation for that model.
+
+### HumanMessage
+
+The `HumanMessage` corresponds to the **"user"** role. A human message represents input from a user interacting with the model.
+
+#### Text Content
+
+Most chat models expect the user input to be in the form of text.
+
+```python
+from langchain_core.messages import HumanMessage
+
+model.invoke([HumanMessage(content="Hello, how are you?")])
+```
+
+:::tip
+When invoking a chat model with a string as input, LangChain will automatically convert the string into a `HumanMessage` object. This is mostly useful for quick testing.
+
+```python
+model.invoke("Hello, how are you?")
+```
+:::
+
+#### Multi-modal Content
+
+Some chat models accept multimodal inputs, such as images, audio, video, or files like PDFs.
+
+Please see the [multimodality](/docs/concepts/multimodality) guide for more information.
+
+### AIMessage
+
+`AIMessage` is used to represent a message with the role **"assistant"**. This is the response from the model, which can include text or a request to invoke tools. It could also include other media types like images, audio, or video -- though this is still uncommon at the moment.
+
+```python
+from langchain_core.messages import HumanMessage
+ai_message = model.invoke([HumanMessage("Tell me a joke")])
+ai_message # <-- AIMessage
+```
+
+An `AIMessage` has the following attributes. The attributes which are **standardized** are the ones that LangChain attempts to standardize across different chat model providers. **raw** fields are specific to the model provider and may vary.
+
+| Attribute            | Standardized/Raw | Description                                                                                                                                                                                                             |
+|----------------------|:-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `content`            | Raw              | Usually a string, but can be a list of content blocks. See [content](#content) for details.                                                                                                                             |
+| `tool_calls`         | Standardized     | Tool calls associated with the message. See [tool calling](/docs/concepts/tool_calling) for details.                                                                                                                    |
+| `invalid_tool_calls` | Standardized     | Tool calls with parsing errors associated with the message. See [tool calling](/docs/concepts/tool_calling) for details.                                                                                                |
+| `usage_metadata`     | Standardized     | Usage metadata for a message, such as [token counts](/docs/concepts/tokens). See [Usage Metadata API Reference](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.UsageMetadata.html) |
+| `id`                 | Standardized     | An optional unique identifier for the message, ideally provided by the provider/model that created the message.                                                                                                         |
+| `response_metadata`  | Raw              | Response metadata, e.g., response headers, logprobs, token counts.                                                                                                                                                      |
+
+#### content
+
+The **content** property of an `AIMessage` represents the response generated by the chat model.
+
+The content is either:
+
+- **text** -- the norm for virtually all chat models.
+- A **list of dictionaries** -- Each dictionary represents a content block and is associated with a `type`.
+    * Used by Anthropic for surfacing agent thought process when doing [tool calling](/docs/concepts/tool_calling).
+    * Used by OpenAI for audio outputs. Please see [multi-modal content](/docs/concepts/multimodality) for more information.
+
+:::important
+The **content** property is **not** standardized across different chat model providers, mostly because there are
+still few examples to generalize from.
+:::
+
+### AIMessageChunk
+
+It is common to [stream](/docs/concepts/streaming) responses for the chat model as they are being generated, so the user can see the response in real-time instead of waiting for the entire response to be generated before displaying it.
+
+It is returned from the `stream`, `astream` and `astream_events` methods of the chat model.
+
+For example,
+
+```python
+for chunk in model.stream([HumanMessage("what color is the sky?")]):
+    print(chunk)
+```
+
+`AIMessageChunk` follows nearly the same structure as `AIMessage`, but uses a different [ToolCallChunk](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.tool.ToolCallChunk.html#langchain_core.messages.tool.ToolCallChunk)
+to be able to stream tool calling in a standardized manner.
+
+
+#### Aggregating
+
+`AIMessageChunks` support the `+` operator to merge them into a single `AIMessage`. This is useful when you want to display the final response to the user.
+
+```python
+ai_message = chunk1 + chunk2 + chunk3 + ...
+```
+
+### ToolMessage
+
+This represents a message with role "tool", which contains the result of [calling a tool](/docs/concepts/tool_calling). In addition to `role` and `content`, this message has:
+
+- a `tool_call_id` field which conveys the id of the call to the tool that was called to produce this result.
+- an `artifact` field which can be used to pass along arbitrary artifacts of the tool execution which are useful to track but which should not be sent to the model.
+
+Please see [tool calling](/docs/concepts/tool_calling) for more information.
+
+### RemoveMessage
+
+This is a special message type that does not correspond to any roles. It is used
+for managing chat history in [LangGraph](/docs/concepts/langgraph).
+
+Please see the following for more information on how to use the `RemoveMessage`:
+
+* [Memory conceptual guide](https://langchain-ai.github.io/langgraph/concepts/memory/)
+* [How to delete messages](https://langchain-ai.github.io/langgraph/how-tos/memory/delete-messages/)
+
+### (Legacy) FunctionMessage
+
+This is a legacy message type, corresponding to OpenAI's legacy function-calling API. `ToolMessage` should be used instead to correspond to the updated tool-calling API.
+
+## OpenAI Format
+
+### Inputs
+
+Chat models also accept OpenAI's format as **inputs** to chat models:
+
+```python
+chat_model.invoke([
+    {
+        "role": "user",
+        "content": "Hello, how are you?",
+    },
+    {
+        "role": "assistant",
+        "content": "I'm doing well, thank you for asking.",
+    },
+    {
+        "role": "user",
+        "content": "Can you tell me a joke?",
+    }
+])
+```
+
+### Outputs
+
+At the moment, the output of the model will be in terms of LangChain messages, so you will need to convert the output to the OpenAI format if you
+need OpenAI format for the output as well.
+
+The [convert_to_openai_messages](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.convert_to_openai_messages.html) utility function can be used to convert from LangChain messages to OpenAI format.
--- a/docs/docs/concepts/multimodality.mdx
+++ b/docs/docs/concepts/multimodality.mdx
@@ -0,0 +1,88 @@
+# Multimodality
+
+## Overview
+
+**Multimodality** refers to the ability to work with data that comes in different forms, such as text, audio, images, and video. Multimodality can appear in various components, allowing models and systems to handle and process a mix of these data types seamlessly.
+
+- **Chat Models**: These could, in theory, accept and generate multimodal inputs and outputs, handling a variety of data types like text, images, audio, and video.
+- **Embedding Models**: Embedding Models can represent multimodal content, embedding various forms of data—such as text, images, and audio—into vector spaces.
+- **Vector Stores**: Vector stores could search over embeddings that represent multimodal data, enabling retrieval across different types of information.
+
+## Multimodality in Chat models
+
+:::info Pre-requisites
+* [Chat models](/docs/concepts/chat_models)
+* [Messages](/docs/concepts/messages)
+:::
+ 
+Multimodal support is still relatively new and less common, model providers have not yet standardized on the "best" way to define the API. As such, LangChain's multimodal abstractions are lightweight and flexible, designed to accommodate different model providers' APIs and interaction patterns, but are **not** standardized across models.
+
+### How to use multimodal models
+
+* Use the [chat model integration table](/docs/integrations/chat/) to identify which models support multimodality.
+* Reference the [relevant how-to guides](/docs/how_to/#multimodal) for specific examples of how to use multimodal models.
+
+### What kind of multimodality is supported?
+
+#### Inputs
+
+Some models can accept multimodal inputs, such as images, audio, video, or files. The types of multimodal inputs supported depend on the model provider. For instance, [Google's Gemini](https://python.langchain.com/docs/integrations/chat/google_generative_ai/) supports documents like PDFs as inputs.
+
+Most chat models that support **multimodal inputs** also accept those values in OpenAI's content blocks format. So far this is restricted to image inputs. For models like Gemini which support video and other bytes input, the APIs also support the native, model-specific representations.
+
+The gist of passing multimodal inputs to a chat model is to use content blocks that specify a type and corresponding data. For example, to pass an image to a chat model:
+
+```python
+from langchain_core.messages import HumanMessage
+
+message = HumanMessage(
+    content=[
+        {"type": "text", "text": "describe the weather in this image"},
+        {"type": "image_url", "image_url": {"url": image_url}},
+    ],
+)
+response = model.invoke([message])
+```
+
+:::caution
+The exact format of the content blocks may vary depending on the model provider. Please refer to the chat model's
+integration documentation for the correct format. Find the integration in the [chat model integration table](/docs/integrations/chat/).
+:::
+
+#### Outputs
+
+Virtually no popular chat models support multimodal outputs at the time of writing (October 2024). 
+
+The only exception is OpenAI's chat model ([gpt-4o-audio-preview](https://python.langchain.com/docs/integrations/chat/openai/)), which can generate audio outputs.
+
+Multimodal outputs will appear as part of the [AIMessage](/docs/concepts/messages/#aimessage) response object.
+
+Please see the [ChatOpenAI](/docs/integrations/chat/openai/) for more information on how to use multimodal outputs.
+
+#### Tools
+
+Currently, no chat model is designed to work **directly** with multimodal data in a [tool call request](/docs/concepts/tool_calling) or [ToolMessage](/docs/concepts/tool_calling) result.
+
+However, a chat model can easily interact with multimodal data by invoking tools with references (e.g., a URL) to the multimodal data, rather than the data itself. For example, any model capable of [tool calling](/docs/concepts/tool_calling) can be equipped with tools to download and process images, audio, or video.
+
+## Multimodality in embedding models
+
+:::info Prerequisites
+* [Embedding Models](/docs/concepts/embedding_models)
+:::
+
+**Embeddings** are vector representations of data used for tasks like similarity search and retrieval.
+
+The current [embedding interface](https://python.langchain.com/api_reference/core/embeddings/langchain_core.embeddings.embeddings.Embeddings.html#langchain_core.embeddings.embeddings.Embeddings) used in LangChain is optimized entirely for text-based data, and will **not** work with multimodal data.
+
+As use cases involving multimodal search and retrieval tasks become more common, we expect to expand the embedding interface to accommodate other data types like images, audio, and video.
+
+## Multimodality in vector stores
+
+:::info Prerequisites
+* [Vectorstores](/docs/concepts/vectorstores)
+:::
+
+Vector stores are databases for storing and retrieving embeddings, which are typically used in search and retrieval tasks. Similar to embeddings, vector stores are currently optimized for text-based data.
+
+As use cases involving multimodal search and retrieval tasks become more common, we expect to expand the vector store interface to accommodate other data types like images, audio, and video.
--- a/docs/docs/concepts/output_parsers.mdx
+++ b/docs/docs/concepts/output_parsers.mdx
@@ -0,0 +1,41 @@
+# Output parsers
+
+<span data-heading-keywords="output parser"></span>
+
+:::note
+
+The information here refers to parsers that take a text output from a model try to parse it into a more structured representation.
+More and more models are supporting function (or tool) calling, which handles this automatically.
+It is recommended to use function/tool calling rather than output parsing.
+See documentation for that [here](/docs/concepts/#function-tool-calling).
+
+:::
+
+`Output parser` is responsible for taking the output of a model and transforming it to a more suitable format for downstream tasks.
+Useful when you are using LLMs to generate structured data, or to normalize output from chat models and LLMs.
+
+LangChain has lots of different types of output parsers. This is a list of output parsers LangChain supports. The table below has various pieces of information:
+
+- **Name**: The name of the output parser
+- **Supports Streaming**: Whether the output parser supports streaming.
+- **Has Format Instructions**: Whether the output parser has format instructions. This is generally available except when (a) the desired schema is not specified in the prompt but rather in other parameters (like OpenAI function calling), or (b) when the OutputParser wraps another OutputParser.
+- **Calls LLM**: Whether this output parser itself calls an LLM. This is usually only done by output parsers that attempt to correct misformatted output.
+- **Input Type**: Expected input type. Most output parsers work on both strings and messages, but some (like OpenAI Functions) need a message with specific kwargs.
+- **Output Type**: The output type of the object returned by the parser.
+- **Description**: Our commentary on this output parser and when to use it.
+
+| Name                                                                                                                                                                                                                                    | Supports Streaming | Has Format Instructions | Calls LLM | Input Type         | Output Type          | Description                                                                                                                                                                                                                                              |
+|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|-------------------------|-----------|--------------------|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [JSON](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.json.JsonOutputParser.html#langchain_core.output_parsers.json.JsonOutputParser)                                                     | ✅                  | ✅                       |           | `str` \| `Message` | JSON object          | Returns a JSON object as specified. You can specify a Pydantic model and it will return JSON for that model. Probably the most reliable output parser for getting structured data that does NOT use function calling.                                    |
+| [XML](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.xml.XMLOutputParser.html#langchain_core.output_parsers.xml.XMLOutputParser)                                                          | ✅                  | ✅                       |           | `str` \| `Message` | `dict`               | Returns a dictionary of tags. Use when XML output is needed. Use with models that are good at writing XML (like Anthropic's).                                                                                                                            |
+| [CSV](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.list.CommaSeparatedListOutputParser.html#langchain_core.output_parsers.list.CommaSeparatedListOutputParser)                          | ✅                  | ✅                       |           | `str` \| `Message` | `List[str]`          | Returns a list of comma separated values.                                                                                                                                                                                                                |
+| [OutputFixing](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.fix.OutputFixingParser.html#langchain.output_parsers.fix.OutputFixingParser)                                                |                    |                         | ✅         | `str` \| `Message` |                      | Wraps another output parser. If that output parser errors, then this will pass the error message and the bad output to an LLM and ask it to fix the output.                                                                                              |
+| [RetryWithError](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.retry.RetryWithErrorOutputParser.html#langchain.output_parsers.retry.RetryWithErrorOutputParser)                          |                    |                         | ✅         | `str` \| `Message` |                      | Wraps another output parser. If that output parser errors, then this will pass the original inputs, the bad output, and the error message to an LLM and ask it to fix it. Compared to OutputFixingParser, this one also sends the original instructions. |
+| [Pydantic](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.pydantic.PydanticOutputParser.html#langchain_core.output_parsers.pydantic.PydanticOutputParser)                                 |                    | ✅                       |           | `str` \| `Message` | `pydantic.BaseModel` | Takes a user defined Pydantic model and returns data in that format.                                                                                                                                                                                     |
+| [YAML](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.yaml.YamlOutputParser.html#langchain.output_parsers.yaml.YamlOutputParser)                                                          |                    | ✅                       |           | `str` \| `Message` | `pydantic.BaseModel` | Takes a user defined Pydantic model and returns data in that format. Uses YAML to encode it.                                                                                                                                                             |
+| [PandasDataFrame](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.pandas_dataframe.PandasDataFrameOutputParser.html#langchain.output_parsers.pandas_dataframe.PandasDataFrameOutputParser) |                    | ✅                       |           | `str` \| `Message` | `dict`               | Useful for doing operations with pandas DataFrames.                                                                                                                                                                                                      |
+| [Enum](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.enum.EnumOutputParser.html#langchain.output_parsers.enum.EnumOutputParser)                                                          |                    | ✅                       |           | `str` \| `Message` | `Enum`               | Parses response into one of the provided enum values.                                                                                                                                                                                                    |
+| [Datetime](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.datetime.DatetimeOutputParser.html#langchain.output_parsers.datetime.DatetimeOutputParser)                                      |                    | ✅                       |           | `str` \| `Message` | `datetime.datetime`  | Parses response into a datetime string.                                                                                                                                                                                                                  |
+| [Structured](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.structured.StructuredOutputParser.html#langchain.output_parsers.structured.StructuredOutputParser)                            |                    | ✅                       |           | `str` \| `Message` | `Dict[str, str]`     | An output parser that returns structured information. It is less powerful than other output parsers since it only allows for fields to be strings. This can be useful when you are working with smaller LLMs.                                            |
+
+For specifics on how to use output parsers, see the [relevant how-to guides here](/docs/how_to/#output-parsers).
--- a/docs/docs/concepts/prompts.mdx
+++ b/docs/docs/concepts/prompts.mdx
@@ -0,0 +1,79 @@
+# Prompt Templates
+
+Prompt templates help to translate user input and parameters into instructions for a language model.
+This can be used to guide a model's response, helping it understand the context and generate relevant and coherent language-based output.
+
+Prompt Templates take as input a dictionary, where each key represents a variable in the prompt template to fill in.
+
+Prompt Templates output a PromptValue. This PromptValue can be passed to an LLM or a ChatModel, and can also be cast to a string or a list of messages.
+The reason this PromptValue exists is to make it easy to switch between strings and messages.
+
+There are a few different types of prompt templates:
+
+## String PromptTemplates
+
+These prompt templates are used to format a single string, and generally are used for simpler inputs.
+For example, a common way to construct and use a PromptTemplate is as follows:
+
+```python
+from langchain_core.prompts import PromptTemplate
+
+prompt_template = PromptTemplate.from_template("Tell me a joke about {topic}")
+
+prompt_template.invoke({"topic": "cats"})
+```
+
+## ChatPromptTemplates
+
+These prompt templates are used to format a list of messages. These "templates" consist of a list of templates themselves.
+For example, a common way to construct and use a ChatPromptTemplate is as follows:
+
+```python
+from langchain_core.prompts import ChatPromptTemplate
+
+prompt_template = ChatPromptTemplate.from_messages([
+    ("system", "You are a helpful assistant"),
+    ("user", "Tell me a joke about {topic}")
+])
+
+prompt_template.invoke({"topic": "cats"})
+```
+
+In the above example, this ChatPromptTemplate will construct two messages when called.
+The first is a system message, that has no variables to format.
+The second is a HumanMessage, and will be formatted by the `topic` variable the user passes in.
+
+## MessagesPlaceholder
+<span data-heading-keywords="messagesplaceholder"></span>
+
+This prompt template is responsible for adding a list of messages in a particular place.
+In the above ChatPromptTemplate, we saw how we could format two messages, each one a string.
+But what if we wanted the user to pass in a list of messages that we would slot into a particular spot?
+This is how you use MessagesPlaceholder.
+
+```python
+from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
+from langchain_core.messages import HumanMessage
+
+prompt_template = ChatPromptTemplate.from_messages([
+    ("system", "You are a helpful assistant"),
+    MessagesPlaceholder("msgs")
+])
+
+prompt_template.invoke({"msgs": [HumanMessage(content="hi!")]})
+```
+
+This will produce a list of two messages, the first one being a system message, and the second one being the HumanMessage we passed in.
+If we had passed in 5 messages, then it would have produced 6 messages in total (the system message plus the 5 passed in).
+This is useful for letting a list of messages be slotted into a particular spot.
+
+An alternative way to accomplish the same thing without using the `MessagesPlaceholder` class explicitly is:
+
+```python
+prompt_template = ChatPromptTemplate.from_messages([
+    ("system", "You are a helpful assistant"),
+    ("placeholder", "{msgs}") # <-- This is the changed part
+])
+```
+
+For specifics on how to use prompt templates, see the [relevant how-to guides here](/docs/how_to/#prompt-templates).
--- a/docs/docs/concepts/rag.mdx
+++ b/docs/docs/concepts/rag.mdx
@@ -0,0 +1,98 @@
+# Retrieval Augmented Generation (RAG)
+
+:::info[Prerequisites]
+
+* [Retrieval](/docs/concepts/retrieval/)
+
+:::
+
+## Overview
+
+Retrieval Augmented Generation (RAG) is a powerful technique that enhances [language models](/docs/concepts/chat_models/) by combining them with external knowledge bases. 
+RAG addresses [a key limitation of models](https://www.glean.com/blog/how-to-build-an-ai-assistant-for-the-enterprise): models rely on fixed training datasets, which can lead to outdated or incomplete information.
+When given a query, RAG systems first search a knowledge base for relevant information.
+The system then incorporates this retrieved information into the model's prompt.
+The model uses the provided context to generate a response to the query.
+By bridging the gap between vast language models and dynamic, targeted information retrieval, RAG is a powerful technique for building more capable and reliable AI systems.
+
+## Key Concepts
+
+![Conceptual Overview](/img/rag_concepts.png)
+
+(1) **Retrieval system**: Retrieve relevant information from a knowledge base.
+
+(2) **Adding external knowledge**: Pass retrieved information to a model.
+
+## Retrieval system
+
+Model's have internal knowledge that is often fixed, or at least not updated frequently due to the high cost of training.
+This limits their ability to answer questions about current events, or to provide specific domain knowledge.
+To address this, there are various knowledge injection techniques like [fine-tuning](https://hamel.dev/blog/posts/fine_tuning_valuable.html) or continued pre-training.
+Both are [costly](https://www.glean.com/blog/how-to-build-an-ai-assistant-for-the-enterprise) and often [poorly suited](https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts) for factual retrieval.
+Using a retrieval system offers several advantages:
+
+- **Up-to-date information**: RAG can access and utilize the latest data, keeping responses current.
+- **Domain-specific expertise**: With domain-specific knowledge bases, RAG can provide answers in specific domains.
+- **Reduced hallucination**: Grounding responses in retrieved facts helps minimize false or invented information.
+- **Cost-effective knowledge integration**: RAG offers a more efficient alternative to expensive model fine-tuning.
+
+:::info[Further reading]
+
+See our conceptual guide on [retrieval](/docs/concepts/retrieval/).
+
+:::
+
+## Adding external knowledge
+
+With a retrieval system in place, we need to pass knowledge from this system to the model. 
+A RAG pipeline typically achieves this following these steps:
+
+- Receive an input query.
+- Use the retrieval system to search for relevant information based on the query.
+- Incorporate the retrieved information into the prompt sent to the LLM.
+- Generate a response that leverages the retrieved context.
+
+As an example, here's a simple RAG workflow that passes information from a [retriever](/docs/concepts/retrievers/) to a [chat model](/docs/concepts/chat_models/):
+
+```python
+from langchain_openai import ChatOpenAI
+from langchain_core.messages import SystemMessage, HumanMessage
+
+# Define a system prompt that tells the model how to use the retrieved context
+system_prompt = """You are an assistant for question-answering tasks. 
+Use the following pieces of retrieved context to answer the question. 
+If you don't know the answer, just say that you don't know. 
+Use three sentences maximum and keep the answer concise.
+Context: {context}:"""
+    
+# Define a question
+question = """What are the main components of an LLM-powered autonomous agent system?"""
+
+# Retrieve relevant documents
+docs = retriever.invoke(question)
+
+# Combine the documents into a single string
+docs_text = "".join(d.page_content for d in docs)
+
+# Populate the system prompt with the retrieved context
+system_prompt_fmt = system_prompt.format(context=docs_text)
+
+# Create a model
+model = ChatOpenAI(model="gpt-4o", temperature=0) 
+
+# Generate a response
+questions = model.invoke([SystemMessage(content=system_prompt_fmt),
+                          HumanMessage(content=question)])
+```
+
+:::info[Further reading]
+
+RAG a deep area with many possible optimization and design choices:
+
+* See [this excellent blog](https://cameronrwolfe.substack.com/p/a-practitioners-guide-to-retrieval?utm_source=profile&utm_medium=reader2) from Cameron Wolfe for a comprehensive overview and history of RAG.
+* See our [RAG how-to guides](/docs/how_to/#qa-with-rag).
+* See our RAG [tutorials](/docs/tutorials/#working-with-external-knowledge).
+* See our RAG from Scratch course, with [code](https://github.com/langchain-ai/rag-from-scratch) and [video playlist](https://www.youtube.com/playlist?list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x).
+* Also, see our RAG from Scratch course [on Freecodecamp](https://youtu.be/sVcwVQRHIc8?feature=shared).
+
+:::
--- a/docs/docs/concepts/retrieval.mdx
+++ b/docs/docs/concepts/retrieval.mdx
@@ -0,0 +1,240 @@
+# Retrieval
+
+:::info[Prerequisites]
+
+* [Retrievers](/docs/concepts/retrievers/)
+* [Vectorstores](/docs/concepts/vectorstores/)
+* [Embeddings](/docs/concepts/embedding_models/)
+* [Text splitters](/docs/concepts/text_splitters/)
+
+:::
+
+:::danger[Security]
+ 
+Some of the concepts reviewed here utilize models to generate queries (e.g., for SQL or graph databases).
+There are inherent risks in doing this. 
+Make sure that your database connection permissions are scoped as narrowly as possible for your application's needs. 
+This will mitigate, though not eliminate, the risks of building a model-driven system capable of querying databases. 
+For more on general security best practices, see our [security guide](/docs/security/).
+
+:::
+
+## Overview 
+
+Retrieval systems are fundamental to many AI applications, efficiently identifying relevant information from large datasets. 
+These systems accommodate various data formats:
+
+- Unstructured text (e.g., documents) is often stored in vector stores or lexical search indexes.
+- Structured data is typically housed in relational or graph databases with defined schemas.
+
+Despite this diversity in data formats, modern AI applications increasingly aim to make all types of data accessible through natural language interfaces. 
+Models play a crucial role in this process by translating natural language queries into formats compatible with the underlying search index or database. 
+This translation enables more intuitive and flexible interactions with complex data structures.
+
+## Key concepts 
+
+![Retrieval](/img/retrieval_concept.png)
+
+(1) **Query analysis**: A process where models transform or construct search queries to optimize retrieval.
+
+(2) **Information retrieval**: Search queries are used to fetch information from various retrieval systems.
+
+## Query Analysis 
+
+While users typically prefer to interact with retrieval systems using natural language, retrieval systems can specific query syntax or benefit from particular keywords. 
+Query analysis serves as a bridge between raw user input and optimized search queries. Some common applications of query analysis include:
+
+1. **Query Re-writing**: Queries can be re-written or expanded to improve semantic or lexical searches.
+2. **Query Construction**: Search indexes may require structured queries (e.g., SQL for databases).
+
+Query analysis employs models to transform or construct optimized search queries from raw user input. 
+
+### Query Re-writing
+
+Retrieval systems should ideally handle a wide spectrum of user inputs, from simple and poorly worded queries to complex, multi-faceted questions. 
+To achieve this versatility, a popular approach is to use models to transform raw user queries into more effective search queries. 
+This transformation can range from simple keyword extraction to sophisticated query expansion and reformulation.
+Here are some key benefits of using models for query analysis in unstructured data retrieval:
+
+1. **Query Clarification**: Models can rephrase ambiguous or poorly worded queries for clarity.
+2. **Semantic Understanding**: They can capture the intent behind a query, going beyond literal keyword matching.
+3. **Query Expansion**: Models can generate related terms or concepts to broaden the search scope.
+4. **Complex Query Handling**: They can break down multi-part questions into simpler sub-queries.
+
+Various techniques have been developed to leverage models for query re-writing, including:
+
+| Name                                                                                                      | When to use                                                                                     | Description                                                                                                                                                                                                                                                                            |
+|-----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [Multi-query](/docs/how_to/MultiQueryRetriever/)                                                          | When you want to ensure high recall in retrieval by providing multiple pharsings of a question. | Rewrite the user question with multiple pharsings, retrieve documents for each rewritten question, return the unique documents for all queries.                                                                                                                                        |
+| [Decomposition](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb) | When a question can be broken down into smaller subproblems.                                    | Decompose a question into a set of subproblems / questions, which can either be solved sequentially (use the answer from first + retrieval to answer the second) or in parallel (consolidate each answer into final answer).                                                           |
+| [Step-back](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb)     | When a higher-level conceptual understanding is required.                                       | First prompt the LLM to ask a generic step-back question about higher-level concepts or principles, and retrieve relevant facts about them. Use this grounding to help answer the user question. [Paper](https://arxiv.org/pdf/2310.06117).                                            |
+| [HyDE](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb)          | If you have challenges retrieving relevant documents using the raw user inputs.                 | Use an LLM to convert questions into hypothetical documents that answer the question. Use the embedded hypothetical documents to retrieve real documents with the premise that doc-doc similarity search can produce more relevant matches. [Paper](https://arxiv.org/abs/2212.10496). |
+
+As an example, query decomposition can simply be accomplished using prompting and a structured output that enforces a list of sub-questions.
+These can then be run sequentially or in parallel on a downstream retrieval system.
+
+```python
+from pydantic import BaseModel, Field
+from langchain_openai import ChatOpenAI
+from langchain_core.messages import SystemMessage, HumanMessage
+
+# Define a Pydantic model to enforce the output structure
+class Questions(BaseModel):
+    questions: List[str] = Field(
+        description="A list of sub-questions related to the input query."
+    )
+
+# Create an instance of the model and enforce the output structure
+model = ChatOpenAI(model="gpt-4o", temperature=0) 
+structured_model = model.with_structured_output(Questions)
+
+# Define the system prompt
+system = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
+The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n"""
+
+# Pass the question to the model
+question = """What are the main components of an LLM-powered autonomous agent system?"""
+questions = structured_model.invoke([SystemMessage(content=system)]+[HumanMessage(content=question)])
+```
+
+:::tip
+
+See our RAG from Scratch videos for a few different specific approaches:
+- [Multi-query](https://youtu.be/JChPi0CRnDY?feature=shared)
+- [Decomposition](https://youtu.be/h0OPWlEOank?feature=shared)
+- [Step-back](https://youtu.be/xn1jEjRyJ2U?feature=shared)
+- [HyDE](https://youtu.be/SaDzIVkYqyY?feature=shared)
+
+:::
+
+### Query Construction
+
+Query analysis also can focus on translating natural language queries into specialized query languages or filters. 
+This translation is crucial for effectively interacting with various types of databases that house structured or semi-structured data.
+
+1. **Structured Data examples**: For relational and graph databases, Domain-Specific Languages (DSLs) are used to query data.
+   - **Text-to-SQL**: [Converts natural language to SQL](https://paperswithcode.com/task/text-to-sql) for relational databases.
+   - **Text-to-Cypher**: [Converts natural language to Cypher](https://neo4j.com/labs/neodash/2.4/user-guide/extensions/natural-language-queries/) for graph databases.
+
+2. **Semi-structured Data examples**: For vectorstores, queries can combine semantic search with metadata filtering.
+   - **Natural Language to Metadata Filters**: Converts user queries into [appropriate metadata filters](https://docs.pinecone.io/guides/data/filter-with-metadata).
+
+These approaches leverage models to bridge the gap between user intent and the specific query requirements of different data storage systems. Here are some popular techniques:
+
+| Name                                     | When to Use                                                                                                                          | Description                                                                                                                                                                                                                                          |
+|------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [Self Query](/docs/how_to/self_query/)   | If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text. | This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filter to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself). |
+| [Text to SQL](/docs/tutorials/sql_qa/)   | If users are asking questions that require information housed in a relational database, accessible via SQL.                          | This uses an LLM to transform user input into a SQL query.                                                                                                                                                                                           |
+| [Text-to-Cypher](/docs/tutorials/graph/) | If users are asking questions that require information housed in a graph database, accessible via Cypher.                            | This uses an LLM to transform user input into a Cypher query.                                                                                                                                                                                        |
+
+As an example, here is how to use the `SelfQueryRetriever` to convert natural language queries into metadata filters.  
+
+```python
+metadata_field_info = schema_for_metadata 
+document_content_description = "Brief summary of a movie"
+llm = ChatOpenAI(temperature=0)
+retriever = SelfQueryRetriever.from_llm(
+    llm,
+    vectorstore,
+    document_content_description,
+    metadata_field_info,
+)
+```
+
+:::info[Further reading]
+
+* See our tutorials on [text-to-SQL](/docs/tutorials/sql_qa/), [text-to-Cypher](/docs/tutorials/graph/), and [query analysis for metadata filters](/docs/tutorials/query_analysis/).
+* See our [blog post overview](https://blog.langchain.dev/query-construction/).
+* See our RAG from Scratch video on [query construction](https://youtu.be/kl6NwWYxvbM?feature=shared).
+
+::: 
+
+## Information Retrieval 
+
+### Common retrieval systems
+
+#### Lexical search indexes
+
+Many search engines are based upon matching words in a query to the words in each document. 
+This approach is called lexical retrieval, using search [algorithms that are typically based upon word frequencies](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2).
+The intution is simple: a word appears frequently both in the user’s query and a particular document, then this document might be a good match.
+
+The particular data structure used to implement this is often an [*inverted index*](https://www.geeksforgeeks.org/inverted-index/).
+This types of index contains a list of words and a mapping of each word to a list of locations at which it occurs in various documents. 
+Using this data structure, it is possible to efficiently match the words in search queries to the documents in which they appear.
+[BM25](https://en.wikipedia.org/wiki/Okapi_BM25#:~:text=BM25%20is%20a%20bag%2Dof,slightly%20different%20components%20and%20parameters.) and [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) are [two popular lexical search algorithms](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2).
+
+:::info[Further reading]
+
+* See the [BM25](/docs/integrations/retrievers/bm25/) retriever integration.
+* See the [Elasticsearch](/docs/integrations/retrievers/elasticsearch_retriever/) retriever integration.
+
+::: 
+
+#### Vector indexes
+
+Vector indexes are an alternative way to index and store unstructured data.
+See our conceptual guide on [vectorstores](/docs/concepts/vectorstores/) for a detailed overview.  
+In short, rather than using word frequencies, vectorstores use an [embedding model](/docs/concepts/embedding_models/) to compress documents into high-dimensional vector representation. 
+This allows for efficient similarity search over embedding vectors using simple mathematical operations like cosine similarity.
+
+:::info[Further reading]
+
+* See our [how-to guide](/docs/how_to/vectorstore_retriever/) for more details on working with vectorstores.
+* See our [list of vectorstore integrations](/docs/integrations/vectorstores/).
+* See Cameron Wolfe's [blog post](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2) on the basics of vector search.
+
+:::
+
+#### Relational databases
+
+Relational databases are a fundamental type of structured data storage used in many applications. 
+They organize data into tables with predefined schemas, where each table represents an entity or relationship. 
+Data is stored in rows (records) and columns (attributes), allowing for efficient querying and manipulation through SQL (Structured Query Language). 
+Relational databases excel at maintaining data integrity, supporting complex queries, and handling relationships between different data entities.
+
+:::info[Further reading]
+
+* See our [tutorial](/docs/tutorials/sql_qa/) for working with SQL databases.
+* See our [SQL database toolkit](/docs/integrations/tools/sql_database/).
+
+:::
+
+#### Graph databases
+
+Graph databases are a specialized type of database designed to store and manage highly interconnected data. 
+Unlike traditional relational databases, graph databases use a flexible structure consisting of nodes (entities), edges (relationships), and properties. 
+This structure allows for efficient representation and querying of complex, interconnected data.
+Graph databases store data in a graph structure, with nodes, edges, and properties.
+They are particularly useful for storing and querying complex relationships between data points, such as social networks, supply-chain management, fraud detection, and recommendation services
+
+:::info[Further reading]
+
+* See our [tutorial](/docs/tutorials/graph/) for working with graph databases.
+* See our [list of graph database integrations](/docs/integrations/graphs/). 
+* See Neo4j's [starter kit for LangChain](https://neo4j.com/developer-blog/langchain-neo4j-starter-kit/).
+
+:::
+
+### Retriever  
+
+LangChain provides a unified interface for interacting with various retrieval systems through the [retriever](/docs/concepts/retrievers/) concept. The interface is straightforward:
+
+1. Input: A query (string)
+2. Output: A list of documents (standardized LangChain [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects)
+
+You can create a retriever using any of the retrieval systems mentioned earlier. The query analysis techniques we discussed are particularly useful here, as they enable natural language interfaces for databases that typically require structured query languages.
+For example, you can build a retriever for a SQL database using text-to-SQL conversion. This allows a natural language query (string) to be transformed into a SQL query behind the scenes.
+Regardless of the underlying retrieval system, all retrievers in LangChain share a common interface. You can use them with the simple `invoke` method:
+
+
+```python
+docs = retriever.invoke(query)
+```
+
+:::info[Further reading]
+
+* See our [conceptual guide on retrievers](/docs/concepts/retrievers/).
+* See our [how-to guide](/docs/how_to/#retrievers) on working with retrievers.
+
+:::
--- a/docs/docs/concepts/retrievers.mdx
+++ b/docs/docs/concepts/retrievers.mdx
@@ -0,0 +1,145 @@
+# Retrievers
+
+<span data-heading-keywords="retriever,retrievers"></span>
+
+:::info[Prerequisites]
+
+* [Vectorstores](/docs/concepts/vectorstores/)
+* [Embeddings](/docs/concepts/embedding_models/)
+* [Text splitters](/docs/concepts/text_splitters/)
+
+:::
+
+## Overview
+
+Many different types of retrieval systems exist, including vectorstores, graph databases, and relational databases.
+With the rise on popularity of large language models, retrieval systems have become an important component in AI application (e.g., [RAG](/docs/concepts/rag/)).
+Because of their importance and variability, LangChain provides a uniform interface for interacting with different types of retrieval systems.
+The LangChain [retriever](/docs/concepts/retrievers/) interface is straightforward:
+
+1. Input: A query (string)
+2. Output: A list of documents (standardized LangChain [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects)
+
+## Key concept
+
+![Retriever](/img/retriever_concept.png)
+ 
+All retrievers implement a simple interface for retrieving documents using natural language queries.
+
+## Interface 
+
+The only requirement for a retriever is the ability to accepts a query and return documents. 
+In particular, LangChain's retriever class only requires that the `_get_relevant_documents` method is implemented, which takes a `query: str` and returns a list of [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects that are most relevant to the query.
+The underlying logic used to get relevant documents is specified by the retriever and can be whatever is most useful for the application.
+
+A LangChain retriever is a [runnable](/docs/how_to/lcel_cheatsheet/), which is a standard interface is for LangChain components. 
+This means that it has a few common methods, including `invoke`, that are used to interact with it. A retriever can be invoked with a query:
+
+```python
+docs = retriever.invoke(query)
+```
+
+Retrievers return a list of [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects, which have two attributes:
+
+* `page_content`: The content of this document. Currently is a string.
+* `metadata`: Arbitrary metadata associated with this document (e.g., document id, file name, source, etc). 
+
+:::info[Further reading]
+
+* See our [how-to guide](/docs/how_to/custom_retriever/) on building your own custom retriever.
+
+:::
+ 
+## Common types
+
+Despite the flexibility of the retriever interface, a few common types of retrieval systems are frequently used.
+
+### Search APIs
+
+It's important to note that retrievers don't need to actually *store* documents. 
+For example, we can be built retrievers on top of search APIs that simply return search results! 
+See our retriever integrations with [Amazon Kendra](https://python.langchain.com/docs/integrations/retrievers/amazon_kendra_retriever/) or [Wikipedia Search](https://python.langchain.com/docs/integrations/retrievers/wikipedia/). 
+
+### Relational or Graph Database
+
+Retrievers can be built on top of relational or graph databases.
+In these cases, [query analysis](/docs/concepts/retrieval/) techniques to construct a structured query from natural language is critical.
+For example, you can build a retriever for a SQL database using text-to-SQL conversion. This allows a natural language query (string) retriever to be transformed into a SQL query behind the scenes.
+
+:::info[Further reading]
+
+* See our [tutorial](/docs/tutorials/sql_qa/) for context on how to build a retreiver using a SQL database and text-to-SQL.
+* See our [tutorial](/docs/tutorials/graph/) for context on how to build a retreiver using a graph database and text-to-Cypher.
+
+:::
+
+### Lexical Search
+
+As discussed in our conceptual review of [retrieval](/docs/concepts/retrieval/), many search engines are based upon matching words in a query to the words in each document. 
+[BM25](https://en.wikipedia.org/wiki/Okapi_BM25#:~:text=BM25%20is%20a%20bag%2Dof,slightly%20different%20components%20and%20parameters.) and [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) are [two popular lexical search algorithms](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2).
+LangChain has retrievers for many popular lexical search algorithms / engines.
+
+:::info[Further reading]
+
+* See the [BM25](/docs/integrations/retrievers/bm25/) retriever integration.
+* See the [TF-IDF](/docs/integrations/retrievers/tf_idf/) retriever integration.
+* See the [Elasticsearch](/docs/integrations/retrievers/elasticsearch_retriever/) retriever integration.
+
+::: 
+
+### Vectorstore 
+
+[Vectorstores](/docs/concepts/vectorstores/) are a powerful and efficient way to index and retrieve unstructured data. 
+An vectorstore can be used as a retriever by calling the `as_retriever()` method.
+
+```python
+vectorstore = MyVectorStore()
+retriever = vectorstore.as_retriever()
+```
+
+## Advanced retrieval patterns
+
+### Ensemble 
+
+Because the retriever interface is so simple, returning a list of `Document` objects given a search query, it is possible to combine multiple retrievers using ensembling.
+This is particularly useful when you have multiple retrievers that are good at finding different types of relevant documents.
+It is easy to create an [ensemble retriever](/docs/how_to/ensemble_retriever/) that combines multiple retrievers with linear weighted scores:
+
+```python
+# initialize the ensemble retriever
+ensemble_retriever = EnsembleRetriever(
+    retrievers=[bm25_retriever, vector_store_retriever], weights=[0.5, 0.5]
+)
+```
+
+When ensembling, how do we combine search results from many retrievers? 
+This motivates the concept of re-ranking, which takes the output of multiple retrievers and combines them using a more sophisticated algorithm such as [Reciprocal Rank Fusion (RRF)](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).
+
+### Source Document Retention 
+
+Many retrievers utilize some kind of index to make documents easily searchable.
+The process of indexing can include a transformation step (e.g., vectorstores often use document splitting). 
+Whatever transformation is used, can be very useful to retain a link between the *transformed document* and the original, giving the retriever the ability to return the *original* document.
+
+![Retrieval with full docs](/img/retriever_full_docs.png)
+
+This is particularly useful in AI applications, because it ensures no loss in document context for the model.
+For example, you may use small chunk size for indexing documents in a vectorstore. 
+If you return *only* the chunks as the retrieval result, then the model will have lost the original document context for the chunks. 
+
+LangChain has two different retrievers that can be used to address this challenge. 
+The [Multi-Vector](/docs/how_to/multi_vector/) retriever allows the user to use any document transformation (e.g., use an LLM to write a summary of the document) for indexing while retaining linkage to the source document. 
+The [ParentDocument](/docs/how_to/parent_document_retriever/) retriever links document chunks from a text-splitter transformation for indexing while retaining linkage to the source document. 
+
+| Name                                                      | Index Type                    | Uses an LLM               | When to Use                                                                                                                             | Description                                                                                                                                                                                                              |
+|-----------------------------------------------------------|-------------------------------|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [ParentDocument](/docs/how_to/parent_document_retriever/) | Vector store + Document Store | No                        | If your pages have lots of smaller pieces of distinct information that are best indexed by themselves, but best retrieved all together. | This involves indexing multiple chunks for each document. Then you find the chunks that are most similar in embedding space, but you retrieve the whole parent document and return that (rather than individual chunks). |
+| [Multi Vector](/docs/how_to/multi_vector/)                | Vector store + Document Store | Sometimes during indexing | If you are able to extract information from documents that you think is more relevant to index than the text itself.                    | This involves creating multiple vectors for each document. Each vector could be created in a myriad of ways - examples include summaries of the text and hypothetical questions.                                         |
+
+:::info[Further reading]
+
+* See our [how-to guide](/docs/how_to/parent_document_retriever/) on using the ParentDocument retriever.
+* See our [how-to guide](/docs/how_to/multi_vector/) on using the MultiVector retriever.
+* See our RAG from Scratch video on the [multi vector retriever](https://youtu.be/gTCU9I6QqCE?feature=shared).
+
+:::
--- a/docs/docs/concepts/runnables.mdx
+++ b/docs/docs/concepts/runnables.mdx
@@ -0,0 +1,352 @@
+# Runnable Interface
+
+The Runnable interface is foundational for working with LangChain components, and it's implemented across many of them, such as [language models](/docs/concepts/chat_models), [output parsers](/docs/concepts/output_parsers), [retrievers](/docs/concepts/retrievers), [compiled LangGraph graphs](
+https://langchain-ai.github.io/langgraph/concepts/low_level/#compiling-your-graph) and more.
+
+This guide covers the main concepts and methods of the Runnable interface, which allows developers to interact with various LangChain components in a consistent and predictable manner.
+
+:::info Related Resources
+* The ["Runnable" Interface API Reference](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable) provides a detailed overview of the Runnable interface and its methods.
+* A list of built-in `Runnables` can be found in the [LangChain Core API Reference](https://python.langchain.com/api_reference/core/runnables.html). Many of these Runnables are useful when composing custom "chains" in LangChain using the [LangChain Expression Language (LCEL)](/docs/concepts/lcel).
+:::
+
+## Overview of Runnable Interface
+
+The Runnable way defines a standard interface that allows a Runnable component to be:
+
+* [Invoked](/docs/how_to/lcel_cheatsheet/#invoke-a-runnable): A single input is transformed into an output.
+* [Batched](/docs/how_to/lcel_cheatsheet/#batch-a-runnable/): Multiple inputs are efficiently transformed into outputs.
+* [Streamed](/docs/how_to/lcel_cheatsheet/#stream-a-runnable): Outputs are streamed as they are produced.
+* Inspected: Schematic information about Runnable's input, output, and configuration can be accessed.
+* Composed: Multiple Runnables can be composed to work together using [the LangChain Expression Language (LCEL)](/docs/concepts/lcel) to create complex pipelines.
+
+Please review the [LCEL Cheatsheet](/docs/how_to/lcel_cheatsheet) for some common patterns that involve the Runnable interface and LCEL expressions.
+
+<a id="batch"></a>
+### Optimized Parallel Execution (Batch)
+<span data-heading-keywords="batch"></span>
+
+LangChain Runnables offer a built-in `batch` (and `batch_as_completed`) API that allow you to process multiple inputs in parallel.
+
+Using these methods can significantly improve performance when needing to process multiple independent inputs, as the
+processing can be done in parallel instead of sequentially.
+
+The two batching options are:
+
+* `batch`: Process multiple inputs in parallel, returning results in the same order as the inputs.
+* `batch_as_completed`: Process multiple inputs in parallel, returning results as they complete. Results may arrive out of order, but each includes the input index for matching.
+
+The default implementation of `batch` and `batch_as_completed` use a thread pool executor to run the `invoke` method in parallel. This allows for efficient parallel execution without the need for users to manage threads, and speeds up code that is I/O-bound (e.g., making API requests, reading files, etc.). It will not be as effective for CPU-bound operations, as the GIL (Global Interpreter Lock) in Python will prevent true parallel execution.
+
+Some Runnables may provide their own implementations of `batch` and `batch_as_completed` that are optimized for their specific use case (e.g.,
+rely on a `batch` API provided by a model provider).
+
+:::note
+The async versions of `abatch` and `abatch_as_completed` these rely on asyncio's [gather](https://docs.python.org/3/library/asyncio-task.html#asyncio.gather) and [as_completed](https://docs.python.org/3/library/asyncio-task.html#asyncio.as_completed) functions to run the `ainvoke` method in parallel.
+:::
+
+:::tip
+When processing a large number of inputs using `batch` or `batch_as_completed`, users may want to control the maximum number of parallel calls. This can be done by setting the `max_concurrency` attribute in the `RunnableConfig` dictionary. See the [RunnableConfig](/docs/concepts/runnables#runnableconfig) for more information.
+
+Chat Models also have a built-in [rate limiter](/docs/concepts/chat_models#rate-limiting) that can be used to control the rate at which requests are made.
+:::
+
+### Asynchronous Support
+<span data-heading-keywords="async-api"></span>
+
+Runnables expose an asynchronous API, allowing them to be called using the `await` syntax in Python. Asynchronous methods can be identified by the "a" prefix (e.g., `ainvoke`, `abatch`, `astream`, `abatch_as_completed`).
+
+Please refer to the [Async Programming with LangChain](/docs/concepts/async) guide for more details.
+
+## Streaming APIs
+<span data-heading-keywords="streaming-api"></span>
+
+Streaming is critical in making applications based on LLMs feel responsive to end-users.
+
+Runnables expose the following three streaming APIs:
+
+1. sync [stream](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.stream) and async [astream](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream): yields the output a Runnable as it is generated.
+2. The async `astream_events`: a more advanced streaming API that allows streaming intermediate steps and final output
+3. The **legacy** async `astream_log`: a legacy streaming API that streams intermediate steps and final output
+
+Please refer to the [Streaming Conceptual Guide](/docs/concepts/streaming) for more details on how to stream in LangChain.
+
+## Input and Output Types
+
+Every `Runnable` is characterized by an input and output type. These input and output types can be any Python object, and are defined by the Runnable itself.
+
+Runnable methods that result in the execution of the Runnable (e.g., `invoke`, `batch`, `stream`, `astream_events`) work with these input and output types.
+
+* invoke: Accepts an input and returns an output.
+* batch: Accepts a list of inputs and returns a list of outputs.
+* stream: Accepts an input and returns a generator that yields outputs.
+
+The **input type** and **output type** vary by component:
+
+| Component    | Input Type                                       | Output Type           |
+|--------------|--------------------------------------------------|-----------------------|
+| Prompt       | dictionary                                       | PromptValue           |
+| ChatModel    | a string, list of chat messages or a PromptValue | ChatMessage           |
+| LLM          | a string, list of chat messages or a PromptValue | String                |
+| OutputParser | the output of an LLM or ChatModel                | Depends on the parser |
+| Retriever    | a string                                         | List of Documents     |
+| Tool         | a string or dictionary, depending on the tool    | Depends on the tool   |
+
+Please refer to the individual component documentation for more information on the input and output types and how to use them.
+
+### Inspecting Schemas
+
+:::note
+This is an advanced feature that is unnecessary for most users. You should probably
+skip this section unless you have a specific need to inspect the schema of a Runnable.
+:::
+
+In some advanced uses, you may want to programmatically **inspect** the Runnable and determine what input and output types the Runnable expects and produces.
+
+The Runnable interface provides methods to get the [JSON Schema](https://json-schema.org/) of the input and output types of a Runnable, as well as [Pydantic schemas](https://docs.pydantic.dev/latest/) for the input and output types.
+
+These APIs are mostly used internally for unit-testing and by [LangServe](/docs/concepts/langserve) which uses the APIs for input validation and generation of [OpenAPI documentation](https://www.openapis.org/).
+
+In addition, to the input and output types, some Runnables have been set up with additional run time configuration options. 
+There are corresponding APIs to get the Pydantic Schema and JSON Schema of the configuration options for the Runnable.
+Please see the [Configurable Runnables](#configurable-runnables) section for more information.
+
+| Method                  | Description                                                      |
+|-------------------------|------------------------------------------------------------------|
+| `get_input_schema`      | Gives the Pydantic Schema of the input schema for the Runnable.  |
+| `get_output_chema`      | Gives the Pydantic Schema of the output schema for the Runnable. |
+| `config_schema`         | Gives the Pydantic Schema of the config schema for the Runnable. |
+| `get_input_jsonschema`  | Gives the JSONSchema of the input schema for the Runnable.       |
+| `get_output_jsonschema` | Gives the JSONSchema of the output schema for the Runnable.      |
+| `get_config_jsonschema` | Gives the JSONSchema of the config schema for the Runnable.      |
+
+
+#### with_types
+
+LangChain will automatically try to infer the input and output types of a Runnable based on available information.
+
+Currently, this inference does not work well for more complex Runnables that are built using [LCEL](/docs/concepts/lcel) composition, and the inferred input and / or output types may be incorrect. In these cases, we recommend that users override the inferred input and output types using the `with_types` method ([API Reference](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.with_types
+).
+
+## RunnableConfig
+
+Any of the methods that are used to execute the runnable (e.g., `invoke`, `batch`, `stream`, `astream_events`) accept a second argument called
+`RunnableConfig` ([API Reference](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.config.RunnableConfig.html#runnableconfig)). This argument is a dictionary that contains configuration for the Runnable that will be used
+at run time during the execution of the runnable.
+
+A `RunnableConfig` can have any of the following properties defined:
+
+| Attribute       | Description                                                                                |
+|-----------------|--------------------------------------------------------------------------------------------|
+| run_name        | Name used for the given Runnable (not inherited).                                          |
+| run_id          | Unique identifier for this call. sub-calls will get their own unique run ids.              |
+| tags            | Tags for this call and any sub-calls.                                                      |
+| metadata        | Metadata for this call and any sub-calls.                                                  |
+| callbacks       | Callbacks for this call and any sub-calls.                                                 |
+| max_concurrency | Maximum number of parallel calls to make (e.g., used by batch).                            |
+| recursion_limit | Maximum number of times a call can recurse (e.g., used by Runnables that return Runnables) |
+| configurable    | Runtime values for configurable attributes of the Runnable.                                |
+
+Passing `config` to the `invoke` method is done like so:
+
+```python
+some_runnable.invoke(
+   some_input, 
+   config={
+      'run_name': 'my_run', 
+      'tags': ['tag1', 'tag2'], 
+      'metadata': {'key': 'value'}
+      
+   }
+)
+```
+
+### Propagation of RunnableConfig
+
+Many `Runnables` are composed of other Runnables, and it is important that the `RunnableConfig` is propagated to all sub-calls made by the Runnable. This allows providing run time configuration values to the parent Runnable that are inherited by all sub-calls.
+
+If this were not the case, it would be impossible to set and propagate [callbacks](/docs/concepts/callbacks) or other configuration values like `tags` and `metadata` which
+are expected to be inherited by all sub-calls.
+
+There are two main patterns by which new `Runnables` are created:
+
+1. Declaratively using [LangChain Expression Language (LCEL)](/docs/concepts/lcel):
+
+    ```python
+    chain = prompt | chat_model | output_parser
+    ```
+
+2. Using a [custom Runnable](#custom-runnables)  (e.g., `RunnableLambda`) or using the `@tool` decorator:
+
+    ```python
+    def foo(input):
+        # Note that .invoke() is used directly here
+        return bar_runnable.invoke(input)
+    foo_runnable = RunnableLambda(foo)
+    ```
+
+LangChain will try to propagate `RunnableConfig` automatically for both of the patterns. 
+
+For handling the second pattern, LangChain relies on Python's [contextvars](https://docs.python.org/3/library/contextvars.html).
+
+In Python 3.11 and above, this works out of the box, and you do not need to do anything special to propagate the `RunnableConfig` to the sub-calls.
+
+In Python 3.9 and 3.10, if you are using **async code**, you need to manually pass the `RunnableConfig` through to the `Runnable` when invoking it. 
+
+This is due to a limitation in [asyncio's tasks](https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task)  in Python 3.9 and 3.10 which did
+not accept a `context` argument).
+
+Propagating the `RunnableConfig` manually is done like so:
+
+```python
+async def foo(input, config): # <-- Note the config argument
+    return await bar_runnable.ainvoke(input, config=config)
+    
+foo_runnable = RunnableLambda(foo)
+```
+
+:::caution
+When using Python 3.10 or lower and writing async code, `RunnableConfig` cannot be propagated
+automatically, and you will need to do it manually! This is a common pitfall when
+attempting to stream data using `astream_events` and `astream_log` as these methods
+rely on proper propagation of [callbacks](/docs/concepts/callbacks) defined inside of `RunnableConfig`.
+:::
+
+### Setting Custom Run Name, Tags, and Metadata
+
+The `run_name`, `tags`, and `metadata` attributes of the `RunnableConfig` dictionary can be used to set custom values for the run name, tags, and metadata for a given Runnable.
+
+The `run_name` is a string that can be used to set a custom name for the run. This name will be used in logs and other places to identify the run. It is not inherited by sub-calls.
+
+The `tags` and `metadata` attributes are lists and dictionaries, respectively, that can be used to set custom tags and metadata for the run. These values are inherited by sub-calls.
+
+Using these attributes can be useful for tracking and debugging runs, as they will be surfaced in [LangSmith](https://docs.smith.langchain.com/) as trace attributes that you can
+filter and search on.
+
+The attributes will also be propagated to [callbacks](/docs/concepts/callbacks), and will appear in streaming APIs like [astream_events](/docs/concepts/streaming) as part of each event in the stream.
+
+:::note Related
+* [How-to trace with LangChain](https://docs.smith.langchain.com/how_to_guides/tracing/trace_with_langchain)
+:::
+
+### Setting Run ID
+
+:::note
+This is an advanced feature that is unnecessary for most users.
+:::
+
+You may need to set a custom `run_id` for a given run, in case you want 
+to reference it later or correlate it with other systems.
+
+The `run_id` MUST be a valid UUID string and **unique** for each run. It is used to identify
+the parent run, sub-class will get their own unique run ids automatically.
+
+To set a custom `run_id`, you can pass it as a key-value pair in the `config` dictionary when invoking the Runnable:
+
+```python
+import uuid
+
+run_id = uuid.uuid4()
+
+some_runnable.invoke(
+   some_input, 
+   config={
+      'run_id': run_id
+   }
+)
+
+# do something with the run_id
+```
+
+### Setting Recursion Limit
+
+:::note
+This is an advanced feature that is unnecessary for most users.
+:::
+
+Some Runnables may return other Runnables, which can lead to infinite recursion if not handled properly. To prevent this, you can set a `recursion_limit` in the `RunnableConfig` dictionary. This will limit the number of times a Runnable can recurse.
+
+### Setting Max Concurrency
+
+If using the `batch` or `batch_as_completed` methods, you can set the `max_concurrency` attribute in the `RunnableConfig` dictionary to control the maximum number of parallel calls to make. This can be useful when you want to limit the number of parallel calls to prevent overloading a server or API.
+
+
+:::tip
+If you're trying to rate limit the number of requests made by a **Chat Model**, you can use the built-in [rate limiter](/docs/concepts/chat_models#rate-limiting) instead of setting `max_concurrency`, which will be more effective.
+
+See the [How to handle rate limits](https://python.langchain.com/docs/how_to/chat_model_rate_limiting/) guide for more information.
+:::
+
+### Setting configurable
+
+The `configurable` field is used to pass runtime values for configurable attributes of the Runnable.
+
+It is used frequently in [LangGraph](/docs/concepts/langgraph) with
+[LangGraph Persistence](https://langchain-ai.github.io/langgraph/concepts/persistence/)
+and [memory](https://langchain-ai.github.io/langgraph/concepts/memory/).
+
+It is used for a similar purpose in [RunnableWithMessageHistory](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html#langchain_core.runnables.history.RunnableWithMessageHistory) to specify either
+a `session_id` / `conversation_id` to keep track of conversation history.
+
+In addition, you can use it to specify any custom configuration options to pass to any [Configurable Runnable](#configurable-runnables) that they create.
+
+### Setting Callbacks
+
+Use this option to configure [callbacks](/docs/concepts/callbacks) for the runnable at 
+runtime. The callbacks will be passed to all sub-calls made by the runnable.
+
+```python
+some_runnable.invoke(
+   some_input,
+   {
+      "callbacks": [
+         SomeCallbackHandler(),
+         AnotherCallbackHandler(),
+      ]
+   }
+)
+```
+
+Please read the [Callbacks Conceptual Guide](/docs/concepts/callbacks) for more information on how to use callbacks in LangChain.
+
+:::important
+If you're using Python 3.9 or 3.10 in an async environment, you must propagate
+the `RunnableConfig` manually to sub-calls in some cases. Please see the
+[Propagating RunnableConfig](#propagation-of-runnableconfig) section for more information.
+:::
+
+## Creating a Runnable from a function
+
+You may need to create a custom Runnable that runs arbitrary logic. This is especially
+useful if using [LangChain Expression Language (LCEL)](/docs/concepts/lcel) to compose
+multiple Runnables and you need to add custom processing logic in one of the steps.
+
+There are two ways to create a custom Runnable from a function:
+
+* `RunnableLambda`: Use this simple transformations where streaming is not required.
+* `RunnableGenerator`: use this for more complex transformations when streaming is needed.
+
+See the [How to run custom functions](/docs/how_to/functions) guide for more information on how to use `RunnableLambda` and `RunnableGenerator`.
+
+:::important
+Users should not try to subclass Runnables to create a new custom Runnable. It is
+much more complex and error-prone than simply using `RunnableLambda` or `RunnableGenerator`.
+:::
+
+## Configurable Runnables
+
+:::note
+This is an advanced feature that is unnecessary for most users.
+
+It helps with configuration of large "chains" created using the [LangChain Expression Language (LCEL)](/docs/concepts/lcel)
+and is leveraged by [LangServe](/docs/concepts/langserve) for deployed Runnables.
+:::
+
+Sometimes you may want to experiment with, or even expose to the end user, multiple different ways of doing things with your Runnable. This could involve adjusting parameters like the temperature in a chat model or even switching between different chat models.
+
+To simplify this process, the Runnable interface provides two methods for creating configurable Runnables at runtime:
+
+* `configurable_fields`: This method allows you to configure specific **attributes** in a Runnable. For example, the `temperature` attribute of a chat model.
+* `configurable_alternatives`: This method enables you to specify **alternative** Runnables that can be run during run time. For example, you could specify a list of different chat models that can be used.
+
+See the [How to configure runtime chain internals](/docs/how_to/configure) guide for more information on how to configure runtime chain internals.
--- a/docs/docs/concepts/streaming.mdx
+++ b/docs/docs/concepts/streaming.mdx
@@ -0,0 +1,90 @@
+# Streaming
+
+:::info Prerequisites
+* [Runnable Interface](/docs/concepts/runnables)
+* [Chat Models](/docs/concepts/chat_models)
+:::
+
+:::info Related Resources
+
+Please see the following how-to guides for specific examples of streaming in LangChain:
+* [How to stream chat models](/docs/how_to/chat_streaming/)
+* [How to stream tool calls](/docs/how_to/tool_streaming/)
+* [How to stream Runnables](/docs/how_to/streaming/)
+:::
+
+Streaming is critical in making applications based on [LLMs](/docs/concepts/chat_models) feel responsive to end-users.
+
+## Why Streaming?
+
+[LLMs](/docs/concepts/chat_models) have noticeable latency on the order of seconds. This is much longer than the typical response time for most APIs, which are usually sub-second. The latency issue compounds quickly
+as you build more complex applications that involve multiple calls to a model.
+
+Fortunately, LLMs generate output iteratively, which means it's possible to show sensible intermediate results before the final response is ready. Consuming output as soon as it becomes available has therefore become a vital part of the UX around building apps with LLMs to help alleviate latency issues, and LangChain aims to have first-class support for streaming.
+
+## Streaming APIs
+
+Every LangChain component that implements the [Runnable Interface](/docs/concepts/runnables) supports streaming.
+
+There are three main APIs for streaming in LangChain:
+
+1. sync [stream](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.stream) and async [astream](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream): yields the output a Runnable as it is generated.
+2. The async [astream_events](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream_events): a streaming API that allows streaming intermediate steps from a Runnable. This API returns a stream of events.
+3. The **legacy** async [astream_log](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream_log): This is an advanced streaming API that allows streaming intermediate steps from a Runnable. Users should **not** use this API when writing new code.
+
+### Streaming with LangGraph
+
+[LangGraph](/docs/concepts/langgraph) compiled graphs are [Runnables](/docs/concepts/runnables) and support the same streaming APIs.
+
+In LangGraph the `stream` and `astream` methods are phrased in terms of changes to the [graph state](https://langchain-ai.github.io/langgraph/concepts/low_level/#state), and as a result are much more helpful for getting intermediate states of the graph as they are generated.
+
+Please review the [LangGraph streaming guide](https://langchain-ai.github.io/langgraph/concepts/streaming/) for more information on how to stream when working with LangGraph.
+
+### `.stream()` and `.astream()`
+<span data-heading-keywords="stream"></span>
+
+:::info Related Resources
+* [How to use streaming](/docs/how_to/streaming/#using-stream)
+:::
+
+The `.stream()` returns an iterator, which you can consume with a simple `for` loop. Here's an example with a chat model:
+
+```python
+from langchain_anthropic import ChatAnthropic
+
+model = ChatAnthropic(model="claude-3-sonnet-20240229")
+
+for chunk in model.stream("what color is the sky?"):
+    print(chunk.content, end="|", flush=True)
+```
+
+For models (or other components) that don't support streaming natively, this iterator would just yield a single chunk, but
+you could still use the same general pattern when calling them. Using `.stream()` will also automatically call the model in streaming mode
+without the need to provide additional config.
+
+The type of each outputted chunk depends on the type of component - for example, chat models yield [`AIMessageChunks`](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.AIMessageChunk.html).
+Because this method is part of [LangChain Expression Language](/docs/concepts/#langchain-expression-language-lcel),
+you can handle formatting differences from different outputs using an [output parser](/docs/concepts/#output-parsers) to transform
+each yielded chunk.
+
+## Dispatching Custom Events
+
+You can dispatch custom [callback events](/docs/concepts#callbacks) if you want to add custom data to the event stream of [astream events](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream_events).
+
+You can use custom events to provide additional information about the progress of a long-running task.
+
+For example, if you have a long-running [tool](/docs/concepts/tools) that involves multiple steps (e.g., multiple API calls) with multiple steps, you can dispatch custom events between the steps and use these custom events to monitor progress. You could also surface these custom events to an end user of your application to show them how the current task is progressing.
+
+:::info Related Resources
+* [How to dispatch custom callback events](https://python.langchain.com/docs/how_to/callbacks_custom_events/#astream-events-api)
+:::
+
+## Chat Models
+
+### "Auto-Streaming" Chat Models
+
+## Using Astream Events API
+
+### Async throughout
+
+Important LangChain primitives like chat models, output parsers, prompts, retrievers, and agents implement the LangChain Runnable Interface.
--- a/docs/docs/concepts/structured_output.mdx
+++ b/docs/docs/concepts/structured_output.mdx
@@ -0,0 +1,143 @@
+# Structured output
+
+## Overview 
+
+For many applications, such as chatbots, models need to respond to users directly in natural language. 
+However, there are scenarios where we need models to output in a *structured format*. 
+For example, we might want to store the model output in a database and ensure that the output conforms to the database schema.
+This need motivates the concept of structured output, where models can be instructed to respond with a particular output structure.
+
+![Structured output](/img/structured_output.png)
+
+## Key Concepts 
+
+**(1) Schema definition:** The output structure is represented as a schema, which can be defined in several ways. 
+**(2) Returning structured output:** The model is given this schema, and is instructed to return output that conforms to it.
+
+## Recommended usage
+
+This psuedo-code illustrates the recommended workflow when using structured output.
+LangChain provides a helper function, `with_structured_output()`, that automates the process of binding the schema to the model and parsing the output.
+This helper function is available for all model providers that support structured output. 
+
+```python
+# Define schema
+schema = {"foo": "bar"}
+# Bind schema to model
+model_with_structure = model.with_structured_output(schema)
+# Invoke the model to produce structured output that matches the schema
+structured_output = model_with_structure.invoke(user_input)
+```
+
+## Schema definition
+
+The central concept is that the output structure of model responses needs to be represented in some way. 
+While types of objects you can use depend on the model you're working with, there are common types of objects that are typically allowed or recommended for structured output in Python.
+
+The simplest and most common format for structured output is a JSON-like structure, which in Python can be represented as a dictionary (dict) or list (list).
+JSON objects (or dicts in Python) are often used directly when the tool requires raw, flexible, and minimal-overhead structured data.
+
+```json
+{
+  "answer": "The answer to the user's question",
+  "followup_question": "A followup question the user could ask"
+}
+```
+
+As a second example, [Pydantic](https://docs.pydantic.dev/latest/) is particularly useful for defining structured output schemas because it offers type hints and validation.
+Here's an example of a Pydantic schema: 
+
+```python
+from pydantic import BaseModel, Field
+class ResponseFormatter(BaseModel):
+    """Always use this tool to structure your response to the user."""
+    answer: str = Field(description="The answer to the user's question")
+    followup_question: str = Field(description="A followup question the user could ask")
+
+```
+
+TODO: There are many other ways to define schemas (Dataclasses, TypedDicts, Custom Classes). How many to cover? How many supported by popular model APIs?
+
+## Returning structured output
+
+With a schema defined, we need a way to instruct the model to use it.
+While one approach is to include this schema in the prompt and *ask nicely* for the model to use it, this is not recommended. 
+Several more powerful methods that utilities native features in the model provider's API are available.
+
+### Using Tool Calling
+
+Many [model providers support](/docs/integrations/chat/) tool calling, a concept discussed in more detail in our [tool calling guide](/docs/concepts/tool_calling/).
+In short, tool calling involves binding a tool to a model and, when appropriate, the model can *decide* to call this tool and ensure its response conforms to the tool's schema.
+With this in mind, the central concept is straightforward: *simply bind our schema to a model as a tool!*
+Here is an example using the `ResponseFormatter` schema defined above:
+
+```python
+from langchain_openai import ChatOpenAI
+model = ChatOpenAI(model="gpt-4o", temperature=0)
+# Bind ResponseFormatter schema as a tool to the model
+model_with_tools = model.bind_tools([ResponseFormatter])
+# Invoke the model
+ai_msg = model_with_tools.invoke("What is the powerhouse of the cell?")
+```
+
+The arguments of the tool call are already extracted as a dictionary. 
+This dictionary can be optionally parsed into a Pydantic object, matching our original `ResponseFormatter` schema.
+
+```python
+# Get the tool call arguments
+ai_msg.tool_calls[0]["args"]
+{'answer': "The powerhouse of the cell is the mitochondrion. Mitochondria are organelles that generate most of the cell's supply of adenosine triphosphate (ATP), which is used as a source of chemical energy.",
+ 'followup_question': 'What is the function of ATP in the cell?'}
+# Parse the dictionary into a Pydantic object
+pydantic_object = ResponseFormatter.model_validate(ai_msg.tool_calls[0]["args"])
+```
+
+### JSON mode
+
+In addition to tool calling, some model providers support a feature called `JSON mode`. 
+This supports JSON schema definition as input and enforces the model to produce a conforming JSON output.
+You can find a table of model providers that support JSON mode [here](/docs/integrations/chat/).
+Here is an example of how to use JSON mode with OpenAI:
+
+```python
+from langchain_openai import ChatOpenAI
+model = ChatOpenAI(model="gpt-4o", model_kwargs={ "response_format": { "type": "json_object" } })
+ai_msg = model.invoke("Return a JSON object with key 'random_ints' and a value of 10 random ints in [0-99]")
+ai_msg.content
+'\n{\n  "random_ints": [23, 47, 89, 15, 34, 76, 58, 3, 62, 91]\n}'
+```
+
+One important point to flag: the model *still* returns a string, which needs to be parsed into a JSON object.
+This can, of course, simply use the `json` library or a JSON output parser if you need more advanced functionality.
+See this [how-to guide on the JSON output parser](/docs/how_to/output_parser_json) for more details.
+
+```python
+import json
+json_object = json.loads(ai_msg.content)
+{'random_ints': [23, 47, 89, 15, 34, 76, 58, 3, 62, 91]}
+```
+
+## LangChain helper
+
+There a few challenges when producing structured output with the above methods: (1) If using tool calling, tool call arguments needs to be parsed from a dictionary back to the original schema.  
+(2) In addition, the model needs to be instructed to *always* use the tool when we want to enforce structured output, which is a provider specific setting. (3) If using JSON mode, the output needs to be parsed into a JSON object. 
+With these challenges in mind, LangChain provides a helper function (`with_structured_output()`) to streamline the process.
+
+![Diagram of with structured output](/img/with_structured_output.png)
+
+This both binds the schema to the model as a tool and parses the output to the specified output schema. 
+
+```python
+# Bind the schema to the model
+model_with_structure = model.with_structured_output(ResponseFormatter)
+# Invoke the model
+structured_output = model_with_structure.invoke("What is the powerhouse of the cell?")
+# Get back the Pydantic object
+structured_output
+ResponseFormatter(answer="The powerhouse of the cell is the mitochondrion. Mitochondria are organelles that generate most of the cell's supply of adenosine triphosphate (ATP), which is used as a source of chemical energy.", followup_question='What is the function of ATP in the cell?')
+```
+
+TODO: We need to explain the choice of implementation under the hood. Seems to be set with the `method` argument. How is default chosen? What if provider only has JSON mode? What inputs schemas are supported?
+
+
+For more details on usage, see our [how-to guide](/docs/how_to/structured_output/#the-with_structured_output-method).
--- a/docs/docs/concepts/structured_outputs.mdx
+++ b/docs/docs/concepts/structured_outputs.mdx
@@ -0,0 +1,3 @@
+# Structured Outputs
+
+Place holder
--- a/docs/docs/concepts/text_splitters.mdx
+++ b/docs/docs/concepts/text_splitters.mdx
@@ -0,0 +1,135 @@
+# Text Splitters
+<span data-heading-keywords="embedding,embeddings"></span>
+
+:::info[Prerequisites]
+
+* [Documents](/docs/concepts/retrievers/#interface)
+* Tokenization(/docs/concepts/tokens)
+:::
+
+## Overview
+
+Document splitting is often a crucial preprocessing step for many applications.
+It involves breaking down large texts into smaller, manageable chunks.
+This process offers several benefits, such as ensuring consistent processing of varying document lengths, overcoming input size limitations of models, and improving the quality of text representations used in retrieval systems.
+There are several strategies for splitting documents, each with its own advantages.
+
+## Key concepts
+
+![Conceptual Overview](/img/text_splitters.png)
+
+Text splitters split documents into smaller chunks for use in downstream applications.
+
+## Why split documents?
+
+There are several reasons to split documents:
+
+- **Handling non-uniform document lengths**: Real-world document collections often contain texts of varying sizes. Splitting ensures consistent processing across all documents.
+- **Overcoming model limitations**: Many embedding models and language models have maximum input size constraints. Splitting allows us to process documents that would otherwise exceed these limits.
+- **Improving representation quality**: For longer documents, the quality of embeddings or other representations may degrade as they try to capture too much information. Splitting can lead to more focused and accurate representations of each section.
+- **Enhancing retrieval precision**: In information retrieval systems, splitting can improve the granularity of search results, allowing for more precise matching of queries to relevant document sections.
+- **Optimizing computational resources**: Working with smaller chunks of text can be more memory-efficient and allow for better parallelization of processing tasks.
+
+Now, the next question is *how* to split the documents into chunks! There are several strategies, each with its own advantages.
+
+:::info[Further reading]
+* See Greg Kamradt's [chunkviz](https://chunkviz.up.railway.app/) to visualize different splitting strategies discussed below.
+:::
+
+## Approaches
+
+### Length-based
+
+The most intuitive strategy is to split documents based on their length. This simple yet effective approach ensures that each chunk doesn't exceed a specified size limit.
+Key benefits of length-based splitting:
+- Straightforward implementation
+- Consistent chunk sizes
+- Easily adaptable to different model requirements
+
+Types of length-based splitting:
+- **Token-based**: Splits text based on the number of tokens, which is useful when working with language models.
+- **Character-based**: Splits text based on the number of characters, which can be more consistent across different types of text.
+
+Example implementation using LangChain's `CharacterTextSplitter` with token-based splitting:
+
+```python
+from langchain_text_splitters import CharacterTextSplitter
+text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
+    encoding_name="cl100k_base", chunk_size=100, chunk_overlap=0
+)
+texts = text_splitter.split_text(document)
+```
+
+:::info[Further reading]
+
+* See the how-to guide for [token-based](/docs/how_to/split_by_token/) splitting.
+* See the how-to guide for [character-based](/docs/how_to/character_text_splitter/) splitting.
+
+:::
+
+### Text-structured based
+
+Text is naturally organized into hierarchical units such as paragraphs, sentences, and words. 
+We can leverage this inherent structure to inform our splitting strategy, creating split that maintain natural language flow, maintain semantic coherence within split, and adapts to varying levels of text granularity.
+LangChain's [`RecursiveCharacterTextSplitter`](/docs/how_to/recursive_text_splitter/) implements this concept:
+- The `RecursiveCharacterTextSplitter` attempts to keep larger units (e.g., paragraphs) intact.
+- If a unit exceeds the chunk size, it moves to the next level (e.g., sentences).
+- This process continues down to the word level if necessary.
+
+Here is example usage:
+
+```python
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0)
+texts = text_splitter.split_text(document)
+```
+
+:::info[Further reading]
+
+* See the how-to guide for [recursive text splitting](/docs/how_to/recursive_text_splitter/).
+
+:::
+
+### Document-structured based
+
+Some documents have an inherent structure, such as HTML, Markdown, or JSON files. 
+In these cases, it's beneficial to split the document based on its structure, as it often naturally groups semantically related text.
+Key benefits of structure-based splitting:
+- Preserves the logical organization of the document
+- Maintains context within each chunk
+- Can be more effective for downstream tasks like retrieval or summarization
+
+Examples of structure-based splitting:
+- **Markdown**: Split based on headers (e.g., #, ##, ###)
+- **HTML**: Split using tags
+- **JSON**: Split by object or array elements
+- **Code**: Split by functions, classes, or logical blocks
+
+:::info[Further reading]
+
+* See the how-to guide for [Markdown splitting](/docs/how_to/markdown_header_metadata_splitter/).
+* See the how-to guide for [Recursive JSON splitting](/docs/how_to/recursive_json_splitter/).
+* See the how-to guide for [Code splitting](/docs/how_to/code_splitter/).
+* See the how-to guide for [HTML splitting](/docs/how_to/HTML_header_metadata_splitter/).
+
+:::
+
+### Semantic meaning based
+
+Unlike the previous methods, semantic-based splitting actually considers the *content* of the text. 
+While other approaches use document or text structure as proxies for semantic meaning, this method directly analyzes the text's semantics.
+There are several ways to implement this, but conceptually the approach is split text when there are significant changes in text *meaning*.
+As an example, we can use a sliding window approach to generate embeddings, and compare the embeddings to find significant differences:
+
+- Start with the first few sentences and generate an embedding.
+- Move to the next group of sentences and generate another embedding (e.g., using a sliding window approach).
+- Compare the embeddings to find significant differences, which indicate potential "break points" between semantic sections.
+
+This technique helps create chunks that are more semantically coherent, potentially improving the quality of downstream tasks like retrieval or summarization.
+
+:::info[Further reading]
+
+* See the how-to guide for [splitting text based on semantic meaning](/docs/how_to/semantic-chunker/).
+* See Greg Kamradt's [notebook](https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb) showcasing semantic splitting.
+
+:::
--- a/docs/docs/concepts/tokens.mdx
+++ b/docs/docs/concepts/tokens.mdx
@@ -0,0 +1,58 @@
+# Tokens
+
+Modern large language models (LLMs) are typically based on a transformer architecture that processes a sequence of units known as tokens. Tokens are the fundamental elements that models use to break down input and generate output. In this section, we'll discuss what tokens are and how they are used by language models.
+
+## What is a token?
+
+A **token** is the basic unit that a language model reads, processes, and generates. These units can vary based on how the model provider defines them, but in general, they could represent:
+
+* A whole word (e.g., "apple"),
+* A part of a word (e.g., "app"),
+* Or other linguistic components such as punctuation or spaces.
+
+The way the model tokenizes the input depends on its **tokenizer algorithm**, which converts the input into tokens. Similarly, the model’s output comes as a stream of tokens, which is then decoded back into human-readable text.
+
+## How tokens work in language models
+
+The reason language models use tokens is tied to how they understand and predict language. Rather than processing characters or entire sentences directly, language models focus on **tokens**, which represent meaningful linguistic units. Here's how the process works:
+
+1. **Input Tokenization**: When you provide a model with a prompt (e.g., "LangChain is cool!"), the tokenizer algorithm splits the text into tokens. For example, the sentence could be tokenized into parts like `["Lang", "Chain", " is", " cool", "!"]`. Note that token boundaries don’t always align with word boundaries.
+    ![](/img/tokenization.png)
+
+2. **Processing**: The transformer architecture behind these models processes tokens sequentially to predict the next token in a sentence. It does this by analyzing the relationships between tokens, capturing context and meaning from the input.
+3. **Output Generation**: The model generates new tokens one by one. These output tokens are then decoded back into human-readable text.
+
+Using tokens instead of raw characters allows the model to focus on linguistically meaningful units, which helps it capture grammar, structure, and context more effectively.
+
+## Tokens don’t have to be text
+
+Although tokens are most commonly used to represent text, they don’t have to be limited to textual data. Tokens can also serve as abstract representations of **multi-modal data**, such as:
+
+- **Images**,
+- **Audio**,
+- **Video**,
+- And other types of data.
+
+At the time of writing, virtually no models support **multi-modal output**, and only a few models can handle **multi-modal inputs** (e.g., text combined with images or audio). However, as advancements in AI continue, we expect **multi-modality** to become much more common. This would allow models to process and generate a broader range of media, significantly expanding the scope of what tokens can represent and how models can interact with diverse types of data.
+
+:::note
+In principle, **anything that can be represented as a sequence of tokens** could be modeled in a similar way. For example, **DNA sequences**—which are composed of a series of nucleotides (A, T, C, G)—can be tokenized and modeled to capture patterns, make predictions, or generate sequences. This flexibility allows transformer-based models to handle diverse types of sequential data, further broadening their potential applications across various domains, including bioinformatics, signal processing, and other fields that involve structured or unstructured sequences.
+:::
+
+Please see the [multimodality](/docs/concepts/multimodality) section for more information on multi-modal inputs and outputs.
+
+## Why not use characters?
+
+Using tokens instead of individual characters makes models both more efficient and better at understanding context and grammar. Tokens represent meaningful units, like whole words or parts of words, allowing models to capture language structure more effectively than by processing raw characters. Token-level processing also reduces the number of units the model has to handle, leading to faster computation.
+
+In contrast, character-level processing would require handling a much larger sequence of input, making it harder for the model to learn relationships and context. Tokens enable models to focus on linguistic meaning, making them more accurate and efficient in generating responses.
+
+## How tokens correspond to text
+
+Please see this post from [OpenAI](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them) for more details on how tokens are counted and how they correspond to text.
+
+According to the OpenAI post, the approximate token counts for English text are as follows:
+
+* 1 token ~= 4 chars in English
+* 1 token ~= ¾ words
+* 100 tokens ~= 75 words
--- a/docs/docs/concepts/tool_calling.mdx
+++ b/docs/docs/concepts/tool_calling.mdx
@@ -0,0 +1,137 @@
+# Tool Calling
+
+:::info[Prerequisites]
+* [Tools](/docs/concepts/tools)
+* [Chat Models](/docs/concepts/chat_models)
+:::
+
+
+## Overview 
+
+Many AI applications interact directly with humans. In these cases, it is appropriate for models to respond in natural language.
+But what about cases where we want a model to also interact *directly* with systems, such as databases or an API?
+These systems often have a particular input schema; for example, APIs frequently have a required payload structure.
+This need motivates the concept of *tool calling*. You can use [tool calling](https://platform.openai.com/docs/guides/function-calling/example-use-cases) to request model responses that match a particular schema.
+
+:::info
+You will sometimes hear the term `function calling`. We use this term interchangeably with `tool calling`. 
+:::
+
+![Conceptual overview of tool calling](/img/tool_calling_concept.png)
+
+## Key Concepts 
+
+**(1) Tool Creation:** Use the [@tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html) decorator to create a [tool](/docs/concepts/tools). A tool is an association between a function and its schema.
+**(2) Tool Binding:** The tool needs to be connected to a model that supports tool calling. This gives the model awareness of the tool and the associated input schema required by the tool.
+**(3) Tool Calling:** When appropriate, the model can decide to call a tool and ensure its response conforms to the tool's input schema.
+**(4) Tool Execution:** The tool can be executed using the arguments provided by the model.
+
+![Conceptual parts of tool calling](/img/tool_calling_components.png)
+
+## Recommended usage
+
+This psuedo-code illustrates the recommended workflow for using tool calling. 
+Created tools are passed to `.bind_tools()` method as a list.
+This model can be called, as usual. If a tool call is made, model's response will contain the tool call arguments.
+The tool call arguments can be passed directly to the tool.
+
+```python
+# Tool creation
+tools = [my_tool]
+# Tool binding
+modelwtools = model.bind_tools(tools)
+# Tool calling 
+response = modelwtools.invoke(user_input)
+# Tool execution
+tool_output = my_tool(response.tool_calls)
+```
+
+## Tool Creation
+
+The recommended way to create a tool is using the `@tool` decorator.
+
+```python
+from langchain_core.tools import tool
+
+@tool
+def multiply(a: int, b: int) -> int:
+    """Multiply a and b."""
+    return a * b
+```
+
+For more information about tool creation, please see:
+
+* [Conceptual guide on tools](/docs/concepts/tools/)
+* [How to create custom tools](https://python.langchain.com/docs/how_to/custom_tools/)
+
+## Tool Binding 
+
+[Many](https://platform.openai.com/docs/guides/function-calling) [model providers](https://platform.openai.com/docs/guides/function-calling) support tool calling. 
+
+:::tip
+See our [model integration page](/docs/integrations/chat/) for a list of providers that support tool calling.
+:::
+
+The central concept to understand is that LangChain provides a standardized interface for connecting tools to models. 
+The `.bind_tools()` method can be used to specify which tools are available for a model to call. 
+
+```python
+model_with_tools = model.bind_tools([tools_list])
+```
+
+As a specific example, let's take a function `multiply` and bind it as a tool to a model that supports tool calling.
+
+```python
+def multiply(a: int, b: int) -> int:
+    """Multiply a and b.
+
+    Args:
+        a: first int
+        b: second int
+    """
+    return a * b
+
+llm_with_tools = tool_calling_model.bind_tools([multiply])
+```
+
+## Tool Calling
+
+![Diagram of a tool call by a model](/img/tool_call_example.png)
+
+A key principle of tool calling is that the model decides when to use a tool based on the input's relevance. The model doesn't always need to call a tool.
+For example, given an unrelated input, the model would not call the tool:
+
+```python
+result = llm_with_tools.invoke("Hello world!")
+```
+
+The result would be an `AIMessage` containing the model's response in natural language (e.g., "Hello!").
+However, if we pass an input *relevant to the tool*, the model should choose to call it:
+
+```python
+result = llm_with_tools.invoke("What is 2 multiplied by 3?")
+```
+
+As before, the output `result` will be an `AIMessage`. 
+But, if the tool was called, `result` will have a `tool_calls` attribute.
+This attribute includes everything needed to execute the tool, including the tool name and input arguments:
+
+```
+result.tool_calls
+{'name': 'multiply', 'args': {'a': 2, 'b': 3}, 'id': 'xxx', 'type': 'tool_call'}
+```
+
+For more details on usage, see our [how-to guides](/docs/how_to/#tools)!
+
+## Tool execution
+
+## Best practices
+
+When designing tools to be used by a model, it is important to keep in mind that:
+
+* Models that have explicit [tool-calling APIs](/docs/concepts/#functiontool-calling) will be better at tool calling than non-fine-tuned models.
+* Models will perform better if the tools have well-chosen names and descriptions.
+* Simple, narrowly scoped tools are easier for models to use than complex tools.
+* Asking the model to select from a large list of tools poses challenges for the model.
+
+
--- a/docs/docs/concepts/tools.mdx
+++ b/docs/docs/concepts/tools.mdx
@@ -0,0 +1,192 @@
+# Tools
+
+:::info Prerequisites
+- [Chat models](/docs/concepts/chat_models/)
+:::
+
+## Overview
+
+The **tool** abstraction in LangChain associates a python **function** with a **schema** that defines the function's **name**, **description** and **input**. 
+
+**Tools** can be passed to [chat models](/docs/concepts/chat_models) that support [tool calling](/docs/concepts/tool_calling) allowing the model to request the execution of a specific function with specific inputs.
+
+## Key Concepts
+
+- Tools are a way to encapsulate a function and its schema in a way that can be passed to a chat model.
+- Create tools using the [@tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html) decorator, which simplifies the process of tool creation, supporting the following:
+   - Automatically infer the tool's **name**, **description** and **inputs**, while also supporting customization.
+   - Defining tools that return **artifacts** (e.g. images, dataframes, etc.)
+   - Hiding input arguments from the schema (and hence from the model) using **injected tool arguments**.
+
+## Tool Interface
+
+The tool interface is defined in the [BaseTool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html#langchain_core.tools.base.BaseTool) class which is a subclass of the [Runnable Interface](/docs/concepts/runnables).
+
+The key attributes that correspond to the tool's **schema**:
+
+- **name**: The name of the tool.
+- **description**: A description of what the tool does.
+- **args**: Property that returns the JSON schema for the tool's arguments.
+
+The key methods to execute the function associated with the **tool**:
+
+- **invoke**: Invokes the tool with the given arguments.
+- **ainvoke**: Invokes the tool with the given arguments, asynchronously. Used for [async programming with Langchain](/docs/concepts/async).
+
+## Create tools using the `@tool` decorator
+
+The recommended way to create tools is using the [@tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html) decorator. This decorator is designed to simplify the process of tool creation and should be used in most cases. After defining a function, you can decorate it with [@tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html) to create a tool that implements the [Tool Interface](#tool-interface).
+
+```python
+from langchain_core.tools import tool
+
+@tool
+def multiply(a: int, b: int) -> int:
+   """Multiply two numbers."""
+   return a * b
+```
+
+For more details on how to create tools, see the [how to create custom tools](/docs/how_to/custom_tools/) guide.
+
+:::note
+LangChain has a few other ways to create tools; e.g., by sub-classing the [BaseTool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html#langchain_core.tools.base.BaseTool) class or by using `StructuredTool`. These methods are shown in the [how to create custom tools guide](/docs/how_to/custom_tools/), but
+we generally recommend using the `@tool` decorator for most cases.
+:::
+
+## Use the tool directly
+
+Once you have defined a tool, you can use it directly by calling the function. For example, to use the `multiply` tool defined above:
+
+```python
+multiply.invoke({"a": 2, "b": 3})
+```
+
+### Inspect
+
+You can also inspect the tool's schema and other properties:
+
+```python
+print(multiply.name) # multiply
+print(multiply.description) # Multiply two numbers.
+print(multiply.args) 
+# {
+#   'type': 'object', 
+#   'properties': {'a': {'type': 'integer'}, 'b': {'type': 'integer'}}, 
+#   'required': ['a', 'b']
+# }
+```
+
+:::note
+If you're using pre-built LangChain or LangGraph components like [create_react_agent](https://langchain-ai.github.io/langgraph/reference/prebuilt/#langgraph.prebuilt.chat_agent_executor.create_react_agent),you might not need to interact with tools directly. However, understanding how to use them can be valuable for debugging and testing. Additionally, when building custom LangGraph workflows, you may find it necessary to work with tools directly.
+:::
+
+## Configuring the schema
+
+The `@tool` decorator offers additional options to configure the schema of the tool (e.g., modify name, description
+or parse the function's doc-string to infer the schema).
+
+Please see the [API reference for @tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html) for more details and review the [how to create custom tools](/docs/how_to/custom_tools/) guide for examples.
+
+## Tool artifacts
+
+**Tools** are utilities that can be called by a model, and whose outputs are designed to be fed back to a model. Sometimes, however, there are artifacts of a tool's execution that we want to make accessible to downstream components in our chain or agent, but that we don't want to expose to the model itself. For example if a tool returns a custom object, a dataframe or an image, we may want to pass some metadata about this output to the model without passing the actual output to the model. At the same time, we may want to be able to access this full output elsewhere, for example in downstream tools.
+
+```python
+@tool(response_format="content_and_artifact")
+def some_tool(...) -> Tuple[str, Any]:
+    """Tool that does something."""
+    ...
+    return 'Message for chat model', some_artifact 
+```
+
+See [how to return artifacts from tools](/docs/how_to/tool_artifacts/) for more details.
+
+## Special type annotations
+
+There are a number of special type annotations that can be used in the tool's function signature to configure the run time behavior of the tool.
+
+The following type annotations will end up **removing** the argument from the tool's schema. This can be useful for arguments that should not be exposed to the model and that the model should not be able to control.
+
+- **InjectedToolArg**: Value should be injected manually at runtime using `.invoke` or `.ainvoke`.
+- **RunnableConfig**: Pass in the RunnableConfig object to the tool.
+- **InjectedState**: Pass in the overall state of the LangGraph graph to the tool.
+- **InjectedStore**: Pass in the LangGraph store object to the tool.
+
+You can also use the `Annotated` type with a string literal to provide a **description** for the corresponding argument that **WILL** be exposed in the tool's schema.
+
+- **Annotated[..., "string literal"]** -- Adds a description to the argument that will be exposed in the tool's schema.
+
+### InjectedToolArg
+
+There are cases where certain arguments need to be passed to a tool at runtime but should not be generated by the model itself. For this, we use the `InjectedToolArg` annotation, which allows certain parameters to be hidden from the tool's schema.
+
+For example, if a tool requires a `user_id` to be injected dynamically at runtime, it can be structured in this way:
+
+```python
+from langchain_core.tools import tool, InjectedToolArg
+
+@tool
+def user_specific_tool(input_data: str, user_id: InjectedToolArg) -> str:
+    """Tool that processes input data."""
+    return f"User {user_id} processed {input_data}"
+```
+
+Annotating the `user_id` argument with `InjectedToolArg` tells LangChain that this argument should not be exposed as part of the
+tool's schema.
+
+See [how to pass run time values to tools](https://python.langchain.com/docs/how_to/tool_runtime/) for more details on how to use `InjectedToolArg`.  
+
+
+### RunnableConfig
+
+You can use the `RunnableConfig` object to pass custom run time values to tools.
+
+If you need to access the [RunnableConfig](/docs/concepts/runnables/#runnableconfig) object from within a tool. This can be done by using the `RunnableConfig` annotation in the tool's function signature.
+
+```python
+from langchain_core.runnables import RunnableConfig
+
+@tool
+async def some_func(..., config: RunnableConfig) -> ...:
+    """Tool that does something."""
+    # do something with config
+    ...
+
+await some_func.ainvoke(..., config={"configurable": {"value": "some_value"}})
+```
+
+The `config` will not be part of the tool's schema and will be injected at runtime with appropriate values.
+
+:::note
+You may need to access the `config` object to manually propagate it to subclass. This happens if you're working with python 3.9 / 3.10 in an [async](/docs/concepts/async) environment and need to manually propagate the `config` object to sub-calls.
+
+Please read [Propagation RunnableConfig](/docs/concepts/runnables#propagation-runnableconfig) for more details to learn how to propagate the `RunnableConfig` down the call chain manually (or upgrade to Python 3.11 where this is no longer an issue).
+:::
+
+### InjectedState
+
+Please see the [InjectedState](https://langchain-ai.github.io/langgraph/reference/prebuilt/#langgraph.prebuilt.tool_node.InjectedState) documentation for more details.
+
+### InjectedStore
+
+Please see the [InjectedStore](https://langchain-ai.github.io/langgraph/reference/prebuilt/#langgraph.prebuilt.tool_node.InjectedStore) documentation for more details.
+
+## Best practices
+
+When designing tools to be used by models, keep the following in mind:
+
+- Tools that are well-named, correctly-documented and properly type-hinted are easier for models to use.
+- Design simple and narrowly scoped tools, as they are easier for models to use correctly.
+- Use chat models that support [tool-calling](/docs/concepts/tool_calling) APIs to take advantage of tools.
+
+## Related Resources
+
+See the following resources for more information:
+
+- [API Reference for @tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html)
+- [How to create custom tools](https://python.langchain.com/docs/how_to/custom_tools/)
+- [How to pass run time values to tools](https://python.langchain.com/docs/how_to/tool_runtime/)
+- [All LangChain tool how-to guides](https://docs.langchain.com/docs/how_to/#tools)
+- [Additional how-to guides that show usage with LangGraph](https://langchain-ai.github.io/langgraph/how-tos/tool-calling/)
+- Tool integrations, see the [tool integration docs](https://docs.langchain.com/docs/integrations/tools/).
+
--- a/docs/docs/concepts/vectorstores.mdx
+++ b/docs/docs/concepts/vectorstores.mdx
@@ -0,0 +1,186 @@
+# Vector stores
+<span data-heading-keywords="vector,vectorstore,vectorstores,vector store,vector stores"></span>
+
+:::info[Prerequisites]
+
+* [Embeddings](/docs/concepts/embedding_models/)
+* [Text splitters](/docs/concepts/text_splitters/)
+
+:::
+:::info[Note]
+
+This conceptual overview focuses on text-based indexing and retrieval for simplicity. 
+However, embedding models can be [multi-modal](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings)
+and vectorstores can be used to store and retrieve a variety of data types beyond text.
+:::
+
+## Overview
+
+Vectorstores are a powerful and efficient way to index and retrieve unstructured data. 
+They leverage vector [embeddings](/docs/concepts/embedding_models/), which are numerical representations of unstructured data that capture semantic meaning.
+At their core, vectorstores utilize specialized data structures called vector indices. 
+These indices are designed to perform efficient similarity searches over embedding vectors, allowing for rapid retrieval of relevant information based on semantic similarity rather than exact keyword matches.
+
+## Key concept
+
+![Vectorstores](/img/vectorstores.png)
+
+There are [many different types of vectorstores](/docs/integrations/vectorstores/).
+LangChain provides a universal interface for working with them, providing standard methods for common operations.
+
+## Adding documents
+
+Using [Pinecone](https://python.langchain.com/api_reference/pinecone/vectorstores/langchain_pinecone.vectorstore.PineconeVectorStore.html#langchain_pinecone.vectorstores.PineconeVectorStore) as an example, we initialize a vectorstore with the [embedding](/docs/concepts/embedding_models/) model we want to use:
+
+```python
+from pinecone import Pinecone
+from langchain_openai import OpenAIEmbeddings
+from langchain_pinecone import PineconeVectorStore
+
+# Initialize Pinecone
+pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
+
+# Initialize with an embedding model
+vector_store = PineconeVectorStore(index=pc.Index(index_name), embedding=OpenAIEmbeddings())
+```
+
+Given a vectorstore, we need the ability to add documents to it.
+The `add_texts` and `add_documents` methods can be used to add texts (strings) and documents (LangChain [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects) to a vectorstore, respectively.
+As an example, we can create a list of [Documents](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html).
+`Document` objects all have `page_content` and `metadata` attributes, making them a universal way to store unstructured text and associated metadata.
+
+```python
+from langchain_core.documents import Document
+document_1 = Document(
+    page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
+    metadata={"source": "tweet"},
+)
+
+document_2 = Document(
+    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
+    metadata={"source": "news"},
+)
+documents = [document_1, document_2]
+```
+
+When we use the `add_documents` method to add the documents to the vectorstore, the vectorstore will use the provided embedding model to create an embedding of each document. 
+What happens if we add the same document twice?
+Many vectorstores support [`upsert`](https://docs.pinecone.io/guides/data/upsert-data) functionality, which combines the functionality of inserting and updating records.
+To use this, we simply supply a unique identifier for each document when we add it to the vectorstore using `add_documents` or `add_texts`.
+If the record doesn't exist, it inserts a new record.
+If the record already exists, it updates the existing record.
+
+```python
+# Given a list of documents and a vector store
+uuids = [str(uuid4()) for _ in range(len(documents))]
+vector_store.add_documents(documents=documents, ids=uuids)
+```
+
+:::info[Further reading]
+
+* See the [full list of LangChain vectorstore integrations](/docs/integrations/vectorstores/).
+* See Pinecone's [documentation](https://docs.pinecone.io/guides/data/upsert-data) on the `upsert` method.
+
+:::
+
+## Search
+
+Vectorstores embed and store the documents that added.
+If we pass in a query, the vectorstore will embed the query, perform a similarity search over the embedded documents, and return the most similar ones.
+This captures two important concepts: first, there needs to be a way to measure the similarity between the query and *any* [embedded](/docs/concepts/embedding_models/) document.
+Second, there needs to be an algorithm to efficiently perform this similarity search across *all* embedded documents.
+
+### Similarity metrics
+
+A critical advantage of embeddings vectors is they can be compared using many simple mathematical operations:
+
+- **Cosine Similarity**: Measures the cosine of the angle between two vectors.
+- **Euclidean Distance**: Measures the straight-line distance between two points.
+- **Dot Product**: Measures the projection of one vector onto another.
+
+The choice of similarity metric can sometimes be selected when initializing the vectorstore.
+As an example, Pinecone allows the user to select the [similarity metric on index creation](/docs/integrations/vectorstores/pinecone/#initialization).
+
+```python
+pc.create_index(
+    name=index_name,
+    dimension=3072,
+    metric="cosine",
+)
+```
+
+:::info[Further reading]
+
+* See [this documentation](https://developers.google.com/machine-learning/clustering/dnn-clustering/supervised-similarity) from Google on similarity metrics to consider with embeddings.
+* See Pinecone's [blog post](https://www.pinecone.io/learn/vector-similarity/) on similarity metrics.
+* See OpenAI's [FAQ](https://platform.openai.com/docs/guides/embeddings/faq) on what similarity metric to use with OpenAI embeddings.
+
+:::
+
+### Similarity Search
+
+Given a similarity metric to measure the distance between the embedded query and any embedded document, we need an algorithm to efficiently search over *all* the embedded documents to find the most similar ones.
+There are various ways to do this. As an example, many vectorstores implement [HNSW (Hierarchical Navigable Small World)](https://www.pinecone.io/learn/series/faiss/hnsw/), a graph-based index structure that allows for efficient similarity search.
+Regardless of the search algorithm used under the hood, the LangChain vectorstore interface has a `similarity_search` method for all integrations. 
+This will take the search query, create an embedding, find similar documents, and return them as a list of [Documents](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html).
+
+```python
+query = "my query"
+docs = vectorstore.similarity_search(query)
+```
+
+Many vectorstores support search parameters to be passed with the `similarity_search` method. See the documentation for the specific vectorstore you are using to see what parameters are supported.
+As an example [Pinecone](https://python.langchain.com/api_reference/pinecone/vectorstores/langchain_pinecone.vectorstores.PineconeVectorStore.html#langchain_pinecone.vectorstores.PineconeVectorStore.similarity_search) several parameters that are important general concepts:
+Many vectorstores support [the `k`](/docs/integrations/vectorstores/pinecone/#query-directly), which controls the number of Documents to return, and `filter`, which allows for filtering documents by metadata.
+
+- `query (str) – Text to look up documents similar to.`
+- `k (int) – Number of Documents to return. Defaults to 4.`
+- `filter (dict | None) – Dictionary of argument(s) to filter on metadata`
+
+:::info[Further reading]
+
+* See the [how-to guide](/docs/how_to/vectorstores/) for more details on how to use the `similarity_search` method.
+* See the [integrations page](/docs/integrations/vectorstores/) for more details on arguments that can be passed in to the `similarity_search` method for specific vectorstores.
+
+:::
+
+### Metadata filtering
+
+While vectorstore implement a search algorithm to efficiently search over *all* the embedded documents to find the most similar ones, many also support filtering on metadata.
+This allows structured filters to reduce the size of the similarity search space. These two concepts work well together:
+
+1. **Semantic search**: Query the unstructured data directly, often using via embedding or keyword similarity.
+2. **Metadata search**: Apply structured query to the metadata, filering specific documents.
+
+Vectorstore support for metadata filtering is typically dependent on the underlying vector store implementation.
+Here is example usage with [Pinecone](/docs/integrations/vectorstores/pinecone/#query-directly), showing that we filter for all documents that have the metadata key `source` with value `tweet`.
+
+```python
+results = vectorstore.similarity_search(
+    "LangChain provides abstractions to make working with LLMs easy",
+    k=2,
+    filter={"source": "tweet"},
+)
+```  
+
+:::info[Further reading]
+
+* See Pinecone's [documentation](https://docs.pinecone.io/guides/data/filter-with-metadata) on filtering with metadata.
+* See the [list of LangChain vectorstore integrations](/docs/integrations/retrievers/self_query/) that support metadata filtering.
+
+:::
+
+## Advanced search and retrieval techniques
+
+While algorithms like HNSW provide the foundation for efficient similarity search in many cases, additional techniques can be employed to improve search quality and diversity.
+For example, [maximal marginal relevance](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/example_selectors/mmr/) is a re-ranking algorithm used to diversify search results, which is applied after the initial similarity search to ensure a more diverse set of results.
+As a second example, some [vector stores](/docs/integrations/retrievers/pinecone_hybrid_search/) offer built-in [hybrid-search](https://docs.pinecone.io/guides/data/understanding-hybrid-search) to combine keyword and semantic similarity search, which marries the benefits of both approaches. 
+At the moment, there is no unified way to perform hybrid search using LangChain vectorstores, but it is generally exposed as a keyword argument that is passed in with `similarity_search`.
+See this [how-to guide on hybrid search](/docs/how_to/hybrid/) for more details.
+
+| Name                                                                                                              | When to use                                           | Description                                                                                                                                  |
+|-------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
+| [Hybrid search](/docs/integrations/retrievers/pinecone_hybrid_search/)                                            | When combining keyword-based and semantic similarity. | Hybrid search combines keyword and semantic similarity, marrying the benefits of both approaches. [Paper](https://arxiv.org/abs/2210.11934). |
+| [Maximal Marginal Relevance (MMR)](/docs/integrations/vectorstores/pinecone/#maximal-marginal-relevance-searches) | When needing to diversify search results.             | MMR attempts to diversify the results of a search to avoid returning similar and redundant documents.                                        |
+
+ 
--- a/docs/docs/concepts/why_langchain.mdx
+++ b/docs/docs/concepts/why_langchain.mdx
--- a/docs/docs/integrations/chat/groq.ipynb
+++ b/docs/docs/integrations/chat/groq.ipynb
@@ -17,7 +17,7 @@
   "source": [
    "# ChatGroq\n",
    "\n",
-    "This will help you getting started with Groq [chat models](../../concepts.mdx#chat-models). For detailed documentation of all ChatGroq features and configurations head to the [API reference](https://python.langchain.com/api_reference/groq/chat_models/langchain_groq.chat_models.ChatGroq.html). For a list of all Groq models, visit this [link](https://console.groq.com/docs/models).\n",
+    "This will help you getting started with Groq [chat models](../../concepts/chat_models.mdx). For detailed documentation of all ChatGroq features and configurations head to the [API reference](https://python.langchain.com/api_reference/groq/chat_models/langchain_groq.chat_models.ChatGroq.html). For a list of all Groq models, visit this [link](https://console.groq.com/docs/models).\n",
    "\n",
    "## Overview\n",
    "### Integration details\n",
--- a/docs/docs/integrations/chat/together.ipynb
+++ b/docs/docs/integrations/chat/together.ipynb
@@ -18,7 +18,7 @@
    "# ChatTogether\n",
    "\n",
    "\n",
-    "This page will help you get started with Together AI [chat models](../../concepts.mdx#chat-models). For detailed documentation of all ChatTogether features and configurations head to the [API reference](https://python.langchain.com/api_reference/together/chat_models/langchain_together.chat_models.ChatTogether.html).\n",
+    "This page will help you get started with Together AI [chat models](../../concepts/chat_models.mdx). For detailed documentation of all ChatTogether features and configurations head to the [API reference](https://python.langchain.com/api_reference/together/chat_models/langchain_together.chat_models.ChatTogether.html).\n",
    "\n",
    "[Together AI](https://www.together.ai/) offers an API to query [50+ leading open-source models](https://docs.together.ai/docs/chat-models)\n",
    "\n",
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@@ -47,7 +47,17 @@ module.exports = {
        className: 'hidden',
      }],
    },
-    "concepts",
+    {
+      type: "category",
+      link: {type: 'doc', id: 'concepts/index'},
+      label: "Conceptual Guide",
+      collapsible: false,
+      items: [{
+        type: 'autogenerated',
+        dirName: 'concepts',
+        className: 'hidden',
+      }],
+    },
    {
      type: "category",
      label: "Ecosystem",
--- a/docs/static/img/embeddings_colbert.png
+++ b/docs/static/img/embeddings_colbert.png
--- a/docs/static/img/embeddings_concept.png
+++ b/docs/static/img/embeddings_concept.png
--- a/docs/static/img/rag_concepts.png
+++ b/docs/static/img/rag_concepts.png
--- a/docs/static/img/retrieval_concept.png
+++ b/docs/static/img/retrieval_concept.png
--- a/docs/static/img/retrieval_high_level.png
+++ b/docs/static/img/retrieval_high_level.png
--- a/docs/static/img/retriever_concept.png
+++ b/docs/static/img/retriever_concept.png
--- a/docs/static/img/retriever_full_docs.png
+++ b/docs/static/img/retriever_full_docs.png
--- a/docs/static/img/structured_output.png
+++ b/docs/static/img/structured_output.png
--- a/docs/static/img/text_splitters.png
+++ b/docs/static/img/text_splitters.png
--- a/docs/static/img/tool_call_example.png
+++ b/docs/static/img/tool_call_example.png
--- a/docs/static/img/tool_calling_agent.png
+++ b/docs/static/img/tool_calling_agent.png
--- a/docs/static/img/tool_calling_components.png
+++ b/docs/static/img/tool_calling_components.png
--- a/docs/static/img/tool_calling_concept.png
+++ b/docs/static/img/tool_calling_concept.png
--- a/docs/static/img/vectorstores.png
+++ b/docs/static/img/vectorstores.png
--- a/docs/static/img/with_structured_output.png
+++ b/docs/static/img/with_structured_output.png
Author	SHA1	Message	Date
Eugene Yurtsev	8a41a62bb6	x	2024-10-21 17:17:11 -04:00
Eugene Yurtsev	aa0b25cb2a	docs: fix some typos (#27519 ) * Fix some typos * Add some missing links	2024-10-21 16:00:31 -04:00
Eugene Yurtsev	13f7d2d58d	tools concept (#27482 ) Add tools conceptual page	2024-10-21 15:34:21 -04:00
Lance Martin	8484c23c72	Address comments, minor cleanups (#27475 )	2024-10-21 08:10:16 -07:00
Lance Martin	8033cae96a	First draft of concept pages (#27088 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-10-18 12:52:52 -07:00
Eugene Yurtsev	127ac819fc	Docs: conceptual docs batch 1 (#27173 ) Re-organizing some of the content involves runnables/lcel/streaming into conceptual guides. Conceptual guides added: - [x] Runnables - [x] LCEL - [x] Chat Models - [x] LLM - [x] async - [x] Messages - [x] Chat History - [x] Multimodality - [x] Tokenization Outstanding: - [ ] Callbacks/Tracers - [ ] Streaming - [ ] Tool Creation - [ ] Document Loading Other conceptual guides are placeholders to make sure that no existing links breaks. Some high level re-organization: * Introduce the Runnable interface prior to LCEL (since those are two distinct concepts) * Cross-link as much related content as possible (including how-to guides)	2024-10-18 14:59:53 -04:00
Eugene Yurtsev	046f6a5544	concepts docs: archictecture individual page (#27290 ) Update architecture page	2024-10-11 16:06:57 -04:00
Eugene Yurtsev	f8ce6210be	concept docs: add scaffold (#27277 ) Starting to structure the scaffold for the concepts. Moving concept content into their own pages. TBD what we'll end up doing with the actual concepts page in terms of visual layout.	2024-10-11 15:50:37 -04:00
Eugene Yurtsev	8d9ef40118	docs: move concepts into a separate directory (#27171 ) Move concepts into a separate directory	2024-10-07 15:19:38 -04:00
Eugene Yurtsev	13646282bd	Merge branch 'master' into concept_docs	2024-10-07 15:15:43 -04:00
Eugene Yurtsev	c661ffe813	x	2024-10-02 13:54:21 -04:00