Docs: Re-organize conceptual docs (#27047)

Reorganization of conceptual documentation --------- Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com> Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2026-01-04 23:47:36 +00:00 · 2024-10-22 22:08:20 -04:00
parent 6d2a76ac05
commit f2dbf01d4a
52 changed files with 3621 additions and 1394 deletions
--- a/docs/docs/concepts.mdx
+++ b/docs/docs/concepts.mdx
--- a/docs/docs/concepts/agents.mdx
+++ b/docs/docs/concepts/agents.mdx
@@ -0,0 +1,25 @@
+# Agents
+
+By themselves, language models can't take actions - they just output text. Agents are systems that take a high-level task and use an LLM as a reasoning engine to decide what actions to take and execute those actions.
+
+[LangGraph](/docs/concepts/architecture#langgraph) is an extension of LangChain specifically aimed at creating highly controllable and customizable agents. We recommend that you use LangGraph for building agents.
+
+Please see the following resources for more information:
+
+* LangGraph docs on [common agent architectures](https://langchain-ai.github.io/langgraph/concepts/agentic_concepts/)
+* [Pre-built agents in LangGraph](https://langchain-ai.github.io/langgraph/reference/prebuilt/#langgraph.prebuilt.chat_agent_executor.create_react_agent)
+
+## Legacy agent concept: AgentExecutor
+
+LangChain previously introduced the `AgentExecutor` as a runtime for agents. 
+While it served as an excellent starting point, its limitations became apparent when dealing with more sophisticated and customized agents. 
+As a result, we're gradually phasing out `AgentExecutor` in favor of more flexible solutions in LangGraph.
+
+### Transitioning from AgentExecutor to langgraph
+
+If you're currently using `AgentExecutor`, don't worry! We've prepared resources to help you:
+
+1. For those who still need to use `AgentExecutor`, we offer a comprehensive guide on [how to use AgentExecutor](/docs/how_to/agent_executor).
+
+2. However, we strongly recommend transitioning to LangGraph for improved flexibility and control. To facilitate this transition, we've created a detailed [migration guide](/docs/how_to/migrate_agent) to help you move from `AgentExecutor` to LangGraph seamlessly.
+
--- a/docs/docs/concepts/architecture.mdx
+++ b/docs/docs/concepts/architecture.mdx
@@ -0,0 +1,78 @@
+import ThemedImage from '@theme/ThemedImage';
+import useBaseUrl from '@docusaurus/useBaseUrl';
+
+# Architecture
+
+LangChain as a framework consists of a number of packages.
+
+<ThemedImage
+    alt="Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers."
+    sources={{
+        light: useBaseUrl('/svg/langchain_stack_062024.svg'),
+        dark: useBaseUrl('/svg/langchain_stack_062024_dark.svg'),
+    }}
+    title="LangChain Framework Overview"
+    style={{ width: "100%" }}
+/>
+
+
+## langchain-core
+
+This package contains base abstractions of different components and ways to compose them together.
+The interfaces for core components like LLMs, vector stores, retrievers and more are defined here.
+No third party integrations are defined here.
+The dependencies are kept purposefully very lightweight.
+
+## langchain
+
+The main `langchain` package contains chains, agents, and retrieval strategies that make up an application's cognitive architecture.
+These are NOT third party integrations.
+All chains, agents, and retrieval strategies here are NOT specific to any one integration, but rather generic across all integrations.
+
+## langchain-community
+
+This package contains third party integrations that are maintained by the LangChain community.
+Key partner packages are separated out (see below).
+This contains all integrations for various components (LLMs, vector stores, retrievers).
+All dependencies in this package are optional to keep the package as lightweight as possible.
+
+## Partner packages
+
+While the long tail of integrations is in `langchain-community`, we split popular integrations into their own packages (e.g. `langchain-openai`, `langchain-anthropic`, etc). This was done in order to improve support for these important integrations.
+
+For more information see:
+
+* A list [LangChain integrations](/docs/integrations/providers/)
+* The [LangChain API Reference](https://python.langchain.com/api_reference/) where you can find detailed information about the API reference of each partner package.
+
+## LangGraph
+
+`langgraph` is an extension of `langchain` aimed at building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
+
+LangGraph exposes high level interfaces for creating common types of agents, as well as a low-level API for composing custom flows.
+
+:::info[Further reading]
+
+* See our LangGraph overview [here](https://langchain-ai.github.io/langgraph/concepts/high_level/#core-principles).
+* See our LangGraph Academy Course [here](https://academy.langchain.com/courses/intro-to-langgraph).
+
+:::
+
+## LangServe
+
+A package to deploy LangChain chains as REST APIs. Makes it easy to get a production ready API up and running.
+
+:::important
+LangServe is designed to primarily deploy simple Runnables and work with well-known primitives in langchain-core.
+
+If you need a deployment option for LangGraph, you should instead be looking at LangGraph Cloud (beta) which will be better suited for deploying LangGraph applications.
+:::
+
+For more information, see the [LangServe documentation](/docs/langserve).
+
+
+## LangSmith
+
+A developer platform that lets you debug, test, evaluate, and monitor LLM applications.
+
+For more information, see the [LangSmith documentation](https://docs.smith.langchain.com)
--- a/docs/docs/concepts/async.mdx
+++ b/docs/docs/concepts/async.mdx
@@ -0,0 +1,81 @@
+# Async programming with langchain
+
+:::info Prerequisites
+* [Runnable interface](/docs/concepts/runnables)
+* [asyncio](https://docs.python.org/3/library/asyncio.html)
+:::
+
+LLM based applications often involve a lot of I/O-bound operations, such as making API calls to language models, databases, or other services. Asynchronous programming (or async programming) is a paradigm that allows a program to perform multiple tasks concurrently without blocking the execution of other tasks, improving efficiency and responsiveness, particularly in I/O-bound operations.
+
+:::note
+You are expected to be familiar with asynchronous programming in Python before reading this guide. If you are not, please find appropriate resources online to learn how to program asynchronously in Python.
+This guide specifically focuses on what you need to know to work with LangChain in an asynchronous context, assuming that you are already familiar with asynch
+:::
+
+## Langchain asynchronous apis
+
+Many LangChain APIs are designed to be asynchronous, allowing you to build efficient and responsive applications.
+
+Typically, any method that may perform I/O operations (e.g., making API calls, reading files) will have an asynchronous counterpart.
+
+In LangChain, async implementations are located in the same classes as their synchronous counterparts, with the asynchronous methods having an "a" prefix. For example, the synchronous `invoke` method has an asynchronous counterpart called `ainvoke`.
+
+Many components of LangChain implement the [Runnable Interface](/docs/concepts/runnables), which includes support for asynchronous execution. This means that you can run Runnables asynchronously using the `await` keyword in Python.
+
+```python
+await some_runnable.ainvoke(some_input)
+```
+
+Other components like [Embedding Models](/docs/concepts/embedding_models) and [VectorStore](/docs/concepts/vectorstores) that do not implement the [Runnable Interface](/docs/concepts/runnables) usually still follow the same rule and include the asynchronous version of method in the same class with an "a" prefix.
+
+For example,
+
+```python
+await some_vectorstore.aadd_documents(documents)
+```
+
+Runnables created using the [LangChain Expression Language (LCEL)](/docs/concepts/lcel) can also be run asynchronously as they implement
+the full [Runnable Interface](/docs/concepts/runnables).
+
+Fore more information, please review the [API reference](https://python.langchain.com/api_reference/) for the specific component you are using.
+
+## Delegation to sync methods
+
+Most popular LangChain integrations implement asynchronous support of their APIs. For example, the `ainvoke` method of many ChatModel implementations uses the `httpx.AsyncClient` to make asynchronous HTTP requests to the model provider's API.
+
+When an asynchronous implementation is not available, LangChain tries to provide a default implementation, even if it incurs
+a **slight** overhead.
+
+By default, LangChain will delegate the execution of a unimplemented asynchronous methods to the synchronous counterparts. LangChain almost always assumes that the synchronous method should be treated as a blocking operation and should be run in a separate thread.
+This is done using [asyncio.loop.run_in_executor](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor) functionality provided by the `asyncio` library. LangChain uses the default executor provided by the `asyncio` library, which lazily initializes a thread pool executor with a default number of threads that is reused in the given event loop. While this strategy incurs a slight overhead due to context switching between threads, it guarantees that every asynchronous method has a default implementation that works out of the box.
+
+## Performance
+
+Async code in LangChain should generally perform relatively well with minimal overhead out of the box, and is unlikely
+to be a bottleneck in most applications.
+
+The two main sources of overhead are:
+
+1. Cost of context switching between threads when [delegating to synchronous methods](#delegation-to-sync-methods). This can be addressed by providing a native asynchronous implementation.
+2. In [LCEL](/docs/concepts/lcel) any "cheap functions" that appear as part of the chain will be either scheduled as tasks on the event loop (if they are async) or run in a separate thread (if they are sync), rather than just be run inline.
+
+The latency overhead you should expect from these is between tens of microseconds to a few milliseconds.
+
+A more common source of performance issues arises from users accidentally blocking the event loop by calling synchronous code in an async context (e.g., calling `invoke` rather than `ainvoke`).
+
+## Compatibility
+
+LangChain is only compatible with the `asyncio` library, which is distributed as part of the Python standard library. It will not work with other async libraries like `trio` or `curio`.
+
+In Python 3.9 and 3.10, [asyncio's tasks](https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task) did not
+accept a `context` parameter. Due to this limitation, LangChain cannot automatically propagate the `RunnableConfig` down the call chain
+in certain scenarios.
+
+If you are experiencing issues with streaming, callbacks or tracing in async code and are using Python 3.9 or 3.10, this is a likely cause.
+
+Please read [Propagation RunnableConfig](/docs/concepts/runnables#propagation-RunnableConfig) for more details to learn how to propagate the `RunnableConfig` down the call chain manually (or upgrade to Python 3.11 where this is no longer an issue).
+
+## How to use in ipython and jupyter notebooks
+
+As of IPython 7.0, IPython supports asynchronous REPLs. This means that you can use the `await` keyword in the IPython REPL and Jupyter Notebooks without any additional setup. For more information, see the [IPython blog post](https://blog.jupyter.org/ipython-7-0-async-repl-a35ce050f7f7).
+
--- a/docs/docs/concepts/callbacks.mdx
+++ b/docs/docs/concepts/callbacks.mdx
@@ -0,0 +1,73 @@
+# Callbacks
+
+:::note Prerequisites
+- [Runnable interface](/docs/concepts/#runnable-interface)
+:::
+
+LangChain provides a callbacks system that allows you to hook into the various stages of your LLM application. This is useful for logging, monitoring, streaming, and other tasks.
+
+You can subscribe to these events by using the `callbacks` argument available throughout the API. This argument is list of handler objects, which are expected to implement one or more of the methods described below in more detail.
+
+## Callback events
+
+| Event            | Event Trigger                               | Associated Method     |
+|------------------|---------------------------------------------|-----------------------|
+| Chat model start | When a chat model starts                    | `on_chat_model_start` |
+| LLM start        | When a llm starts                           | `on_llm_start`        |
+| LLM new token    | When an llm OR chat model emits a new token | `on_llm_new_token`    |
+| LLM ends         | When an llm OR chat model ends              | `on_llm_end`          |
+| LLM errors       | When an llm OR chat model errors            | `on_llm_error`        |
+| Chain start      | When a chain starts running                 | `on_chain_start`      |
+| Chain end        | When a chain ends                           | `on_chain_end`        |
+| Chain error      | When a chain errors                         | `on_chain_error`      |
+| Tool start       | When a tool starts running                  | `on_tool_start`       |
+| Tool end         | When a tool ends                            | `on_tool_end`         |
+| Tool error       | When a tool errors                          | `on_tool_error`       |
+| Agent action     | When an agent takes an action               | `on_agent_action`     |
+| Agent finish     | When an agent ends                          | `on_agent_finish`     |
+| Retriever start  | When a retriever starts                     | `on_retriever_start`  |
+| Retriever end    | When a retriever ends                       | `on_retriever_end`    |
+| Retriever error  | When a retriever errors                     | `on_retriever_error`  |
+| Text             | When arbitrary text is run                  | `on_text`             |
+| Retry            | When a retry event is run                   | `on_retry`            |
+
+## Callback handlers
+
+Callback handlers can either be `sync` or `async`:
+
+* Sync callback handlers implement the [BaseCallbackHandler](https://python.langchain.com/api_reference/core/callbacks/langchain_core.callbacks.base.BaseCallbackHandler.html) interface.
+* Async callback handlers implement the [AsyncCallbackHandler](https://python.langchain.com/api_reference/core/callbacks/langchain_core.callbacks.base.AsyncCallbackHandler.html) interface.
+
+During run-time LangChain configures an appropriate callback manager (e.g., [CallbackManager](https://python.langchain.com/api_reference/core/callbacks/langchain_core.callbacks.manager.CallbackManager.html) or [AsyncCallbackManager](https://python.langchain.com/api_reference/core/callbacks/langchain_core.callbacks.manager.AsyncCallbackManager.html) which will be responsible for calling the appropriate method on each "registered" callback handler when the event is triggered.
+
+## Passing callbacks
+
+The `callbacks` property is available on most objects throughout the API (Models, Tools, Agents, etc.) in two different places:
+
+- **Request time callbacks**: Passed at the time of the request in addition to the input data.
+Available on all standard `Runnable` objects. These callbacks are INHERITED by all children
+of the object they are defined on. For example, `chain.invoke({"number": 25}, {"callbacks": [handler]})`.
+- **Constructor callbacks**: `chain = TheNameOfSomeChain(callbacks=[handler])`. These callbacks
+are passed as arguments to the constructor of the object. The callbacks are scoped
+only to the object they are defined on, and are **not** inherited by any children of the object.
+
+:::warning
+Constructor callbacks are scoped only to the object they are defined on. They are **not** inherited by children
+of the object.
+:::
+
+If you're creating a custom chain or runnable, you need to remember to propagate request time
+callbacks to any child objects.
+
+:::important Async in Python&lt;=3.10
+
+Any `RunnableLambda`, a `RunnableGenerator`, or `Tool` that invokes other runnables
+and is running `async` in python&lt;=3.10, will have to propagate callbacks to child
+objects manually. This is because LangChain cannot automatically propagate
+callbacks to child objects in this case.
+
+This is a common reason why you may fail to see events being emitted from custom
+runnables or tools.
+:::
+
+For specifics on how to use callbacks, see the [relevant how-to guides here](/docs/how_to/#callbacks).
--- a/docs/docs/concepts/chat_history.mdx
+++ b/docs/docs/concepts/chat_history.mdx
@@ -0,0 +1,46 @@
+# Chat history
+
+:::info Prerequisites
+
+- [Messages](/docs/concepts/messages)
+- [Chat models](/docs/concepts/chat_models)
+- [Tool calling](/docs/concepts/tool_calling)
+:::
+
+Chat history is a record of the conversation between the user and the chat model. It is used to maintain context and state throughout the conversation. The chat history is sequence of [messages](/docs/concepts/messages), each of which is associated with a specific [role](/docs/concepts/messages#role), such as "user", "assistant", "system", or "tool".
+
+## Conversation patterns
+
+![Conversation patterns](/img/conversation_patterns.png)
+
+Most conversations start with a **system message** that sets the context for the conversation. This is followed by a **user message** containing the user's input, and then an **assistant message** containing the model's response.
+
+The **assistant** may respond directly to the user or if configured with tools request that a [tool](/docs/concepts/tool_calling) be invoked to perform a specific task.
+
+So a full conversation often involves a combination of two patterns of alternating messages:
+
+1. The **user** and the **assistant** representing a back-and-forth conversation.
+2. The **assistant** and **tool messages** representing an ["agentic" workflow](/docs/concepts/agents) where the assistant is invoking tools to perform specific tasks.
+
+## Managing chat history
+
+Since chat models have a maximum limit on input size, it's important to manage chat history and trim it as needed to avoid exceeding the [context window](/docs/concepts/chat_models#context_window).
+
+While processing chat history, it's essential to preserve a correct conversation structure. 
+
+Key guidelines for managing chat history:
+
+- The conversation should follow one of these structures:
+    - The first message is either a "user" message or a "system" message, followed by a "user" and then an "assistant" message.
+    - The last message should be either a "user" message or a "tool" message containing the result of a tool call.
+- When using [tool calling](/docs/concepts/tool_calling), a "tool" message should only follow an "assistant" message that requested the tool invocation.
+
+:::tip
+Understanding correct conversation structure is essential for being able to properly implement
+[memory](https://langchain-ai.github.io/langgraph/concepts/memory/) in chat models.
+:::
+
+## Related resources
+
+- [How to trim messages](https://python.langchain.com/docs/how_to/trim_messages/)
+- [Memory guide](https://langchain-ai.github.io/langgraph/concepts/memory/) for information on implementing short-term and long-term memory in chat models using [LangGraph](https://langchain-ai.github.io/langgraph/).
--- a/docs/docs/concepts/chat_models.mdx
+++ b/docs/docs/concepts/chat_models.mdx
@@ -0,0 +1,168 @@
+# Chat models
+
+## Overview
+
+Large Language Models (LLMs) are advanced machine learning models that excel in a wide range of language-related tasks such as text generation, translation, summarization, question answering, and more, without needing task-specific tuning for every scenario.
+
+Modern LLMs are typically accessed through a chat model interface that takes a list of [messages](/docs/concepts/messages) as input and returns a [message](/docs/concepts/messages) as output.
+
+The newest generation of chat models offer additional capabilities:
+
+* [Tool calling](/docs/concepts#tool-calling): Many popular chat models offer a native [tool calling](/docs/concepts#tool-calling) API. This API allows developers to build rich applications that enable AI to interact with external services, APIs, and databases. Tool calling can also be used to extract structured information from unstructured data and perform various other tasks.
+* [Structured output](/docs/concepts/structured_outputs): A technique to make a chat model respond in a structured format, such as JSON that matches a given schema.
+* [Multimodality](/docs/concepts/multimodality): The ability to work with data other than text; for example, images, audio, and video.
+
+## Features
+
+LangChain provides a consistent interface for working with chat models from different providers while offering additional features for monitoring, debugging, and optimizing the performance of applications that use LLMs.
+
+* Integrations with many chat model providers (e.g., Anthropic, OpenAI, Ollama, Microsoft Azure, Google Vertex, Amazon Bedrock, Hugging Face, Cohere, Groq). Please see [chat model integrations](/docs/integrations/chat/) for an up-to-date list of supported models.
+* Use either LangChain's [messages](/docs/concepts/messages) format or OpenAI format.
+* Standard [tool calling API](/docs/concepts#tool-calling): standard interface for binding tools to models, accessing tool call requests made by models, and sending tool results back to the model.
+* Standard API for structuring outputs (/docs/concepts/structured_outputs) via the `with_structured_output` method.
+* Provides support for [async programming](/docs/concepts/async), [efficient batching](/docs/concepts/runnables#batch), [a rich streaming API](/docs/concepts/streaming).
+* Integration with [LangSmith](https://docs.smith.langchain.com) for monitoring and debugging production-grade applications based on LLMs.
+* Additional features like standardized [token usage](/docs/concepts/messages#token_usage), [rate limiting](#rate-limiting), [caching](#cache) and more.
+
+## Integrations
+
+LangChain has many chat model integrations that allow you to use a wide variety of models from different providers.
+
+These integrations are one of two types:
+
+1. **Official models**: These are models that are officially supported by LangChain and/or model provider. You can find these models in the `langchain-<provider>` packages.
+2. **Community models**: There are models that are mostly contributed and supported by the community. You can find these models in the `langchain-community` package.
+
+LangChain chat models are named with a convention that prefixes "Chat" to their class names (e.g., `ChatOllama`, `ChatAnthropic`, `ChatOpenAI`, etc.).
+
+Please review the [chat model integrations](/docs/integrations/chat/) for a list of supported models.
+
+:::note
+Models that do **not** include the prefix "Chat" in their name or include "LLM" as a suffix in their name typically refer to older models that do not follow the chat model interface and instead use an interface that takes a string as input and returns a string as output.
+:::
+
+
+## Interface
+
+LangChain chat models implement the [BaseChatModel](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html) interface. Because [BaseChatModel] also implements the [Runnable Interface](/docs/concepts/runnables), chat models support a [standard streaming interface](/docs/concepts/streaming), [async programming](/docs/concepts/async), optimized [batching](/docs/concepts/runnables#batch), and more. Please see the [Runnable Interface](/docs/concepts/runnables) for more details.
+
+Many of the key methods of chat models operate on [messages](/docs/concepts/messages) as input and return messages as output.
+
+Chat models offer a standard set of parameters that can be used to configure the model. These parameters are typically used to control the behavior of the model, such as the temperature of the output, the maximum number of tokens in the response, and the maximum time to wait for a response. Please see the [standard parameters](#standard-parameters) section for more details.
+
+:::note
+In documentation, we will often use the terms "LLM" and "Chat Model" interchangeably. This is because most modern LLMs are exposed to users via a chat model interface.
+
+However, LangChain also has implementations of older LLMs that do not follow the chat model interface and instead use an interface that takes a string as input and returns a string as output. These models are typically named without the "Chat" prefix (e.g., `Ollama`, `Anthropic`, `OpenAI`, etc.).
+These models implement the [BaseLLM](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.llms.BaseLLM.html#langchain_core.language_models.llms.BaseLLM) interface and may be named with the "LLM" suffix (e.g., `OllamaLLM`, `AnthropicLLM`, `OpenAILLM`, etc.). Generally, users should not use these models.
+:::
+
+### Key methods
+
+The key methods of a chat model are:
+
+1. **invoke**: The primary method for interacting with a chat model. It takes a list of [messages](/docs/concepts/messages) as input and returns a list of messages as output.
+2. **stream**: A method that allows you to stream the output of a chat model as it is generated.
+3. **batch**: A method that allows you to batch multiple requests to a chat model together for more efficient processing.
+4. **bind_tools**: A method that allows you to bind a tool to a chat model for use in the model's execution context.
+5. **with_structured_output**: A wrapper around the `invoke` method for models that natively support [structured output](/docs/concepts#structured_output).
+
+Other important methods can be found in the [BaseChatModel API Reference](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html).
+
+### Inputs and outputs
+
+Modern LLMs are typically accessed through a chat model interface that takes [messages](/docs/concepts/messages) as input and returns [messages](/docs/concepts/messages) as output. Messages are typically associated with a role (e.g., "system", "human", "assistant") and one or more content blocks that contain text or potentially multimodal data (e.g., images, audio, video).
+
+LangChain supports two message formats to interact with chat models:
+
+1. **LangChain Message Format**: LangChain's own message format, which is used by default and is used internally by LangChain.
+2. **OpenAI's Message Format**: OpenAI's message format.
+
+### Standard parameters
+
+Many chat models have standardized parameters that can be used to configure the model:
+
+| Parameter      | Description                                                                                                                                                                                                                                                                                                    |
+|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `model`        | The name or identifier of the specific AI model you want to use (e.g., `"gpt-3.5-turbo"` or `"gpt-4"`).                                                                                                                                                                                                        |
+| `temperature`  | Controls the randomness of the model's output. A higher value (e.g., 1.0) makes responses more creative, while a lower value (e.g., 0.1) makes them more deterministic and focused.                                                                                                                            |
+| `timeout`      | The maximum time (in seconds) to wait for a response from the model before canceling the request. Ensures the request doesn’t hang indefinitely.                                                                                                                                                               |
+| `max_tokens`   | Limits the total number of tokens (words and punctuation) in the response. This controls how long the output can be.                                                                                                                                                                                           |
+| `stop`         | Specifies stop sequences that indicate when the model should stop generating tokens. For example, you might use specific strings to signal the end of a response.                                                                                                                                              |
+| `max_retries`  | The maximum number of attempts the system will make to resend a request if it fails due to issues like network timeouts or rate limits.                                                                                                                                                                        |
+| `api_key`      | The API key required for authenticating with the model provider. This is usually issued when you sign up for access to the model.                                                                                                                                                                              |
+| `base_url`     | The URL of the API endpoint where requests are sent. This is typically provided by the model's provider and is necessary for directing your requests.                                                                                                                                                          |
+| `rate_limiter` | An optional [BaseRateLimiter](https://python.langchain.com/api_reference/core/rate_limiters/langchain_core.rate_limiters.BaseRateLimiter.html#langchain_core.rate_limiters.BaseRateLimiter) to space out requests to avoid exceeding rate limits.  See [rate-limiting](#rate-limiting) below for more details. |
+
+Some important things to note:
+
+- Standard parameters only apply to model providers that expose parameters with the intended functionality. For example, some providers do not expose a configuration for maximum output tokens, so max_tokens can't be supported on these.
+- Standard params are currently only enforced on integrations that have their own integration packages (e.g. `langchain-openai`, `langchain-anthropic`, etc.), they're not enforced on models in ``langchain-community``.
+
+ChatModels also accept other parameters that are specific to that integration. To find all the parameters supported by a ChatModel head to the [API reference](https://python.langchain.com/api_reference/) for that model.
+
+## Tool calling
+
+Chat models can call [tools](/docs/concepts/tools) to perform tasks such as fetching data from a database, making API requests, or running custom code. Please
+see the [tool calling](/docs/concepts#tool-calling) guide for more information.
+
+## Structured outputs
+
+Chat models can be requested to respond in a particular format (e.g., JSON or matching a particular schema). This feature is extremely
+useful for information extraction tasks. Please read more about
+the technique in the [structured outputs](/docs/concepts#structured_output) guide.
+
+## Multimodality
+
+Large Language Models (LLMs) are not limited to processing text. They can also be used to process other types of data, such as images, audio, and video. This is known as [multimodality](/docs/concepts/multimodality).
+
+Currently, only some LLMs support multimodal inputs, and almost none support multimodal outputs. Please consult the specific model documentation for details.
+
+## Context window
+
+A chat model's context window refers to the maximum size of the input sequence the model can process at one time. While the context windows of modern LLMs are quite large, they still present a limitation that developers must keep in mind when working with chat models.
+
+If the input exceeds the context window, the model may not be able to process the entire input and could raise an error. In conversational applications, this is especially important because the context window determines how much information the model can "remember" throughout a conversation. Developers often need to manage the input within the context window to maintain a coherent dialogue without exceeding the limit. For more details on handling memory in conversations, refer to the [memory](https://langchain-ai.github.io/langgraph/concepts/memory/).
+
+The size of the input is measured in [tokens](/docs/concepts/tokens) which are the unit of processing that the model uses.
+
+## Advanced topics
+ 
+### Rate-limiting
+
+Many chat model providers impose a limit on the number of requests that can be made in a given time period.
+
+If you hit a rate limit, you will typically receive a rate limit error response from the provider, and will need to wait before making more requests.
+
+You have a few options to deal with rate limits:
+
+1. Try to avoid hitting rate limits by spacing out requests: Chat models accept a `rate_limiter` parameter that can be provided during initialization. This parameter is used to control the rate at which requests are made to the model provider. Spacing out the requests to a given model is a particularly useful strategy when benchmarking models to evaluate their performance. Please see the [how to handle rate limits](https://python.langchain.com/docs/how_to/chat_model_rate_limiting/) for more information on how to use this feature.
+2. Try to recover from rate limit errors: If you receive a rate limit error, you can wait a certain amount of time before retrying the request. The amount of time to wait can be increased with each subsequent rate limit error. Chat models have a `max_retries` parameter that can be used to control the number of retries. See the [standard parameters](#standard-parameters) section for more information.
+3. Fallback to another chat model: If you hit a rate limit with one chat model, you can switch to another chat model that is not rate-limited.
+
+### Caching
+
+Chat model APIs can be slow, so a natural question is whether to cache the results of previous conversations. Theoretically, caching can help improve performance by reducing the number of requests made to the model provider. In practice, caching chat model responses is a complex problem and should be approached with caution.
+
+The reason is that getting a cache hit is unlikely after the first or second interaction in a conversation if relying on caching the **exact** inputs into the model. For example, how likely do you think that multiple conversations start with the exact same message? What about the exact same three messages?
+
+An alternative approach is to use semantic caching, where you cache responses based on the meaning of the input rather than the exact input itself. This can be effective in some situations, but not in others.
+
+A semantic cache introduces a dependency on another model on the critical path of your application (e.g., the semantic cache may rely on an [embedding model](/docs/concepts/embedding_models) to convert text to a vector representation), and it's not guaranteed to capture the meaning of the input accurately.
+
+However, there might be situations where caching chat model responses is beneficial. For example, if you have a chat model that is used to answer frequently asked questions, caching responses can help reduce the load on the model provider and improve response times.
+
+Please see the [how to cache chat model responses](/docs/how_to/#chat-model-caching) guide for more details.
+
+## Related resources
+
+* How-to guides on using chat models: [how-to guides](/docs/how_to/#chat-models).
+* List of supported chat models: [chat model integrations](/docs/integrations/chat/).
+
+### Conceptual guides
+
+* [Messages](/docs/concepts/messages)
+* [Tool calling](/docs/concepts#tool-calling)
+* [Multimodality](/docs/concepts/multimodality)
+* [Structured outputs](/docs/concepts#structured_output)
+* [Tokens](/docs/concepts/tokens)
--- a/docs/docs/concepts/document_loaders.mdx
+++ b/docs/docs/concepts/document_loaders.mdx
@@ -0,0 +1,45 @@
+# Document loaders
+<span data-heading-keywords="document loader,document loaders"></span>
+
+:::info[Prerequisites]
+
+* [Document loaders API reference](https://python.langchain.com/docs/how_to/#document-loaders)
+:::
+
+Document loaders are designed to load document objects. LangChain has hundreds of integrations with various data sources to load data from: Slack, Notion, Google Drive, etc.
+
+## Integrations
+
+You can find available integrations on the [Document Loaders Integrations page](https://python.langchain.com/docs/integrations/document_loaders/).
+
+## Interface
+
+Documents loaders implement the [BaseLoader interface](https://python.langchain.com/api_reference/core/document_loaders/langchain_core.document_loaders.base.BaseLoader.html).
+
+Each DocumentLoader has its own specific parameters, but they can all be invoked in the same way with the `.load` method or `.lazy_load`.
+
+Here's a simple example:
+
+```python
+from langchain_community.document_loaders.csv_loader import CSVLoader
+
+loader = CSVLoader(
+    ...  # <-- Integration specific parameters here
+)
+data = loader.load()
+```
+
+or if working with large datasets, you can use the `.lazy_load` method:
+
+```python
+for document in loader.lazy_load():
+    print(document)
+```
+
+## Related resources
+
+Please see the following resources for more information:
+
+* [How-to guides for document loaders](https://python.langchain.com/docs/how_to/#document-loaders)
+* [Document API reference](https://python.langchain.com/docs/how_to/#document-loaders)
+* [Document loaders integrations](https://python.langchain.com/docs/integrations/document_loaders/)
--- a/docs/docs/concepts/embedding_models.mdx
+++ b/docs/docs/concepts/embedding_models.mdx
@@ -0,0 +1,130 @@
+# Embedding models
+<span data-heading-keywords="embedding,embeddings"></span>
+
+:::info[Prerequisites]
+
+* [Documents](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html)
+
+:::
+
+:::info[Note]
+This conceptual overview focuses on text-based embedding models.
+
+Embedding models can also be [multimodal](/docs/concepts/multimodality) though such models are not currently supported by LangChain.
+:::
+
+Imagine being able to capture the essence of any text - a tweet, document, or book - in a single, compact representation.
+This is the power of embedding models, which lie at the heart of many retrieval systems.
+Embedding models transform human language into a format that machines can understand and compare with speed and accuracy. 
+These models take text as input and produce a fixed-length array of numbers, a numerical fingerprint of the text's semantic meaning.
+Embeddings allow search system to find relevant documents not just based on keyword matches, but on semantic understanding. 
+
+## Key concepts
+
+![Conceptual Overview](/img/embeddings_concept.png)
+
+(1) **Embed text as a vector**: Embeddings transform text into a numerical vector representation.
+
+(2) **Measure similarity**: Embedding vectors can be comparing using simple mathematical operations.
+
+## Embedding 
+
+### Historical context 
+
+The landscape of embedding models has evolved significantly over the years. 
+A pivotal moment came in 2018 when Google introduced [BERT (Bidirectional Encoder Representations from Transformers)](https://www.nvidia.com/en-us/glossary/bert/). 
+BERT applied transformer models to embed text as a simple vector representation, which lead to unprecedented performance across various NLP tasks.
+However, BERT wasn't optimized for generating sentence embeddings efficiently. 
+This limitation spurred the creation of [SBERT (Sentence-BERT)](https://www.sbert.net/examples/training/sts/README.html), which adapted the BERT architecture to generate semantically rich sentence embeddings, easily comparable via similarity metrics like cosine similarity, dramatically reduced the computational overhead for tasks like finding similar sentences.
+Today, the embedding model ecosystem is diverse, with numerous providers offering their own implementations. 
+To navigate this variety, researchers and practitioners often turn to benchmarks like the Massive Text Embedding Benchmark (MTEB) [here](https://huggingface.co/blog/mteb) for objective comparisons.
+
+:::info[Further reading]
+
+* See the [seminal BERT paper](https://arxiv.org/abs/1810.04805).
+* See Cameron Wolfe's [excellent review](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2) of embedding models.
+* See the [Massive Text Embedding Benchmark (MTEB)](https://huggingface.co/blog/mteb) leaderboard for a comprehensive overview of embedding models.
+
+:::
+
+### Interface
+
+LangChain provides a universal interface for working with them, providing standard methods for common operations.
+This common interface simplifies interaction with various embedding providers through two central methods:
+
+- `embed_documents`: For embedding multiple texts (documents)
+- `embed_query`: For embedding a single text (query)
+
+This distinction is important, as some providers employ different embedding strategies for documents (which are to be searched) versus queries (the search input itself).
+To illustrate, here's a practical example using LangChain's `.embed_documents` method to embed a list of strings:
+
+```python
+from langchain_openai import OpenAIEmbeddings
+embeddings_model = OpenAIEmbeddings()
+embeddings = embeddings_model.embed_documents(
+    [
+        "Hi there!",
+        "Oh, hello!",
+        "What's your name?",
+        "My friends call me World",
+        "Hello World!"
+    ]
+)
+len(embeddings), len(embeddings[0])
+(5, 1536)
+```
+
+For convenience, you can also use the `embed_query` method to embed a single text:
+
+```python
+query_embedding = embeddings_model.embed_query("What is the meaning of life?")
+```
+
+:::info[Further reading]
+
+* See the full list of [LangChain embedding model integrations](/docs/integrations/text_embedding/).
+* See these [how-to guides](/docs/how_to/embed_text) for working with embedding models.
+
+:::
+
+### Integrations
+
+LangChain offers many embedding model integrations which you can find [on the embedding models](/docs/integrations/text_embedding/) integrations page.
+
+## Measure similarity
+
+Each embedding is essentially a set of coordinates, often in a high-dimensional space. 
+In this space, the position of each point (embedding) reflects the meaning of its corresponding text.
+Just as similar words might be close to each other in a thesaurus, similar concepts end up close to each other in this embedding space. 
+This allows for intuitive comparisons between different pieces of text.
+By reducing text to these numerical representations, we can use simple mathematical operations to quickly measure how alike two pieces of text are, regardless of their original length or structure.
+Some common similarity metrics include:
+
+- **Cosine Similarity**: Measures the cosine of the angle between two vectors.
+- **Euclidean Distance**: Measures the straight-line distance between two points.
+- **Dot Product**: Measures the projection of one vector onto another.
+
+The choice of similarity metric should be chosen based on the model.
+As an example, [OpenAI suggests cosine similarity for their embeddings](https://platform.openai.com/docs/guides/embeddings/which-distance-function-should-i-use), which can be easily implemented:
+
+```python
+import numpy as np
+
+def cosine_similarity(vec1, vec2):
+    dot_product = np.dot(vec1, vec2)
+    norm_vec1 = np.linalg.norm(vec1)
+    norm_vec2 = np.linalg.norm(vec2)
+    return dot_product / (norm_vec1 * norm_vec2)
+
+similarity = cosine_similarity(query_result, document_result)
+print("Cosine Similarity:", similarity)
+```  
+
+:::info[Further reading]
+
+* See Simon Willison’s [nice blog post and video](https://simonwillison.net/2023/Oct/23/embeddings/) on embeddings and similarity metrics.
+* See [this documentation](https://developers.google.com/machine-learning/clustering/dnn-clustering/supervised-similarity) from Google on similarity metrics to consider with embeddings.
+* See Pinecone's [blog post](https://www.pinecone.io/learn/vector-similarity/) on similarity metrics.
+* See OpenAI's [FAQ](https://platform.openai.com/docs/guides/embeddings/faq) on what similarity metric to use with OpenAI embeddings.
+
+::: 
--- a/docs/docs/concepts/evaluation.mdx
+++ b/docs/docs/concepts/evaluation.mdx
@@ -0,0 +1,17 @@
+# Evaluation
+<span data-heading-keywords="evaluation,evaluate"></span>
+
+Evaluation is the process of assessing the performance and effectiveness of your LLM-powered applications.
+It involves testing the model's responses against a set of predefined criteria or benchmarks to ensure it meets the desired quality standards and fulfills the intended purpose.
+This process is vital for building reliable applications.
+
+![](/img/langsmith_evaluate.png)
+
+[LangSmith](https://docs.smith.langchain.com/) helps with this process in a few ways:
+
+- It makes it easier to create and curate datasets via its tracing and annotation features
+- It provides an evaluation framework that helps you define metrics and run your app against your dataset
+- It allows you to track results over time and automatically run your evaluators on a schedule or as part of CI/Code
+
+To learn more, check out [this LangSmith guide](https://docs.smith.langchain.com/concepts/evaluation).
+
--- a/docs/docs/concepts/example_selectors.mdx
+++ b/docs/docs/concepts/example_selectors.mdx
@@ -0,0 +1,20 @@
+# Example selectors
+
+:::note Prerequisites
+
+- [Chat models](/docs/concepts/chat_models/)
+- [Few-shot prompting](/docs/concepts/few_shot_prompting/)
+:::
+
+## Overview
+
+One common prompting technique for achieving better performance is to include examples as part of the prompt. This is known as [few-shot prompting](/docs/concepts/few_shot_prompting).
+
+This gives the [language model](/docs/concepts/chat_models/) concrete examples of how it should behave.
+Sometimes these examples are hardcoded into the prompt, but for more advanced situations it may be nice to dynamically select them.
+
+**Example Selectors** are classes responsible for selecting and then formatting examples into prompts.
+
+## Related resources
+
+* [Example selector how-to guides](/docs/how_to/#example-selectors)
--- a/docs/docs/concepts/few_shot_prompting.mdx
+++ b/docs/docs/concepts/few_shot_prompting.mdx
@@ -0,0 +1,85 @@
+# Few-shot prompting
+
+:::note Prerequisites
+
+- [Chat models](/docs/concepts/chat_models/)
+:::
+
+## Overview
+
+One of the most effective ways to improve model performance is to give a model examples of
+what you want it to do. The technique of adding example inputs and expected outputs
+to a model prompt is known as "few-shot prompting". The technique is based on the
+[Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) paper.
+There are a few things to think about when doing few-shot prompting:
+
+1. How are examples generated?
+2. How many examples are in each prompt?
+3. How are examples selected at runtime?
+4. How are examples formatted in the prompt?
+
+Here are the considerations for each.
+
+## 1. Generating examples
+
+The first and most important step of few-shot prompting is coming up with a good dataset of examples. Good examples should be relevant at runtime, clear, informative, and provide information that was not already known to the model.
+
+At a high-level, the basic ways to generate examples are:
+- Manual: a person/people generates examples they think are useful.
+- Better model: a better (presumably more expensive/slower) model's responses are used as examples for a worse (presumably cheaper/faster) model.
+- User feedback: users (or labelers) leave feedback on interactions with the application and examples are generated based on that feedback (for example, all interactions with positive feedback could be turned into examples).
+- LLM feedback: same as user feedback but the process is automated by having models evaluate themselves.
+
+Which approach is best depends on your task. For tasks where a small number core principles need to be understood really well, it can be valuable hand-craft a few really good examples.
+For tasks where the space of correct behaviors is broader and more nuanced, it can be useful to generate many examples in a more automated fashion so that there's a higher likelihood of there being some highly relevant examples for any runtime input.
+
+**Single-turn v.s. multi-turn examples**
+
+Another dimension to think about when generating examples is what the example is actually showing.
+
+The simplest types of examples just have a user input and an expected model output. These are single-turn examples.
+
+One more complex type if example is where the example is an entire conversation, usually in which a model initially responds incorrectly and a user then tells the model how to correct its answer.
+This is called a multi-turn example. Multi-turn examples can be useful for more nuanced tasks where its useful to show common errors and spell out exactly why they're wrong and what should be done instead.
+
+## 2. Number of examples
+
+Once we have a dataset of examples, we need to think about how many examples should be in each prompt.
+The key tradeoff is that more examples generally improve performance, but larger prompts increase costs and latency.
+And beyond some threshold having too many examples can start to confuse the model.
+Finding the right number of examples is highly dependent on the model, the task, the quality of the examples, and your cost and latency constraints.
+Anecdotally, the better the model is the fewer examples it needs to perform well and the more quickly you hit steeply diminishing returns on adding more examples.
+But, the best/only way to reliably answer this question is to run some experiments with different numbers of examples.
+
+## 3. Selecting examples
+
+Assuming we are not adding our entire example dataset into each prompt, we need to have a way of selecting examples from our dataset based on a given input. We can do this:
+- Randomly
+- By (semantic or keyword-based) similarity of the inputs
+- Based on some other constraints, like token size
+
+LangChain has a number of [`ExampleSelectors`](/docs/concepts/example_selectors) which make it easy to use any of these techniques.
+
+Generally, selecting by semantic similarity leads to the best model performance. But how important this is is again model and task specific, and is something worth experimenting with.
+
+## 4. Formatting examples
+
+Most state-of-the-art models these days are chat models, so we'll focus on formatting examples for those. Our basic options are to insert the examples:
+- In the system prompt as a string
+- As their own messages
+
+If we insert our examples into the system prompt as a string, we'll need to make sure it's clear to the model where each example begins and which parts are the input versus output. Different models respond better to different syntaxes, like [ChatML](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chat-markup-language), XML, TypeScript, etc.
+
+If we insert our examples as messages, where each example is represented as a sequence of Human, AI messages, we might want to also assign [names](/docs/concepts/#messages) to our messages like `"example_user"` and `"example_assistant"` to make it clear that these messages correspond to different actors than the latest input message.
+
+**Formatting tool call examples**
+
+One area where formatting examples as messages can be tricky is when our example outputs have tool calls. This is because different models have different constraints on what types of message sequences are allowed when any tool calls are generated.
+- Some models require that any AIMessage with tool calls be immediately followed by ToolMessages for every tool call,
+- Some models additionally require that any ToolMessages be immediately followed by an AIMessage before the next HumanMessage,
+- Some models require that tools are passed in to the model if there are any tool calls / ToolMessages in the chat history.
+
+These requirements are model-specific and should be checked for the model you are using. If your model requires ToolMessages after tool calls and/or AIMessages after ToolMessages and your examples only include expected tool calls and not the actual tool outputs, you can try adding dummy ToolMessages / AIMessages to the end of each example with generic contents to satisfy the API constraints.
+In these cases it's especially worth experimenting with inserting your examples as strings versus messages, as having dummy messages can adversely affect certain models.
+
+You can see a case study of how Anthropic and OpenAI respond to different few-shot prompting techniques on two different tool calling benchmarks [here](https://blog.langchain.dev/few-shot-prompting-to-improve-tool-calling-performance/).
--- a/docs/docs/concepts/index.mdx
+++ b/docs/docs/concepts/index.mdx
@@ -0,0 +1,89 @@
+# Conceptual guide
+
+This guide provides explanations of the key concepts behind the LangChain framework and AI applications more broadly.
+
+We recommend that you go through at least one of the [Tutorials](/docs/tutorials) before diving into the conceptual guide. This will provide practical context that will make it easier to understand the concepts discussed here.
+
+The conceptual guide does not cover step-by-step instructions or specific implementation examples — those are found in the [How-to guides](/docs/how_to/) and [Tutorials](/docs/tutorials). For detailed reference material, please see the [API reference](https://python.langchain.com/api_reference/).
+
+## High level
+
+- **[Why LangChain?](/docs/concepts/why_langchain)**: Overview of the value that LangChain provides.
+- **[Architecture](/docs/concepts/architecture)**: How packages are organized in the LangChain ecosystem.
+
+## Concepts
+
+- **[Chat models](/docs/concepts/chat_models)**: LLMs exposed via a chat API that process sequences of messages as input and output a message.
+- **[Messages](/docs/concepts/messages)**: The unit of communication in chat models, used to represent model input and output.
+- **[Chat history](/docs/concepts/chat_history)**: A conversation represented as a sequence of messages, alternating between user messages and model responses.
+- **[Tools](/docs/concepts/tools)**: A function with an associated schema defining the function's name, description, and the arguments it accepts.
+- **[Tool calling](/docs/concepts/tool_calling)**: A type of chat model API that accepts tool schemas, along with messages, as input and returns invocations of those tools as part of the output message.
+- **[Structured output](/docs/concepts/structured_outputs)**: A technique to make a chat model respond in a structured format, such as JSON that matches a given schema.
+- **[Memory](https://langchain-ai.github.io/langgraph/concepts/memory/)**: Information about a conversation that is persisted so that it can be used in future conversations.
+- **[Multimodality](/docs/concepts/multimodality)**: The ability to work with data that comes in different forms, such as text, audio, images, and video.
+- **[Runnable interface](/docs/concepts/runnables)**: The base abstraction that many LangChain components and the LangChain Expression Language are built on.
+- **[LangChain Expression Language (LCEL)](/docs/concepts/lcel)**: A syntax for orchestrating LangChain components. Most useful for simpler applications.
+- **[Document loaders](/docs/concepts/document_loaders)**: Load a source as a list of documents.
+- **[Retrieval](/docs/concepts/retrieval)**: Information retrieval systems can retrieve structured or unstructured data from a datasource in response to a query.
+- **[Text splitters](/docs/concepts/text_splitters)**: Split long text into smaller chunks that can be individually indexed to enable granular retrieval.
+- **[Embedding models](/docs/concepts/embedding_models)**: Models that represent data such as text or images in a vector space.
+- **[Vector stores](/docs/concepts/vectorstores)**: Storage of and efficient search over vectors and associated metadata.
+- **[Retriever](/docs/concepts/retrievers)**: A component that returns relevant documents from a knowledge base in response to a query.
+- **[Retrieval Augmented Generation (RAG)](/docs/concepts/rag)**: A technique that enhances language models by combining them with external knowledge bases.
+- **[Agents](/docs/concepts/agents)**: Use a [language model](/docs/concepts/chat_models) to choose a sequence of actions to take. Agents can interact with external resources via [tool](/docs/concepts/tools).
+- **[Prompt templates](/docs/concepts/prompt_templates)**: Component for factoring out the static parts of a model "prompt" (usually a sequence of messages). Useful for serializing, versioning, and reusing these static parts.
+- **[Output parsers](/docs/concepts/output_parsers)**: Responsible for taking the output of a model and transforming it into a more suitable format for downstream tasks. Output parsers were primarily useful prior to the general availability of [tool calling](/docs/concepts/tool_calling) and [structured outputs](/docs/concepts/structured_outputs).
+- **[Few-shot prompting](/docs/concepts/few_shot_prompting)**: A technique for improving model performance by providing a few examples of the task to perform in the prompt.
+- **[Example selectors](/docs/concepts/example_selectors)**: Used to select the most relevant examples from a dataset based on a given input. Example selectors are used in few-shot prompting to select examples for a prompt.
+- **[Async programming](/docs/concepts/async)**: The basics that one should know to use LangChain in an asynchronous context.
+- **[Callbacks](/docs/concepts/callbacks)**: Callbacks enable the execution of custom auxiliary code in built-in components. Callbacks are used to stream outputs from LLMs in LangChain, trace the intermediate steps of an application, and more.
+- **[Tracing](/docs/concepts/tracing)**: The process of recording the steps that an application takes to go from input to output. Tracing is essential for debugging and diagnosing issues in complex applications.
+- **[Evaluation](/docs/concepts/evaluation)**: The process of assessing the performance and effectiveness of AI applications. This involves testing the model's responses against a set of predefined criteria or benchmarks to ensure it meets the desired quality standards and fulfills the intended purpose. This process is vital for building reliable applications.
+
+## Glossary
+
+- **[AIMessageChunk](/docs/concepts/messages#aimessagechunk)**: A partial response from an AI message. Used when streaming responses from a chat model.
+- **[AIMessage](/docs/concepts/messages#aimessage)**: Represents a complete response from an AI model.
+- **[astream_events](/docs/concepts/chat_models#key-methods)**: Stream granular information from [LCEL](/docs/concepts/lcel) chains.
+- **[BaseTool](/docs/concepts/tools#basetool)**: The base class for all tools in LangChain.
+- **[batch](/docs/concepts/runnables)**: Use to execute a runnable with batch inputs a Runnable.
+- **[bind_tools](/docs/concepts/chat_models#bind-tools)**: Allows models to interact with tools.
+- **[Caching](/docs/concepts/chat_models#caching)**: Storing results to avoid redundant calls to a chat model.
+- **[Chat models](/docs/concepts/multimodality#chat-models)**: Chat models that handle multiple data modalities.
+- **[Configurable runnables](/docs/concepts/runnables#configurable-Runnables)**: Creating configurable Runnables.
+- **[Context window](/docs/concepts/chat_models#context-window)**: The maximum size of input a chat model can process.
+- **[Conversation patterns](/docs/concepts/chat_history#conversation-patterns)**: Common patterns in chat interactions.
+- **[Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html)**: LangChain's representation of a document.
+- **[Embedding models](/docs/concepts/multimodality#embedding-models)**: Models that generate vector embeddings for various data types.
+- **[HumanMessage](/docs/concepts/messages#humanmessage)**: Represents a message from a human user.
+- **[InjectedState](/docs/concepts/tools#injectedstate)**: A state injected into a tool function.
+- **[InjectedStore](/docs/concepts/tools#injectedstore)**: A store that can be injected into a tool for data persistence.
+- **[InjectedToolArg](/docs/concepts/tools#injectedtoolarg)**: Mechanism to inject arguments into tool functions.
+- **[input and output types](/docs/concepts/runnables#input-and-output-types)**: Types used for input and output in Runnables.
+- **[Integration packages](/docs/concepts/architecture#partner-packages)**: Third-party packages that integrate with LangChain.
+- **[invoke](/docs/concepts/runnables)**: A standard method to invoke a Runnable.
+- **[JSON mode](/docs/concepts/structured_outputs#json-mode)**: Returning responses in JSON format.
+- **[langchain-community](/docs/concepts/architecture#langchain-community)**: Community-driven components for LangChain.
+- **[langchain-core](/docs/concepts/architecture#langchain-core)**: Core langchain package. Includes base interfaces and in-memory implementations.
+- **[langchain](/docs/concepts/architecture#langchain)**: A package for higher level components (e.g., some pre-built chains).
+- **[langgraph](/docs/concepts/architecture#langgraph)**: Powerful orchestration layer for LangChain. Use to build complex pipelines and workflows.
+- **[langserve](/docs/concepts/architecture#langserve)**: Use to deploy LangChain Runnables as REST endpoints. Uses FastAPI. Works primarily for LangChain Runnables, does not currently integrate with LangGraph.
+- **[Managing chat history](/docs/concepts/chat_history#managing-chat-history)**: Techniques to maintain and manage the chat history.
+- **[OpenAI format](/docs/concepts/messages#openai-format)**: OpenAI's message format for chat models.
+- **[Propagation of RunnableConfig](/docs/concepts/runnables#propagation-RunnableConfig)**: Propagating configuration through Runnables. Read if working with python 3.9, 3.10 and async.
+- **[rate-limiting](/docs/concepts/chat_models#rate-limiting)**: Client side rate limiting for chat models.
+- **[RemoveMessage](/docs/concepts/messages#remove-message)**: An abstraction used to remove a message from chat history, used primarily in LangGraph.
+- **[role](/docs/concepts/messages#role)**: Represents the role (e.g., user, assistant) of a chat message.
+- **[RunnableConfig](/docs/concepts/runnables#RunnableConfig)**: Use to pass run time information to Runnables (e.g., `run_name`, `run_id`, `tags`, `metadata`, `max_concurrency`, `recursion_limit`, `configurable`).
+- **[Standard parameters for chat models](/docs/concepts/chat_models#standard-parameters)**: Parameters such as API key, `temperature`, and `max_tokens`,
+- **[stream](/docs/concepts/streaming)**: Use to stream output from a Runnable or a graph.
+- **[Tokenization](/docs/concepts/tokens)**: The process of converting data into tokens and vice versa.
+- **[Tokens](/docs/concepts/tokens)**: The basic unit that a language model reads, processes, and generates under the hood.
+- **[Tool artifacts](/docs/concepts/tools#tool-artifacts)**: Add artifacts to the output of a tool that will not be sent to the model, but will be available for downstream processing.
+- **[Tool binding](/docs/concepts/tool_calling#tool-binding)**: Binding tools to models.
+- **[@tool](/docs/concepts/tools#@tool)**: Decorator for creating tools in LangChain.
+- **[Toolkits](/docs/concepts/tools#toolkits)**: A collection of tools that can be used together.
+- **[ToolMessage](/docs/concepts/messages#toolmessage)**: Represents a message that contains the results of a tool execution.
+- **[Vector stores](/docs/concepts/vectorstores)**: Datastores specialized for storing and efficiently searching vector embeddings.
+- **[with_structured_output](/docs/concepts/chat_models#with-structured-output)**: A helper method for chat models that natively support [tool calling](/docs/concepts/tool_calling) to get structured output matching a given schema specified via Pydantic, JSON schema or a function.
+- **[with_types](/docs/concepts/runnables#with_types)**: Method to overwrite the input and output types of a runnable. Useful when working with complex LCEL chains and deploying with LangServe.
--- a/docs/docs/concepts/key_value_stores.mdx
+++ b/docs/docs/concepts/key_value_stores.mdx
@@ -0,0 +1,38 @@
+# Key-value stores
+
+## Overview
+
+LangChain provides a key-value store interface for storing and retrieving data.
+
+LangChain includes a [`BaseStore`](https://python.langchain.com/api_reference/core/stores/langchain_core.stores.BaseStore.html) interface,
+which allows for storage of arbitrary data. However, LangChain components that require KV-storage accept a
+more specific `BaseStore[str, bytes]` instance that stores binary data (referred to as a `ByteStore`), and internally take care of
+encoding and decoding data for their specific needs.
+
+This means that as a user, you only need to think about one type of store rather than different ones for different types of data.
+
+## Usage
+
+The key-value store interface in LangChain is used primarily for:
+
+1. Caching [embeddings](/docs/concepts/embedding_models) via [CachedBackedEmbeddings](https://python.langchain.com/api_reference/langchain/embeddings/langchain.embeddings.cache.CacheBackedEmbeddings.html#langchain.embeddings.cache.CacheBackedEmbeddings) to avoid recomputing embeddings for repeated queries or when re-indexing content.
+
+2. As a simple [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html#langchain_core.documents.base.Document) persistence layer in some retrievers.
+
+Please see these how-to guides for more information:
+
+* [How to cache embeddings guide](https://python.langchain.com/docs/how_to/caching_embeddings/).
+* [How to retriever using multiple vectors per document](https://python.langchain.com/docs/how_to/custom_retriever/).
+
+## Interface
+
+All [`BaseStores`](https://python.langchain.com/api_reference/core/stores/langchain_core.stores.BaseStore.html) support the following interface. Note that the interface allows for modifying **multiple** key-value pairs at once:
+
+- `mget(key: Sequence[str]) -> List[Optional[bytes]]`: get the contents of multiple keys, returning `None` if the key does not exist
+- `mset(key_value_pairs: Sequence[Tuple[str, bytes]]) -> None`: set the contents of multiple keys
+- `mdelete(key: Sequence[str]) -> None`: delete multiple keys
+- `yield_keys(prefix: Optional[str] = None) -> Iterator[str]`: yield all keys in the store, optionally filtering by a prefix
+
+## Integrations
+
+Please reference the [stores integration page](/docs/integrations/stores/) for a list of available key-value store integrations.
--- a/docs/docs/concepts/lcel.mdx
+++ b/docs/docs/concepts/lcel.mdx
@@ -0,0 +1,221 @@
+# LangChain Expression Language (LCEL)
+
+:::info Prerequisites
+* [Runnable Interface](/docs/concepts/runnables)
+:::
+
+The **L**ang**C**hain **E**xpression **L**anguage (LCEL) takes a [declarative](https://en.wikipedia.org/wiki/Declarative_programming) approach to building new [Runnables](/docs/concepts/runnables) from existing Runnables.
+
+This means that you describe what you want to happen, rather than how you want it to happen, allowing LangChain to optimize the run-time execution of the chains.
+
+We often refer to a `Runnable` created using LCEL as a "chain". It's important to remember that a "chain" is `Runnable` and it implements the full [Runnable Interface](/docs/concepts/runnables).
+
+:::note
+* The [LCEL cheatsheet](https://python.langchain.com/docs/how_to/lcel_cheatsheet/) shows common patterns that involve the Runnable interface and LCEL expressions.
+* Please see the following list of [how-to guides](/docs/how_to/#langchain-expression-language-lcel) that cover common tasks with LCEL.
+* A list of built-in `Runnables` can be found in the [LangChain Core API Reference](https://python.langchain.com/api_reference/core/runnables.html). Many of these Runnables are useful when composing custom "chains" in LangChain using LCEL.
+:::
+
+## Benefits of LCEL
+
+LangChain optimizes the run-time execution of chains built with LCEL in a number of ways:
+
+- **Optimize parallel execution**: Run Runnables in parallel using [RunnableParallel](#RunnableParallel) or run multiple inputs through a given chain in parallel using the [Runnable Batch API](/docs/concepts/runnables#batch). Parallel execution can significantly reduce the latency as processing can be done in parallel instead of sequentially.
+- **Guarantee Async support**: Any chain built with LCEL can be run asynchronously using the [Runnable Async API](/docs/concepts/runnables#async-api). This can be useful when running chains in a server environment where you want to handle large number of requests concurrently.
+- **Simplify streaming**: LCEL chains can be streamed, allowing for incremental output as the chain is executed. LangChain can optimize the streaming of the output to minimize the time-to-first-token(time elapsed until the first chunk of output from a [chat model](/docs/concepts/chat_models) or [llm](/docs/concepts/llms) comes out).
+
+Other benefits include:
+
+- [**Seamless LangSmith tracing**](https://docs.smith.langchain.com)
+As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at every step.
+With LCEL, **all** steps are automatically logged to [LangSmith](https://docs.smith.langchain.com/) for maximum observability and debuggability.
+- **Standard API**: Because all chains are built using the Runnable interface, they can be used in the same way as any other Runnable.
+- [**Deployable with LangServe**](/docs/concepts/architecture#langserve): Chains built with LCEL can be deployed using for production use.
+
+## Should I use LCEL?
+
+LCEL is an [orchestration solution](https://en.wikipedia.org/wiki/Orchestration_(computing)) -- it allows LangChain to handle run-time execution of chains in an optimized way.
+
+While we have seen users run chains with hundreds of steps in production, we generally recommend using LCEL for simpler orchestration tasks. When the application requires complex state management, branching, cycles or multiple agents, we recommend that users take advantage of [LangGraph](/docs/concepts/architecture#langgraph).
+
+In LangGraph, users define graphs that specify the flow of the application. This allows users to keep using LCEL within individual nodes when LCEL is needed, while making it easy to define complex orchestration logic that is more readable and maintainable.
+
+Here are some guidelines:
+
+* If you are making a single LLM call, you don't need LCEL; instead call the underlying [chat model](/docs/concepts/chat_models) directly.
+* If you have a simple chain (e.g., prompt + llm + parser, simple retrieval set up etc.), LCEL is a reasonable fit, if you're taking advantage of the LCEL benefits.
+* If you're building a complex chain (e.g., with branching, cycles, multiple agents, etc.) use [LangGraph](/docs/concepts/architecture#langgraph) instead. Remember that you can always use LCEL within individual nodes in LangGraph.
+
+## Composition Primitives
+
+`LCEL` chains are built by composing existing `Runnables` together. The two main composition primitives are [RunnableSequence](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.RunnableSequence.html#langchain_core.runnables.base.RunnableSequence) and [RunnableParallel](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.RunnableParallel.html#langchain_core.runnables.base.RunnableParallel).
+
+Many other composition primitives (e.g., [RunnableAssign](
+https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.passthrough.RunnableAssign.html#langchain_core.runnables.passthrough.RunnableAssign
+)) can be thought of as variations of these two primitives.
+
+:::note
+You can find a list of all composition primitives in the [LangChain Core API Reference](https://python.langchain.com/api_reference/core/runnables.html).
+:::
+
+### RunnableSequence
+
+`RunnableSequence` is a composition primitive that allows you "chain" multiple runnables sequentially, with the output of one runnable serving as the input to the next.
+
+```python
+from langchain_core.runnables import RunnableSequence
+chain = RunnableSequence([runnable1, runnable2])
+```
+
+Invoking the `chain` with some input:
+
+```python
+final_output = chain.invoke(some_input)
+```
+
+corresponds to the following:
+
+```python
+output1 = runnable1.invoke(some_input)
+final_output = runnable2.invoke(output1)
+```
+
+:::note
+`runnable1` and `runnable2` are placeholders for any `Runnable` that you want to chain together.
+:::
+
+### RunnableParallel
+
+`RunnableParallel` is a composition primitive that allows you to run multiple runnables concurrently, with the same input provided to each.
+
+```python
+from langchain_core.runnables import RunnableParallel
+chain = RunnableParallel({
+    "key1": runnable1,
+    "key2": runnable2,
+})
+```
+
+Invoking the `chain` with some input:
+
+```python
+final_output = chain.invoke(some_input)
+```
+
+Will yield a `final_output` dictionary with the same keys as the input dictionary, but with the values replaced by the output of the corresponding runnable.
+
+```python
+{
+    "key1": runnable1.invoke(some_input),
+    "key2": runnable2.invoke(some_input),
+}
+```
+
+Recall, that the runnables are executed in parallel, so while the result is the same as
+dictionary comprehension shown above, the execution time is much faster.
+
+:::note
+`RunnableParallel`supports both synchronous and asynchronous execution (as all `Runnables` do).
+
+* For synchronous execution, `RunnableParallel` uses a [ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor) to run the runnables concurrently.
+* For asynchronous execution, `RunnableParallel` uses [asyncio.gather](https://docs.python.org/3/library/asyncio.html#asyncio.gather) to run the runnables concurrently.
+:::
+
+## Composition Syntax
+
+The usage of `RunnableSequence` and `RunnableParallel` is so common that we created a shorthand syntax for using them. This helps
+to make the code more readable and concise.
+
+### The `|` operator
+
+We have [overloaded](https://docs.python.org/3/reference/datamodel.html#special-method-names) the `|` operator to create a `RunnableSequence` from two `Runnables`.
+
+```python
+chain = runnable1 | runnable2
+```
+
+is Equivalent to:
+
+```python
+chain = RunnableSequence([runnable1, runnable2])
+```
+
+### The `.pipe` method`
+
+If you have moral qualms with operator overloading, you can use the `.pipe` method instead. This is equivalent to the `|` operator.
+
+```python
+chain = runnable1.pipe(runnable2)
+```
+
+### Coercion
+
+LCEL applies automatic type coercion to make it easier to compose chains.
+
+If you do not understand the type coercion, you can always use the `RunnableSequence` and `RunnableParallel` classes directly.
+
+This will make the code more verbose, but it will also make it more explicit.
+
+#### Dictionary to RunnableParallel
+
+Inside an LCEL expression, a dictionary is automatically converted to a `RunnableParallel`.
+
+For example, the following code:
+
+```python
+mapping = {
+    "key1": runnable1,
+    "key2": runnable2,
+}
+
+chain = mapping | runnable3
+```
+
+It gets automatically converted to the following:
+
+```python
+chain = RunnableSequence([RunnableParallel(mapping), runnable3])
+```
+
+:::caution
+You have to be careful because the `mapping` dictionary is not a `RunnableParallel` object, it is just a dictionary. This means that the following code will raise an `AttributeError`:
+
+```python
+mapping.invoke(some_input)
+```
+:::
+
+#### Function to RunnableLambda
+
+Inside an LCEL expression, a function is automatically converted to a `RunnableLambda`.
+
+```
+def some_func(x):
+    return x
+
+chain = some_func | runnable1
+```
+
+It gets automatically converted to the following:
+
+```python
+chain = RunnableSequence([RunnableLambda(some_func), runnable1])
+```
+
+:::caution
+You have to be careful because the lambda function is not a `RunnableLambda` object, it is just a function. This means that the following code will raise an `AttributeError`:
+
+```python
+lambda x: x + 1.invoke(some_input)
+```
+:::
+
+## Legacy Chains
+
+LCEL aims to provide consistency around behavior and customization over legacy subclassed chains such as `LLMChain` and
+`ConversationalRetrievalChain`. Many of these legacy chains hide important details like prompts, and as a wider variety
+of viable models emerge, customization has become more and more important.
+
+If you are currently using one of these legacy chains, please see [this guide for guidance on how to migrate](/docs/versions/migrating_chains).
+
+For guides on how to do specific tasks with LCEL, check out [the relevant how-to guides](/docs/how_to/#langchain-expression-language-lcel).
--- a/docs/docs/concepts/llms.mdx
+++ b/docs/docs/concepts/llms.mdx
@@ -0,0 +1,3 @@
+# Large language models (llms)
+
+Please see the [Chat Model Concept Guide](/docs/concepts/chat_models) page for more information.
--- a/docs/docs/concepts/messages.mdx
+++ b/docs/docs/concepts/messages.mdx
@@ -0,0 +1,244 @@
+# Messages
+
+:::info Prerequisites
+- [Chat Models](/docs/concepts/chat_models)
+:::
+
+## Overview
+
+Messages are the unit of communication in [chat models](/docs/concepts/chat_models). They are used to represent the input and output of a chat model, as well as any additional context or metadata that may be associated with a conversation.
+
+Each message has a **role** (e.g., "user", "assistant"), **content** (e.g., text, multimodal data), and additional metadata that can vary depending on the chat model provider.
+
+LangChain provides a unified message format that can be used across chat models, allowing users to work with different chat models without worrying about the specific details of the message format used by each model provider.
+
+## What inside a message?
+
+A message typically consists of the following pieces of information:
+
+- **Role**: The role of the message (e.g., "user", "assistant").
+- **Content**: The content of the message (e.g., text, multimodal data).
+- Additional metadata: id, name, [token usage](/docs/concepts/tokens) and other model-specific metadata.
+
+### Role
+
+Roles are used to distinguish between different types of messages in a conversation and help the chat model understand how to respond to a given sequence of messages.
+
+| **Role**              | **Description**                                                                                                                                                                                                 |
+|-----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **system**            | Used to tell the chat model how to behave and provide additional context. Not supported by all chat model providers.                                                                                            |
+| **user**              | Represents input from a user interacting with the model, usually in the form of text or other interactive input.                                                                                                |
+| **assistant**         | Represents a response from the model, which can include text or a request to invoke tools.                                                                                                                      |
+| **tool**              | A message used to pass the results of a tool invocation back to the model after external data or processing has been retrieved. Used with chat models that support [tool calling](/docs/concepts/tool_calling). |
+| **function (legacy)** | This is a legacy role, corresponding to OpenAI's legacy function-calling API. **tool** role should be used instead.                                                                                             |
+
+### Content
+
+The content of a message text or a list of dictionaries representing [multimodal data](/docs/concepts/multimodality) (e.g., images, audio, video). The exact format of the content can vary between different chat model providers.
+
+Currently, most chat models support text as the primary content type, with some models also supporting multimodal data. However, support for multimodal data is still limited across most chat model providers.
+
+For more information see:
+* [HumanMessage](#humanmessage) -- for content in the input from the user.
+* [AIMessage](#aimessage) -- for content in the response from the model.
+* [Multimodality](/docs/concepts/multimodality) -- for more information on multimodal content.
+
+### Other Message Data
+
+Depending on the chat model provider, messages can include other data such as:
+
+- **ID**: An optional unique identifier for the message.
+- **Name**: An optional `name` property which allows differentiate between different entities/speakers with the same role. Not all models support this!
+- **Metadata**: Additional information about the message, such as timestamps, token usage, etc.
+- **Tool Calls**: A request made by the model to call one or more tools> See [tool calling](/docs/concepts/tool_calling) for more information.
+
+## Conversation Structure
+
+The sequence of messages into a chat model should follow a specific structure to ensure that the chat model can generate a valid response.
+
+For example, a typical conversation structure might look like this:
+
+1. **User Message**: "Hello, how are you?"
+2. **Assistant Message**: "I'm doing well, thank you for asking."
+3. **User Message**: "Can you tell me a joke?"
+4. **Assistant Message**: "Sure! Why did the scarecrow win an award? Because he was outstanding in his field!"
+
+Please read the [chat history](/docs/concepts/chat_history) guide for more information on managing chat history and ensuring that the conversation structure is correct.
+
+## LangChain Messages
+
+LangChain provides a unified message format that can be used across all chat models, allowing users to work with different chat models without worrying about the specific details of the message format used by each model provider.
+
+LangChain messages are Python objects that subclass from a [BaseMessage](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.base.BaseMessage.html).
+
+The five main message types are:
+
+- [SystemMessage](#systemmessage): corresponds to **system** role
+- [HumanMessage](#humanmessage): corresponds to **user** role
+- [AIMessage](#aimessage): corresponds to **assistant** role
+- [AIMessageChunk](#aimessagechunk): corresponds to **assistant** role, used for [streaming](/docs/concepts/streaming) responses
+- [ToolMessage](#toolmessage): corresponds to **tool** role
+
+Other important messages include:
+
+- [RemoveMessage](#removemessage) -- does not correspond to any role. This is an abstraction, mostly used in [LangGraph](/docs/concepts/architecture#langgraph) to manage chat history.
+- **Legacy** [FunctionMessage](#legacy-functionmessage): corresponds to the **function** role in OpenAI's **legacy** function-calling API.
+
+You can find more information about **messages** in the [API Reference](https://python.langchain.com/api_reference/core/messages.html).
+
+### SystemMessage
+
+A `SystemMessage` is used to prime the behavior of the AI model and provide additional context, such as instructing the model to adopt a specific persona or setting the tone of the conversation (e.g., "This is a conversation about cooking").
+
+Different chat providers may support system message in one of the following ways:
+
+* **Through a "system" message role**: In this case, a system message is included as part of the message sequence with the role explicitly set as "system."
+* **Through a separate API parameter for system instructions**: Instead of being included as a message, system instructions are passed via a dedicated API parameter.
+* **No support for system messages**: Some models do not support system messages at all.
+
+Most major chat model providers support system instructions via either a chat message or a separate API parameter. LangChain will automatically adapt based on the provider’s capabilities. If the provider supports a separate API parameter for system instructions, LangChain will extract the content of a system message and pass it through that parameter.
+
+If no system message is supported by the provider, in most cases LangChain will attempt to incorporate the system message's content into a HumanMessage or raise an exception if that is not possible. However, this behavior is not yet consistently enforced across all implementations, and if using a less popular implementation of a chat model (e.g., an implementation from the `langchain-community` package) it is recommended to check the specific documentation for that model.
+
+### HumanMessage
+
+The `HumanMessage` corresponds to the **"user"** role. A human message represents input from a user interacting with the model.
+
+#### Text Content
+
+Most chat models expect the user input to be in the form of text.
+
+```python
+from langchain_core.messages import HumanMessage
+
+model.invoke([HumanMessage(content="Hello, how are you?")])
+```
+
+:::tip
+When invoking a chat model with a string as input, LangChain will automatically convert the string into a `HumanMessage` object. This is mostly useful for quick testing.
+
+```python
+model.invoke("Hello, how are you?")
+```
+:::
+
+#### Multi-modal Content
+
+Some chat models accept multimodal inputs, such as images, audio, video, or files like PDFs.
+
+Please see the [multimodality](/docs/concepts/multimodality) guide for more information.
+
+### AIMessage
+
+`AIMessage` is used to represent a message with the role **"assistant"**. This is the response from the model, which can include text or a request to invoke tools. It could also include other media types like images, audio, or video -- though this is still uncommon at the moment.
+
+```python
+from langchain_core.messages import HumanMessage
+ai_message = model.invoke([HumanMessage("Tell me a joke")])
+ai_message # <-- AIMessage
+```
+
+An `AIMessage` has the following attributes. The attributes which are **standardized** are the ones that LangChain attempts to standardize across different chat model providers. **raw** fields are specific to the model provider and may vary.
+
+| Attribute            | Standardized/Raw | Description                                                                                                                                                                                                             |
+|----------------------|:-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `content`            | Raw              | Usually a string, but can be a list of content blocks. See [content](#content) for details.                                                                                                                             |
+| `tool_calls`         | Standardized     | Tool calls associated with the message. See [tool calling](/docs/concepts/tool_calling) for details.                                                                                                                    |
+| `invalid_tool_calls` | Standardized     | Tool calls with parsing errors associated with the message. See [tool calling](/docs/concepts/tool_calling) for details.                                                                                                |
+| `usage_metadata`     | Standardized     | Usage metadata for a message, such as [token counts](/docs/concepts/tokens). See [Usage Metadata API Reference](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.UsageMetadata.html) |
+| `id`                 | Standardized     | An optional unique identifier for the message, ideally provided by the provider/model that created the message.                                                                                                         |
+| `response_metadata`  | Raw              | Response metadata, e.g., response headers, logprobs, token counts.                                                                                                                                                      |
+
+#### content
+
+The **content** property of an `AIMessage` represents the response generated by the chat model.
+
+The content is either:
+
+- **text** -- the norm for virtually all chat models.
+- A **list of dictionaries** -- Each dictionary represents a content block and is associated with a `type`.
+    * Used by Anthropic for surfacing agent thought process when doing [tool calling](/docs/concepts/tool_calling).
+    * Used by OpenAI for audio outputs. Please see [multi-modal content](/docs/concepts/multimodality) for more information.
+
+:::important
+The **content** property is **not** standardized across different chat model providers, mostly because there are
+still few examples to generalize from.
+:::
+
+### AIMessageChunk
+
+It is common to [stream](/docs/concepts/streaming) responses for the chat model as they are being generated, so the user can see the response in real-time instead of waiting for the entire response to be generated before displaying it.
+
+It is returned from the `stream`, `astream` and `astream_events` methods of the chat model.
+
+For example,
+
+```python
+for chunk in model.stream([HumanMessage("what color is the sky?")]):
+    print(chunk)
+```
+
+`AIMessageChunk` follows nearly the same structure as `AIMessage`, but uses a different [ToolCallChunk](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.tool.ToolCallChunk.html#langchain_core.messages.tool.ToolCallChunk)
+to be able to stream tool calling in a standardized manner.
+
+
+#### Aggregating
+
+`AIMessageChunks` support the `+` operator to merge them into a single `AIMessage`. This is useful when you want to display the final response to the user.
+
+```python
+ai_message = chunk1 + chunk2 + chunk3 + ...
+```
+
+### ToolMessage
+
+This represents a message with role "tool", which contains the result of [calling a tool](/docs/concepts/tool_calling). In addition to `role` and `content`, this message has:
+
+- a `tool_call_id` field which conveys the id of the call to the tool that was called to produce this result.
+- an `artifact` field which can be used to pass along arbitrary artifacts of the tool execution which are useful to track but which should not be sent to the model.
+
+Please see [tool calling](/docs/concepts/tool_calling) for more information.
+
+### RemoveMessage
+
+This is a special message type that does not correspond to any roles. It is used
+for managing chat history in [LangGraph](/docs/concepts/architecture#langgraph).
+
+Please see the following for more information on how to use the `RemoveMessage`:
+
+* [Memory conceptual guide](https://langchain-ai.github.io/langgraph/concepts/memory/)
+* [How to delete messages](https://langchain-ai.github.io/langgraph/how-tos/memory/delete-messages/)
+
+### (Legacy) FunctionMessage
+
+This is a legacy message type, corresponding to OpenAI's legacy function-calling API. `ToolMessage` should be used instead to correspond to the updated tool-calling API.
+
+## OpenAI Format
+
+### Inputs
+
+Chat models also accept OpenAI's format as **inputs** to chat models:
+
+```python
+chat_model.invoke([
+    {
+        "role": "user",
+        "content": "Hello, how are you?",
+    },
+    {
+        "role": "assistant",
+        "content": "I'm doing well, thank you for asking.",
+    },
+    {
+        "role": "user",
+        "content": "Can you tell me a joke?",
+    }
+])
+```
+
+### Outputs
+
+At the moment, the output of the model will be in terms of LangChain messages, so you will need to convert the output to the OpenAI format if you
+need OpenAI format for the output as well.
+
+The [convert_to_openai_messages](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.convert_to_openai_messages.html) utility function can be used to convert from LangChain messages to OpenAI format.
--- a/docs/docs/concepts/multimodality.mdx
+++ b/docs/docs/concepts/multimodality.mdx
@@ -0,0 +1,88 @@
+# Multimodality
+
+## Overview
+
+**Multimodality** refers to the ability to work with data that comes in different forms, such as text, audio, images, and video. Multimodality can appear in various components, allowing models and systems to handle and process a mix of these data types seamlessly.
+
+- **Chat Models**: These could, in theory, accept and generate multimodal inputs and outputs, handling a variety of data types like text, images, audio, and video.
+- **Embedding Models**: Embedding Models can represent multimodal content, embedding various forms of data—such as text, images, and audio—into vector spaces.
+- **Vector Stores**: Vector stores could search over embeddings that represent multimodal data, enabling retrieval across different types of information.
+
+## Multimodality in chat models
+
+:::info Pre-requisites
+* [Chat models](/docs/concepts/chat_models)
+* [Messages](/docs/concepts/messages)
+:::
+ 
+Multimodal support is still relatively new and less common, model providers have not yet standardized on the "best" way to define the API. As such, LangChain's multimodal abstractions are lightweight and flexible, designed to accommodate different model providers' APIs and interaction patterns, but are **not** standardized across models.
+
+### How to use multimodal models
+
+* Use the [chat model integration table](/docs/integrations/chat/) to identify which models support multimodality.
+* Reference the [relevant how-to guides](/docs/how_to/#multimodal) for specific examples of how to use multimodal models.
+
+### What kind of multimodality is supported?
+
+#### Inputs
+
+Some models can accept multimodal inputs, such as images, audio, video, or files. The types of multimodal inputs supported depend on the model provider. For instance, [Google's Gemini](https://python.langchain.com/docs/integrations/chat/google_generative_ai/) supports documents like PDFs as inputs.
+
+Most chat models that support **multimodal inputs** also accept those values in OpenAI's content blocks format. So far this is restricted to image inputs. For models like Gemini which support video and other bytes input, the APIs also support the native, model-specific representations.
+
+The gist of passing multimodal inputs to a chat model is to use content blocks that specify a type and corresponding data. For example, to pass an image to a chat model:
+
+```python
+from langchain_core.messages import HumanMessage
+
+message = HumanMessage(
+    content=[
+        {"type": "text", "text": "describe the weather in this image"},
+        {"type": "image_url", "image_url": {"url": image_url}},
+    ],
+)
+response = model.invoke([message])
+```
+
+:::caution
+The exact format of the content blocks may vary depending on the model provider. Please refer to the chat model's
+integration documentation for the correct format. Find the integration in the [chat model integration table](/docs/integrations/chat/).
+:::
+
+#### Outputs
+
+Virtually no popular chat models support multimodal outputs at the time of writing (October 2024). 
+
+The only exception is OpenAI's chat model ([gpt-4o-audio-preview](https://python.langchain.com/docs/integrations/chat/openai/)), which can generate audio outputs.
+
+Multimodal outputs will appear as part of the [AIMessage](/docs/concepts/messages/#aimessage) response object.
+
+Please see the [ChatOpenAI](/docs/integrations/chat/openai/) for more information on how to use multimodal outputs.
+
+#### Tools
+
+Currently, no chat model is designed to work **directly** with multimodal data in a [tool call request](/docs/concepts/tool_calling) or [ToolMessage](/docs/concepts/tool_calling) result.
+
+However, a chat model can easily interact with multimodal data by invoking tools with references (e.g., a URL) to the multimodal data, rather than the data itself. For example, any model capable of [tool calling](/docs/concepts/tool_calling) can be equipped with tools to download and process images, audio, or video.
+
+## Multimodality in embedding models
+
+:::info Prerequisites
+* [Embedding Models](/docs/concepts/embedding_models)
+:::
+
+**Embeddings** are vector representations of data used for tasks like similarity search and retrieval.
+
+The current [embedding interface](https://python.langchain.com/api_reference/core/embeddings/langchain_core.embeddings.embeddings.Embeddings.html#langchain_core.embeddings.embeddings.Embeddings) used in LangChain is optimized entirely for text-based data, and will **not** work with multimodal data.
+
+As use cases involving multimodal search and retrieval tasks become more common, we expect to expand the embedding interface to accommodate other data types like images, audio, and video.
+
+## Multimodality in vector stores
+
+:::info Prerequisites
+* [Vectorstores](/docs/concepts/vectorstores)
+:::
+
+Vector stores are databases for storing and retrieving embeddings, which are typically used in search and retrieval tasks. Similar to embeddings, vector stores are currently optimized for text-based data.
+
+As use cases involving multimodal search and retrieval tasks become more common, we expect to expand the vector store interface to accommodate other data types like images, audio, and video.
--- a/docs/docs/concepts/output_parsers.mdx
+++ b/docs/docs/concepts/output_parsers.mdx
@@ -0,0 +1,41 @@
+# Output parsers
+
+<span data-heading-keywords="output parser"></span>
+
+:::note
+
+The information here refers to parsers that take a text output from a model try to parse it into a more structured representation.
+More and more models are supporting function (or tool) calling, which handles this automatically.
+It is recommended to use function/tool calling rather than output parsing.
+See documentation for that [here](/docs/concepts/#function-tool-calling).
+
+:::
+
+`Output parser` is responsible for taking the output of a model and transforming it to a more suitable format for downstream tasks.
+Useful when you are using LLMs to generate structured data, or to normalize output from chat models and LLMs.
+
+LangChain has lots of different types of output parsers. This is a list of output parsers LangChain supports. The table below has various pieces of information:
+
+- **Name**: The name of the output parser
+- **Supports Streaming**: Whether the output parser supports streaming.
+- **Has Format Instructions**: Whether the output parser has format instructions. This is generally available except when (a) the desired schema is not specified in the prompt but rather in other parameters (like OpenAI function calling), or (b) when the OutputParser wraps another OutputParser.
+- **Calls LLM**: Whether this output parser itself calls an LLM. This is usually only done by output parsers that attempt to correct misformatted output.
+- **Input Type**: Expected input type. Most output parsers work on both strings and messages, but some (like OpenAI Functions) need a message with specific kwargs.
+- **Output Type**: The output type of the object returned by the parser.
+- **Description**: Our commentary on this output parser and when to use it.
+
+| Name                                                                                                                                                                                                                                    | Supports Streaming | Has Format Instructions | Calls LLM | Input Type         | Output Type          | Description                                                                                                                                                                                                                                              |
+|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|-------------------------|-----------|--------------------|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [JSON](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.json.JSONOutputParser.html#langchain_core.output_parsers.json.JSONOutputParser)                                                     | ✅                  | ✅                       |           | `str` \| `Message` | JSON object          | Returns a JSON object as specified. You can specify a Pydantic model and it will return JSON for that model. Probably the most reliable output parser for getting structured data that does NOT use function calling.                                    |
+| [XML](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.xml.XMLOutputParser.html#langchain_core.output_parsers.xml.XMLOutputParser)                                                          | ✅                  | ✅                       |           | `str` \| `Message` | `dict`               | Returns a dictionary of tags. Use when XML output is needed. Use with models that are good at writing XML (like Anthropic's).                                                                                                                            |
+| [CSV](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.list.CommaSeparatedListOutputParser.html#langchain_core.output_parsers.list.CommaSeparatedListOutputParser)                          | ✅                  | ✅                       |           | `str` \| `Message` | `List[str]`          | Returns a list of comma separated values.                                                                                                                                                                                                                |
+| [OutputFixing](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.fix.OutputFixingParser.html#langchain.output_parsers.fix.OutputFixingParser)                                                |                    |                         | ✅         | `str` \| `Message` |                      | Wraps another output parser. If that output parser errors, then this will pass the error message and the bad output to an LLM and ask it to fix the output.                                                                                              |
+| [RetryWithError](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.retry.RetryWithErrorOutputParser.html#langchain.output_parsers.retry.RetryWithErrorOutputParser)                          |                    |                         | ✅         | `str` \| `Message` |                      | Wraps another output parser. If that output parser errors, then this will pass the original inputs, the bad output, and the error message to an LLM and ask it to fix it. Compared to OutputFixingParser, this one also sends the original instructions. |
+| [Pydantic](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.pydantic.PydanticOutputParser.html#langchain_core.output_parsers.pydantic.PydanticOutputParser)                                 |                    | ✅                       |           | `str` \| `Message` | `pydantic.BaseModel` | Takes a user defined Pydantic model and returns data in that format.                                                                                                                                                                                     |
+| [YAML](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.yaml.YamlOutputParser.html#langchain.output_parsers.yaml.YamlOutputParser)                                                          |                    | ✅                       |           | `str` \| `Message` | `pydantic.BaseModel` | Takes a user defined Pydantic model and returns data in that format. Uses YAML to encode it.                                                                                                                                                             |
+| [PandasDataFrame](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.pandas_dataframe.PandasDataFrameOutputParser.html#langchain.output_parsers.pandas_dataframe.PandasDataFrameOutputParser) |                    | ✅                       |           | `str` \| `Message` | `dict`               | Useful for doing operations with pandas DataFrames.                                                                                                                                                                                                      |
+| [Enum](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.enum.EnumOutputParser.html#langchain.output_parsers.enum.EnumOutputParser)                                                          |                    | ✅                       |           | `str` \| `Message` | `Enum`               | Parses response into one of the provided enum values.                                                                                                                                                                                                    |
+| [Datetime](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.datetime.DatetimeOutputParser.html#langchain.output_parsers.datetime.DatetimeOutputParser)                                      |                    | ✅                       |           | `str` \| `Message` | `datetime.datetime`  | Parses response into a datetime string.                                                                                                                                                                                                                  |
+| [Structured](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.structured.StructuredOutputParser.html#langchain.output_parsers.structured.StructuredOutputParser)                            |                    | ✅                       |           | `str` \| `Message` | `Dict[str, str]`     | An output parser that returns structured information. It is less powerful than other output parsers since it only allows for fields to be strings. This can be useful when you are working with smaller LLMs.                                            |
+
+For specifics on how to use output parsers, see the [relevant how-to guides here](/docs/how_to/#output-parsers).
--- a/docs/docs/concepts/prompt_templates.mdx
+++ b/docs/docs/concepts/prompt_templates.mdx
@@ -0,0 +1,79 @@
+# Prompt Templates
+
+Prompt templates help to translate user input and parameters into instructions for a language model.
+This can be used to guide a model's response, helping it understand the context and generate relevant and coherent language-based output.
+
+Prompt Templates take as input a dictionary, where each key represents a variable in the prompt template to fill in.
+
+Prompt Templates output a PromptValue. This PromptValue can be passed to an LLM or a ChatModel, and can also be cast to a string or a list of messages.
+The reason this PromptValue exists is to make it easy to switch between strings and messages.
+
+There are a few different types of prompt templates:
+
+## String PromptTemplates
+
+These prompt templates are used to format a single string, and generally are used for simpler inputs.
+For example, a common way to construct and use a PromptTemplate is as follows:
+
+```python
+from langchain_core.prompts import PromptTemplate
+
+prompt_template = PromptTemplate.from_template("Tell me a joke about {topic}")
+
+prompt_template.invoke({"topic": "cats"})
+```
+
+## ChatPromptTemplates
+
+These prompt templates are used to format a list of messages. These "templates" consist of a list of templates themselves.
+For example, a common way to construct and use a ChatPromptTemplate is as follows:
+
+```python
+from langchain_core.prompts import ChatPromptTemplate
+
+prompt_template = ChatPromptTemplate([
+    ("system", "You are a helpful assistant"),
+    ("user", "Tell me a joke about {topic}")
+])
+
+prompt_template.invoke({"topic": "cats"})
+```
+
+In the above example, this ChatPromptTemplate will construct two messages when called.
+The first is a system message, that has no variables to format.
+The second is a HumanMessage, and will be formatted by the `topic` variable the user passes in.
+
+## MessagesPlaceholder
+<span data-heading-keywords="messagesplaceholder"></span>
+
+This prompt template is responsible for adding a list of messages in a particular place.
+In the above ChatPromptTemplate, we saw how we could format two messages, each one a string.
+But what if we wanted the user to pass in a list of messages that we would slot into a particular spot?
+This is how you use MessagesPlaceholder.
+
+```python
+from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
+from langchain_core.messages import HumanMessage
+
+prompt_template = ChatPromptTemplate([
+    ("system", "You are a helpful assistant"),
+    MessagesPlaceholder("msgs")
+])
+
+prompt_template.invoke({"msgs": [HumanMessage(content="hi!")]})
+```
+
+This will produce a list of two messages, the first one being a system message, and the second one being the HumanMessage we passed in.
+If we had passed in 5 messages, then it would have produced 6 messages in total (the system message plus the 5 passed in).
+This is useful for letting a list of messages be slotted into a particular spot.
+
+An alternative way to accomplish the same thing without using the `MessagesPlaceholder` class explicitly is:
+
+```python
+prompt_template = ChatPromptTemplate([
+    ("system", "You are a helpful assistant"),
+    ("placeholder", "{msgs}") # <-- This is the changed part
+])
+```
+
+For specifics on how to use prompt templates, see the [relevant how-to guides here](/docs/how_to/#prompt-templates).
--- a/docs/docs/concepts/rag.mdx
+++ b/docs/docs/concepts/rag.mdx
@@ -0,0 +1,98 @@
+# Retrieval augmented generation (rag)
+
+:::info[Prerequisites]
+
+* [Retrieval](/docs/concepts/retrieval/)
+
+:::
+
+## Overview
+
+Retrieval Augmented Generation (RAG) is a powerful technique that enhances [language models](/docs/concepts/chat_models/) by combining them with external knowledge bases. 
+RAG addresses [a key limitation of models](https://www.glean.com/blog/how-to-build-an-ai-assistant-for-the-enterprise): models rely on fixed training datasets, which can lead to outdated or incomplete information.
+When given a query, RAG systems first search a knowledge base for relevant information.
+The system then incorporates this retrieved information into the model's prompt.
+The model uses the provided context to generate a response to the query.
+By bridging the gap between vast language models and dynamic, targeted information retrieval, RAG is a powerful technique for building more capable and reliable AI systems.
+
+## Key concepts
+
+![Conceptual Overview](/img/rag_concepts.png)
+
+(1) **Retrieval system**: Retrieve relevant information from a knowledge base.
+
+(2) **Adding external knowledge**: Pass retrieved information to a model.
+
+## Retrieval system
+
+Model's have internal knowledge that is often fixed, or at least not updated frequently due to the high cost of training.
+This limits their ability to answer questions about current events, or to provide specific domain knowledge.
+To address this, there are various knowledge injection techniques like [fine-tuning](https://hamel.dev/blog/posts/fine_tuning_valuable.html) or continued pre-training.
+Both are [costly](https://www.glean.com/blog/how-to-build-an-ai-assistant-for-the-enterprise) and often [poorly suited](https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts) for factual retrieval.
+Using a retrieval system offers several advantages:
+
+- **Up-to-date information**: RAG can access and utilize the latest data, keeping responses current.
+- **Domain-specific expertise**: With domain-specific knowledge bases, RAG can provide answers in specific domains.
+- **Reduced hallucination**: Grounding responses in retrieved facts helps minimize false or invented information.
+- **Cost-effective knowledge integration**: RAG offers a more efficient alternative to expensive model fine-tuning.
+
+:::info[Further reading]
+
+See our conceptual guide on [retrieval](/docs/concepts/retrieval/).
+
+:::
+
+## Adding external knowledge
+
+With a retrieval system in place, we need to pass knowledge from this system to the model. 
+A RAG pipeline typically achieves this following these steps:
+
+- Receive an input query.
+- Use the retrieval system to search for relevant information based on the query.
+- Incorporate the retrieved information into the prompt sent to the LLM.
+- Generate a response that leverages the retrieved context.
+
+As an example, here's a simple RAG workflow that passes information from a [retriever](/docs/concepts/retrievers/) to a [chat model](/docs/concepts/chat_models/):
+
+```python
+from langchain_openai import ChatOpenAI
+from langchain_core.messages import SystemMessage, HumanMessage
+
+# Define a system prompt that tells the model how to use the retrieved context
+system_prompt = """You are an assistant for question-answering tasks. 
+Use the following pieces of retrieved context to answer the question. 
+If you don't know the answer, just say that you don't know. 
+Use three sentences maximum and keep the answer concise.
+Context: {context}:"""
+    
+# Define a question
+question = """What are the main components of an LLM-powered autonomous agent system?"""
+
+# Retrieve relevant documents
+docs = retriever.invoke(question)
+
+# Combine the documents into a single string
+docs_text = "".join(d.page_content for d in docs)
+
+# Populate the system prompt with the retrieved context
+system_prompt_fmt = system_prompt.format(context=docs_text)
+
+# Create a model
+model = ChatOpenAI(model="gpt-4o", temperature=0) 
+
+# Generate a response
+questions = model.invoke([SystemMessage(content=system_prompt_fmt),
+                          HumanMessage(content=question)])
+```
+
+:::info[Further reading]
+
+RAG a deep area with many possible optimization and design choices:
+
+* See [this excellent blog](https://cameronrwolfe.substack.com/p/a-practitioners-guide-to-retrieval?utm_source=profile&utm_medium=reader2) from Cameron Wolfe for a comprehensive overview and history of RAG.
+* See our [RAG how-to guides](/docs/how_to/#qa-with-rag).
+* See our RAG [tutorials](/docs/tutorials/#working-with-external-knowledge).
+* See our RAG from Scratch course, with [code](https://github.com/langchain-ai/rag-from-scratch) and [video playlist](https://www.youtube.com/playlist?list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x).
+* Also, see our RAG from Scratch course [on Freecodecamp](https://youtu.be/sVcwVQRHIc8?feature=shared).
+
+:::
--- a/docs/docs/concepts/retrieval.mdx
+++ b/docs/docs/concepts/retrieval.mdx
@@ -0,0 +1,240 @@
+# Retrieval
+
+:::info[Prerequisites]
+
+* [Retrievers](/docs/concepts/retrievers/)
+* [Vectorstores](/docs/concepts/vectorstores/)
+* [Embeddings](/docs/concepts/embedding_models/)
+* [Text splitters](/docs/concepts/text_splitters/)
+
+:::
+
+:::danger[Security]
+ 
+Some of the concepts reviewed here utilize models to generate queries (e.g., for SQL or graph databases).
+There are inherent risks in doing this. 
+Make sure that your database connection permissions are scoped as narrowly as possible for your application's needs. 
+This will mitigate, though not eliminate, the risks of building a model-driven system capable of querying databases. 
+For more on general security best practices, see our [security guide](/docs/security/).
+
+:::
+
+## Overview 
+
+Retrieval systems are fundamental to many AI applications, efficiently identifying relevant information from large datasets. 
+These systems accommodate various data formats:
+
+- Unstructured text (e.g., documents) is often stored in vector stores or lexical search indexes.
+- Structured data is typically housed in relational or graph databases with defined schemas.
+
+Despite this diversity in data formats, modern AI applications increasingly aim to make all types of data accessible through natural language interfaces. 
+Models play a crucial role in this process by translating natural language queries into formats compatible with the underlying search index or database. 
+This translation enables more intuitive and flexible interactions with complex data structures.
+
+## Key concepts 
+
+![Retrieval](/img/retrieval_concept.png)
+
+(1) **Query analysis**: A process where models transform or construct search queries to optimize retrieval.
+
+(2) **Information retrieval**: Search queries are used to fetch information from various retrieval systems.
+
+## Query analysis 
+
+While users typically prefer to interact with retrieval systems using natural language, retrieval systems can specific query syntax or benefit from particular keywords. 
+Query analysis serves as a bridge between raw user input and optimized search queries. Some common applications of query analysis include:
+
+1. **Query Re-writing**: Queries can be re-written or expanded to improve semantic or lexical searches.
+2. **Query Construction**: Search indexes may require structured queries (e.g., SQL for databases).
+
+Query analysis employs models to transform or construct optimized search queries from raw user input. 
+
+### Query re-writing
+
+Retrieval systems should ideally handle a wide spectrum of user inputs, from simple and poorly worded queries to complex, multi-faceted questions. 
+To achieve this versatility, a popular approach is to use models to transform raw user queries into more effective search queries. 
+This transformation can range from simple keyword extraction to sophisticated query expansion and reformulation.
+Here are some key benefits of using models for query analysis in unstructured data retrieval:
+
+1. **Query Clarification**: Models can rephrase ambiguous or poorly worded queries for clarity.
+2. **Semantic Understanding**: They can capture the intent behind a query, going beyond literal keyword matching.
+3. **Query Expansion**: Models can generate related terms or concepts to broaden the search scope.
+4. **Complex Query Handling**: They can break down multi-part questions into simpler sub-queries.
+
+Various techniques have been developed to leverage models for query re-writing, including:
+
+| Name                                                                                                      | When to use                                                                                     | Description                                                                                                                                                                                                                                                                            |
+|-----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [Multi-query](/docs/how_to/MultiQueryRetriever/)                                                          | When you want to ensure high recall in retrieval by providing multiple pharsings of a question. | Rewrite the user question with multiple pharsings, retrieve documents for each rewritten question, return the unique documents for all queries.                                                                                                                                        |
+| [Decomposition](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb) | When a question can be broken down into smaller subproblems.                                    | Decompose a question into a set of subproblems / questions, which can either be solved sequentially (use the answer from first + retrieval to answer the second) or in parallel (consolidate each answer into final answer).                                                           |
+| [Step-back](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb)     | When a higher-level conceptual understanding is required.                                       | First prompt the LLM to ask a generic step-back question about higher-level concepts or principles, and retrieve relevant facts about them. Use this grounding to help answer the user question. [Paper](https://arxiv.org/pdf/2310.06117).                                            |
+| [HyDE](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb)          | If you have challenges retrieving relevant documents using the raw user inputs.                 | Use an LLM to convert questions into hypothetical documents that answer the question. Use the embedded hypothetical documents to retrieve real documents with the premise that doc-doc similarity search can produce more relevant matches. [Paper](https://arxiv.org/abs/2212.10496). |
+
+As an example, query decomposition can simply be accomplished using prompting and a structured output that enforces a list of sub-questions.
+These can then be run sequentially or in parallel on a downstream retrieval system.
+
+```python
+from pydantic import BaseModel, Field
+from langchain_openai import ChatOpenAI
+from langchain_core.messages import SystemMessage, HumanMessage
+
+# Define a pydantic model to enforce the output structure
+class Questions(BaseModel):
+    questions: List[str] = Field(
+        description="A list of sub-questions related to the input query."
+    )
+
+# Create an instance of the model and enforce the output structure
+model = ChatOpenAI(model="gpt-4o", temperature=0) 
+structured_model = model.with_structured_output(Questions)
+
+# Define the system prompt
+system = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
+The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n"""
+
+# Pass the question to the model
+question = """What are the main components of an LLM-powered autonomous agent system?"""
+questions = structured_model.invoke([SystemMessage(content=system)]+[HumanMessage(content=question)])
+```
+
+:::tip
+
+See our RAG from Scratch videos for a few different specific approaches:
+- [Multi-query](https://youtu.be/JChPi0CRnDY?feature=shared)
+- [Decomposition](https://youtu.be/h0OPWlEOank?feature=shared)
+- [Step-back](https://youtu.be/xn1jEjRyJ2U?feature=shared)
+- [HyDE](https://youtu.be/SaDzIVkYqyY?feature=shared)
+
+:::
+
+### Query construction
+
+Query analysis also can focus on translating natural language queries into specialized query languages or filters. 
+This translation is crucial for effectively interacting with various types of databases that house structured or semi-structured data.
+
+1. **Structured Data examples**: For relational and graph databases, Domain-Specific Languages (DSLs) are used to query data.
+   - **Text-to-SQL**: [Converts natural language to SQL](https://paperswithcode.com/task/text-to-sql) for relational databases.
+   - **Text-to-Cypher**: [Converts natural language to Cypher](https://neo4j.com/labs/neodash/2.4/user-guide/extensions/natural-language-queries/) for graph databases.
+
+2. **Semi-structured Data examples**: For vectorstores, queries can combine semantic search with metadata filtering.
+   - **Natural Language to Metadata Filters**: Converts user queries into [appropriate metadata filters](https://docs.pinecone.io/guides/data/filter-with-metadata).
+
+These approaches leverage models to bridge the gap between user intent and the specific query requirements of different data storage systems. Here are some popular techniques:
+
+| Name                                     | When to Use                                                                                                                          | Description                                                                                                                                                                                                                                          |
+|------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [Self Query](/docs/how_to/self_query/)   | If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text. | This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filter to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself). |
+| [Text to SQL](/docs/tutorials/sql_qa/)   | If users are asking questions that require information housed in a relational database, accessible via SQL.                          | This uses an LLM to transform user input into a SQL query.                                                                                                                                                                                           |
+| [Text-to-Cypher](/docs/tutorials/graph/) | If users are asking questions that require information housed in a graph database, accessible via Cypher.                            | This uses an LLM to transform user input into a Cypher query.                                                                                                                                                                                        |
+
+As an example, here is how to use the `SelfQueryRetriever` to convert natural language queries into metadata filters.  
+
+```python
+metadata_field_info = schema_for_metadata 
+document_content_description = "Brief summary of a movie"
+llm = ChatOpenAI(temperature=0)
+retriever = SelfQueryRetriever.from_llm(
+    llm,
+    vectorstore,
+    document_content_description,
+    metadata_field_info,
+)
+```
+
+:::info[Further reading]
+
+* See our tutorials on [text-to-SQL](/docs/tutorials/sql_qa/), [text-to-Cypher](/docs/tutorials/graph/), and [query analysis for metadata filters](/docs/tutorials/query_analysis/).
+* See our [blog post overview](https://blog.langchain.dev/query-construction/).
+* See our RAG from Scratch video on [query construction](https://youtu.be/kl6NwWYxvbM?feature=shared).
+
+::: 
+
+## Information retrieval 
+
+### Common retrieval systems
+
+#### Lexical search indexes
+
+Many search engines are based upon matching words in a query to the words in each document. 
+This approach is called lexical retrieval, using search [algorithms that are typically based upon word frequencies](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2).
+The intution is simple: a word appears frequently both in the user’s query and a particular document, then this document might be a good match.
+
+The particular data structure used to implement this is often an [*inverted index*](https://www.geeksforgeeks.org/inverted-index/).
+This types of index contains a list of words and a mapping of each word to a list of locations at which it occurs in various documents. 
+Using this data structure, it is possible to efficiently match the words in search queries to the documents in which they appear.
+[BM25](https://en.wikipedia.org/wiki/Okapi_BM25#:~:text=BM25%20is%20a%20bag%2Dof,slightly%20different%20components%20and%20parameters.) and [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) are [two popular lexical search algorithms](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2).
+
+:::info[Further reading]
+
+* See the [BM25](/docs/integrations/retrievers/bm25/) retriever integration.
+* See the [Elasticsearch](/docs/integrations/retrievers/elasticsearch_retriever/) retriever integration.
+
+::: 
+
+#### Vector indexes
+
+Vector indexes are an alternative way to index and store unstructured data.
+See our conceptual guide on [vectorstores](/docs/concepts/vectorstores/) for a detailed overview.  
+In short, rather than using word frequencies, vectorstores use an [embedding model](/docs/concepts/embedding_models/) to compress documents into high-dimensional vector representation. 
+This allows for efficient similarity search over embedding vectors using simple mathematical operations like cosine similarity.
+
+:::info[Further reading]
+
+* See our [how-to guide](/docs/how_to/vectorstore_retriever/) for more details on working with vectorstores.
+* See our [list of vectorstore integrations](/docs/integrations/vectorstores/).
+* See Cameron Wolfe's [blog post](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2) on the basics of vector search.
+
+:::
+
+#### Relational databases
+
+Relational databases are a fundamental type of structured data storage used in many applications. 
+They organize data into tables with predefined schemas, where each table represents an entity or relationship. 
+Data is stored in rows (records) and columns (attributes), allowing for efficient querying and manipulation through SQL (Structured Query Language). 
+Relational databases excel at maintaining data integrity, supporting complex queries, and handling relationships between different data entities.
+
+:::info[Further reading]
+
+* See our [tutorial](/docs/tutorials/sql_qa/) for working with SQL databases.
+* See our [SQL database toolkit](/docs/integrations/tools/sql_database/).
+
+:::
+
+#### Graph databases
+
+Graph databases are a specialized type of database designed to store and manage highly interconnected data. 
+Unlike traditional relational databases, graph databases use a flexible structure consisting of nodes (entities), edges (relationships), and properties. 
+This structure allows for efficient representation and querying of complex, interconnected data.
+Graph databases store data in a graph structure, with nodes, edges, and properties.
+They are particularly useful for storing and querying complex relationships between data points, such as social networks, supply-chain management, fraud detection, and recommendation services
+
+:::info[Further reading]
+
+* See our [tutorial](/docs/tutorials/graph/) for working with graph databases.
+* See our [list of graph database integrations](/docs/integrations/graphs/). 
+* See Neo4j's [starter kit for LangChain](https://neo4j.com/developer-blog/langchain-neo4j-starter-kit/).
+
+:::
+
+### Retriever  
+
+LangChain provides a unified interface for interacting with various retrieval systems through the [retriever](/docs/concepts/retrievers/) concept. The interface is straightforward:
+
+1. Input: A query (string)
+2. Output: A list of documents (standardized LangChain [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects)
+
+You can create a retriever using any of the retrieval systems mentioned earlier. The query analysis techniques we discussed are particularly useful here, as they enable natural language interfaces for databases that typically require structured query languages.
+For example, you can build a retriever for a SQL database using text-to-SQL conversion. This allows a natural language query (string) to be transformed into a SQL query behind the scenes.
+Regardless of the underlying retrieval system, all retrievers in LangChain share a common interface. You can use them with the simple `invoke` method:
+
+
+```python
+docs = retriever.invoke(query)
+```
+
+:::info[Further reading]
+
+* See our [conceptual guide on retrievers](/docs/concepts/retrievers/).
+* See our [how-to guide](/docs/how_to/#retrievers) on working with retrievers.
+
+:::
--- a/docs/docs/concepts/retrievers.mdx
+++ b/docs/docs/concepts/retrievers.mdx
@@ -0,0 +1,145 @@
+# Retrievers
+
+<span data-heading-keywords="retriever,retrievers"></span>
+
+:::info[Prerequisites]
+
+* [Vectorstores](/docs/concepts/vectorstores/)
+* [Embeddings](/docs/concepts/embedding_models/)
+* [Text splitters](/docs/concepts/text_splitters/)
+
+:::
+
+## Overview
+
+Many different types of retrieval systems exist, including vectorstores, graph databases, and relational databases.
+With the rise on popularity of large language models, retrieval systems have become an important component in AI application (e.g., [RAG](/docs/concepts/rag/)).
+Because of their importance and variability, LangChain provides a uniform interface for interacting with different types of retrieval systems.
+The LangChain [retriever](/docs/concepts/retrievers/) interface is straightforward:
+
+1. Input: A query (string)
+2. Output: A list of documents (standardized LangChain [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects)
+
+## Key concept
+
+![Retriever](/img/retriever_concept.png)
+ 
+All retrievers implement a simple interface for retrieving documents using natural language queries.
+
+## Interface 
+
+The only requirement for a retriever is the ability to accepts a query and return documents. 
+In particular, [LangChain's retriever class](https://api.python.langchain.com/en/latest/retrievers/langchain_core.retrievers.BaseRetriever.html) only requires that the `_get_relevant_documents` method is implemented, which takes a `query: str` and returns a list of [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects that are most relevant to the query.
+The underlying logic used to get relevant documents is specified by the retriever and can be whatever is most useful for the application.
+
+A LangChain retriever is a [runnable](/docs/how_to/lcel_cheatsheet/), which is a standard interface is for LangChain components. 
+This means that it has a few common methods, including `invoke`, that are used to interact with it. A retriever can be invoked with a query:
+
+```python
+docs = retriever.invoke(query)
+```
+
+Retrievers return a list of [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects, which have two attributes:
+
+* `page_content`: The content of this document. Currently is a string.
+* `metadata`: Arbitrary metadata associated with this document (e.g., document id, file name, source, etc). 
+
+:::info[Further reading]
+
+* See our [how-to guide](/docs/how_to/custom_retriever/) on building your own custom retriever.
+
+:::
+ 
+## Common types
+
+Despite the flexibility of the retriever interface, a few common types of retrieval systems are frequently used.
+
+### Search apis
+
+It's important to note that retrievers don't need to actually *store* documents. 
+For example, we can be built retrievers on top of search APIs that simply return search results! 
+See our retriever integrations with [Amazon Kendra](https://python.langchain.com/docs/integrations/retrievers/amazon_kendra_retriever/) or [Wikipedia Search](https://python.langchain.com/docs/integrations/retrievers/wikipedia/). 
+
+### Relational or graph database
+
+Retrievers can be built on top of relational or graph databases.
+In these cases, [query analysis](/docs/concepts/retrieval/) techniques to construct a structured query from natural language is critical.
+For example, you can build a retriever for a SQL database using text-to-SQL conversion. This allows a natural language query (string) retriever to be transformed into a SQL query behind the scenes.
+
+:::info[Further reading]
+
+* See our [tutorial](/docs/tutorials/sql_qa/) for context on how to build a retreiver using a SQL database and text-to-SQL.
+* See our [tutorial](/docs/tutorials/graph/) for context on how to build a retreiver using a graph database and text-to-Cypher.
+
+:::
+
+### Lexical search
+
+As discussed in our conceptual review of [retrieval](/docs/concepts/retrieval/), many search engines are based upon matching words in a query to the words in each document. 
+[BM25](https://en.wikipedia.org/wiki/Okapi_BM25#:~:text=BM25%20is%20a%20bag%2Dof,slightly%20different%20components%20and%20parameters.) and [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) are [two popular lexical search algorithms](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2).
+LangChain has retrievers for many popular lexical search algorithms / engines.
+
+:::info[Further reading]
+
+* See the [BM25](/docs/integrations/retrievers/bm25/) retriever integration.
+* See the [TF-IDF](/docs/integrations/retrievers/tf_idf/) retriever integration.
+* See the [Elasticsearch](/docs/integrations/retrievers/elasticsearch_retriever/) retriever integration.
+
+::: 
+
+### Vectorstore 
+
+[Vectorstores](/docs/concepts/vectorstores/) are a powerful and efficient way to index and retrieve unstructured data. 
+An vectorstore can be used as a retriever by calling the `as_retriever()` method.
+
+```python
+vectorstore = MyVectorStore()
+retriever = vectorstore.as_retriever()
+```
+
+## Advanced retrieval patterns
+
+### Ensemble 
+
+Because the retriever interface is so simple, returning a list of `Document` objects given a search query, it is possible to combine multiple retrievers using ensembling.
+This is particularly useful when you have multiple retrievers that are good at finding different types of relevant documents.
+It is easy to create an [ensemble retriever](/docs/how_to/ensemble_retriever/) that combines multiple retrievers with linear weighted scores:
+
+```python
+# Initialize the ensemble retriever
+ensemble_retriever = EnsembleRetriever(
+    retrievers=[bm25_retriever, vector_store_retriever], weights=[0.5, 0.5]
+)
+```
+
+When ensembling, how do we combine search results from many retrievers? 
+This motivates the concept of re-ranking, which takes the output of multiple retrievers and combines them using a more sophisticated algorithm such as [Reciprocal Rank Fusion (RRF)](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).
+
+### Source document retention 
+
+Many retrievers utilize some kind of index to make documents easily searchable.
+The process of indexing can include a transformation step (e.g., vectorstores often use document splitting). 
+Whatever transformation is used, can be very useful to retain a link between the *transformed document* and the original, giving the retriever the ability to return the *original* document.
+
+![Retrieval with full docs](/img/retriever_full_docs.png)
+
+This is particularly useful in AI applications, because it ensures no loss in document context for the model.
+For example, you may use small chunk size for indexing documents in a vectorstore. 
+If you return *only* the chunks as the retrieval result, then the model will have lost the original document context for the chunks. 
+
+LangChain has two different retrievers that can be used to address this challenge. 
+The [Multi-Vector](/docs/how_to/multi_vector/) retriever allows the user to use any document transformation (e.g., use an LLM to write a summary of the document) for indexing while retaining linkage to the source document. 
+The [ParentDocument](/docs/how_to/parent_document_retriever/) retriever links document chunks from a text-splitter transformation for indexing while retaining linkage to the source document. 
+
+| Name                                                      | Index Type                    | Uses an LLM               | When to Use                                                                                                                             | Description                                                                                                                                                                                                              |
+|-----------------------------------------------------------|-------------------------------|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [ParentDocument](/docs/how_to/parent_document_retriever/) | Vector store + Document Store | No                        | If your pages have lots of smaller pieces of distinct information that are best indexed by themselves, but best retrieved all together. | This involves indexing multiple chunks for each document. Then you find the chunks that are most similar in embedding space, but you retrieve the whole parent document and return that (rather than individual chunks). |
+| [Multi Vector](/docs/how_to/multi_vector/)                | Vector store + Document Store | Sometimes during indexing | If you are able to extract information from documents that you think is more relevant to index than the text itself.                    | This involves creating multiple vectors for each document. Each vector could be created in a myriad of ways - examples include summaries of the text and hypothetical questions.                                         |
+
+:::info[Further reading]
+
+* See our [how-to guide](/docs/how_to/parent_document_retriever/) on using the ParentDocument retriever.
+* See our [how-to guide](/docs/how_to/multi_vector/) on using the MultiVector retriever.
+* See our RAG from Scratch video on the [multi vector retriever](https://youtu.be/gTCU9I6QqCE?feature=shared).
+
+:::
--- a/docs/docs/concepts/runnables.mdx
+++ b/docs/docs/concepts/runnables.mdx
@@ -0,0 +1,352 @@
+# Runnable interface
+
+The Runnable interface is foundational for working with LangChain components, and it's implemented across many of them, such as [language models](/docs/concepts/chat_models), [output parsers](/docs/concepts/output_parsers), [retrievers](/docs/concepts/retrievers), [compiled LangGraph graphs](
+https://langchain-ai.github.io/langgraph/concepts/low_level/#compiling-your-graph) and more.
+
+This guide covers the main concepts and methods of the Runnable interface, which allows developers to interact with various LangChain components in a consistent and predictable manner.
+
+:::info Related Resources
+* The ["Runnable" Interface API Reference](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable) provides a detailed overview of the Runnable interface and its methods.
+* A list of built-in `Runnables` can be found in the [LangChain Core API Reference](https://python.langchain.com/api_reference/core/runnables.html). Many of these Runnables are useful when composing custom "chains" in LangChain using the [LangChain Expression Language (LCEL)](/docs/concepts/lcel).
+:::
+
+## Overview of runnable interface
+
+The Runnable way defines a standard interface that allows a Runnable component to be:
+
+* [Invoked](/docs/how_to/lcel_cheatsheet/#invoke-a-runnable): A single input is transformed into an output.
+* [Batched](/docs/how_to/lcel_cheatsheet/#batch-a-runnable/): Multiple inputs are efficiently transformed into outputs.
+* [Streamed](/docs/how_to/lcel_cheatsheet/#stream-a-runnable): Outputs are streamed as they are produced.
+* Inspected: Schematic information about Runnable's input, output, and configuration can be accessed.
+* Composed: Multiple Runnables can be composed to work together using [the LangChain Expression Language (LCEL)](/docs/concepts/lcel) to create complex pipelines.
+
+Please review the [LCEL Cheatsheet](/docs/how_to/lcel_cheatsheet) for some common patterns that involve the Runnable interface and LCEL expressions.
+
+<a id="batch"></a>
+### Optimized parallel execution (batch)
+<span data-heading-keywords="batch"></span>
+
+LangChain Runnables offer a built-in `batch` (and `batch_as_completed`) API that allow you to process multiple inputs in parallel.
+
+Using these methods can significantly improve performance when needing to process multiple independent inputs, as the
+processing can be done in parallel instead of sequentially.
+
+The two batching options are:
+
+* `batch`: Process multiple inputs in parallel, returning results in the same order as the inputs.
+* `batch_as_completed`: Process multiple inputs in parallel, returning results as they complete. Results may arrive out of order, but each includes the input index for matching.
+
+The default implementation of `batch` and `batch_as_completed` use a thread pool executor to run the `invoke` method in parallel. This allows for efficient parallel execution without the need for users to manage threads, and speeds up code that is I/O-bound (e.g., making API requests, reading files, etc.). It will not be as effective for CPU-bound operations, as the GIL (Global Interpreter Lock) in Python will prevent true parallel execution.
+
+Some Runnables may provide their own implementations of `batch` and `batch_as_completed` that are optimized for their specific use case (e.g.,
+rely on a `batch` API provided by a model provider).
+
+:::note
+The async versions of `abatch` and `abatch_as_completed` these rely on asyncio's [gather](https://docs.python.org/3/library/asyncio-task.html#asyncio.gather) and [as_completed](https://docs.python.org/3/library/asyncio-task.html#asyncio.as_completed) functions to run the `ainvoke` method in parallel.
+:::
+
+:::tip
+When processing a large number of inputs using `batch` or `batch_as_completed`, users may want to control the maximum number of parallel calls. This can be done by setting the `max_concurrency` attribute in the `RunnableConfig` dictionary. See the [RunnableConfig](/docs/concepts/runnables#RunnableConfig) for more information.
+
+Chat Models also have a built-in [rate limiter](/docs/concepts/chat_models#rate-limiting) that can be used to control the rate at which requests are made.
+:::
+
+### Asynchronous support
+<span data-heading-keywords="async-api"></span>
+
+Runnables expose an asynchronous API, allowing them to be called using the `await` syntax in Python. Asynchronous methods can be identified by the "a" prefix (e.g., `ainvoke`, `abatch`, `astream`, `abatch_as_completed`).
+
+Please refer to the [Async Programming with LangChain](/docs/concepts/async) guide for more details.
+
+## Streaming apis
+<span data-heading-keywords="streaming-api"></span>
+
+Streaming is critical in making applications based on LLMs feel responsive to end-users.
+
+Runnables expose the following three streaming APIs:
+
+1. sync [stream](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.stream) and async [astream](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream): yields the output a Runnable as it is generated.
+2. The async `astream_events`: a more advanced streaming API that allows streaming intermediate steps and final output
+3. The **legacy** async `astream_log`: a legacy streaming API that streams intermediate steps and final output
+
+Please refer to the [Streaming Conceptual Guide](/docs/concepts/streaming) for more details on how to stream in LangChain.
+
+## Input and output types
+
+Every `Runnable` is characterized by an input and output type. These input and output types can be any Python object, and are defined by the Runnable itself.
+
+Runnable methods that result in the execution of the Runnable (e.g., `invoke`, `batch`, `stream`, `astream_events`) work with these input and output types.
+
+* invoke: Accepts an input and returns an output.
+* batch: Accepts a list of inputs and returns a list of outputs.
+* stream: Accepts an input and returns a generator that yields outputs.
+
+The **input type** and **output type** vary by component:
+
+| Component    | Input Type                                       | Output Type           |
+|--------------|--------------------------------------------------|-----------------------|
+| Prompt       | dictionary                                       | PromptValue           |
+| ChatModel    | a string, list of chat messages or a PromptValue | ChatMessage           |
+| LLM          | a string, list of chat messages or a PromptValue | String                |
+| OutputParser | the output of an LLM or ChatModel                | Depends on the parser |
+| Retriever    | a string                                         | List of Documents     |
+| Tool         | a string or dictionary, depending on the tool    | Depends on the tool   |
+
+Please refer to the individual component documentation for more information on the input and output types and how to use them.
+
+### Inspecting schemas
+
+:::note
+This is an advanced feature that is unnecessary for most users. You should probably
+skip this section unless you have a specific need to inspect the schema of a Runnable.
+:::
+
+In some advanced uses, you may want to programmatically **inspect** the Runnable and determine what input and output types the Runnable expects and produces.
+
+The Runnable interface provides methods to get the [JSON Schema](https://json-schema.org/) of the input and output types of a Runnable, as well as [Pydantic schemas](https://docs.pydantic.dev/latest/) for the input and output types.
+
+These APIs are mostly used internally for unit-testing and by [LangServe](/docs/concepts/architecture#langserve) which uses the APIs for input validation and generation of [OpenAPI documentation](https://www.openapis.org/).
+
+In addition, to the input and output types, some Runnables have been set up with additional run time configuration options. 
+There are corresponding APIs to get the Pydantic Schema and JSON Schema of the configuration options for the Runnable.
+Please see the [Configurable Runnables](#configurable-runnables) section for more information.
+
+| Method                  | Description                                                      |
+|-------------------------|------------------------------------------------------------------|
+| `get_input_schema`      | Gives the Pydantic Schema of the input schema for the Runnable.  |
+| `get_output_chema`      | Gives the Pydantic Schema of the output schema for the Runnable. |
+| `config_schema`         | Gives the Pydantic Schema of the config schema for the Runnable. |
+| `get_input_jsonschema`  | Gives the JSONSchema of the input schema for the Runnable.       |
+| `get_output_jsonschema` | Gives the JSONSchema of the output schema for the Runnable.      |
+| `get_config_jsonschema` | Gives the JSONSchema of the config schema for the Runnable.      |
+
+
+#### With_types
+
+LangChain will automatically try to infer the input and output types of a Runnable based on available information.
+
+Currently, this inference does not work well for more complex Runnables that are built using [LCEL](/docs/concepts/lcel) composition, and the inferred input and / or output types may be incorrect. In these cases, we recommend that users override the inferred input and output types using the `with_types` method ([API Reference](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.with_types
+).
+
+## RunnableConfig
+
+Any of the methods that are used to execute the runnable (e.g., `invoke`, `batch`, `stream`, `astream_events`) accept a second argument called
+`RunnableConfig` ([API Reference](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.config.RunnableConfig.html#RunnableConfig)). This argument is a dictionary that contains configuration for the Runnable that will be used
+at run time during the execution of the runnable.
+
+A `RunnableConfig` can have any of the following properties defined:
+
+| Attribute       | Description                                                                                |
+|-----------------|--------------------------------------------------------------------------------------------|
+| run_name        | Name used for the given Runnable (not inherited).                                          |
+| run_id          | Unique identifier for this call. sub-calls will get their own unique run ids.              |
+| tags            | Tags for this call and any sub-calls.                                                      |
+| metadata        | Metadata for this call and any sub-calls.                                                  |
+| callbacks       | Callbacks for this call and any sub-calls.                                                 |
+| max_concurrency | Maximum number of parallel calls to make (e.g., used by batch).                            |
+| recursion_limit | Maximum number of times a call can recurse (e.g., used by Runnables that return Runnables) |
+| configurable    | Runtime values for configurable attributes of the Runnable.                                |
+
+Passing `config` to the `invoke` method is done like so:
+
+```python
+some_runnable.invoke(
+   some_input, 
+   config={
+      'run_name': 'my_run', 
+      'tags': ['tag1', 'tag2'], 
+      'metadata': {'key': 'value'}
+      
+   }
+)
+```
+
+### Propagation of RunnableConfig
+
+Many `Runnables` are composed of other Runnables, and it is important that the `RunnableConfig` is propagated to all sub-calls made by the Runnable. This allows providing run time configuration values to the parent Runnable that are inherited by all sub-calls.
+
+If this were not the case, it would be impossible to set and propagate [callbacks](/docs/concepts/callbacks) or other configuration values like `tags` and `metadata` which
+are expected to be inherited by all sub-calls.
+
+There are two main patterns by which new `Runnables` are created:
+
+1. Declaratively using [LangChain Expression Language (LCEL)](/docs/concepts/lcel):
+
+    ```python
+    chain = prompt | chat_model | output_parser
+    ```
+
+2. Using a [custom Runnable](#custom-runnables)  (e.g., `RunnableLambda`) or using the `@tool` decorator:
+
+    ```python
+    def foo(input):
+        # Note that .invoke() is used directly here
+        return bar_runnable.invoke(input)
+    foo_runnable = RunnableLambda(foo)
+    ```
+
+LangChain will try to propagate `RunnableConfig` automatically for both of the patterns. 
+
+For handling the second pattern, LangChain relies on Python's [contextvars](https://docs.python.org/3/library/contextvars.html).
+
+In Python 3.11 and above, this works out of the box, and you do not need to do anything special to propagate the `RunnableConfig` to the sub-calls.
+
+In Python 3.9 and 3.10, if you are using **async code**, you need to manually pass the `RunnableConfig` through to the `Runnable` when invoking it. 
+
+This is due to a limitation in [asyncio's tasks](https://docs.python.org/3/library/asyncio-task.html#asyncio.create_task)  in Python 3.9 and 3.10 which did
+not accept a `context` argument).
+
+Propagating the `RunnableConfig` manually is done like so:
+
+```python
+async def foo(input, config): # <-- Note the config argument
+    return await bar_runnable.ainvoke(input, config=config)
+    
+foo_runnable = RunnableLambda(foo)
+```
+
+:::caution
+When using Python 3.10 or lower and writing async code, `RunnableConfig` cannot be propagated
+automatically, and you will need to do it manually! This is a common pitfall when
+attempting to stream data using `astream_events` and `astream_log` as these methods
+rely on proper propagation of [callbacks](/docs/concepts/callbacks) defined inside of `RunnableConfig`.
+:::
+
+### Setting custom run name, tags, and metadata
+
+The `run_name`, `tags`, and `metadata` attributes of the `RunnableConfig` dictionary can be used to set custom values for the run name, tags, and metadata for a given Runnable.
+
+The `run_name` is a string that can be used to set a custom name for the run. This name will be used in logs and other places to identify the run. It is not inherited by sub-calls.
+
+The `tags` and `metadata` attributes are lists and dictionaries, respectively, that can be used to set custom tags and metadata for the run. These values are inherited by sub-calls.
+
+Using these attributes can be useful for tracking and debugging runs, as they will be surfaced in [LangSmith](https://docs.smith.langchain.com/) as trace attributes that you can
+filter and search on.
+
+The attributes will also be propagated to [callbacks](/docs/concepts/callbacks), and will appear in streaming APIs like [astream_events](/docs/concepts/streaming) as part of each event in the stream.
+
+:::note Related
+* [How-to trace with LangChain](https://docs.smith.langchain.com/how_to_guides/tracing/trace_with_langchain)
+:::
+
+### Setting run id
+
+:::note
+This is an advanced feature that is unnecessary for most users.
+:::
+
+You may need to set a custom `run_id` for a given run, in case you want 
+to reference it later or correlate it with other systems.
+
+The `run_id` MUST be a valid UUID string and **unique** for each run. It is used to identify
+the parent run, sub-class will get their own unique run ids automatically.
+
+To set a custom `run_id`, you can pass it as a key-value pair in the `config` dictionary when invoking the Runnable:
+
+```python
+import uuid
+
+run_id = uuid.uuid4()
+
+some_runnable.invoke(
+   some_input, 
+   config={
+      'run_id': run_id
+   }
+)
+
+# Do something with the run_id
+```
+
+### Setting recursion limit
+
+:::note
+This is an advanced feature that is unnecessary for most users.
+:::
+
+Some Runnables may return other Runnables, which can lead to infinite recursion if not handled properly. To prevent this, you can set a `recursion_limit` in the `RunnableConfig` dictionary. This will limit the number of times a Runnable can recurse.
+
+### Setting max concurrency
+
+If using the `batch` or `batch_as_completed` methods, you can set the `max_concurrency` attribute in the `RunnableConfig` dictionary to control the maximum number of parallel calls to make. This can be useful when you want to limit the number of parallel calls to prevent overloading a server or API.
+
+
+:::tip
+If you're trying to rate limit the number of requests made by a **Chat Model**, you can use the built-in [rate limiter](/docs/concepts/chat_models#rate-limiting) instead of setting `max_concurrency`, which will be more effective.
+
+See the [How to handle rate limits](https://python.langchain.com/docs/how_to/chat_model_rate_limiting/) guide for more information.
+:::
+
+### Setting configurable
+
+The `configurable` field is used to pass runtime values for configurable attributes of the Runnable.
+
+It is used frequently in [LangGraph](/docs/concepts/architecture#langgraph) with
+[LangGraph Persistence](https://langchain-ai.github.io/langgraph/concepts/persistence/)
+and [memory](https://langchain-ai.github.io/langgraph/concepts/memory/).
+
+It is used for a similar purpose in [RunnableWithMessageHistory](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html#langchain_core.runnables.history.RunnableWithMessageHistory) to specify either
+a `session_id` / `conversation_id` to keep track of conversation history.
+
+In addition, you can use it to specify any custom configuration options to pass to any [Configurable Runnable](#configurable-runnables) that they create.
+
+### Setting callbacks
+
+Use this option to configure [callbacks](/docs/concepts/callbacks) for the runnable at 
+runtime. The callbacks will be passed to all sub-calls made by the runnable.
+
+```python
+some_runnable.invoke(
+   some_input,
+   {
+      "callbacks": [
+         SomeCallbackHandler(),
+         AnotherCallbackHandler(),
+      ]
+   }
+)
+```
+
+Please read the [Callbacks Conceptual Guide](/docs/concepts/callbacks) for more information on how to use callbacks in LangChain.
+
+:::important
+If you're using Python 3.9 or 3.10 in an async environment, you must propagate
+the `RunnableConfig` manually to sub-calls in some cases. Please see the
+[Propagating RunnableConfig](#propagation-of-RunnableConfig) section for more information.
+:::
+
+## Creating a runnable from a function
+
+You may need to create a custom Runnable that runs arbitrary logic. This is especially
+useful if using [LangChain Expression Language (LCEL)](/docs/concepts/lcel) to compose
+multiple Runnables and you need to add custom processing logic in one of the steps.
+
+There are two ways to create a custom Runnable from a function:
+
+* `RunnableLambda`: Use this simple transformations where streaming is not required.
+* `RunnableGenerator`: use this for more complex transformations when streaming is needed.
+
+See the [How to run custom functions](/docs/how_to/functions) guide for more information on how to use `RunnableLambda` and `RunnableGenerator`.
+
+:::important
+Users should not try to subclass Runnables to create a new custom Runnable. It is
+much more complex and error-prone than simply using `RunnableLambda` or `RunnableGenerator`.
+:::
+
+## Configurable runnables
+
+:::note
+This is an advanced feature that is unnecessary for most users.
+
+It helps with configuration of large "chains" created using the [LangChain Expression Language (LCEL)](/docs/concepts/lcel)
+and is leveraged by [LangServe](/docs/concepts/architecture#langserve) for deployed Runnables.
+:::
+
+Sometimes you may want to experiment with, or even expose to the end user, multiple different ways of doing things with your Runnable. This could involve adjusting parameters like the temperature in a chat model or even switching between different chat models.
+
+To simplify this process, the Runnable interface provides two methods for creating configurable Runnables at runtime:
+
+* `configurable_fields`: This method allows you to configure specific **attributes** in a Runnable. For example, the `temperature` attribute of a chat model.
+* `configurable_alternatives`: This method enables you to specify **alternative** Runnables that can be run during run time. For example, you could specify a list of different chat models that can be used.
+
+See the [How to configure runtime chain internals](/docs/how_to/configure) guide for more information on how to configure runtime chain internals.
--- a/docs/docs/concepts/streaming.mdx
+++ b/docs/docs/concepts/streaming.mdx
@@ -0,0 +1,191 @@
+# Streaming
+
+:::info Prerequisites
+* [Runnable Interface](/docs/concepts/runnables)
+* [Chat Models](/docs/concepts/chat_models)
+:::
+
+**Streaming** is crucial for enhancing the responsiveness of applications built on [LLMs](/docs/concepts/chat_models). By displaying output progressively, even before a complete response is ready, streaming significantly improves user experience (UX), particularly when dealing with the latency of LLMs.
+
+## Overview
+
+Generating full responses from [LLMs](/docs/concepts/chat_models) often incurs a delay of several seconds, which becomes more noticeable in complex applications with multiple model calls. Fortunately, LLMs generate responses iteratively, allowing for intermediate results to be displayed as they are produced. By streaming these intermediate outputs, LangChain enables smoother UX in LLM-powered apps and offers built-in support for streaming at the core of its design.
+
+In this guide, we'll discuss streaming in LLM applications and explore how LangChain's streaming APIs facilitate real-time output from various components in your application.
+
+## What to stream in LLM applications
+
+In applications involving LLMs, several types of data can be streamed to improve user experience by reducing perceived latency and increasing transparency. These include:
+
+### 1. Streaming LLM outputs
+
+The most common and critical data to stream is the output generated by the LLM itself. LLMs often take time to generate full responses, and by streaming the output in real-time, users can see partial results as they are produced. This provides immediate feedback and helps reduce the wait time for users.
+
+### 2. Streaming pipeline or workflow progress
+
+Beyond just streaming LLM output, it’s useful to stream progress through more complex workflows or pipelines, giving users a sense of how the application is progressing overall. This could include:
+
+- **In LangGraph Workflows:**
+With [LangGraph](/docs/concepts/architecture#langgraph), workflows are composed of nodes and edges that represent various steps. Streaming here involves tracking changes to the **graph state** as individual **nodes** request updates. This allows for more granular monitoring of which node in the workflow is currently active, giving real-time updates about the status of the workflow as it progresses through different stages.
+
+- **In LCEL Pipelines:**
+Streaming updates from an [LCEL](/docs/concepts/lcel) pipeline involves capturing progress from individual **sub-runnables**. For example, as different steps or components of the pipeline execute, you can stream which sub-runnable is currently running, providing real-time insight into the overall pipeline's progress.
+
+Streaming pipeline or workflow progress is essential in providing users with a clear picture of where the application is in the execution process.
+
+### 3. Streaming custom data
+
+In some cases, you may need to stream **custom data** that goes beyond the information provided by the pipeline or workflow structure. This custom information is injected within a specific step in the workflow, whether that step is a tool or a LangGraph node. For example, you could stream updates about what a tool is doing in real-time or the progress through a LangGraph node. This granular data, which is emitted directly from within the step, provides more detailed insights into the execution of the workflow and is especially useful in complex processes where more visibility is needed.
+
+## Streaming APIs
+
+LangChain two main APIs for streaming output in real-time. These APIs are supported by any component that implements the [Runnable Interface](/docs/concepts/runnables), including [LLMs](/docs/concepts/chat_models), [compiled LangGraph graphs](https://langchain-ai.github.io/langgraph/concepts/low_level/), and any Runnable generated with [LCEL](/docs/concepts/lcel).
+
+1. sync [stream](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.stream) and async [astream](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream): Use to stream outputs from individual Runnables (e.g., a chat model) as they are generated or stream any workflow created with LangGraph.
+2. The async only [astream_events](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream_events): Use this API to get access to custom events and intermediate outputs from LLM  applications built entirely with [LCEL](/docs/concepts/lcel). Note that this API is available, but not needed when working with LangGraph.
+
+:::note
+In addition, there is a **legacy** async [astream_log](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream_log) API. This API is not recommended for new projects it is more complex and less feature-rich than the other streaming APIs.
+:::
+
+### `stream()` and `astream()`
+
+The `stream()` method returns an iterator that yields chunks of output synchronously as they are produced. You can use a `for` loop to process each chunk in real-time. For example, when using an LLM, this allows the output to be streamed incrementally as it is generated, reducing the wait time for users.
+
+The type of chunk yielded by the `stream()` and `astream()` methods depends on the component being streamed. For example, when streaming from an [LLM](/docs/concepts/chat_models) each component will be an [AIMessageChunk](/docs/concepts/messages#aimessagechunk); however, for other components, the chunk may be different. 
+
+The `stream()` method returns an iterator that yields these chunks as they are produced. For example,
+
+```python
+for chunk in component.stream(some_input):
+    # IMPORTANT: Keep the processing of each chunk as efficient as possible.
+    # While you're processing the current chunk, the upstream component is
+    # waiting to produce the next one. For example, if working with LangGraph,
+    # graph execution is paused while the current chunk is being processed.
+    # In extreme cases, this could even result in timeouts (e.g., when llm outputs are
+    # streamed from an API that has a timeout).
+    print(chunk)
+```
+
+The [asynchronous version](/docs/concepts/async), `astream()`, works similarly but is designed for non-blocking workflows. You can use it in asynchronous code to achieve the same real-time streaming behavior.
+
+#### Usage with chat models
+
+When using `stream()` or `astream()` with chat models, the output is streamed as [AIMessageChunks](/docs/concepts/messages#aimessagechunk) as it is generated by the LLM. This allows you to present or process the LLM's output incrementally as it's being produced, which is particularly useful in interactive applications or interfaces.
+
+#### Usage with LangGraph
+
+[LangGraph](/docs/concepts/architecture#langgraph) compiled graphs are [Runnables](/docs/concepts/runnables) and support the standard streaming APIs.
+
+When using the *stream* and *astream* methods with LangGraph, you can **one or more** [streaming mode](https://langchain-ai.github.io/langgraph/reference/types/#langgraph.types.StreamMode) which allow you to control the type of output that is streamed. The available streaming modes are:
+
+- **"values"**: Emit all values of the [state](https://langchain-ai.github.io/langgraph/concepts/low_level/) for each step.
+- **"updates"**: Emit only the node name(s) and updates that were returned by the node(s) after each step.
+- **"debug"**: Emit debug events for each step.
+- **"messages"**: Emit LLM [messages](/docs/concepts/messages) [token-by-token](/docs/concepts/tokens).
+- **"custom"**: Emit custom output witten using [LangGraph's StreamWriter](https://langchain-ai.github.io/langgraph/reference/types/#langgraph.types.StreamWriter).
+
+For more information, please see:
+* [LangGraph streaming conceptual guide](https://langchain-ai.github.io/langgraph/concepts/streaming/) for more information on how to stream when working with LangGraph.
+* [LangGraph streaming how-to guides](https://langchain-ai.github.io/langgraph/how-tos/#streaming) for specific examples of streaming in LangGraph.
+
+#### Usage with LCEL
+
+If you compose multiple Runnables using [LangChain’s Expression Language (LCEL)](/docs/concepts/lcel), the `stream()` and `astream()` methods will, by convention, stream the output of the last step in the chain. This allows the final processed result to be streamed incrementally. **LCEL** tries to optimize streaming latency in pipelines such that the streaming results from the last step are available as soon as possible.
+
+
+
+### `astream_events`
+<span data-heading-keywords="astream_events,stream_events,stream events"></span>
+
+:::tip
+Use the `astream_events` API to access custom data and intermediate outputs from LLM applications built entirely with [LCEL](/docs/concepts/lcel). 
+
+While this API is available for use with [LangGraph](/docs/concepts/architecture#langgraph) as well, it is usually not necessary when working with LangGraph, as the `stream` and `astream` methods provide comprehensive streaming capabilities for LangGraph graphs.
+:::
+
+For chains constructed using **LCEL**, the `.stream()` method only streams the output of the final step from te chain. This might be sufficient for some applications, but as you build more complex chains of several LLM calls together, you may want to use the intermediate values of the chain alongside the final output. For example, you may want to return sources alongside the final generation when building a chat-over-documents app.
+
+There are ways to do this [using callbacks](/docs/concepts/#callbacks-1), or by constructing your chain in such a way that it passes intermediate
+values to the end with something like chained [`.assign()`](/docs/how_to/passthrough/) calls, but LangChain also includes an
+`.astream_events()` method that combines the flexibility of callbacks with the ergonomics of `.stream()`. When called, it returns an iterator
+which yields [various types of events](/docs/how_to/streaming/#event-reference) that you can filter and process according
+to the needs of your project.
+
+Here's one small example that prints just events containing streamed chat model output:
+
+```python
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_anthropic import ChatAnthropic
+
+model = ChatAnthropic(model="claude-3-sonnet-20240229")
+
+prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
+parser = StrOutputParser()
+chain = prompt | model | parser
+
+async for event in chain.astream_events({"topic": "parrot"}, version="v2"):
+    kind = event["event"]
+    if kind == "on_chat_model_stream":
+        print(event, end="|", flush=True)
+```
+
+You can roughly think of it as an iterator over callback events (though the format differs) - and you can use it on almost all LangChain components!
+
+See [this guide](/docs/how_to/streaming/#using-stream-events) for more detailed information on how to use `.astream_events()`, including a table listing available events.
+
+## Writing custom data to the stream
+
+To write custom data to the stream, you will need to choose one of the following methods based on the component you are working with:
+
+1. LangGraph's [StreamWriter](https://langchain-ai.github.io/langgraph/reference/types/#langgraph.types.StreamWriter) can be used to write custom data that will surface through **stream** and **astream** APIs when working with LangGraph. **Important** this is a LangGraph feature, so it is not available when working with pure LCEL. See [how to streaming custom data](https://langchain-ai.github.io/langgraph/how-tos/streaming-content/) for more information.
+2. [dispatch_events](https://python.langchain.com/api_reference/core/callbacks/langchain_core.callbacks.manager.dispatch_custom_event.html#) / [adispatch_events](https://python.langchain.com/api_reference/core/callbacks/langchain_core.callbacks.manager.adispatch_custom_event.html) can be used to write custom data that will be surfaced through the **astream_events** API. See [how to dispatch custom callback events](https://python.langchain.com/docs/how_to/callbacks_custom_events/#astream-events-api) for more information.
+
+## "Auto-Streaming" Chat Models
+
+LangChain simplifies streaming from [chat models](/docs/concepts/chat_models) by automatically enabling streaming mode in certain cases, even when you’re not explicitly calling the streaming methods. This is particularly useful when you use the non-streaming `invoke` method but still want to stream the entire application, including intermediate results from the chat model.
+
+### How It Works
+
+When you call the `invoke` (or `ainvoke`) method on a chat model, LangChain will automatically switch to streaming mode if it detects that you are trying to stream the overall application. 
+
+Under the hood, it'll have `invoke` (or `ainvoke`) use the `stream` (or `astream`) method to generate its output. The result of the invocation will be the same as far as the code that was using `invoke` is concerned; however, while the chat model is being streamed, LangChain will take care of invoking `on_llm_new_token` events in LangChain's [callback system](/docs/concepts/callbacks). These callback events
+allow LangGraph `stream`/`astream` and `astream_events` to surface the chat model's output in real-time.
+
+Example:
+
+```python
+def node(state):
+    ...
+    # The code below uses the invoke method, but LangChain will 
+    # automatically switch to streaming mode
+    # when it detects that the overall 
+    # application is being streamed.
+    ai_message = model.invoke(state["messages"])
+    ...
+
+for chunk in compiled_graph.stream(..., mode="messages"): 
+    ...
+```
+## Async Programming
+
+LangChain offers both synchronous (sync) and asynchronous (async) versions of many of its methods. The async methods are typically prefixed with an "a" (e.g., `ainvoke`, `astream`). When writing async code, it's crucial to consistently use these asynchronous methods to ensure non-blocking behavior and optimal performance.
+
+If streaming data fails to appear in real-time, please ensure that you are using the correct async methods for your workflow.
+
+Please review the [async programming in LangChain guide](/docs/concepts/async) for more information on writing async code with LangChain.
+
+## Related Resources
+
+Please see the following how-to guides for specific examples of streaming in LangChain:
+* [LangGraph conceptual guide on streaming](https://langchain-ai.github.io/langgraph/concepts/streaming/)
+* [LangGraph streaming how-to guides](https://langchain-ai.github.io/langgraph/how-tos/#streaming)
+* [How to stream runnables](/docs/how_to/streaming/): This how-to guide goes over common streaming patterns with LangChain components (e.g., chat models) and with [LCEL](/docs/concepts/lcel).
+* [How to stream chat models](/docs/how_to/chat_streaming/)
+* [How to stream tool calls](/docs/how_to/tool_streaming/)
+
+For writing custom data to the stream, please see the following resources:
+
+* If using LangGraph, see [how to stream custom data](https://langchain-ai.github.io/langgraph/how-tos/streaming-content/).
+* If using LCEL, see [how to dispatch custom callback events](https://python.langchain.com/docs/how_to/callbacks_custom_events/#astream-events-api).
--- a/docs/docs/concepts/structured_outputs.mdx
+++ b/docs/docs/concepts/structured_outputs.mdx
@@ -0,0 +1,148 @@
+# Structured outputs
+
+## Overview 
+
+For many applications, such as chatbots, models need to respond to users directly in natural language. 
+However, there are scenarios where we need models to output in a *structured format*. 
+For example, we might want to store the model output in a database and ensure that the output conforms to the database schema.
+This need motivates the concept of structured output, where models can be instructed to respond with a particular output structure.
+
+![Structured output](/img/structured_output.png)
+
+## Key concepts 
+
+**(1) Schema definition:** The output structure is represented as a schema, which can be defined in several ways. 
+**(2) Returning structured output:** The model is given this schema, and is instructed to return output that conforms to it.
+
+## Recommended usage
+
+This pseudo-code illustrates the recommended workflow when using structured output.
+LangChain provides a method, [`with_structured_output()`](/docs/how_to/structured_output/#the-with_structured_output-method), that automates the process of binding the schema to the [model](/docs/concepts/chat_models/) and parsing the output.
+This helper function is available for all model providers that support structured output. 
+
+```python
+# Define schema
+schema = {"foo": "bar"}
+# Bind schema to model
+model_with_structure = model.with_structured_output(schema)
+# Invoke the model to produce structured output that matches the schema
+structured_output = model_with_structure.invoke(user_input)
+```
+
+## Schema definition
+
+The central concept is that the output structure of model responses needs to be represented in some way. 
+While types of objects you can use depend on the model you're working with, there are common types of objects that are typically allowed or recommended for structured output in Python.
+
+The simplest and most common format for structured output is a JSON-like structure, which in Python can be represented as a dictionary (dict) or list (list).
+JSON objects (or dicts in Python) are often used directly when the tool requires raw, flexible, and minimal-overhead structured data.
+
+```json
+{
+  "answer": "The answer to the user's question",
+  "followup_question": "A followup question the user could ask"
+}
+```
+
+As a second example, [Pydantic](https://docs.pydantic.dev/latest/) is particularly useful for defining structured output schemas because it offers type hints and validation.
+Here's an example of a Pydantic schema: 
+
+```python
+from pydantic import BaseModel, Field
+class ResponseFormatter(BaseModel):
+    """Always use this tool to structure your response to the user."""
+    answer: str = Field(description="The answer to the user's question")
+    followup_question: str = Field(description="A followup question the user could ask")
+
+```
+
+## Returning structured output
+
+With a schema defined, we need a way to instruct the model to use it.
+While one approach is to include this schema in the prompt and *ask nicely* for the model to use it, this is not recommended. 
+Several more powerful methods that utilizes native features in the model provider's API are available.
+
+### Using tool calling
+
+Many [model providers support](/docs/integrations/chat/) tool calling, a concept discussed in more detail in our [tool calling guide](/docs/concepts/tool_calling/).
+In short, tool calling involves binding a tool to a model and, when appropriate, the model can *decide* to call this tool and ensure its response conforms to the tool's schema.
+With this in mind, the central concept is strightforward: *simply bind our schema to a model as a tool!*
+Here is an example using the `ResponseFormatter` schema defined above:
+
+```python
+from langchain_openai import ChatOpenAI
+model = ChatOpenAI(model="gpt-4o", temperature=0)
+# Bind responseformatter schema as a tool to the model
+model_with_tools = model.bind_tools([ResponseFormatter])
+# Invoke the model
+ai_msg = model_with_tools.invoke("What is the powerhouse of the cell?")
+```
+
+The arguments of the tool call are already extracted as a dictionary. 
+This dictionary can be optionally parsed into a Pydantic object, matching our original `ResponseFormatter` schema.
+
+```python
+# Get the tool call arguments
+ai_msg.tool_calls[0]["args"]
+{'answer': "The powerhouse of the cell is the mitochondrion. Mitochondria are organelles that generate most of the cell's supply of adenosine triphosphate (ATP), which is used as a source of chemical energy.",
+ 'followup_question': 'What is the function of ATP in the cell?'}
+# Parse the dictionary into a pydantic object
+pydantic_object = ResponseFormatter.model_validate(ai_msg.tool_calls[0]["args"])
+```
+
+### JSON mode
+
+In addition to tool calling, some model providers support a feature called `JSON mode`. 
+This supports JSON schema definition as input and enforces the model to produce a conforming JSON output.
+You can find a table of model providers that support JSON mode [here](/docs/integrations/chat/).
+Here is an example of how to use JSON mode with OpenAI:
+
+```python
+from langchain_openai import ChatOpenAI
+model = ChatOpenAI(model="gpt-4o", model_kwargs={ "response_format": { "type": "json_object" } })
+ai_msg = model.invoke("Return a JSON object with key 'random_ints' and a value of 10 random ints in [0-99]")
+ai_msg.content
+'\n{\n  "random_ints": [23, 47, 89, 15, 34, 76, 58, 3, 62, 91]\n}'
+```
+
+One important point to flag: the model *still* returns a string, which needs to be parsed into a JSON object.
+This can, of course, simply use the `json` library or a JSON output parser if you need more adavanced functionality.
+See this [how-to guide on the JSON output parser](/docs/how_to/output_parser_json) for more details.
+
+```python
+import json
+json_object = json.loads(ai_msg.content)
+{'random_ints': [23, 47, 89, 15, 34, 76, 58, 3, 62, 91]}
+```
+
+## Structured output method 
+
+There a few challenges when producing structured output with the above methods: 
+
+(1) If using tool calling, tool call arguments needs to be parsed from a dictionary back to the original schema.  
+
+(2) In addition, the model needs to be instructed to *always* use the tool when we want to enforce structured output, which is a provider specific setting. 
+
+(3) If using JSON mode, the output needs to be parsed into a JSON object. 
+
+With these challenges in mind, LangChain provides a helper function (`with_structured_output()`) to streamline the process.
+
+![Diagram of with structured output](/img/with_structured_output.png)
+
+This both binds the schema to the model as a tool and parses the output to the specified output schema. 
+
+```python
+# Bind the schema to the model
+model_with_structure = model.with_structured_output(ResponseFormatter)
+# Invoke the model
+structured_output = model_with_structure.invoke("What is the powerhouse of the cell?")
+# Get back the pydantic object
+structured_output
+ResponseFormatter(answer="The powerhouse of the cell is the mitochondrion. Mitochondria are organelles that generate most of the cell's supply of adenosine triphosphate (ATP), which is used as a source of chemical energy.", followup_question='What is the function of ATP in the cell?')
+```
+
+:::info[Further reading]
+
+For more details on usage, see our [how-to guide](/docs/how_to/structured_output/#the-with_structured_output-method).
+
+:::
--- a/docs/docs/concepts/text_splitters.mdx
+++ b/docs/docs/concepts/text_splitters.mdx
@@ -0,0 +1,135 @@
+# Text splitters
+<span data-heading-keywords="text splitter,text splitting"></span>
+
+:::info[Prerequisites]
+
+* [Documents](/docs/concepts/retrievers/#interface)
+* Tokenization(/docs/concepts/tokens)
+:::
+
+## Overview
+
+Document splitting is often a crucial preprocessing step for many applications.
+It involves breaking down large texts into smaller, manageable chunks.
+This process offers several benefits, such as ensuring consistent processing of varying document lengths, overcoming input size limitations of models, and improving the quality of text representations used in retrieval systems.
+There are several strategies for splitting documents, each with its own advantages.
+
+## Key concepts
+
+![Conceptual Overview](/img/text_splitters.png)
+
+Text splitters split documents into smaller chunks for use in downstream applications.
+
+## Why split documents?
+
+There are several reasons to split documents:
+
+- **Handling non-uniform document lengths**: Real-world document collections often contain texts of varying sizes. Splitting ensures consistent processing across all documents.
+- **Overcoming model limitations**: Many embedding models and language models have maximum input size constraints. Splitting allows us to process documents that would otherwise exceed these limits.
+- **Improving representation quality**: For longer documents, the quality of embeddings or other representations may degrade as they try to capture too much information. Splitting can lead to more focused and accurate representations of each section.
+- **Enhancing retrieval precision**: In information retrieval systems, splitting can improve the granularity of search results, allowing for more precise matching of queries to relevant document sections.
+- **Optimizing computational resources**: Working with smaller chunks of text can be more memory-efficient and allow for better parallelization of processing tasks.
+
+Now, the next question is *how* to split the documents into chunks! There are several strategies, each with its own advantages.
+
+:::info[Further reading]
+* See Greg Kamradt's [chunkviz](https://chunkviz.up.railway.app/) to visualize different splitting strategies discussed below.
+:::
+
+## Approaches
+
+### Length-based
+
+The most intuitive strategy is to split documents based on their length. This simple yet effective approach ensures that each chunk doesn't exceed a specified size limit.
+Key benefits of length-based splitting:
+- Straightforward implementation
+- Consistent chunk sizes
+- Easily adaptable to different model requirements
+
+Types of length-based splitting:
+- **Token-based**: Splits text based on the number of tokens, which is useful when working with language models.
+- **Character-based**: Splits text based on the number of characters, which can be more consistent across different types of text.
+
+Example implementation using LangChain's `CharacterTextSplitter` with token-based splitting:
+
+```python
+from langchain_text_splitters import CharacterTextSplitter
+text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
+    encoding_name="cl100k_base", chunk_size=100, chunk_overlap=0
+)
+texts = text_splitter.split_text(document)
+```
+
+:::info[Further reading]
+
+* See the how-to guide for [token-based](/docs/how_to/split_by_token/) splitting.
+* See the how-to guide for [character-based](/docs/how_to/character_text_splitter/) splitting.
+
+:::
+
+### Text-structured based
+
+Text is naturally organized into hierarchical units such as paragraphs, sentences, and words. 
+We can leverage this inherent structure to inform our splitting strategy, creating split that maintain natural language flow, maintain semantic coherence within split, and adapts to varying levels of text granularity.
+LangChain's [`RecursiveCharacterTextSplitter`](/docs/how_to/recursive_text_splitter/) implements this concept:
+- The `RecursiveCharacterTextSplitter` attempts to keep larger units (e.g., paragraphs) intact.
+- If a unit exceeds the chunk size, it moves to the next level (e.g., sentences).
+- This process continues down to the word level if necessary.
+
+Here is example usage:
+
+```python
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0)
+texts = text_splitter.split_text(document)
+```
+
+:::info[Further reading]
+
+* See the how-to guide for [recursive text splitting](/docs/how_to/recursive_text_splitter/).
+
+:::
+
+### Document-structured based
+
+Some documents have an inherent structure, such as HTML, Markdown, or JSON files. 
+In these cases, it's beneficial to split the document based on its structure, as it often naturally groups semantically related text.
+Key benefits of structure-based splitting:
+- Preserves the logical organization of the document
+- Maintains context within each chunk
+- Can be more effective for downstream tasks like retrieval or summarization
+
+Examples of structure-based splitting:
+- **Markdown**: Split based on headers (e.g., #, ##, ###)
+- **HTML**: Split using tags
+- **JSON**: Split by object or array elements
+- **Code**: Split by functions, classes, or logical blocks
+
+:::info[Further reading]
+
+* See the how-to guide for [Markdown splitting](/docs/how_to/markdown_header_metadata_splitter/).
+* See the how-to guide for [Recursive JSON splitting](/docs/how_to/recursive_json_splitter/).
+* See the how-to guide for [Code splitting](/docs/how_to/code_splitter/).
+* See the how-to guide for [HTML splitting](/docs/how_to/HTML_header_metadata_splitter/).
+
+:::
+
+### Semantic meaning based
+
+Unlike the previous methods, semantic-based splitting actually considers the *content* of the text. 
+While other approaches use document or text structure as proxies for semantic meaning, this method directly analyzes the text's semantics.
+There are several ways to implement this, but conceptually the approach is split text when there are significant changes in text *meaning*.
+As an example, we can use a sliding window approach to generate embeddings, and compare the embeddings to find significant differences:
+
+- Start with the first few sentences and generate an embedding.
+- Move to the next group of sentences and generate another embedding (e.g., using a sliding window approach).
+- Compare the embeddings to find significant differences, which indicate potential "break points" between semantic sections.
+
+This technique helps create chunks that are more semantically coherent, potentially improving the quality of downstream tasks like retrieval or summarization.
+
+:::info[Further reading]
+
+* See the how-to guide for [splitting text based on semantic meaning](/docs/how_to/semantic-chunker/).
+* See Greg Kamradt's [notebook](https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb) showcasing semantic splitting.
+
+:::
--- a/docs/docs/concepts/tokens.mdx
+++ b/docs/docs/concepts/tokens.mdx
@@ -0,0 +1,58 @@
+# Tokens
+
+Modern large language models (LLMs) are typically based on a transformer architecture that processes a sequence of units known as tokens. Tokens are the fundamental elements that models use to break down input and generate output. In this section, we'll discuss what tokens are and how they are used by language models.
+
+## What is a token?
+
+A **token** is the basic unit that a language model reads, processes, and generates. These units can vary based on how the model provider defines them, but in general, they could represent:
+
+* A whole word (e.g., "apple"),
+* A part of a word (e.g., "app"),
+* Or other linguistic components such as punctuation or spaces.
+
+The way the model tokenizes the input depends on its **tokenizer algorithm**, which converts the input into tokens. Similarly, the model’s output comes as a stream of tokens, which is then decoded back into human-readable text.
+
+## How tokens work in language models
+
+The reason language models use tokens is tied to how they understand and predict language. Rather than processing characters or entire sentences directly, language models focus on **tokens**, which represent meaningful linguistic units. Here's how the process works:
+
+1. **Input Tokenization**: When you provide a model with a prompt (e.g., "LangChain is cool!"), the tokenizer algorithm splits the text into tokens. For example, the sentence could be tokenized into parts like `["Lang", "Chain", " is", " cool", "!"]`. Note that token boundaries don’t always align with word boundaries.
+    ![](/img/tokenization.png)
+
+2. **Processing**: The transformer architecture behind these models processes tokens sequentially to predict the next token in a sentence. It does this by analyzing the relationships between tokens, capturing context and meaning from the input.
+3. **Output Generation**: The model generates new tokens one by one. These output tokens are then decoded back into human-readable text.
+
+Using tokens instead of raw characters allows the model to focus on linguistically meaningful units, which helps it capture grammar, structure, and context more effectively.
+
+## Tokens don’t have to be text
+
+Although tokens are most commonly used to represent text, they don’t have to be limited to textual data. Tokens can also serve as abstract representations of **multi-modal data**, such as:
+
+- **Images**,
+- **Audio**,
+- **Video**,
+- And other types of data.
+
+At the time of writing, virtually no models support **multi-modal output**, and only a few models can handle **multi-modal inputs** (e.g., text combined with images or audio). However, as advancements in AI continue, we expect **multi-modality** to become much more common. This would allow models to process and generate a broader range of media, significantly expanding the scope of what tokens can represent and how models can interact with diverse types of data.
+
+:::note
+In principle, **anything that can be represented as a sequence of tokens** could be modeled in a similar way. For example, **DNA sequences**—which are composed of a series of nucleotides (A, T, C, G)—can be tokenized and modeled to capture patterns, make predictions, or generate sequences. This flexibility allows transformer-based models to handle diverse types of sequential data, further broadening their potential applications across various domains, including bioinformatics, signal processing, and other fields that involve structured or unstructured sequences.
+:::
+
+Please see the [multimodality](/docs/concepts/multimodality) section for more information on multi-modal inputs and outputs.
+
+## Why not use characters?
+
+Using tokens instead of individual characters makes models both more efficient and better at understanding context and grammar. Tokens represent meaningful units, like whole words or parts of words, allowing models to capture language structure more effectively than by processing raw characters. Token-level processing also reduces the number of units the model has to handle, leading to faster computation.
+
+In contrast, character-level processing would require handling a much larger sequence of input, making it harder for the model to learn relationships and context. Tokens enable models to focus on linguistic meaning, making them more accurate and efficient in generating responses.
+
+## How tokens correspond to text
+
+Please see this post from [OpenAI](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them) for more details on how tokens are counted and how they correspond to text.
+
+According to the OpenAI post, the approximate token counts for English text are as follows:
+
+* 1 token ~= 4 chars in English
+* 1 token ~= ¾ words
+* 100 tokens ~= 75 words
--- a/docs/docs/concepts/tool_calling.mdx
+++ b/docs/docs/concepts/tool_calling.mdx
@@ -0,0 +1,149 @@
+# Tool calling
+
+:::info[Prerequisites]
+* [Tools](/docs/concepts/tools)
+* [Chat Models](/docs/concepts/chat_models)
+:::
+
+
+## Overview 
+
+Many AI applications interact directly with humans. In these cases, it is appropriate for models to respond in natural language.
+But what about cases where we want a model to also interact *directly* with systems, such as databases or an API?
+These systems often have a particular input schema; for example, APIs frequently have a required payload structure.
+This need motivates the concept of *tool calling*. You can use [tool calling](https://platform.openai.com/docs/guides/function-calling/example-use-cases) to request model responses that match a particular schema.
+
+:::info
+You will sometimes hear the term `function calling`. We use this term interchangeably with `tool calling`. 
+:::
+
+![Conceptual overview of tool calling](/img/tool_calling_concept.png)
+
+## Key concepts 
+
+**(1) Tool Creation:** Use the [@tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html) decorator to create a [tool](/docs/concepts/tools). A tool is an association between a function and its schema.
+**(2) Tool Binding:** The tool needs to be connected to a model that supports tool calling. This gives the model awareness of the tool and the associated input schema required by the tool.
+**(3) Tool Calling:** When appropriate, the model can decide to call a tool and ensure its response conforms to the tool's input schema.
+**(4) Tool Execution:** The tool can be executed using the arguments provided by the model.
+
+![Conceptual parts of tool calling](/img/tool_calling_components.png)
+
+## Recommended usage
+
+This pseudo-code illustrates the recommended workflow for using tool calling. 
+Created tools are passed to `.bind_tools()` method as a list.
+This model can be called, as usual. If a tool call is made, model's response will contain the tool call arguments.
+The tool call arguments can be passed directly to the tool.
+
+```python
+# Tool creation
+tools = [my_tool]
+# Tool binding
+model_with_tools = model.bind_tools(tools)
+# Tool calling 
+response = model_with_tools.invoke(user_input)
+```
+
+## Tool creation
+
+The recommended way to create a tool is using the `@tool` decorator.
+
+```python
+from langchain_core.tools import tool
+
+@tool
+def multiply(a: int, b: int) -> int:
+    """Multiply a and b."""
+    return a * b
+```
+
+:::info[Further reading]
+
+* See our conceptual guide on [tools](/docs/concepts/tools/) for more details.
+* See our [model integrations](/docs/integrations/chat/) that support tool calling.
+* See our [how-to guide](/docs/how_to/tool_calling/) on tool calling.
+
+:::
+
+## Tool binding 
+
+[Many](https://platform.openai.com/docs/guides/function-calling) [model providers](https://platform.openai.com/docs/guides/function-calling) support tool calling. 
+
+:::tip
+See our [model integration page](/docs/integrations/chat/) for a list of providers that support tool calling.
+:::
+
+The central concept to understand is that LangChain provides a standardized interface for connecting tools to models. 
+The `.bind_tools()` method can be used to specify which tools are available for a model to call. 
+
+```python
+model_with_tools = model.bind_tools([tools_list])
+```
+
+As a specific example, let's take a function `multiply` and bind it as a tool to a model that supports tool calling.
+
+```python
+def multiply(a: int, b: int) -> int:
+    """Multiply a and b.
+
+    Args:
+        a: first int
+        b: second int
+    """
+    return a * b
+
+llm_with_tools = tool_calling_model.bind_tools([multiply])
+```
+
+## Tool calling
+
+![Diagram of a tool call by a model](/img/tool_call_example.png)
+
+A key principle of tool calling is that the model decides when to use a tool based on the input's relevance. The model doesn't always need to call a tool.
+For example, given an unrelated input, the model would not call the tool:
+
+```python
+result = llm_with_tools.invoke("Hello world!")
+```
+
+The result would be an `AIMessage` containing the model's response in natural language (e.g., "Hello!").
+However, if we pass an input *relevant to the tool*, the model should choose to call it:
+
+```python
+result = llm_with_tools.invoke("What is 2 multiplied by 3?")
+```
+
+As before, the output `result` will be an `AIMessage`. 
+But, if the tool was called, `result` will have a `tool_calls` attribute.
+This attribute includes everything needed to execute the tool, including the tool name and input arguments:
+
+```
+result.tool_calls
+{'name': 'multiply', 'args': {'a': 2, 'b': 3}, 'id': 'xxx', 'type': 'tool_call'}
+```
+
+For more details on usage, see our [how-to guides](/docs/how_to/#tools)!
+
+## Tool execution
+
+[Tools](/docs/concepts/tools/) implement the [Runnable](/docs/concepts/runnables/) interface, which means that they can be invoked (e.g., `tool.invoke(args)`) directly.
+
+[LangGraph](https://langchain-ai.github.io/langgraph/) offers pre-built components (e.g., [`ToolNode`](https://langchain-ai.github.io/langgraph/reference/prebuilt/#toolnode)) that will often invoke the tool in behalf of the user.
+
+:::info[Further reading]
+
+* See our [how-to guide](/docs/how_to/tool_calling/) on tool calling.
+* See the [LangGraph documentation on using ToolNode](https://langchain-ai.github.io/langgraph/how-tos/tool-calling/).
+
+:::
+
+## Best practices
+
+When designing [tools](/docs/concepts/tools/) to be used by a model, it is important to keep in mind that:
+
+* Models that have explicit [tool-calling APIs](/docs/concepts/#functiontool-calling) will be better at tool calling than non-fine-tuned models.
+* Models will perform better if the tools have well-chosen names and descriptions.
+* Simple, narrowly scoped tools are easier for models to use than complex tools.
+* Asking the model to select from a large list of tools poses challenges for the model.
+
+
--- a/docs/docs/concepts/tools.mdx
+++ b/docs/docs/concepts/tools.mdx
@@ -0,0 +1,211 @@
+# Tools
+
+:::info Prerequisites
+- [Chat models](/docs/concepts/chat_models/)
+:::
+
+## Overview
+
+The **tool** abstraction in LangChain associates a python **function** with a **schema** that defines the function's **name**, **description** and **input**. 
+
+**Tools** can be passed to [chat models](/docs/concepts/chat_models) that support [tool calling](/docs/concepts/tool_calling) allowing the model to request the execution of a specific function with specific inputs.
+
+## Key concepts
+
+- Tools are a way to encapsulate a function and its schema in a way that can be passed to a chat model.
+- Create tools using the [@tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html) decorator, which simplifies the process of tool creation, supporting the following:
+   - Automatically infer the tool's **name**, **description** and **inputs**, while also supporting customization.
+   - Defining tools that return **artifacts** (e.g. images, dataframes, etc.)
+   - Hiding input arguments from the schema (and hence from the model) using **injected tool arguments**.
+
+## Tool interface
+
+The tool interface is defined in the [BaseTool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html#langchain_core.tools.base.BaseTool) class which is a subclass of the [Runnable Interface](/docs/concepts/runnables).
+
+The key attributes that correspond to the tool's **schema**:
+
+- **name**: The name of the tool.
+- **description**: A description of what the tool does.
+- **args**: Property that returns the JSON schema for the tool's arguments.
+
+The key methods to execute the function associated with the **tool**:
+
+- **invoke**: Invokes the tool with the given arguments.
+- **ainvoke**: Invokes the tool with the given arguments, asynchronously. Used for [async programming with Langchain](/docs/concepts/async).
+
+## Create tools using the `@tool` decorator
+
+The recommended way to create tools is using the [@tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html) decorator. This decorator is designed to simplify the process of tool creation and should be used in most cases. After defining a function, you can decorate it with [@tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html) to create a tool that implements the [Tool Interface](#tool-interface).
+
+```python
+from langchain_core.tools import tool
+
+@tool
+def multiply(a: int, b: int) -> int:
+   """Multiply two numbers."""
+   return a * b
+```
+
+For more details on how to create tools, see the [how to create custom tools](/docs/how_to/custom_tools/) guide.
+
+:::note
+LangChain has a few other ways to create tools; e.g., by sub-classing the [BaseTool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html#langchain_core.tools.base.BaseTool) class or by using `StructuredTool`. These methods are shown in the [how to create custom tools guide](/docs/how_to/custom_tools/), but
+we generally recommend using the `@tool` decorator for most cases.
+:::
+
+## Use the tool directly
+
+Once you have defined a tool, you can use it directly by calling the function. For example, to use the `multiply` tool defined above:
+
+```python
+multiply.invoke({"a": 2, "b": 3})
+```
+
+### Inspect
+
+You can also inspect the tool's schema and other properties:
+
+```python
+print(multiply.name) # multiply
+print(multiply.description) # Multiply two numbers.
+print(multiply.args) 
+# {
+# 'type': 'object', 
+# 'properties': {'a': {'type': 'integer'}, 'b': {'type': 'integer'}}, 
+# 'required': ['a', 'b']
+# }
+```
+
+:::note
+If you're using pre-built LangChain or LangGraph components like [create_react_agent](https://langchain-ai.github.io/langgraph/reference/prebuilt/#langgraph.prebuilt.chat_agent_executor.create_react_agent),you might not need to interact with tools directly. However, understanding how to use them can be valuable for debugging and testing. Additionally, when building custom LangGraph workflows, you may find it necessary to work with tools directly.
+:::
+
+## Configuring the schema
+
+The `@tool` decorator offers additional options to configure the schema of the tool (e.g., modify name, description
+or parse the function's doc-string to infer the schema).
+
+Please see the [API reference for @tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html) for more details and review the [how to create custom tools](/docs/how_to/custom_tools/) guide for examples.
+
+## Tool artifacts
+
+**Tools** are utilities that can be called by a model, and whose outputs are designed to be fed back to a model. Sometimes, however, there are artifacts of a tool's execution that we want to make accessible to downstream components in our chain or agent, but that we don't want to expose to the model itself. For example if a tool returns a custom object, a dataframe or an image, we may want to pass some metadata about this output to the model without passing the actual output to the model. At the same time, we may want to be able to access this full output elsewhere, for example in downstream tools.
+
+```python
+@tool(response_format="content_and_artifact")
+def some_tool(...) -> Tuple[str, Any]:
+    """Tool that does something."""
+    ...
+    return 'Message for chat model', some_artifact 
+```
+
+See [how to return artifacts from tools](/docs/how_to/tool_artifacts/) for more details.
+
+## Special type annotations
+
+There are a number of special type annotations that can be used in the tool's function signature to configure the run time behavior of the tool.
+
+The following type annotations will end up **removing** the argument from the tool's schema. This can be useful for arguments that should not be exposed to the model and that the model should not be able to control.
+
+- **InjectedToolArg**: Value should be injected manually at runtime using `.invoke` or `.ainvoke`.
+- **RunnableConfig**: Pass in the RunnableConfig object to the tool.
+- **InjectedState**: Pass in the overall state of the LangGraph graph to the tool.
+- **InjectedStore**: Pass in the LangGraph store object to the tool.
+
+You can also use the `Annotated` type with a string literal to provide a **description** for the corresponding argument that **WILL** be exposed in the tool's schema.
+
+- **Annotated[..., "string literal"]** -- Adds a description to the argument that will be exposed in the tool's schema.
+
+### InjectedToolArg
+
+There are cases where certain arguments need to be passed to a tool at runtime but should not be generated by the model itself. For this, we use the `InjectedToolArg` annotation, which allows certain parameters to be hidden from the tool's schema.
+
+For example, if a tool requires a `user_id` to be injected dynamically at runtime, it can be structured in this way:
+
+```python
+from langchain_core.tools import tool, InjectedToolArg
+
+@tool
+def user_specific_tool(input_data: str, user_id: InjectedToolArg) -> str:
+    """Tool that processes input data."""
+    return f"User {user_id} processed {input_data}"
+```
+
+Annotating the `user_id` argument with `InjectedToolArg` tells LangChain that this argument should not be exposed as part of the
+tool's schema.
+
+See [how to pass run time values to tools](https://python.langchain.com/docs/how_to/tool_runtime/) for more details on how to use `InjectedToolArg`.  
+
+
+### RunnableConfig
+
+You can use the `RunnableConfig` object to pass custom run time values to tools.
+
+If you need to access the [RunnableConfig](/docs/concepts/runnables/#RunnableConfig) object from within a tool. This can be done by using the `RunnableConfig` annotation in the tool's function signature.
+
+```python
+from langchain_core.runnables import RunnableConfig
+
+@tool
+async def some_func(..., config: RunnableConfig) -> ...:
+    """Tool that does something."""
+    # do something with config
+    ...
+
+await some_func.ainvoke(..., config={"configurable": {"value": "some_value"}})
+```
+
+The `config` will not be part of the tool's schema and will be injected at runtime with appropriate values.
+
+:::note
+You may need to access the `config` object to manually propagate it to subclass. This happens if you're working with python 3.9 / 3.10 in an [async](/docs/concepts/async) environment and need to manually propagate the `config` object to sub-calls.
+
+Please read [Propagation RunnableConfig](/docs/concepts/runnables#propagation-RunnableConfig) for more details to learn how to propagate the `RunnableConfig` down the call chain manually (or upgrade to Python 3.11 where this is no longer an issue).
+:::
+
+### InjectedState
+
+Please see the [InjectedState](https://langchain-ai.github.io/langgraph/reference/prebuilt/#langgraph.prebuilt.tool_node.InjectedState) documentation for more details.
+
+### InjectedStore
+
+Please see the [InjectedStore](https://langchain-ai.github.io/langgraph/reference/prebuilt/#langgraph.prebuilt.tool_node.InjectedStore) documentation for more details.
+
+## Best practices
+
+When designing tools to be used by models, keep the following in mind:
+
+- Tools that are well-named, correctly-documented and properly type-hinted are easier for models to use.
+- Design simple and narrowly scoped tools, as they are easier for models to use correctly.
+- Use chat models that support [tool-calling](/docs/concepts/tool_calling) APIs to take advantage of tools.
+
+
+## Toolkits
+<span data-heading-keywords="toolkit,toolkits"></span>
+
+LangChain has a concept of **toolkits**. This a very thin abstraction that groups tools together that
+are designed to be used together for specific tasks.
+
+### Interface
+
+All Toolkits expose a `get_tools` method which returns a list of tools. You can therefore do:
+
+```python
+# Initialize a toolkit
+toolkit = ExampleTookit(...)
+
+# Get list of tools
+tools = toolkit.get_tools()
+```
+
+## Related resources
+
+See the following resources for more information:
+
+- [API Reference for @tool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html)
+- [How to create custom tools](https://python.langchain.com/docs/how_to/custom_tools/)
+- [How to pass run time values to tools](https://python.langchain.com/docs/how_to/tool_runtime/)
+- [All LangChain tool how-to guides](https://docs.langchain.com/docs/how_to/#tools)
+- [Additional how-to guides that show usage with LangGraph](https://langchain-ai.github.io/langgraph/how-tos/tool-calling/)
+- Tool integrations, see the [tool integration docs](https://docs.langchain.com/docs/integrations/tools/).
+
--- a/docs/docs/concepts/tracing.mdx
+++ b/docs/docs/concepts/tracing.mdx
@@ -0,0 +1,10 @@
+# Tracing
+
+<span data-heading-keywords="trace,tracing"></span>
+
+A trace is essentially a series of steps that your application takes to go from input to output.
+Traces contain individual steps called `runs`. These can be individual calls from a model, retriever,
+tool, or sub-chains.
+Tracing gives you observability inside your chains and agents, and is vital in diagnosing issues.
+
+For a deeper dive, check out [this LangSmith conceptual guide](https://docs.smith.langchain.com/concepts/tracing).
--- a/docs/docs/concepts/vectorstores.mdx
+++ b/docs/docs/concepts/vectorstores.mdx
@@ -0,0 +1,191 @@
+# Vector stores
+<span data-heading-keywords="vector,vectorstore,vectorstores,vector store,vector stores"></span>
+
+:::info[Prerequisites]
+
+* [Embeddings](/docs/concepts/embedding_models/)
+* [Text splitters](/docs/concepts/text_splitters/)
+
+:::
+:::info[Note]
+
+This conceptual overview focuses on text-based indexing and retrieval for simplicity. 
+However, embedding models can be [multi-modal](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings)
+and vector stores can be used to store and retrieve a variety of data types beyond text.
+:::
+
+## Overview
+
+Vector stores are specialized data stores that enable indexing and retrieving information based on vector representations.
+
+These vectors, called [embeddings](/docs/concepts/embedding_models/), capture the semantic meaning of data that has been embedded.
+
+Vector stores are frequently used to search over unstructured data, such as text, images, and audio, to retrieve relevant information based on semantic similarity rather than exact keyword matches.
+
+![Vectorstores](/img/vectorstores.png)
+
+## Integrations
+
+LangChain has a large number of vectorstore integrations, allowing users to easily switch between different vectorstore implementations.
+
+Please see the [full list of LangChain vectorstore integrations](/docs/integrations/vectorstores/).
+
+## Interface
+
+LangChain provides a standard interface for working with vector stores, allowing users to easily switch between different vectorstore implementations.
+
+The interface consists of basic methods for writing, deleting and searching for documents in the vector store.
+
+The key methods are:
+
+- `add_documents`: Add a list of texts to the vector store.
+- `delete_documents`: Delete a list of documents from the vector store.
+- `similarity_search`: Search for similar documents to a given query.
+
+
+## Initialization
+
+Most vectors in LangChain accept an embedding model as an argument when initializing the vector store.
+
+We will use LangChain's [InMemoryVectorStore](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.in_memory.InMemoryVectorStore.html) implementation to illustrate the API.
+
+```python
+from langchain_core.vectorstores import InMemoryVectorStore
+# Initialize with an embedding model
+vector_store = InMemoryVectorStore(embedding=SomeEmbeddingModel())
+```
+
+## Adding documents
+
+To add documents, use the `add_documents` method.
+
+This API works with a list of [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects.
+`Document` objects all have `page_content` and `metadata` attributes, making them a universal way to store unstructured text and associated metadata.
+
+```python
+from langchain_core.documents import Document
+
+document_1 = Document(
+    page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
+    metadata={"source": "tweet"},
+)
+
+document_2 = Document(
+    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
+    metadata={"source": "news"},
+)
+
+documents = [document_1, document_2]
+
+vector_store.add_documents(documents=documents)
+```
+
+You should usually provide IDs for the documents you add to the vector store, so
+that instead of adding the same document multiple times, you can update the existing document.
+
+```python
+vector_store.add_documents(documents=documents, ids=["doc1", "doc2"])
+```
+
+## Delete
+
+To delete documents, use the `delete_documents` method which takes a list of document IDs to delete.
+
+```python
+vector_store.delete_documents(ids=["doc1"])
+```
+
+## Search
+
+Vectorstores embed and store the documents that added.
+If we pass in a query, the vectorstore will embed the query, perform a similarity search over the embedded documents, and return the most similar ones.
+This captures two important concepts: first, there needs to be a way to measure the similarity between the query and *any* [embedded](/docs/concepts/embedding_models/) document.
+Second, there needs to be an algorithm to efficiently perform this similarity search across *all* embedded documents.
+
+### Similarity metrics
+
+A critical advantage of embeddings vectors is they can be compared using many simple mathematical operations:
+
+- **Cosine Similarity**: Measures the cosine of the angle between two vectors.
+- **Euclidean Distance**: Measures the straight-line distance between two points.
+- **Dot Product**: Measures the projection of one vector onto another.
+
+The choice of similarity metric can sometimes be selected when initializing the vectorstore. Please refer
+to the documentation of the specific vectorstore you are using to see what similarity metrics are supported.
+
+:::info[Further reading]
+
+* See [this documentation](https://developers.google.com/machine-learning/clustering/dnn-clustering/supervised-similarity) from Google on similarity metrics to consider with embeddings.
+* See Pinecone's [blog post](https://www.pinecone.io/learn/vector-similarity/) on similarity metrics.
+* See OpenAI's [FAQ](https://platform.openai.com/docs/guides/embeddings/faq) on what similarity metric to use with OpenAI embeddings.
+
+:::
+
+### Similarity search
+
+Given a similarity metric to measure the distance between the embedded query and any embedded document, we need an algorithm to efficiently search over *all* the embedded documents to find the most similar ones.
+There are various ways to do this. As an example, many vectorstores implement [HNSW (Hierarchical Navigable Small World)](https://www.pinecone.io/learn/series/faiss/hnsw/), a graph-based index structure that allows for efficient similarity search.
+Regardless of the search algorithm used under the hood, the LangChain vectorstore interface has a `similarity_search` method for all integrations. 
+This will take the search query, create an embedding, find similar documents, and return them as a list of [Documents](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html).
+
+```python
+query = "my query"
+docs = vectorstore.similarity_search(query)
+```
+
+Many vectorstores support search parameters to be passed with the `similarity_search` method. See the documentation for the specific vectorstore you are using to see what parameters are supported.
+As an example [Pinecone](https://python.langchain.com/api_reference/pinecone/vectorstores/langchain_pinecone.vectorstores.PineconeVectorStore.html#langchain_pinecone.vectorstores.PineconeVectorStore.similarity_search) several parameters that are important general concepts:
+Many vectorstores support [the `k`](/docs/integrations/vectorstores/pinecone/#query-directly), which controls the number of Documents to return, and `filter`, which allows for filtering documents by metadata.
+
+- `query (str) – Text to look up documents similar to.`
+- `k (int) – Number of Documents to return. Defaults to 4.`
+- `filter (dict | None) – Dictionary of argument(s) to filter on metadata`
+
+:::info[Further reading]
+
+* See the [how-to guide](/docs/how_to/vectorstores/) for more details on how to use the `similarity_search` method.
+* See the [integrations page](/docs/integrations/vectorstores/) for more details on arguments that can be passed in to the `similarity_search` method for specific vectorstores.
+
+:::
+
+### Metadata filtering
+
+While vectorstore implement a search algorithm to efficiently search over *all* the embedded documents to find the most similar ones, many also support filtering on metadata.
+This allows structured filters to reduce the size of the similarity search space. These two concepts work well together:
+
+1. **Semantic search**: Query the unstructured data directly, often using via embedding or keyword similarity.
+2. **Metadata search**: Apply structured query to the metadata, filering specific documents.
+
+Vectorstore support for metadata filtering is typically dependent on the underlying vector store implementation.
+
+Here is example usage with [Pinecone](/docs/integrations/vectorstores/pinecone/#query-directly), showing that we filter for all documents that have the metadata key `source` with value `tweet`.
+
+```python
+vectorstore.similarity_search(
+    "LangChain provides abstractions to make working with LLMs easy",
+    k=2,
+    filter={"source": "tweet"},
+)
+```  
+
+:::info[Further reading]
+
+* See Pinecone's [documentation](https://docs.pinecone.io/guides/data/filter-with-metadata) on filtering with metadata.
+* See the [list of LangChain vectorstore integrations](/docs/integrations/retrievers/self_query/) that support metadata filtering.
+
+:::
+
+## Advanced search and retrieval techniques
+
+While algorithms like HNSW provide the foundation for efficient similarity search in many cases, additional techniques can be employed to improve search quality and diversity.
+For example, [maximal marginal relevance](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/example_selectors/mmr/) is a re-ranking algorithm used to diversify search results, which is applied after the initial similarity search to ensure a more diverse set of results.
+As a second example, some [vector stores](/docs/integrations/retrievers/pinecone_hybrid_search/) offer built-in [hybrid-search](https://docs.pinecone.io/guides/data/understanding-hybrid-search) to combine keyword and semantic similarity search, which marries the benefits of both approaches. 
+At the moment, there is no unified way to perform hybrid search using LangChain vectorstores, but it is generally exposed as a keyword argument that is passed in with `similarity_search`.
+See this [how-to guide on hybrid search](/docs/how_to/hybrid/) for more details.
+
+| Name                                                                                                              | When to use                                           | Description                                                                                                                                  |
+|-------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
+| [Hybrid search](/docs/integrations/retrievers/pinecone_hybrid_search/)                                            | When combining keyword-based and semantic similarity. | Hybrid search combines keyword and semantic similarity, marrying the benefits of both approaches. [Paper](https://arxiv.org/abs/2210.11934). |
+| [Maximal Marginal Relevance (MMR)](/docs/integrations/vectorstores/pinecone/#maximal-marginal-relevance-searches) | When needing to diversify search results.             | MMR attempts to diversify the results of a search to avoid returning similar and redundant documents.                                        |
+
+ 
--- a/docs/docs/concepts/why_langchain.mdx
+++ b/docs/docs/concepts/why_langchain.mdx
@@ -0,0 +1,109 @@
+# Why langchain?
+
+The goal of `langchain` the Python package and LangChain the company is to make it as easy possible for developers to build applications that reason.
+While LangChain originally started as a single open source package, it has evolved into a company and a whole ecosystem.
+This page will talk about the LangChain ecosystem as a whole.
+Most of the components within in the LangChain ecosystem can be used by themselves - so if you feel particularly drawn to certain components but not others, that is totally fine! Pick and choose whichever components you like best.
+
+## Features
+
+There are several primary needs that LangChain aims to address:
+
+1. **Standardized component interfaces:** The growing number of [models](/docs/integrations/chat/) and [related components](/docs/integrations/vectorstores/) for AI applications has resulted in a wide variety of different APIs that developers need to learn and use.
+This diversity can make it challenging for developers to switch between providers or combine components when building applications.
+LangChain exposes a standard interface for key components, making it easy to switch between providers.
+
+2. **Orchestration:** As applications become more complex, combining multiple components and models, there's [a growing need to efficiently connect these elements into control flows](https://lilianweng.github.io/posts/2023-06-23-agent/) that can [accomplish diverse tasks](https://www.sequoiacap.com/article/generative-ais-act-o1/).
+[Orchestration](https://en.wikipedia.org/wiki/Orchestration_(computing)) is crucial for building such applications.
+
+3. **Observability and evaluation:** As applications become more complex, it becomes increasingly difficult to understand what is happening within them.
+Furthermore, the pace of development can become rate-limited by the [paradox of choice](https://en.wikipedia.org/wiki/Paradox_of_choice):
+for example, developers often wonder how to engineer their prompt or which LLM best balances accuracy, latency, and cost. 
+[Observability](https://en.wikipedia.org/wiki/Observability) and evaluations can help developers monitor their applications and rapidly answer these types of questions with confidence.
+
+
+## Standardized component interfaces
+
+LangChain provides common interfaces for components that are central to many AI applications.
+As an example, all [chat models](/docs/concepts/chat_models/) implement the [BaseChatModel](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html) interface.
+This provides a standard way to interact with chat models, supporting important but often provider-specific features like [tool calling](/docs/concepts/tool_calling/) and [structured outputs](/docs/concepts/structured_outputs/).
+
+
+### Example: chat models 
+
+Many [model providers](/docs/concepts/chat_models/) support [tool calling](/docs/concepts/tool_calling/), a critical features for many applications (e.g., [agents](https://langchain-ai.github.io/langgraph/concepts/agentic_concepts/)), that allows a developer to request model responses that match a particular schema.
+The APIs for each provider differ. 
+LangChain's [chat model](/docs/concepts/chat_models/) interface provides a common way to bind [tools](/docs/concepts/tools) to a model in order to support [tool calling](/docs/concepts/tool_calling/):
+
+```python
+# Tool creation
+tools = [my_tool]
+# Tool binding
+model_with_tools = model.bind_tools(tools)
+```
+
+Similarly, getting models to produce [structured outputs](/docs/concepts/structured_outputs/) is an extremely common use case. 
+Providers support different approaches for this, including [JSON mode or tool calling](https://platform.openai.com/docs/guides/structured-outputs), with different APIs.
+LangChain's [chat model](/docs/concepts/chat_models/) interface provides a common way to produce structured outputs using the `with_structured_output()` method:
+
+```python
+# Define schema
+schema = ...
+# Bind schema to model
+model_with_structure = model.with_structured_output(schema)
+```
+
+### Example: retrievers
+
+In the context of [RAG](/docs/concepts/rag/) and LLM application components, LangChain's [retriever](/docs/concepts/retrievers/) interface provides a standard way to connect to many different types of data services or databases (e.g., [vector stores](/docs/concepts/vectorstores) or databases).
+The underlying implementation of the retriever depends on the type of data store or database you are connecting to, but all retrievers implement the [runnable interface](/docs/concepts/runnables/), meaning they can be invoked in a common manner.
+
+```python
+documents = my_retriever.invoke("What is the meaning of life?")
+```
+
+## Orchestration 
+
+While standardization for individual components is useful, we've increasingly seen that developers want to *combine* components into more complex applications. 
+This motivates the need for [orchestration](https://en.wikipedia.org/wiki/Orchestration_(computing)).
+There are several common characteristics of LLM applications that this orchestration layer should support:
+
+* **Complex control flow:** The application requires complex patterns such as cycles (e.g., a loop that reiterates until a condition is met).
+* **[Persistence](https://langchain-ai.github.io/langgraph/concepts/persistence/):** The application needs to maintain [short-term and / or long-term memory](https://langchain-ai.github.io/langgraph/concepts/memory/).
+* **[Human-in-the-loop](https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/):** The application needs human interaction, e.g., pausing, reviewing, editing, approving certain steps.
+
+The recommended way to do orchestration for these complex applications is [LangGraph](https://langchain-ai.github.io/langgraph/concepts/high_level/).
+LangGraph is a library that gives developers a high degree of control by expressing the flow of the application as a set of nodes and edges.
+LangGraph comes with built-in support for [persistence](https://langchain-ai.github.io/langgraph/concepts/persistence/), [human-in-the-loop](https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/), [memory](https://langchain-ai.github.io/langgraph/concepts/memory/), and other features.
+It's particularly  well suited for building [agents](https://langchain-ai.github.io/langgraph/concepts/agentic_concepts/) or [multi-agent](https://langchain-ai.github.io/langgraph/concepts/multi_agent/) applications. 
+Importantly, individual LangChain components can be used within LangGraph nodes, but you can also use LangGraph **without** using LangChain components.
+
+:::info[Further reading]
+
+Have a look at our free course, [Introduction to LangGraph](https://academy.langchain.com/courses/intro-to-langgraph), to learn more about how to use LangGraph to build complex applications.
+
+:::
+
+## Observability and evaluation
+
+The pace of AI application development is often rate-limited by high-quality evaluations because there is a paradox of choice. 
+Developers often wonder how to engineer their prompt or which LLM best balances accuracy, latency, and cost. 
+High quality tracing and evaluations can help you rapidly answer these types of questions with confidence.
+[LangSmith](https://docs.smith.langchain.com/) is our platform that supports observability and evaluation for AI applications.
+See our conceptual guides on [evaluations](https://docs.smith.langchain.com/concepts/evaluation) and [tracing](https://docs.smith.langchain.com/concepts/tracing) for more details.
+
+:::info[Further reading]
+
+See our video playlist on [LangSmith tracing and evaluations](https://youtube.com/playlist?list=PLfaIDFEXuae0um8Fj0V4dHG37fGFU8Q5S&feature=shared) for more details.
+
+:::
+
+## Conclusion
+
+LangChain offers standard interfaces for components that are central to many AI applications, which offers a few specific advantages:
+- **Ease of swapping providers:** It allows you to swap out different component providers without having to change the underlying code.
+- **Advanced features:** It provides common methods for more advanced features, such as [streaming](/docs/concepts/runnables/#streaming) and [tool calling](/docs/concepts/tool_calling/).
+
+[LangGraph](https://langchain-ai.github.io/langgraph/concepts/high_level/) makes it possible to orchestrate complex applications (e.g., [agents](/docs/concepts/agents/)) and provide features like including [persistence](https://langchain-ai.github.io/langgraph/concepts/persistence/), [human-in-the-loop](https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/), or [memory](https://langchain-ai.github.io/langgraph/concepts/memory/).
+
+[LangSmith](https://docs.smith.langchain.com/) makes it possible to iterate with confidence on your applications, by providing LLM-specific observability and framework for testing and evaluating your application.
--- a/docs/docs/integrations/chat/groq.ipynb
+++ b/docs/docs/integrations/chat/groq.ipynb
@@ -17,7 +17,7 @@
   "source": [
    "# ChatGroq\n",
    "\n",
-    "This will help you getting started with Groq [chat models](../../concepts.mdx#chat-models). For detailed documentation of all ChatGroq features and configurations head to the [API reference](https://python.langchain.com/api_reference/groq/chat_models/langchain_groq.chat_models.ChatGroq.html). For a list of all Groq models, visit this [link](https://console.groq.com/docs/models).\n",
+    "This will help you getting started with Groq [chat models](../../concepts/chat_models.mdx). For detailed documentation of all ChatGroq features and configurations head to the [API reference](https://python.langchain.com/api_reference/groq/chat_models/langchain_groq.chat_models.ChatGroq.html). For a list of all Groq models, visit this [link](https://console.groq.com/docs/models).\n",
    "\n",
    "## Overview\n",
    "### Integration details\n",
--- a/docs/docs/integrations/chat/together.ipynb
+++ b/docs/docs/integrations/chat/together.ipynb
@@ -18,7 +18,7 @@
    "# ChatTogether\n",
    "\n",
    "\n",
-    "This page will help you get started with Together AI [chat models](../../concepts.mdx#chat-models). For detailed documentation of all ChatTogether features and configurations head to the [API reference](https://python.langchain.com/api_reference/together/chat_models/langchain_together.chat_models.ChatTogether.html).\n",
+    "This page will help you get started with Together AI [chat models](../../concepts/chat_models.mdx). For detailed documentation of all ChatTogether features and configurations head to the [API reference](https://python.langchain.com/api_reference/together/chat_models/langchain_together.chat_models.ChatTogether.html).\n",
    "\n",
    "[Together AI](https://www.together.ai/) offers an API to query [50+ leading open-source models](https://docs.together.ai/docs/chat-models)\n",
    "\n",
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@@ -47,7 +47,17 @@ module.exports = {
        className: 'hidden',
      }],
    },
-    "concepts",
+    {
+      type: "category",
+      link: {type: 'doc', id: 'concepts/index'},
+      label: "Conceptual Guide",
+      collapsible: false,
+      items: [{
+        type: 'autogenerated',
+        dirName: 'concepts',
+        className: 'hidden',
+      }],
+    },
    {
      type: "category",
      label: "Ecosystem",
--- a/docs/static/img/agent_types.png
+++ b/docs/static/img/agent_types.png
--- a/docs/static/img/conversation_patterns.png
+++ b/docs/static/img/conversation_patterns.png
--- a/docs/static/img/embeddings_concept.png
+++ b/docs/static/img/embeddings_concept.png
--- a/docs/static/img/rag_concepts.png
+++ b/docs/static/img/rag_concepts.png
--- a/docs/static/img/retrieval_concept.png
+++ b/docs/static/img/retrieval_concept.png
--- a/docs/static/img/retrieval_high_level.png
+++ b/docs/static/img/retrieval_high_level.png
--- a/docs/static/img/retriever_concept.png
+++ b/docs/static/img/retriever_concept.png
--- a/docs/static/img/retriever_full_docs.png
+++ b/docs/static/img/retriever_full_docs.png
--- a/docs/static/img/structured_output.png
+++ b/docs/static/img/structured_output.png
--- a/docs/static/img/text_splitters.png
+++ b/docs/static/img/text_splitters.png
--- a/docs/static/img/tool_call_example.png
+++ b/docs/static/img/tool_call_example.png
--- a/docs/static/img/tool_calling_agent.png
+++ b/docs/static/img/tool_calling_agent.png
--- a/docs/static/img/tool_calling_components.png
+++ b/docs/static/img/tool_calling_components.png
--- a/docs/static/img/tool_calling_concept.png
+++ b/docs/static/img/tool_calling_concept.png
--- a/docs/static/img/vectorstores.png
+++ b/docs/static/img/vectorstores.png
--- a/docs/static/img/with_structured_output.png
+++ b/docs/static/img/with_structured_output.png