update

2025-09-22 02:50:31 +00:00 · 2024-10-11 14:40:07 -04:00
parent 18c569f3e3
commit 4fe2ea01a1
6 changed files with 200 additions and 54 deletions
--- a/docs/docs/concepts/callbacks.mdx
+++ b/docs/docs/concepts/callbacks.mdx
@@ -0,0 +1,17 @@
+# Callbacks
+
+The lowest level way to stream outputs from LLMs in LangChain is via the [callbacks](/docs/concepts/#callbacks) system. You can pass a
+callback handler that handles the [`on_llm_new_token`](https://python.langchain.com/api_reference/langchain/callbacks/langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.html#langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.on_llm_new_token) event into LangChain components. When that component is invoked, any
+[LLM](/docs/concepts/#llms) or [chat model](/docs/concepts/#chat-models) contained in the component calls
+the callback with the generated token. Within the callback, you could pipe the tokens into some other destination, e.g. a HTTP response.
+You can also handle the [`on_llm_end`](https://python.langchain.com/api_reference/langchain/callbacks/langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.html#langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.on_llm_end) event to perform any necessary cleanup.
+
+You can see [this how-to section](/docs/how_to/#callbacks) for more specifics on using callbacks.
+
+Callbacks were the first technique for streaming introduced in LangChain. While powerful and generalizable,
+they can be unwieldy for developers. For example:
+
+- You need to explicitly initialize and manage some aggregator or other stream to collect results.
+- The execution order isn't explicitly guaranteed, and you could theoretically have a callback run after the `.invoke()` method finishes.
+- Providers would often make you pass an additional parameter to stream outputs instead of returning them all at once.
+- You would often ignore the result of the actual model call in favor of callback results.
--- a/docs/docs/concepts/chat_models.mdx
+++ b/docs/docs/concepts/chat_models.mdx
@@ -1 +1,36 @@
 # Chat models
+<span data-heading-keywords="chat model,chat models"></span>
+
+Language models that use a sequence of messages as inputs and return chat messages as outputs (as opposed to using plain text).
+These are traditionally newer models (older models are generally `LLMs`, see below).
+Chat models support the assignment of distinct roles to conversation messages, helping to distinguish messages from the AI, users, and instructions such as system messages.
+
+Although the underlying models are messages in, message out, the LangChain wrappers also allow these models to take a string as input. This means you can easily use chat models in place of LLMs.
+
+When a string is passed in as input, it is converted to a `HumanMessage` and then passed to the underlying model.
+
+LangChain does not host any Chat Models, rather we rely on third party integrations.
+
+We have some standardized parameters when constructing ChatModels:
+- `model`: the name of the model
+- `temperature`: the sampling temperature
+- `timeout`: request timeout
+- `max_tokens`: max tokens to generate
+- `stop`: default stop sequences
+- `max_retries`: max number of times to retry requests
+- `api_key`: API key for the model provider
+- `base_url`: endpoint to send requests to
+
+Some important things to note:
+- standard params only apply to model providers that expose parameters with the intended functionality. For example, some providers do not expose a configuration for maximum output tokens, so max_tokens can't be supported on these.
+- standard params are currently only enforced on integrations that have their own integration packages (e.g. `langchain-openai`, `langchain-anthropic`, etc.), they're not enforced on models in ``langchain-community``.
+
+ChatModels also accept other parameters that are specific to that integration. To find all the parameters supported by a ChatModel head to the API reference for that model.
+
+:::important
+Some chat models have been fine-tuned for **tool calling** and provide a dedicated API for it.
+Generally, such models are better at tool calling than non-fine-tuned models, and are recommended for use cases that require tool calling.
+Please see the [tool calling section](/docs/concepts/#functiontool-calling) for more information.
+:::
+
+For specifics on how to use chat models, see the [relevant how-to guides here](/docs/how_to/#chat-models).
--- a/docs/docs/concepts/index.mdx
+++ b/docs/docs/concepts/index.mdx
@@ -438,64 +438,16 @@ For specifics on how to use callbacks, see the [relevant how-to guides here](/do
 Conceptual Guide: [Streaming](/docs/concepts/streaming)

 #### `.stream()` and `.astream()`
-
-Conceptual Guide: [Streaming](/docs/concepts/streaming#stream)
+TODO(concepts): Add URL fragment

 #### `.astream_events()`
 <span data-heading-keywords="astream_events,stream_events,stream events"></span>
-
-While the `.stream()` method is intuitive, it can only return the final generated value of your chain. This is fine for single LLM calls,
-but as you build more complex chains of several LLM calls together, you may want to use the intermediate values of
-the chain alongside the final output - for example, returning sources alongside the final generation when building a chat
-over documents app.
-
-There are ways to do this [using callbacks](/docs/concepts/#callbacks-1), or by constructing your chain in such a way that it passes intermediate
-values to the end with something like chained [`.assign()`](/docs/how_to/passthrough/) calls, but LangChain also includes an
-`.astream_events()` method that combines the flexibility of callbacks with the ergonomics of `.stream()`. When called, it returns an iterator
-which yields [various types of events](/docs/how_to/streaming/#event-reference) that you can filter and process according
-to the needs of your project.
-
-Here's one small example that prints just events containing streamed chat model output:
-
-```python
-from langchain_core.output_parsers import StrOutputParser
-from langchain_core.prompts import ChatPromptTemplate
-from langchain_anthropic import ChatAnthropic
-
-model = ChatAnthropic(model="claude-3-sonnet-20240229")
-
-prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
-parser = StrOutputParser()
-chain = prompt | model | parser
-
-async for event in chain.astream_events({"topic": "parrot"}, version="v2"):
-    kind = event["event"]
-    if kind == "on_chat_model_stream":
-        print(event, end="|", flush=True)
-```
-
-You can roughly think of it as an iterator over callback events (though the format differs) - and you can use it on almost all LangChain components!
-
-See [this guide](/docs/how_to/streaming/#using-stream-events) for more detailed information on how to use `.astream_events()`,
-including a table listing available events.
+TODO(concepts): Add URL fragment

 #### Callbacks

-The lowest level way to stream outputs from LLMs in LangChain is via the [callbacks](/docs/concepts/#callbacks) system. You can pass a
-callback handler that handles the [`on_llm_new_token`](https://python.langchain.com/api_reference/langchain/callbacks/langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.html#langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.on_llm_new_token) event into LangChain components. When that component is invoked, any
-[LLM](/docs/concepts/#llms) or [chat model](/docs/concepts/#chat-models) contained in the component calls
-the callback with the generated token. Within the callback, you could pipe the tokens into some other destination, e.g. a HTTP response.
-You can also handle the [`on_llm_end`](https://python.langchain.com/api_reference/langchain/callbacks/langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.html#langchain.callbacks.streaming_aiter.AsyncIteratorCallbackHandler.on_llm_end) event to perform any necessary cleanup.
-
-You can see [this how-to section](/docs/how_to/#callbacks) for more specifics on using callbacks.
-
-Callbacks were the first technique for streaming introduced in LangChain. While powerful and generalizable,
-they can be unwieldy for developers. For example:
-
- You need to explicitly initialize and manage some aggregator or other stream to collect results.
- The execution order isn't explicitly guaranteed, and you could theoretically have a callback run after the `.invoke()` method finishes.
- Providers would often make you pass an additional parameter to stream outputs instead of returning them all at once.
- You would often ignore the result of the actual model call in favor of callback results.
+* Conceptual Guide: [Callbacks](/docs/concepts/callbacks)
+* How-to Guides: [How to use Callbacks](/docs/how_to/#callbacks)

 #### Tokens

--- a/docs/docs/concepts/lcel.mdx
+++ b/docs/docs/concepts/lcel.mdx
@@ -1 +1,35 @@
-# LCEL
+## LangChain Expression Language (LCEL)
+<span data-heading-keywords="lcel"></span>
+
+`LangChain Expression Language`, or `LCEL`, is a declarative way to chain LangChain components.
+LCEL was designed from day 1 to **support putting prototypes in production, with no code changes**, from the simplest “prompt + LLM” chain to the most complex chains (we’ve seen folks successfully run LCEL chains with 100s of steps in production). To highlight a few of the reasons you might want to use LCEL:
+
+- **First-class streaming support:**
+When you build your chains with LCEL you get the best possible time-to-first-token (time elapsed until the first chunk of output comes out). For some chains this means eg. we stream tokens straight from an LLM to a streaming output parser, and you get back parsed, incremental chunks of output at the same rate as the LLM provider outputs the raw tokens.
+
+- **Async support:**
+Any chain built with LCEL can be called both with the synchronous API (eg. in your Jupyter notebook while prototyping) as well as with the asynchronous API (eg. in a [LangServe](/docs/langserve/) server). This enables using the same code for prototypes and in production, with great performance, and the ability to handle many concurrent requests in the same server.
+
+- **Optimized parallel execution:**
+Whenever your LCEL chains have steps that can be executed in parallel (eg if you fetch documents from multiple retrievers) we automatically do it, both in the sync and the async interfaces, for the smallest possible latency.
+
+- **Retries and fallbacks:**
+Configure retries and fallbacks for any part of your LCEL chain. This is a great way to make your chains more reliable at scale. We’re currently working on adding streaming support for retries/fallbacks, so you can get the added reliability without any latency cost.
+
+- **Access intermediate results:**
+For more complex chains it’s often very useful to access the results of intermediate steps even before the final output is produced. This can be used to let end-users know something is happening, or even just to debug your chain. You can stream intermediate results, and it’s available on every [LangServe](/docs/langserve) server.
+
+- **Input and output schemas**
+Input and output schemas give every LCEL chain Pydantic and JSONSchema schemas inferred from the structure of your chain. This can be used for validation of inputs and outputs, and is an integral part of LangServe.
+
+- [**Seamless LangSmith tracing**](https://docs.smith.langchain.com)
+As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at every step.
+With LCEL, **all** steps are automatically logged to [LangSmith](https://docs.smith.langchain.com/) for maximum observability and debuggability.
+
+LCEL aims to provide consistency around behavior and customization over legacy subclassed chains such as `LLMChain` and
+`ConversationalRetrievalChain`. Many of these legacy chains hide important details like prompts, and as a wider variety
+of viable models emerge, customization has become more and more important.
+
+If you are currently using one of these legacy chains, please see [this guide for guidance on how to migrate](/docs/versions/migrating_chains).
+
+For guides on how to do specific tasks with LCEL, check out [the relevant how-to guides](/docs/how_to/#langchain-expression-language-lcel).
--- a/docs/docs/concepts/runnables.mdx
+++ b/docs/docs/concepts/runnables.mdx
@@ -1 +1,35 @@
 # Runnable Interface
+<span data-heading-keywords="invoke,runnable"></span>
+
+To make it as easy as possible to create custom chains, we've implemented a ["Runnable"](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable) protocol. Many LangChain components implement the `Runnable` protocol, including chat models, LLMs, output parsers, retrievers, prompt templates, and more. There are also several useful primitives for working with runnables, which you can read about below.
+
+This is a standard interface, which makes it easy to define custom chains as well as invoke them in a standard way.
+The standard interface includes:
+
+- `stream`: stream back chunks of the response
+- `invoke`: call the chain on an input
+- `batch`: call the chain on a list of inputs
+
+These also have corresponding async methods that should be used with [asyncio](https://docs.python.org/3/library/asyncio.html) `await` syntax for concurrency:
+
+- `astream`: stream back chunks of the response async
+- `ainvoke`: call the chain on an input async
+- `abatch`: call the chain on a list of inputs async
+- `astream_log`: stream back intermediate steps as they happen, in addition to the final response
+- `astream_events`: **beta** stream events as they happen in the chain (introduced in `langchain-core` 0.1.14)
+
+The **input type** and **output type** varies by component:
+
+| Component    | Input Type                                            | Output Type           |
+|--------------|-------------------------------------------------------|-----------------------|
+| Prompt       | Dictionary                                            | PromptValue           |
+| ChatModel    | Single string, list of chat messages or a PromptValue | ChatMessage           |
+| LLM          | Single string, list of chat messages or a PromptValue | String                |
+| OutputParser | The output of an LLM or ChatModel                     | Depends on the parser |
+| Retriever    | Single string                                         | List of Documents     |
+| Tool         | Single string or dictionary, depending on the tool    | Depends on the tool   |
+
+
+All runnables expose input and output **schemas** to inspect the inputs and outputs:
+- `input_schema`: an input Pydantic model auto-generated from the structure of the Runnable
+- `output_schema`: an output Pydantic model auto-generated from the structure of the Runnable
--- a/docs/docs/concepts/streaming.mdx
+++ b/docs/docs/concepts/streaming.mdx
@@ -0,0 +1,74 @@
+# Streaming
+
+<span data-heading-keywords="stream,streaming"></span>
+
+Individual LLM calls often run for much longer than traditional resource requests.
+This compounds when you build more complex chains or agents that require multiple reasoning steps.
+
+Fortunately, LLMs generate output iteratively, which means it's possible to show sensible intermediate results
+before the final response is ready. Consuming output as soon as it becomes available has therefore become a vital part of the UX
+around building apps with LLMs to help alleviate latency issues, and LangChain aims to have first-class support for streaming.
+
+Below, we'll discuss some concepts and considerations around streaming in LangChain.
+
+## `.stream()` and `.astream()`
+
+Most modules in LangChain include the `.stream()` method (and the equivalent `.astream()` method for [async](https://docs.python.org/3/library/asyncio.html) environments) as an ergonomic streaming interface.
+`.stream()` returns an iterator, which you can consume with a simple `for` loop. Here's an example with a chat model:
+
+```python
+from langchain_anthropic import ChatAnthropic
+
+model = ChatAnthropic(model="claude-3-sonnet-20240229")
+
+for chunk in model.stream("what color is the sky?"):
+    print(chunk.content, end="|", flush=True)
+```
+
+For models (or other components) that don't support streaming natively, this iterator would just yield a single chunk, but
+you could still use the same general pattern when calling them. Using `.stream()` will also automatically call the model in streaming mode
+without the need to provide additional config.
+
+The type of each outputted chunk depends on the type of component - for example, chat models yield [`AIMessageChunks`](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.AIMessageChunk.html).
+Because this method is part of [LangChain Expression Language](/docs/concepts/#langchain-expression-language-lcel),
+you can handle formatting differences from different outputs using an [output parser](/docs/concepts/#output-parsers) to transform
+each yielded chunk.
+
+You can check out [this guide](/docs/how_to/streaming/#using-stream) for more detail on how to use `.stream()`.
+
+## `.astream_events()`
+<span data-heading-keywords="astream_events,stream_events,stream events"></span>
+
+While the `.stream()` method is intuitive, it can only return the final generated value of your chain. This is fine for single LLM calls,
+but as you build more complex chains of several LLM calls together, you may want to use the intermediate values of
+the chain alongside the final output - for example, returning sources alongside the final generation when building a chat
+over documents app.
+
+There are ways to do this [using callbacks](/docs/concepts/#callbacks-1), or by constructing your chain in such a way that it passes intermediate
+values to the end with something like chained [`.assign()`](/docs/how_to/passthrough/) calls, but LangChain also includes an
+`.astream_events()` method that combines the flexibility of callbacks with the ergonomics of `.stream()`. When called, it returns an iterator
+which yields [various types of events](/docs/how_to/streaming/#event-reference) that you can filter and process according
+to the needs of your project.
+
+Here's one small example that prints just events containing streamed chat model output:
+
+```python
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_anthropic import ChatAnthropic
+
+model = ChatAnthropic(model="claude-3-sonnet-20240229")
+
+prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
+parser = StrOutputParser()
+chain = prompt | model | parser
+
+async for event in chain.astream_events({"topic": "parrot"}, version="v2"):
+    kind = event["event"]
+    if kind == "on_chat_model_stream":
+        print(event, end="|", flush=True)
+```
+
+You can roughly think of it as an iterator over callback events (though the format differs) - and you can use it on almost all LangChain components!
+
+See [this guide](/docs/how_to/streaming/#using-stream-events) for more detailed information on how to use `.astream_events()`, including a table listing available events.