From 1ee8cf7b203655215c1c6c57942a282a8c6441de Mon Sep 17 00:00:00 2001 From: Eugene Yurtsev Date: Thu, 4 Apr 2024 22:36:03 -0400 Subject: [PATCH] Docs: Update custom chat model (#19967) * Clean up in the existing tutorial * Add model_name to identifying params * Add table to summarize messages --- .../model_io/chat/custom_chat_model.ipynb | 320 +++++++----------- 1 file changed, 121 insertions(+), 199 deletions(-) diff --git a/docs/docs/modules/model_io/chat/custom_chat_model.ipynb b/docs/docs/modules/model_io/chat/custom_chat_model.ipynb index b91ca4cfd43..b410f837293 100644 --- a/docs/docs/modules/model_io/chat/custom_chat_model.ipynb +++ b/docs/docs/modules/model_io/chat/custom_chat_model.ipynb @@ -1,7 +1,6 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "id": "e3da9a3f-f583-4ba6-994e-0e8c1158f5eb", "metadata": {}, @@ -10,13 +9,13 @@ "\n", "In this guide, we'll learn how to create a custom chat model using LangChain abstractions.\n", "\n", - "Wrapping your LLM with the standard `ChatModel` interface allow you to use your LLM in existing LangChain programs with minimal code modifications!\n", + "Wrapping your LLM with the standard `BaseChatModel` interface allow you to use your LLM in existing LangChain programs with minimal code modifications!\n", "\n", "As an bonus, your LLM will automatically become a LangChain `Runnable` and will benefit from some optimizations out of the box (e.g., batch via a threadpool), async support, the `astream_events` API, etc.\n", "\n", "## Inputs and outputs\n", "\n", - "First, we need to talk about messages which are the inputs and outputs of chat models.\n", + "First, we need to talk about **messages** which are the inputs and outputs of chat models.\n", "\n", "### Messages\n", "\n", @@ -24,13 +23,17 @@ "\n", "LangChain has a few built-in message types:\n", "\n", - "- `SystemMessage`: Used for priming AI behavior, usually passed in as the first of a sequence of input messages.\n", - "- `HumanMessage`: Represents a message from a person interacting with the chat model.\n", - "- `AIMessage`: Represents a message from the chat model. This can be either text or a request to invoke a tool.\n", - "- `FunctionMessage` / `ToolMessage`: Message for passing the results of tool invocation back to the model.\n", + "| Message Type | Description |\n", + "|-----------------------|-------------------------------------------------------------------------------------------------|\n", + "| `SystemMessage` | Used for priming AI behavior, usually passed in as the first of a sequence of input messages. |\n", + "| `HumanMessage` | Represents a message from a person interacting with the chat model. |\n", + "| `AIMessage` | Represents a message from the chat model. This can be either text or a request to invoke a tool.|\n", + "| `FunctionMessage` / `ToolMessage` | Message for passing the results of tool invocation back to the model. |\n", + "| `AIMessageChunk` / `HumanMessageChunk` / ... | Chunk variant of each type of message. |\n", + "\n", "\n", "::: {.callout-note}\n", - "`ToolMessage` and `FunctionMessage` closely follow OpenAIs `function` and `tool` arguments.\n", + "`ToolMessage` and `FunctionMessage` closely follow OpenAIs `function` and `tool` roles.\n", "\n", "This is a rapidly developing field and as more models add function calling capabilities, expect that there will be additions to this schema.\n", ":::" @@ -40,7 +43,9 @@ "cell_type": "code", "execution_count": 1, "id": "c5046e6a-8b09-4a99-b6e6-7a605aac5738", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "from langchain_core.messages import (\n", @@ -67,7 +72,9 @@ "cell_type": "code", "execution_count": 2, "id": "d4656e9d-bfa1-4703-8f79-762fe6421294", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "from langchain_core.messages import (\n", @@ -91,7 +98,9 @@ "cell_type": "code", "execution_count": 3, "id": "9c15c299-6f8a-49cf-a072-09924fd44396", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [ { "data": { @@ -108,32 +117,6 @@ "AIMessageChunk(content=\"Hello\") + AIMessageChunk(content=\" World!\")" ] }, - { - "cell_type": "markdown", - "id": "8e952d64-6d38-4a2b-b996-8812c204a12c", - "metadata": {}, - "source": [ - "## Simple Chat Model\n", - "\n", - "Inherting from `SimpleChatModel` is great for prototyping!\n", - "\n", - "It won't allow you to implement all features that you might want out of a chat model, but it's quick to implement, and if you need more you can transition to `BaseChatModel` shown below.\n", - "\n", - "Let's implement a chat model that echoes back the last `n` characters of the prompt!\n", - "\n", - "You need to implement the following:\n", - "\n", - "* The method `_call` - Use to generate a chat result from a prompt.\n", - "\n", - "In addition, you have the option to specify the following:\n", - "\n", - "* The property `_identifying_params` - Represent model parameterization for logging purposes.\n", - "\n", - "Optional:\n", - "\n", - "* `_stream` - Use to implement streaming.\n" - ] - }, { "cell_type": "markdown", "id": "bbfebea1", @@ -143,29 +126,22 @@ "\n", "Let's implement a chat model that echoes back the first `n` characetrs of the last message in the prompt!\n", "\n", - "To do so, we will inherit from `BaseChatModel` and we'll need to implement the following methods/properties:\n", + "To do so, we will inherit from `BaseChatModel` and we'll need to implement the following:\n", "\n", - "In addition, you have the option to specify the following:\n", - "\n", - "To do so inherit from `BaseChatModel` which is a lower level class and implement the methods:\n", - "\n", - "* `_generate` - Use to generate a chat result from a prompt\n", - "* The property `_llm_type` - Used to uniquely identify the type of the model. Used for logging.\n", - "\n", - "Optional:\n", - "\n", - "* `_stream` - Use to implement streaming.\n", - "* `_agenerate` - Use to implement a native async method.\n", - "* `_astream` - Use to implement async version of `_stream`.\n", - "* The property `_identifying_params` - Represent model parameterization for logging purposes.\n", + "| Method/Property | Description | Required/Optional |\n", + "|------------------------------------|-------------------------------------------------------------------|--------------------|\n", + "| `_generate` | Use to generate a chat result from a prompt | Required |\n", + "| `_llm_type` (property) | Used to uniquely identify the type of the model. Used for logging.| Required |\n", + "| `_identifying_params` (property) | Represent model parameterization for tracing purposes. | Optional |\n", + "| `_stream` | Use to implement streaming. | Optional |\n", + "| `_agenerate` | Use to implement a native async method. | Optional |\n", + "| `_astream` | Use to implement async version of `_stream`. | Optional |\n", "\n", "\n", - ":::{.callout-caution}\n", + ":::{.callout-tip}\n", + "The `_astream` implementation uses `run_in_executor` to launch the sync `_stream` in a separate thread if `_stream` is implemented, otherwise it fallsback to use `_agenerate`.\n", "\n", - "Currently, to get async streaming to work (via `astream`), you must provide an implementation of `_astream`.\n", - "\n", - "By default if `_astream` is not provided, then async streaming falls back on `_agenerate` which does not support\n", - "token by token streaming.\n", + "You can use this trick if you want to reuse the `_stream` implementation, but if you're able to implement code that's natively async that's a better solution since that code will run with less overhead.\n", ":::" ] }, @@ -181,7 +157,9 @@ "cell_type": "code", "execution_count": 4, "id": "25ba32e5-5a6d-49f4-bb68-911827b84d61", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "from typing import Any, AsyncIterator, Dict, Iterator, List, Optional\n", @@ -214,6 +192,8 @@ " [HumanMessage(content=\"world\")]])\n", " \"\"\"\n", "\n", + " model_name: str\n", + " \"\"\"The name of the model\"\"\"\n", " n: int\n", " \"\"\"The number of characters from the last message of the prompt to be echoed.\"\"\"\n", "\n", @@ -239,9 +219,19 @@ " downstream and understand why generation stopped.\n", " run_manager: A run manager with callbacks for the LLM.\n", " \"\"\"\n", + " # Replace this with actual logic to generate a response from a list\n", + " # of messages.\n", " last_message = messages[-1]\n", " tokens = last_message.content[: self.n]\n", - " message = AIMessage(content=tokens)\n", + " message = AIMessage(\n", + " content=tokens,\n", + " additional_kwargs={}, # Used to add additional payload (e.g., function calling request)\n", + " response_metadata={ # Use for response metadata\n", + " \"time_in_seconds\": 3,\n", + " },\n", + " )\n", + " ##\n", + "\n", " generation = ChatGeneration(message=message)\n", " return ChatResult(generations=[generation])\n", "\n", @@ -276,36 +266,21 @@ " chunk = ChatGenerationChunk(message=AIMessageChunk(content=token))\n", "\n", " if run_manager:\n", + " # This is optional in newer versions of LangChain\n", + " # The on_llm_new_token will be called automatically\n", " run_manager.on_llm_new_token(token, chunk=chunk)\n", "\n", " yield chunk\n", "\n", - " async def _astream(\n", - " self,\n", - " messages: List[BaseMessage],\n", - " stop: Optional[List[str]] = None,\n", - " run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,\n", - " **kwargs: Any,\n", - " ) -> AsyncIterator[ChatGenerationChunk]:\n", - " \"\"\"An async variant of astream.\n", - "\n", - " If not provided, the default behavior is to delegate to the _generate method.\n", - "\n", - " The implementation below instead will delegate to `_stream` and will\n", - " kick it off in a separate thread.\n", - "\n", - " If you're able to natively support async, then by all means do so!\n", - " \"\"\"\n", - " result = await run_in_executor(\n", - " None,\n", - " self._stream,\n", - " messages,\n", - " stop=stop,\n", - " run_manager=run_manager.get_sync() if run_manager else None,\n", - " **kwargs,\n", + " # Let's add some other information (e.g., response metadata)\n", + " chunk = ChatGenerationChunk(\n", + " message=AIMessageChunk(content=\"\", response_metadata={\"time_in_sec\": 3})\n", " )\n", - " for chunk in result:\n", - " yield chunk\n", + " if run_manager:\n", + " # This is optional in newer versions of LangChain\n", + " # The on_llm_new_token will be called automatically\n", + " run_manager.on_llm_new_token(token, chunk=chunk)\n", + " yield chunk\n", "\n", " @property\n", " def _llm_type(self) -> str:\n", @@ -314,21 +289,18 @@ "\n", " @property\n", " def _identifying_params(self) -> Dict[str, Any]:\n", - " \"\"\"Return a dictionary of identifying parameters.\"\"\"\n", - " return {\"n\": self.n}" - ] - }, - { - "cell_type": "markdown", - "id": "b3c3d030-8d8b-4891-962d-a2d39b331883", - "metadata": {}, - "source": [ - ":::{.callout-tip}\n", - "The `_astream` implementation uses `run_in_executor` to launch the sync `_stream` in a separate thread.\n", + " \"\"\"Return a dictionary of identifying parameters.\n", "\n", - "You can use this trick if you want to reuse the `_stream` implementation, but if you're able to implement code\n", - "that's natively async that's a better solution since that code will run with less overhead.\n", - ":::" + " This information is used by the LangChain callback system, which\n", + " is used for tracing purposes make it possible to monitor LLMs.\n", + " \"\"\"\n", + " return {\n", + " # The model name allows users to specify custom token counting\n", + " # rules in LLM monitoring applications (e.g., in LangSmith users\n", + " # can provide per token pricing for their model and monitor\n", + " # costs for the given LLM.)\n", + " \"model_name\": self.model_name,\n", + " }" ] }, { @@ -345,22 +317,26 @@ "cell_type": "code", "execution_count": 5, "id": "34bf2d48-556a-48be-aee7-496fb02332f3", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "model = CustomChatModelAdvanced(n=3)" + "model = CustomChatModelAdvanced(n=3, model_name=\"my_custom_model\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "27689f30-dcd2-466b-ba9d-f60b7d434110", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [ { "data": { "text/plain": [ - "AIMessage(content='Meo')" + "AIMessage(content='Meo', response_metadata={'time_in_seconds': 3}, id='run-ddb42bd6-4fdd-4bd2-8be5-e11b67d3ac29-0')" ] }, "execution_count": 6, @@ -382,12 +358,14 @@ "cell_type": "code", "execution_count": 7, "id": "406436df-31bf-466b-9c3d-39db9d6b6407", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [ { "data": { "text/plain": [ - "AIMessage(content='hel')" + "AIMessage(content='hel', response_metadata={'time_in_seconds': 3}, id='run-4d3cc912-44aa-454b-977b-ca02be06c12e-0')" ] }, "execution_count": 7, @@ -403,12 +381,15 @@ "cell_type": "code", "execution_count": 8, "id": "a72ffa46-6004-41ef-bbe4-56fa17a029e2", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [ { "data": { "text/plain": [ - "[AIMessage(content='hel'), AIMessage(content='goo')]" + "[AIMessage(content='hel', response_metadata={'time_in_seconds': 3}, id='run-9620e228-1912-4582-8aa1-176813afec49-0'),\n", + " AIMessage(content='goo', response_metadata={'time_in_seconds': 3}, id='run-1ce8cdf8-6f75-448e-82f7-1bb4a121df93-0')]" ] }, "execution_count": 8, @@ -424,13 +405,15 @@ "cell_type": "code", "execution_count": 9, "id": "3633be2c-2ea0-42f9-a72f-3b5240690b55", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "c|a|t|" + "c|a|t||" ] } ], @@ -451,13 +434,15 @@ "cell_type": "code", "execution_count": 10, "id": "b7d73995-eeab-48c6-a7d8-32c98ba29fc2", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "c|a|t|" + "c|a|t||" ] } ], @@ -478,24 +463,27 @@ "cell_type": "code", "execution_count": 11, "id": "17840eba-8ff4-4e73-8e4f-85f16eb1c9d0", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "{'event': 'on_chat_model_start', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'name': 'CustomChatModelAdvanced', 'tags': [], 'metadata': {}, 'data': {'input': 'cat'}}\n", - "{'event': 'on_chat_model_stream', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='c')}}\n", - "{'event': 'on_chat_model_stream', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='a')}}\n", - "{'event': 'on_chat_model_stream', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='t')}}\n", - "{'event': 'on_chat_model_end', 'name': 'CustomChatModelAdvanced', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'data': {'output': AIMessageChunk(content='cat')}}\n" + "{'event': 'on_chat_model_start', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'name': 'CustomChatModelAdvanced', 'tags': [], 'metadata': {}, 'data': {'input': 'cat'}}\n", + "{'event': 'on_chat_model_stream', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='c', id='run-125a2a16-b9cd-40de-aa08-8aa9180b07d0')}}\n", + "{'event': 'on_chat_model_stream', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='a', id='run-125a2a16-b9cd-40de-aa08-8aa9180b07d0')}}\n", + "{'event': 'on_chat_model_stream', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='t', id='run-125a2a16-b9cd-40de-aa08-8aa9180b07d0')}}\n", + "{'event': 'on_chat_model_stream', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='', response_metadata={'time_in_sec': 3}, id='run-125a2a16-b9cd-40de-aa08-8aa9180b07d0')}}\n", + "{'event': 'on_chat_model_end', 'name': 'CustomChatModelAdvanced', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'tags': [], 'metadata': {}, 'data': {'output': AIMessageChunk(content='cat', response_metadata={'time_in_sec': 3}, id='run-125a2a16-b9cd-40de-aa08-8aa9180b07d0')}}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ - "/home/eugene/src/langchain/libs/core/langchain_core/_api/beta_decorator.py:86: LangChainBetaWarning: This API is in beta and may change in the future.\n", + "/home/eugene/src/langchain/libs/core/langchain_core/_api/beta_decorator.py:87: LangChainBetaWarning: This API is in beta and may change in the future.\n", " warn_beta(\n" ] } @@ -505,84 +493,6 @@ " print(event)" ] }, - { - "cell_type": "markdown", - "id": "42f9553f-7d8c-4277-aeb4-d80d77839d90", - "metadata": {}, - "source": [ - "## Identifying Params\n", - "\n", - "LangChain has a callback system which allows implementing loggers to monitor the behavior of LLM applications.\n", - "\n", - "Remember the `_identifying_params` property from earlier? \n", - "\n", - "It's passed to the callback system and is accessible for user specified loggers.\n", - "\n", - "Below we'll implement a handler with just a single `on_chat_model_start` event to see where `_identifying_params` appears." - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "cc7e6b5f-711b-48aa-9ebe-92a13e230c37", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "---\n", - "On chat model start.\n", - "{'invocation_params': {'n': 3, '_type': 'echoing-chat-model-advanced', 'stop': ['woof']}, 'options': {'stop': ['woof']}, 'name': None, 'batch_size': 1}\n" - ] - }, - { - "data": { - "text/plain": [ - "AIMessage(content='meo')" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from typing import Union\n", - "from uuid import UUID\n", - "\n", - "from langchain_core.callbacks import AsyncCallbackHandler\n", - "from langchain_core.outputs import (\n", - " ChatGenerationChunk,\n", - " ChatResult,\n", - " GenerationChunk,\n", - " LLMResult,\n", - ")\n", - "\n", - "\n", - "class SampleCallbackHandler(AsyncCallbackHandler):\n", - " \"\"\"Async callback handler that handles callbacks from LangChain.\"\"\"\n", - "\n", - " async def on_chat_model_start(\n", - " self,\n", - " serialized: Dict[str, Any],\n", - " messages: List[List[BaseMessage]],\n", - " *,\n", - " run_id: UUID,\n", - " parent_run_id: Optional[UUID] = None,\n", - " tags: Optional[List[str]] = None,\n", - " metadata: Optional[Dict[str, Any]] = None,\n", - " **kwargs: Any,\n", - " ) -> Any:\n", - " \"\"\"Run when a chat model starts running.\"\"\"\n", - " print(\"---\")\n", - " print(\"On chat model start.\")\n", - " print(kwargs)\n", - "\n", - "\n", - "model.invoke(\"meow\", stop=[\"woof\"], config={\"callbacks\": [SampleCallbackHandler()]})" - ] - }, { "cell_type": "markdown", "id": "44ee559b-b1da-4851-8c97-420ab394aff9", @@ -603,11 +513,10 @@ "\n", "* [ ] Add unit or integration tests to the overridden methods. Verify that `invoke`, `ainvoke`, `batch`, `stream` work if you've over-ridden the corresponding code.\n", "\n", + "\n", "Streaming (if you're implementing it):\n", "\n", - "* [ ] Provided an async implementation via `_astream`\n", - "* [ ] Make sure to invoke the `on_llm_new_token` callback\n", - "* [ ] `on_llm_new_token` is invoked BEFORE yielding the chunk\n", + "* [ ] Implement the _stream method to get streaming working\n", "\n", "Stop Token Behavior:\n", "\n", @@ -616,7 +525,20 @@ "\n", "Secret API Keys:\n", "\n", - "* [ ] If your model connects to an API it will likely accept API keys as part of its initialization. Use Pydantic's `SecretStr` type for secrets, so they don't get accidentally printed out when folks print the model." + "* [ ] If your model connects to an API it will likely accept API keys as part of its initialization. Use Pydantic's `SecretStr` type for secrets, so they don't get accidentally printed out when folks print the model.\n", + "\n", + "\n", + "Identifying Params:\n", + "\n", + "* [ ] Include a `model_name` in identifying params\n", + "\n", + "\n", + "Optimizations:\n", + "\n", + "Consider providing native async support to reduce the overhead from the model!\n", + " \n", + "* [ ] Provided a native async of `_agenerate` (used by `ainvoke`)\n", + "* [ ] Provided a native async of `_astream` (used by `astream`)" ] } ], @@ -636,7 +558,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.2" + "version": "3.11.4" } }, "nbformat": 4,