Compare commits

...

10 Commits

Author SHA1 Message Date
Erick Friis
ad87d24edc x 2024-11-21 20:02:19 -08:00
Erick Friis
254e59c2ce x 2024-11-21 19:54:14 -08:00
Erick Friis
01bf59679c x 2024-11-21 19:40:08 -08:00
Erick Friis
bffca0d5c2 x 2024-11-21 19:37:17 -08:00
Erick Friis
46ea6722f4 x 2024-11-21 19:00:38 -08:00
Erick Friis
d83000b5b8 x 2024-11-21 18:59:10 -08:00
Erick Friis
3dd6c05ce7 x 2024-11-21 18:48:59 -08:00
Erick Friis
1798d6e92e x 2024-11-21 18:41:56 -08:00
Erick Friis
9ac46cc264 x 2024-11-21 18:17:04 -08:00
Erick Friis
ec12b492f1 docs: poetry publish 2024-11-21 15:56:57 -08:00
12 changed files with 847 additions and 208 deletions

View File

@@ -6,4 +6,5 @@
## Integrations
- [**Start Here**](integrations/index.mdx): Help us integrate with your favorite vendors and tools.
- [**Package**](integrations/package): Publish an integration package to PyPi
- [**Standard Tests**](integrations/standard_tests): Ensure your integration passes an expected set of tests.

View File

@@ -1,3 +1,7 @@
---
pagination_next: null
pagination_prev: null
---
## How to add a community integration (not recommended)
:::danger

View File

@@ -1,3 +1,8 @@
---
pagination_next: null
pagination_prev: null
---
# How to publish an integration package from a template
:::danger

View File

@@ -1,5 +1,5 @@
---
sidebar_position: 5
pagination_next: contributing/how_to/integrations/package
---
# Contribute Integrations
@@ -66,7 +66,7 @@ that will render on this site (https://python.langchain.com/).
As a prerequisite to adding your integration to our documentation, you must:
1. Confirm that your integration is in the [list of components](#components-to-integrate) we are currently accepting.
2. Ensure that your integration is in a separate package that can be installed with `pip install <your-package>`.
2. [Publish your package to PyPi](./package.mdx) and make the repo public.
3. [Implement the standard tests](/docs/contributing/how_to/integrations/standard_tests) for your integration and successfully run them.
3. Write documentation for your integration in the `docs/docs/integrations/<component_type>` directory of the LangChain monorepo.
4. Add a provider page for your integration in the `docs/docs/integrations/providers` directory of the LangChain monorepo.
@@ -75,5 +75,4 @@ Once you have completed these steps, you can submit a PR to the LangChain monore
## Further Reading
If you're starting from scratch, you can follow the [Integration Template Guide](./from_template.mdx) to create and publish a new integration package
to the above spec.
To get started, let's learn [how to bootstrap a new integration package](./package.mdx) for LangChain.

View File

@@ -0,0 +1,260 @@
---
pagination_next: contributing/how_to/integrations/standard_tests
pagination_prev: contributing/how_to/integrations/index
---
# How to bootstrap a new integration package
This guide walks through the process of publishing a new LangChain integration
package to PyPi.
Integration packages are just Python packages that can be installed with `pip install <your-package>`,
which contain classes that are compatible with LangChain's core interfaces.
In this guide, we will be using [Poetry](https://python-poetry.org/) for
dependency management and packaging, and you're welcome to use any other tools you prefer.
## **Prerequisites**
- [GitHub](https://github.com) account
- [PyPi](https://pypi.org/) account
## Boostrapping a new Python package with Poetry
First, install Poetry:
```bash
pip install poetry
```
Next, come up with a name for your package. For this guide, we'll use `langchain-parrot-link`.
You can confirm that the name is available on PyPi by searching for it on the [PyPi website](https://pypi.org/).
Next, create your new Python package with Poetry, and navigate into the new directory with `cd`:
```bash
poetry new langchain-parrot-link
cd langchain-parrot-link
```
Add main dependencies using Poetry, which will add them to your `pyproject.toml` file:
```bash
poetry add langchain-core
```
We will also add some `test` dependencies in a separate poetry dependency group. If
you are not using Poetry, we recommend adding these in a way that won't package them
with your published package, or just installing them separately when you run tests.
`langchain-tests` will provide the [standard tests](../standard_tests) we will use later.
We recommended pinning these to the latest version: <img src="https://img.shields.io/pypi/v/langchain-tests" style={{position:"relative",top:4,left:3}} />
Note: Replace `<latest_version>` with the latest version of `langchain-tests` below.
```bash
poetry add --group test pytest pytest-socket langchain-tests==<latest_version>
```
You're now ready to start writing your integration package!
## Writing your integration
Let's say you're building a simple integration package that provides a `ChatParrotLink`
chat model integration for LangChain. Here's a simple example of what your project
structure might look like:
```plaintext
langchain-parrot-link/
├── langchain_parrot_link/
│ ├── __init__.py
│ └── chat_models.py
├── tests/
│ ├── __init__.py
│ └── test_chat_models.py
├── pyproject.toml
└── README.md
```
All of these files should already exist from step 1, except for
`chat_models.py` and `test_chat_models.py`! We will implement `test_chat_models.py`
later, following the [standard tests](../standard_tests) guide.
To implement `chat_models.py`, let's copy the implementation from our
[Custom Chat Model Guide](../../../../how_to/custom_chat_model).
<details>
<summary>chat_models.py</summary>
```python title="langchain_parrot_link/chat_models.py"
from typing import Any, Dict, Iterator, List, Optional
from langchain_core.callbacks import (
CallbackManagerForLLMRun,
)
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import (
AIMessage,
AIMessageChunk,
BaseMessage,
)
from langchain_core.messages.ai import UsageMetadata
from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
from pydantic import Field
class ChatParrotLink(BaseChatModel):
"""A custom chat model that echoes the first `parrot_buffer_length` characters
of the input.
When contributing an implementation to LangChain, carefully document
the model including the initialization parameters, include
an example of how to initialize the model and include any relevant
links to the underlying models documentation or API.
Example:
.. code-block:: python
model = ChatParrotLink(parrot_buffer_length=2, model="bird-brain-001")
result = model.invoke([HumanMessage(content="hello")])
result = model.batch([[HumanMessage(content="hello")],
[HumanMessage(content="world")]])
"""
model_name: str = Field(alias="model")
"""The name of the model"""
parrot_buffer_length: int
"""The number of characters from the last message of the prompt to be echoed."""
temperature: Optional[float] = None
max_tokens: Optional[int] = None
timeout: Optional[int] = None
stop: Optional[List[str]] = None
max_retries: int = 2
def _generate(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> ChatResult:
"""Override the _generate method to implement the chat model logic.
This can be a call to an API, a call to a local model, or any other
implementation that generates a response to the input prompt.
Args:
messages: the prompt composed of a list of messages.
stop: a list of strings on which the model should stop generating.
If generation stops due to a stop token, the stop token itself
SHOULD BE INCLUDED as part of the output. This is not enforced
across models right now, but it's a good practice to follow since
it makes it much easier to parse the output of the model
downstream and understand why generation stopped.
run_manager: A run manager with callbacks for the LLM.
"""
# Replace this with actual logic to generate a response from a list
# of messages.
last_message = messages[-1]
tokens = last_message.content[: self.parrot_buffer_length]
ct_input_tokens = sum(len(message.content) for message in messages)
ct_output_tokens = len(tokens)
message = AIMessage(
content=tokens,
additional_kwargs={}, # Used to add additional payload to the message
response_metadata={ # Use for response metadata
"time_in_seconds": 3,
},
usage_metadata={
"input_tokens": ct_input_tokens,
"output_tokens": ct_output_tokens,
"total_tokens": ct_input_tokens + ct_output_tokens,
},
)
##
generation = ChatGeneration(message=message)
return ChatResult(generations=[generation])
def _stream(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Iterator[ChatGenerationChunk]:
"""Stream the output of the model.
This method should be implemented if the model can generate output
in a streaming fashion. If the model does not support streaming,
do not implement it. In that case streaming requests will be automatically
handled by the _generate method.
Args:
messages: the prompt composed of a list of messages.
stop: a list of strings on which the model should stop generating.
If generation stops due to a stop token, the stop token itself
SHOULD BE INCLUDED as part of the output. This is not enforced
across models right now, but it's a good practice to follow since
it makes it much easier to parse the output of the model
downstream and understand why generation stopped.
run_manager: A run manager with callbacks for the LLM.
"""
last_message = messages[-1]
tokens = str(last_message.content[: self.parrot_buffer_length])
ct_input_tokens = sum(len(message.content) for message in messages)
for token in tokens:
usage_metadata = UsageMetadata(
{
"input_tokens": ct_input_tokens,
"output_tokens": 1,
"total_tokens": ct_input_tokens + 1,
}
)
ct_input_tokens = 0
chunk = ChatGenerationChunk(
message=AIMessageChunk(content=token, usage_metadata=usage_metadata)
)
if run_manager:
# This is optional in newer versions of LangChain
# The on_llm_new_token will be called automatically
run_manager.on_llm_new_token(token, chunk=chunk)
yield chunk
# Let's add some other information (e.g., response metadata)
chunk = ChatGenerationChunk(
message=AIMessageChunk(content="", response_metadata={"time_in_sec": 3})
)
if run_manager:
# This is optional in newer versions of LangChain
# The on_llm_new_token will be called automatically
run_manager.on_llm_new_token(token, chunk=chunk)
yield chunk
@property
def _llm_type(self) -> str:
"""Get the type of language model used by this chat model."""
return "echoing-chat-model-advanced"
@property
def _identifying_params(self) -> Dict[str, Any]:
"""Return a dictionary of identifying parameters.
This information is used by the LangChain callback system, which
is used for tracing purposes make it possible to monitor LLMs.
"""
return {
# The model name allows users to specify custom token counting
# rules in LLM monitoring applications (e.g., in LangSmith users
# can provide per token pricing for their model and monitor
# costs for the given LLM.)
"model_name": self.model_name,
}
```
</details>
## Next Steps
Now that you've implemented your package, you can move on to [testing your integration](../standard_tests) for your integration and successfully run them.

View File

@@ -0,0 +1,146 @@
---
pagination_prev: contributing/how_to/integrations/standard_tests
pagination_next: null
---
# Publishing your package
Now that your package is implemented and tested, you can:
1. Publish your package to PyPi
2. Add documentation for your package to the LangChain Monorepo
## Publishing your package to PyPi
This guide assumes you have already implemented your package and written tests for it. If you haven't done that yet, please refer to the [implementation guide](../package) and the [testing guide](../standard_tests).
Note that Poetry is not required to publish a package to PyPi, and we're using it in this guide end-to-end for convenience.
You are welcome to publish your package using any other method you prefer.
First, make sure you have a PyPi account and have logged in with Poetry:
<details>
<summary>How to create a PyPi Token</summary>
1. Go to the [PyPi website](https://pypi.org/) and create an account.
2. Verify your email address by clicking the link that PyPi emails to you.
3. Go to your account settings and click "Generate Recovery Codes" to enable 2FA. To generate an API token, you **must** have 2FA enabled currently.
4. Go to your account settings and [generate a new API token](https://pypi.org/manage/account/token/).
</details>
```bash
poetry config pypi-token.pypi <your-pypi-token>
```
Next, build your package:
```bash
poetry build
```
Finally, publish your package to PyPi:
```bash
poetry publish
```
You're all set! Your package is now available on PyPi and can be installed with `pip install langchain-parrot-link`.
## Adding documentation to the LangChain Monorepo
To add documentation for your package to the LangChain Monorepo, you will need to:
1. Fork and clone the LangChain Monorepo
2. Make a "Provider Page" at `docs/docs/integrations/providers/<your-package-name>.ipynb`
3. Make "Component Pages" at `docs/docs/integrations/<component-type>/<your-package-name>.ipynb`
4. Register your package in `libs/packages.yml`
5. Submit a PR with **only these changes** to the LangChain Monorepo
### Fork and clone the LangChain Monorepo
First, fork the [LangChain Monorepo](https://github.com/langchain-ai/langchain) to your GitHub account.
Next, clone the repository to your local machine:
```bash
git clone https://github.com/<your-username>/langchain.git
```
You're now ready to make your PR!
### Bootstrap your documentation pages with the langchain-cli (recommended)
To make it easier to create the necessary documentation pages, you can use the `langchain-cli` to bootstrap them for you.
First, install the latest version of the `langchain-cli` package:
```bash
pip install --upgrade langchain-cli
```
To see the available commands to bootstrap your documentation pages, run:
```bash
langchain-cli integration create-doc --help
```
Let's bootstrap a provider page from the root of the monorepo:
```bash
langchain-cli integration create-doc \
--component-type Provider \
--destination-dir docs/docs/integrations/providers \
--name parrot-link \
--name-class ParrotLink \
```
And a chat model component page:
```bash
langchain-cli integration create-doc \
--component-type ChatModel \
--destination-dir docs/docs/integrations/chat \
--name parrot-link \
--name-class ParrotLink \
```
And a vector store component page:
```bash
langchain-cli integration create-doc \
--component-type VectorStore \
--destination-dir docs/docs/integrations/vectorstores \
--name parrot-link \
--name-class ParrotLink \
```
These commands will create the following 3 files, which you should fill out with information about your package:
- `docs/docs/integrations/providers/parrot-link.ipynb`
- `docs/docs/integrations/chat/parrot-link.ipynb`
- `docs/docs/integrations/vectorstores/parrot-link.ipynb`
### Manually create your documentation pages (if you prefer)
If you prefer to create the documentation pages manually, you can create the same files listed
above and fill them out with information about your package.
You can view the templates that the CLI uses to create these files [here](https://github.com/langchain-ai/langchain/tree/master/libs/cli/langchain_cli/integration_template/docs) if helpful!
### Register your package in `libs/packages.yml`
Finally, add your package to the `libs/packages.yml` file in the LangChain Monorepo.
```yaml
packages:
- name: langchain-parrot-link
repo: <your github handle>/<your repo>
path: .
```
For `path`, you can use `.` if your package is in the root of your repository, or specify a subdirectory (e.g. `libs/parrot-link`) if it is in a subdirectory.
### Submit a PR with your changes
Once you have completed these steps, you can submit a PR to the LangChain Monorepo with **only these changes**.

View File

@@ -4,12 +4,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"pagination_next: contributing/how_to/integrations/publish\n",
"pagination_prev: contributing/how_to/integrations/package\n",
"---\n",
"# How to add standard tests to an integration\n",
"\n",
"When creating either a custom class for yourself or a new tool to publish in a LangChain integration, it is important to add standard tests to ensure it works as expected. This guide will show you how to add standard tests to a tool, and you can **[Skip to the test templates](#standard-test-templates-per-component)** for implementing tests for each integration.\n",
"When creating either a custom class for yourself or to publish in a LangChain integration, it is important to add standard tests to ensure it works as expected. This guide will show you how to add standard tests to a custom chat model, and you can **[Skip to the test templates](#standard-test-templates-per-component)** for implementing tests for each integration type.\n",
"\n",
"## Setup\n",
"\n",
"If you're coming from the [previous guide](../package), you have already installed these dependencies, and you can skip this section.\n",
"\n",
"First, let's install 2 dependencies:\n",
"\n",
"- `langchain-core` will define the interfaces we want to import to define our custom tool.\n",
@@ -20,45 +26,36 @@
"Because added tests in new versions of `langchain-tests` can break your CI/CD pipelines, we recommend pinning the \n",
"version of `langchain-tests` to avoid unexpected changes.\n",
"\n",
":::"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -U langchain-core langchain-tests pytest pytest-socket"
":::\n",
"\n",
"import Tabs from '@theme/Tabs';\n",
"import TabItem from '@theme/TabItem';\n",
"\n",
"<Tabs>\n",
" <TabItem value=\"poetry\" label=\"Poetry\" default>\n",
"If you followed the [previous guide](../package), you should already have these dependencies installed!\n",
"\n",
"```bash\n",
"poetry add langchain-core\n",
"poetry add --group test pytest pytest-socket langchain-tests==<latest_version>\n",
"```\n",
" </TabItem>\n",
" <TabItem value=\"pip\" label=\"Pip\">\n",
"```bash\n",
"pip install -U langchain-core pytest pytest-socket langchain-tests\n",
"\n",
"# install current package in editable mode\n",
"pip install --editable .\n",
"```\n",
" </TabItem>\n",
"</Tabs>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's say we're publishing a package, `langchain_parrot_link`, that exposes a\n",
"tool called `ParrotMultiplyTool`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# title=\"langchain_parrot_link/tools.py\"\n",
"from langchain_core.tools import BaseTool\n",
"\n",
"\n",
"class ParrotMultiplyTool(BaseTool):\n",
" name: str = \"ParrotMultiplyTool\"\n",
" description: str = (\n",
" \"Multiply two numbers like a parrot. Parrots always add \"\n",
" \"eighty for their matey.\"\n",
" )\n",
"\n",
" def _run(self, a: int, b: int) -> int:\n",
" return a * b + 80"
"Let's say we're publishing a package, `langchain_parrot_link`, that exposes the chat model from the [guide on implementing the package](../package). We can add the standard tests to the package by following the steps below."
]
},
{
@@ -68,133 +65,33 @@
"And we'll assume you've structured your package the same way as the main LangChain\n",
"packages:\n",
"\n",
"```\n",
"/\n",
"```plaintext\n",
"langchain-parrot-link/\n",
"├── langchain_parrot_link/\n",
"│ ── tools.py\n",
"└── tests/\n",
" ├── unit_tests/\n",
" ── test_tools.py\n",
" └── integration_tests/\n",
" └── test_tools.py\n",
"│ ── __init__.py\n",
"│ └── chat_models.py\n",
"├── tests/\n",
"│ ── __init__.py\n",
" └── test_chat_models.py\n",
"├── pyproject.toml\n",
"└── README.md\n",
"```\n",
"\n",
"## Add and configure standard tests\n",
"\n",
"There are 2 namespaces in the `langchain-tests` package: \n",
"\n",
"- [unit tests](../../../concepts/testing.mdx#unit-tests) (`langchain_tests.unit_tests`): designed to be used to test the tool in isolation and without access to external services\n",
"- [integration tests](../../../concepts/testing.mdx#unit-tests) (`langchain_tests.integration_tests`): designed to be used to test the tool with access to external services (in particular, the external service that the tool is designed to interact with).\n",
"- [unit tests](../../../concepts/testing.mdx#unit-tests) (`langchain_tests.unit_tests`): designed to be used to test the component in isolation and without access to external services\n",
"- [integration tests](../../../concepts/testing.mdx#unit-tests) (`langchain_tests.integration_tests`): designed to be used to test the component with access to external services (in particular, the external service that the component is designed to interact with).\n",
"\n",
"Both types of tests are implemented as [`pytest` class-based test suites](https://docs.pytest.org/en/7.1.x/getting-started.html#group-multiple-tests-in-a-class).\n",
"\n",
"By subclassing the base classes for each type of standard test (see below), you get all of the standard tests for that type, and you\n",
"can override the properties that the test suite uses to configure the tests.\n",
"\n",
"### Standard tools tests\n",
"### Standard chat model tests\n",
"\n",
"Here's how you would configure the standard unit tests for the custom tool, e.g. in `tests/test_tools.py`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"title": "tests/test_custom_tool.py"
},
"outputs": [],
"source": [
"# title=\"tests/unit_tests/test_tools.py\"\n",
"from typing import Type\n",
"\n",
"from langchain_parrot_link.tools import ParrotMultiplyTool\n",
"from langchain_tests.unit_tests import ToolsUnitTests\n",
"\n",
"\n",
"class TestParrotMultiplyToolUnit(ToolsUnitTests):\n",
" @property\n",
" def tool_constructor(self) -> Type[ParrotMultiplyTool]:\n",
" return ParrotMultiplyTool\n",
"\n",
" @property\n",
" def tool_constructor_params(self) -> dict:\n",
" # if your tool constructor instead required initialization arguments like\n",
" # `def __init__(self, some_arg: int):`, you would return those here\n",
" # as a dictionary, e.g.: `return {'some_arg': 42}`\n",
" return {}\n",
"\n",
" @property\n",
" def tool_invoke_params_example(self) -> dict:\n",
" \"\"\"\n",
" Returns a dictionary representing the \"args\" of an example tool call.\n",
"\n",
" This should NOT be a ToolCall dict - i.e. it should not\n",
" have {\"name\", \"id\", \"args\"} keys.\n",
" \"\"\"\n",
" return {\"a\": 2, \"b\": 3}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# title=\"tests/integration_tests/test_tools.py\"\n",
"from typing import Type\n",
"\n",
"from langchain_parrot_link.tools import ParrotMultiplyTool\n",
"from langchain_tests.integration_tests import ToolsIntegrationTests\n",
"\n",
"\n",
"class TestParrotMultiplyToolIntegration(ToolsIntegrationTests):\n",
" @property\n",
" def tool_constructor(self) -> Type[ParrotMultiplyTool]:\n",
" return ParrotMultiplyTool\n",
"\n",
" @property\n",
" def tool_constructor_params(self) -> dict:\n",
" # if your tool constructor instead required initialization arguments like\n",
" # `def __init__(self, some_arg: int):`, you would return those here\n",
" # as a dictionary, e.g.: `return {'some_arg': 42}`\n",
" return {}\n",
"\n",
" @property\n",
" def tool_invoke_params_example(self) -> dict:\n",
" \"\"\"\n",
" Returns a dictionary representing the \"args\" of an example tool call.\n",
"\n",
" This should NOT be a ToolCall dict - i.e. it should not\n",
" have {\"name\", \"id\", \"args\"} keys.\n",
" \"\"\"\n",
" return {\"a\": 2, \"b\": 3}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and you would run these with the following commands from your project root\n",
"\n",
"```bash\n",
"# run unit tests without network access\n",
"pytest --disable-socket --allow-unix-socket tests/unit_tests\n",
"\n",
"# run integration tests\n",
"pytest tests/integration_tests\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Standard test templates per component:\n",
"\n",
"Above, we implement the **unit** and **integration** standard tests for a tool. Below are the templates for implementing the standard tests for each component:\n",
"\n",
"<details>\n",
" <summary>Chat Models</summary>"
"Here's how you would configure the standard unit tests for the custom chat model:"
]
},
{
@@ -217,7 +114,11 @@
"\n",
" @property\n",
" def chat_model_params(self) -> dict:\n",
" return {\"model\": \"bird-brain-001\", \"temperature\": 0}"
" return {\n",
" \"model\": \"bird-brain-001\",\n",
" \"temperature\": 0,\n",
" \"parrot_buffer_length\": 50,\n",
" }"
]
},
{
@@ -240,7 +141,110 @@
"\n",
" @property\n",
" def chat_model_params(self) -> dict:\n",
" return {\"model\": \"bird-brain-001\", \"temperature\": 0}"
" return {\n",
" \"model\": \"bird-brain-001\",\n",
" \"temperature\": 0,\n",
" \"parrot_buffer_length\": 50,\n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and you would run these with the following commands from your project root\n",
"\n",
"<Tabs>\n",
" <TabItem value=\"poetry\" label=\"Poetry\" default>\n",
"\n",
"```bash\n",
"# run unit tests without network access\n",
"poetry run pytest --disable-socket --allow-unix-socket tests/unit_tests\n",
"\n",
"# run integration tests\n",
"poetry run pytest tests/integration_tests\n",
"```\n",
"\n",
" </TabItem>\n",
" <TabItem value=\"pip\" label=\"Pip\">\n",
"\n",
"```bash\n",
"# run unit tests without network access\n",
"pytest --disable-socket --allow-unix-socket tests/unit_tests\n",
"\n",
"# run integration tests\n",
"pytest tests/integration_tests\n",
"```\n",
"\n",
" </TabItem>\n",
"</Tabs>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Standard test templates per component:\n",
"\n",
"Above, we implement the **unit** and **integration** standard tests for a tool. Below are the templates for implementing the standard tests for each component:\n",
"\n",
"<details>\n",
" <summary>Chat Models</summary>\n",
" <p>Note: The standard tests for chat models are implemented in the example in the main body of this guide too.</p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# title=\"tests/unit_tests/test_chat_models.py\"\n",
"from typing import Type\n",
"\n",
"from langchain_parrot_link.chat_models import ChatParrotLink\n",
"from langchain_tests.unit_tests import ChatModelUnitTests\n",
"\n",
"\n",
"class TestChatParrotLinkUnit(ChatModelUnitTests):\n",
" @property\n",
" def chat_model_class(self) -> Type[ChatParrotLink]:\n",
" return ChatParrotLink\n",
"\n",
" @property\n",
" def chat_model_params(self) -> dict:\n",
" return {\n",
" \"model\": \"bird-brain-001\",\n",
" \"temperature\": 0,\n",
" \"parrot_buffer_length\": 50,\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# title=\"tests/integration_tests/test_chat_models.py\"\n",
"from typing import Type\n",
"\n",
"from langchain_parrot_link.chat_models import ChatParrotLink\n",
"from langchain_tests.integration_tests import ChatModelIntegrationTests\n",
"\n",
"\n",
"class TestChatParrotLinkIntegration(ChatModelIntegrationTests):\n",
" @property\n",
" def chat_model_class(self) -> Type[ChatParrotLink]:\n",
" return ChatParrotLink\n",
"\n",
" @property\n",
" def chat_model_params(self) -> dict:\n",
" return {\n",
" \"model\": \"bird-brain-001\",\n",
" \"temperature\": 0,\n",
" \"parrot_buffer_length\": 50,\n",
" }"
]
},
{
@@ -304,8 +308,7 @@
"source": [
"</details>\n",
"<details>\n",
" <summary>Tools/Toolkits</summary>\n",
" <p>Note: The standard tests for tools/toolkits are implemented in the example in the main body of this guide too.</p>"
" <summary>Tools/Toolkits</summary>"
]
},
{

View File

@@ -48,7 +48,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 4,
"id": "c5046e6a-8b09-4a99-b6e6-7a605aac5738",
"metadata": {
"tags": []
@@ -162,27 +162,32 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"id": "25ba32e5-5a6d-49f4-bb68-911827b84d61",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from typing import Any, AsyncIterator, Dict, Iterator, List, Optional\n",
"from typing import Any, Dict, Iterator, List, Optional\n",
"\n",
"from langchain_core.callbacks import (\n",
" AsyncCallbackManagerForLLMRun,\n",
" CallbackManagerForLLMRun,\n",
")\n",
"from langchain_core.language_models import BaseChatModel, SimpleChatModel\n",
"from langchain_core.messages import AIMessageChunk, BaseMessage, HumanMessage\n",
"from langchain_core.language_models import BaseChatModel\n",
"from langchain_core.messages import (\n",
" AIMessage,\n",
" AIMessageChunk,\n",
" BaseMessage,\n",
")\n",
"from langchain_core.messages.ai import UsageMetadata\n",
"from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult\n",
"from langchain_core.runnables import run_in_executor\n",
"from pydantic import Field\n",
"\n",
"\n",
"class CustomChatModelAdvanced(BaseChatModel):\n",
" \"\"\"A custom chat model that echoes the first `n` characters of the input.\n",
"class ChatParrotLink(BaseChatModel):\n",
" \"\"\"A custom chat model that echoes the first `parrot_buffer_length` characters\n",
" of the input.\n",
"\n",
" When contributing an implementation to LangChain, carefully document\n",
" the model including the initialization parameters, include\n",
@@ -193,16 +198,21 @@
"\n",
" .. code-block:: python\n",
"\n",
" model = CustomChatModel(n=2)\n",
" model = ChatParrotLink(parrot_buffer_length=2, model=\"bird-brain-001\")\n",
" result = model.invoke([HumanMessage(content=\"hello\")])\n",
" result = model.batch([[HumanMessage(content=\"hello\")],\n",
" [HumanMessage(content=\"world\")]])\n",
" \"\"\"\n",
"\n",
" model_name: str\n",
" model_name: str = Field(alias=\"model\")\n",
" \"\"\"The name of the model\"\"\"\n",
" n: int\n",
" parrot_buffer_length: int\n",
" \"\"\"The number of characters from the last message of the prompt to be echoed.\"\"\"\n",
" temperature: Optional[float] = None\n",
" max_tokens: Optional[int] = None\n",
" timeout: Optional[int] = None\n",
" stop: Optional[List[str]] = None\n",
" max_retries: int = 2\n",
"\n",
" def _generate(\n",
" self,\n",
@@ -229,13 +239,20 @@
" # Replace this with actual logic to generate a response from a list\n",
" # of messages.\n",
" last_message = messages[-1]\n",
" tokens = last_message.content[: self.n]\n",
" tokens = last_message.content[: self.parrot_buffer_length]\n",
" ct_input_tokens = sum(len(message.content) for message in messages)\n",
" ct_output_tokens = len(tokens)\n",
" message = AIMessage(\n",
" content=tokens,\n",
" additional_kwargs={}, # Used to add additional payload (e.g., function calling request)\n",
" additional_kwargs={}, # Used to add additional payload to the message\n",
" response_metadata={ # Use for response metadata\n",
" \"time_in_seconds\": 3,\n",
" },\n",
" usage_metadata={\n",
" \"input_tokens\": ct_input_tokens,\n",
" \"output_tokens\": ct_output_tokens,\n",
" \"total_tokens\": ct_input_tokens + ct_output_tokens,\n",
" },\n",
" )\n",
" ##\n",
"\n",
@@ -267,10 +284,21 @@
" run_manager: A run manager with callbacks for the LLM.\n",
" \"\"\"\n",
" last_message = messages[-1]\n",
" tokens = last_message.content[: self.n]\n",
" tokens = str(last_message.content[: self.parrot_buffer_length])\n",
" ct_input_tokens = sum(len(message.content) for message in messages)\n",
"\n",
" for token in tokens:\n",
" chunk = ChatGenerationChunk(message=AIMessageChunk(content=token))\n",
" usage_metadata = UsageMetadata(\n",
" {\n",
" \"input_tokens\": ct_input_tokens,\n",
" \"output_tokens\": 1,\n",
" \"total_tokens\": ct_input_tokens + 1,\n",
" }\n",
" )\n",
" ct_input_tokens = 0\n",
" chunk = ChatGenerationChunk(\n",
" message=AIMessageChunk(content=token, usage_metadata=usage_metadata)\n",
" )\n",
"\n",
" if run_manager:\n",
" # This is optional in newer versions of LangChain\n",
@@ -322,7 +350,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 5,
"id": "27689f30-dcd2-466b-ba9d-f60b7d434110",
"metadata": {
"tags": []
@@ -331,16 +359,16 @@
{
"data": {
"text/plain": [
"AIMessage(content='Meo', response_metadata={'time_in_seconds': 3}, id='run-ddb42bd6-4fdd-4bd2-8be5-e11b67d3ac29-0')"
"AIMessage(content='Meo', additional_kwargs={}, response_metadata={'time_in_seconds': 3}, id='run-cf11aeb6-8ab6-43d7-8c68-c1ef89b6d78e-0', usage_metadata={'input_tokens': 26, 'output_tokens': 3, 'total_tokens': 29})"
]
},
"execution_count": 6,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = CustomChatModelAdvanced(n=3, model_name=\"my_custom_model\")\n",
"model = ChatParrotLink(parrot_buffer_length=3, model=\"my_custom_model\")\n",
"\n",
"model.invoke(\n",
" [\n",
@@ -353,7 +381,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 6,
"id": "406436df-31bf-466b-9c3d-39db9d6b6407",
"metadata": {
"tags": []
@@ -362,10 +390,10 @@
{
"data": {
"text/plain": [
"AIMessage(content='hel', response_metadata={'time_in_seconds': 3}, id='run-4d3cc912-44aa-454b-977b-ca02be06c12e-0')"
"AIMessage(content='hel', additional_kwargs={}, response_metadata={'time_in_seconds': 3}, id='run-618e5ed4-d611-4083-8cf1-c270726be8d9-0', usage_metadata={'input_tokens': 5, 'output_tokens': 3, 'total_tokens': 8})"
]
},
"execution_count": 7,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -376,7 +404,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 7,
"id": "a72ffa46-6004-41ef-bbe4-56fa17a029e2",
"metadata": {
"tags": []
@@ -385,11 +413,11 @@
{
"data": {
"text/plain": [
"[AIMessage(content='hel', response_metadata={'time_in_seconds': 3}, id='run-9620e228-1912-4582-8aa1-176813afec49-0'),\n",
" AIMessage(content='goo', response_metadata={'time_in_seconds': 3}, id='run-1ce8cdf8-6f75-448e-82f7-1bb4a121df93-0')]"
"[AIMessage(content='hel', additional_kwargs={}, response_metadata={'time_in_seconds': 3}, id='run-eea4ed7d-d750-48dc-90c0-7acca1ff388f-0', usage_metadata={'input_tokens': 5, 'output_tokens': 3, 'total_tokens': 8}),\n",
" AIMessage(content='goo', additional_kwargs={}, response_metadata={'time_in_seconds': 3}, id='run-07cfc5c1-3c62-485f-b1e0-3d46e1547287-0', usage_metadata={'input_tokens': 7, 'output_tokens': 3, 'total_tokens': 10})]"
]
},
"execution_count": 8,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
@@ -400,7 +428,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 8,
"id": "3633be2c-2ea0-42f9-a72f-3b5240690b55",
"metadata": {
"tags": []
@@ -429,7 +457,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 9,
"id": "b7d73995-eeab-48c6-a7d8-32c98ba29fc2",
"metadata": {
"tags": []
@@ -458,7 +486,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 10,
"id": "17840eba-8ff4-4e73-8e4f-85f16eb1c9d0",
"metadata": {
"tags": []
@@ -468,20 +496,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
"{'event': 'on_chat_model_start', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'name': 'CustomChatModelAdvanced', 'tags': [], 'metadata': {}, 'data': {'input': 'cat'}}\n",
"{'event': 'on_chat_model_stream', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='c', id='run-125a2a16-b9cd-40de-aa08-8aa9180b07d0')}}\n",
"{'event': 'on_chat_model_stream', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='a', id='run-125a2a16-b9cd-40de-aa08-8aa9180b07d0')}}\n",
"{'event': 'on_chat_model_stream', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='t', id='run-125a2a16-b9cd-40de-aa08-8aa9180b07d0')}}\n",
"{'event': 'on_chat_model_stream', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='', response_metadata={'time_in_sec': 3}, id='run-125a2a16-b9cd-40de-aa08-8aa9180b07d0')}}\n",
"{'event': 'on_chat_model_end', 'name': 'CustomChatModelAdvanced', 'run_id': '125a2a16-b9cd-40de-aa08-8aa9180b07d0', 'tags': [], 'metadata': {}, 'data': {'output': AIMessageChunk(content='cat', response_metadata={'time_in_sec': 3}, id='run-125a2a16-b9cd-40de-aa08-8aa9180b07d0')}}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/eugene/src/langchain/libs/core/langchain_core/_api/beta_decorator.py:87: LangChainBetaWarning: This API is in beta and may change in the future.\n",
" warn_beta(\n"
"{'event': 'on_chat_model_start', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'name': 'ChatParrotLink', 'tags': [], 'metadata': {}, 'data': {'input': 'cat'}, 'parent_ids': []}\n",
"{'event': 'on_chat_model_stream', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'tags': [], 'metadata': {}, 'name': 'ChatParrotLink', 'data': {'chunk': AIMessageChunk(content='c', additional_kwargs={}, response_metadata={}, id='run-3f0b5501-5c78-45b3-92fc-8322a6a5024a', usage_metadata={'input_tokens': 3, 'output_tokens': 1, 'total_tokens': 4})}, 'parent_ids': []}\n",
"{'event': 'on_chat_model_stream', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'tags': [], 'metadata': {}, 'name': 'ChatParrotLink', 'data': {'chunk': AIMessageChunk(content='a', additional_kwargs={}, response_metadata={}, id='run-3f0b5501-5c78-45b3-92fc-8322a6a5024a', usage_metadata={'input_tokens': 0, 'output_tokens': 1, 'total_tokens': 1})}, 'parent_ids': []}\n",
"{'event': 'on_chat_model_stream', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'tags': [], 'metadata': {}, 'name': 'ChatParrotLink', 'data': {'chunk': AIMessageChunk(content='t', additional_kwargs={}, response_metadata={}, id='run-3f0b5501-5c78-45b3-92fc-8322a6a5024a', usage_metadata={'input_tokens': 0, 'output_tokens': 1, 'total_tokens': 1})}, 'parent_ids': []}\n",
"{'event': 'on_chat_model_stream', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'tags': [], 'metadata': {}, 'name': 'ChatParrotLink', 'data': {'chunk': AIMessageChunk(content='', additional_kwargs={}, response_metadata={'time_in_sec': 3}, id='run-3f0b5501-5c78-45b3-92fc-8322a6a5024a')}, 'parent_ids': []}\n",
"{'event': 'on_chat_model_end', 'name': 'ChatParrotLink', 'run_id': '3f0b5501-5c78-45b3-92fc-8322a6a5024a', 'tags': [], 'metadata': {}, 'data': {'output': AIMessageChunk(content='cat', additional_kwargs={}, response_metadata={'time_in_sec': 3}, id='run-3f0b5501-5c78-45b3-92fc-8322a6a5024a', usage_metadata={'input_tokens': 3, 'output_tokens': 3, 'total_tokens': 6})}, 'parent_ids': []}\n"
]
}
],
@@ -547,7 +567,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": ".venv",
"language": "python",
"name": "python3"
},
@@ -561,7 +581,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.11.4"
}
},
"nbformat": 4,

View File

@@ -140,6 +140,8 @@ TEMPLATE_MAP: dict[str, str] = {
"Retriever": "retrievers.ipynb",
}
_component_types_str = ", ".join(f"`{k}`" for k in TEMPLATE_MAP.keys())
@integration_cli.command()
def create_doc(
@@ -170,8 +172,7 @@ def create_doc(
str,
typer.Option(
help=(
"The type of component. Currently only 'ChatModel', "
"'DocumentLoader', 'VectorStore' supported."
f"The type of component. Currently supported: {_component_types_str}."
),
),
] = "ChatModel",
@@ -220,8 +221,7 @@ def create_doc(
docs_template = template_dir / TEMPLATE_MAP[component_type]
else:
raise ValueError(
f"Unrecognized {component_type=}. Expected one of 'ChatModel', "
f"'DocumentLoader', 'Tool'."
f"Unrecognized {component_type=}. Expected one of {_component_types_str}."
)
shutil.copy(docs_template, destination_path)

View File

@@ -493,9 +493,13 @@ class ChatModelIntegrationTests(ChatModelTests):
message=AIMessage(
content="Output text",
usage_metadata={
"input_tokens": 0,
"output_tokens": 240,
"total_tokens": 590,
"input_tokens": (
num_input_tokens if is_first_chunk else 0
),
"output_tokens": 11,
"total_tokens": (
11+num_input_tokens if is_first_chunk else 11
),
"input_token_details": {
"audio": 10,
"cache_creation": 200,

View File

@@ -0,0 +1,167 @@
from typing import Any, Dict, Iterator, List, Optional
from langchain_core.callbacks import (
CallbackManagerForLLMRun,
)
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import (
AIMessage,
AIMessageChunk,
BaseMessage,
)
from langchain_core.messages.ai import UsageMetadata
from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
from pydantic import Field
class ChatParrotLink(BaseChatModel):
"""A custom chat model that echoes the first `parrot_buffer_length` characters
of the input.
When contributing an implementation to LangChain, carefully document
the model including the initialization parameters, include
an example of how to initialize the model and include any relevant
links to the underlying models documentation or API.
Example:
.. code-block:: python
model = ChatParrotLink(parrot_buffer_length=2, model="bird-brain-001")
result = model.invoke([HumanMessage(content="hello")])
result = model.batch([[HumanMessage(content="hello")],
[HumanMessage(content="world")]])
"""
model_name: str = Field(alias="model")
"""The name of the model"""
parrot_buffer_length: int
"""The number of characters from the last message of the prompt to be echoed."""
temperature: Optional[float] = None
max_tokens: Optional[int] = None
timeout: Optional[int] = None
stop: Optional[List[str]] = None
max_retries: int = 2
def _generate(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> ChatResult:
"""Override the _generate method to implement the chat model logic.
This can be a call to an API, a call to a local model, or any other
implementation that generates a response to the input prompt.
Args:
messages: the prompt composed of a list of messages.
stop: a list of strings on which the model should stop generating.
If generation stops due to a stop token, the stop token itself
SHOULD BE INCLUDED as part of the output. This is not enforced
across models right now, but it's a good practice to follow since
it makes it much easier to parse the output of the model
downstream and understand why generation stopped.
run_manager: A run manager with callbacks for the LLM.
"""
# Replace this with actual logic to generate a response from a list
# of messages.
last_message = messages[-1]
tokens = last_message.content[: self.parrot_buffer_length]
ct_input_tokens = sum(len(message.content) for message in messages)
ct_output_tokens = len(tokens)
message = AIMessage(
content=tokens,
additional_kwargs={}, # Used to add additional payload to the message
response_metadata={ # Use for response metadata
"time_in_seconds": 3,
},
usage_metadata={
"input_tokens": ct_input_tokens,
"output_tokens": ct_output_tokens,
"total_tokens": ct_input_tokens + ct_output_tokens,
},
)
##
generation = ChatGeneration(message=message)
return ChatResult(generations=[generation])
def _stream(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Iterator[ChatGenerationChunk]:
"""Stream the output of the model.
This method should be implemented if the model can generate output
in a streaming fashion. If the model does not support streaming,
do not implement it. In that case streaming requests will be automatically
handled by the _generate method.
Args:
messages: the prompt composed of a list of messages.
stop: a list of strings on which the model should stop generating.
If generation stops due to a stop token, the stop token itself
SHOULD BE INCLUDED as part of the output. This is not enforced
across models right now, but it's a good practice to follow since
it makes it much easier to parse the output of the model
downstream and understand why generation stopped.
run_manager: A run manager with callbacks for the LLM.
"""
last_message = messages[-1]
tokens = str(last_message.content[: self.parrot_buffer_length])
ct_input_tokens = sum(len(message.content) for message in messages)
for token in tokens:
usage_metadata = UsageMetadata(
{
"input_tokens": ct_input_tokens,
"output_tokens": 1,
"total_tokens": ct_input_tokens + 1,
}
)
ct_input_tokens = 0
chunk = ChatGenerationChunk(
message=AIMessageChunk(content=token, usage_metadata=usage_metadata)
)
if run_manager:
# This is optional in newer versions of LangChain
# The on_llm_new_token will be called automatically
run_manager.on_llm_new_token(token, chunk=chunk)
yield chunk
# Let's add some other information (e.g., response metadata)
chunk = ChatGenerationChunk(
message=AIMessageChunk(content="", response_metadata={"time_in_sec": 3})
)
if run_manager:
# This is optional in newer versions of LangChain
# The on_llm_new_token will be called automatically
run_manager.on_llm_new_token(token, chunk=chunk)
yield chunk
@property
def _llm_type(self) -> str:
"""Get the type of language model used by this chat model."""
return "echoing-chat-model-advanced"
@property
def _identifying_params(self) -> Dict[str, Any]:
"""Return a dictionary of identifying parameters.
This information is used by the LangChain callback system, which
is used for tracing purposes make it possible to monitor LLMs.
"""
return {
# The model name allows users to specify custom token counting
# rules in LLM monitoring applications (e.g., in LangSmith users
# can provide per token pricing for their model and monitor
# costs for the given LLM.)
"model_name": self.model_name,
}

View File

@@ -0,0 +1,30 @@
"""
Test the standard tests on the custom chat model in the docs
"""
from typing import Type
from langchain_tests.integration_tests import ChatModelIntegrationTests
from langchain_tests.unit_tests import ChatModelUnitTests
from .custom_chat_model import ChatParrotLink
class TestChatParrotLinkUnit(ChatModelUnitTests):
@property
def chat_model_class(self) -> Type[ChatParrotLink]:
return ChatParrotLink
@property
def chat_model_params(self) -> dict:
return {"model": "bird-brain-001", "temperature": 0, "parrot_buffer_length": 50}
class TestChatParrotLinkIntegration(ChatModelIntegrationTests):
@property
def chat_model_class(self) -> Type[ChatParrotLink]:
return ChatParrotLink
@property
def chat_model_params(self) -> dict:
return {"model": "bird-brain-001", "temperature": 0, "parrot_buffer_length": 50}