diff --git a/docs/docs/contributing/how_to/integrations/index.mdx b/docs/docs/contributing/how_to/integrations/index.mdx index eeb0f73b3fd..d8cdf2788c0 100644 --- a/docs/docs/contributing/how_to/integrations/index.mdx +++ b/docs/docs/contributing/how_to/integrations/index.mdx @@ -4,14 +4,16 @@ pagination_next: contributing/how_to/integrations/package # Contribute Integrations -LangChain integrations are packages that provide access to language models, vector stores, and other components that can be used in LangChain. +Integrations are a core component of LangChain. +LangChain provides standard interfaces for several different components (language models, vector stores, etc) that are crucial when building LLM applications. -This guide will walk you through how to contribute new integrations to LangChain, by -publishing an integration package to PyPi, and adding documentation for it -to the LangChain Monorepo. -These instructions will evolve over the next few months as we improve our integration -processes. +## Why contribute an integration to LangChain? + +- **Discoverability:** LangChain is the most used framework for building LLM applications, with over 20 million monthly downloads. LangChain integrations are discoverable by a large community of GenAI builders. +- **Interoptability:** LangChain components expose a standard interface, allowing developers to easily swap them for each other. If you implement a LangChain integration, any developer using a different component will easily be able to swap yours in. +- **Best Practices:** Through their standard interface, LangChain components encourage and facilitate best practices (streaming, async, etc) + ## Components to Integrate @@ -22,8 +24,7 @@ supported in LangChain ::: -While any component can be integrated into LangChain, at this time we are only accepting -new integrations in the docs of the following kinds: +While any component can be integrated into LangChain, there are specific types of integrations we encourage more: @@ -60,18 +61,30 @@ new integrations in the docs of the following kinds: ## How to contribute an integration -The only step necessary to "be" a LangChain integration is to add documentation -that will render on this site (https://python.langchain.com/). +In order to contribute an integration, you should follow these steps: -As a prerequisite to adding your integration to our documentation, you must: - -1. Confirm that your integration is in the [list of components](#components-to-integrate) we are currently accepting. -2. [Implement your package](./package.mdx) and publish it to a public github repository. +1. Confirm that your integration is in the [list of components](#components-to-integrate) we are currently encouraging. +2. [Implement your package](/docs/contributing/how_to/integrations/package/) and publish it to a public github repository. 3. [Implement the standard tests](./standard_tests) for your integration and successfully run them. 4. [Publish your integration](./publish.mdx) by publishing the package to PyPi and add docs in the `docs/docs/integrations` directory of the LangChain monorepo. +5. [Optional] Open and merge a PR to add documentation for your integration to the official LangChain docs. +6. [Optional] Engage with the LangChain team for joint co-marketing ([see below](#co-marketing)). -Once you have completed these steps, you can submit a PR to the LangChain monorepo to add your integration to the documentation. +## Co-Marketing + +With over 20 million monthly downloads, LangChain has a large audience of developers building LLM applications. +Besides just adding integrations, we also like to show them examples of cool tools or APIs they can use. + +While traditionally called "co-marketing", we like to think of this more as "co-education". +For that reason, while we are happy to highlight your integration through our social media channels, we prefer to highlight examples that also serve some educational purpose. +Our main social media channels are Twitter and LinkedIn. + +Here are some heuristics for types of content we are excited to promote: + +- **Integration announcement:** If you announce the integration with a link to the LangChain documentation page, we are happy to re-tweet/re-share on Twitter/LinkedIn. +- **Educational content:** We highlight good educational content on the weekends - if you write a good blog or make a good YouTube video, we are happy to share there! Note that we prefer content that is NOT framed as "here's how to use integration XYZ", but rather "here's how to do ABC", as we find that is more educational and helpful for developers. +- **End-to-end applications:** End-to-end applications are great resources for developers looking to build. We prefer to highlight applications that are more complex/agentic in nature, and that use [LangGraph](https://github.com/langchain-ai/langgraph) as the orchestration framework. We get particularly excited about anything involving long-term memory, human-in-the-loop interaction patterns, or multi-agent architectures. +- **Research:** We love highlighting novel research! Whether it is research built on top of LangChain or that integrates with it. ## Further Reading - -To get started, let's learn [how to bootstrap a new integration package](./package.mdx) for LangChain. +To get started, let's learn [how to implement an integration package](/docs/contributing/how_to/integrations/package/) for LangChain. diff --git a/docs/docs/contributing/how_to/integrations/package.mdx b/docs/docs/contributing/how_to/integrations/package.mdx index 8189ce97f8b..2b5bb97865e 100644 --- a/docs/docs/contributing/how_to/integrations/package.mdx +++ b/docs/docs/contributing/how_to/integrations/package.mdx @@ -2,23 +2,117 @@ pagination_next: contributing/how_to/integrations/standard_tests pagination_prev: contributing/how_to/integrations/index --- -# How to bootstrap a new integration package +# How to implement an integration package -This guide walks through the process of publishing a new LangChain integration -package to PyPi. +This guide walks through the process of implementing a LangChain integration +package. Integration packages are just Python packages that can be installed with `pip install `, which contain classes that are compatible with LangChain's core interfaces. +We will cover: + +1. How to implement components, such as [chat models](/docs/concepts/chat_models/) and [vector stores](/docs/concepts/vectorstores/), that adhere +to the LangChain interface; +2. (Optional) How to bootstrap a new integration package. + +## Implementing LangChain components + +LangChain components are subclasses of base classes in [langchain-core](/docs/concepts/architecture/#langchain-core). +Examples include [chat models](/docs/concepts/chat_models/), +[vector stores](/docs/concepts/vectorstores/), [tools](/docs/concepts/tools/), +[embedding models](/docs/concepts/embedding_models/) and [retrievers](/docs/concepts/retrievers/). + +Your integration package will typically implement a subclass of at least one of these +components. Expand the tabs below to see details on each. + +
+ Chat models + +Refer to the [Custom Chat Model Guide](/docs/how_to/custom_chat_model) guide for +detail on a starter chat model [implementation](/docs/how_to/custom_chat_model/#implementation). + +:::tip + +The model from the [Custom Chat Model Guide](/docs/how_to/custom_chat_model) is tested +against the standard unit and integration tests in the LangChain Github repository. +You can also access that implementation directly from Github +[here](https://github.com/langchain-ai/langchain/blob/master/libs/standard-tests/tests/unit_tests/custom_chat_model.py). + +::: + +
+ +
+ Vector stores + +Your vector store implementation will depend on your chosen database technology. +`langchain-core` includes a minimal +[in-memory vector store](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.in_memory.InMemoryVectorStore.html) +that we can use as a guide. You can access the code [here](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/vectorstores/in_memory.py). + +All vector stores must inherit from the [VectorStore](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.base.VectorStore.html) +base class. This interface consists of methods for writing, deleting and searching +for documents in the vector store. + +`VectorStore` supports a variety of synchronous and asynchronous search types (e.g., +nearest-neighbor or maximum marginal relevance), as well as interfaces for adding +documents to the store. See the [API Reference](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.base.VectorStore.html) +for all supported methods. The required methods are tabulated below: + +| Method/Property | Description | +|------------------------ |------------------------------------------------------| +| `add_documents` | Add documents to the vector store. | +| `delete` | Delete selected documents from vector store (by IDs) | +| `get_by_ids` | Get selected documents from vector store (by IDs) | +| `similarity_search` | Get documents most similar to a query. | +| `embeddings` (property) | Embeddings object for vector store. | +| `from_texts` | Instantiate vector store via adding texts. | + +Note that `InMemoryVectorStore` implements some optional search types, as well as +convenience methods for loading and dumping the object to a file, but this is not +necessary for all implementations. + +:::tip + +The [in-memory vector store](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/vectorstores/in_memory.py) +is tested against the standard tests in the LangChain Github repository. + +::: + +
+ + + +## (Optional) bootstrapping a new integration package + In this guide, we will be using [Poetry](https://python-poetry.org/) for dependency management and packaging, and you're welcome to use any other tools you prefer. -## **Prerequisites** +### **Prerequisites** - [GitHub](https://github.com) account - [PyPi](https://pypi.org/) account -## Boostrapping a new Python package with Poetry +### Boostrapping a new Python package with Poetry First, install Poetry: @@ -64,7 +158,7 @@ poetry install --with test You're now ready to start writing your integration package! -## Writing your integration +### Writing your integration Let's say you're building a simple integration package that provides a `ChatParrotLink` chat model integration for LangChain. Here's a simple example of what your project @@ -86,183 +180,10 @@ All of these files should already exist from step 1, except for `chat_models.py` and `test_chat_models.py`! We will implement `test_chat_models.py` later, following the [standard tests](../standard_tests) guide. -To implement `chat_models.py`, let's copy the implementation from our -[Custom Chat Model Guide](../../../../how_to/custom_chat_model). +For `chat_models.py`, simply paste the contents of the chat model implementation +[above](#implementing-langchain-components). -
- chat_models.py -```python title="langchain_parrot_link/chat_models.py" -from typing import Any, Dict, Iterator, List, Optional - -from langchain_core.callbacks import ( - CallbackManagerForLLMRun, -) -from langchain_core.language_models import BaseChatModel -from langchain_core.messages import ( - AIMessage, - AIMessageChunk, - BaseMessage, -) -from langchain_core.messages.ai import UsageMetadata -from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult -from pydantic import Field - - -class ChatParrotLink(BaseChatModel): - """A custom chat model that echoes the first `parrot_buffer_length` characters - of the input. - - When contributing an implementation to LangChain, carefully document - the model including the initialization parameters, include - an example of how to initialize the model and include any relevant - links to the underlying models documentation or API. - - Example: - - .. code-block:: python - - model = ChatParrotLink(parrot_buffer_length=2, model="bird-brain-001") - result = model.invoke([HumanMessage(content="hello")]) - result = model.batch([[HumanMessage(content="hello")], - [HumanMessage(content="world")]]) - """ - - model_name: str = Field(alias="model") - """The name of the model""" - parrot_buffer_length: int - """The number of characters from the last message of the prompt to be echoed.""" - temperature: Optional[float] = None - max_tokens: Optional[int] = None - timeout: Optional[int] = None - stop: Optional[List[str]] = None - max_retries: int = 2 - - def _generate( - self, - messages: List[BaseMessage], - stop: Optional[List[str]] = None, - run_manager: Optional[CallbackManagerForLLMRun] = None, - **kwargs: Any, - ) -> ChatResult: - """Override the _generate method to implement the chat model logic. - - This can be a call to an API, a call to a local model, or any other - implementation that generates a response to the input prompt. - - Args: - messages: the prompt composed of a list of messages. - stop: a list of strings on which the model should stop generating. - If generation stops due to a stop token, the stop token itself - SHOULD BE INCLUDED as part of the output. This is not enforced - across models right now, but it's a good practice to follow since - it makes it much easier to parse the output of the model - downstream and understand why generation stopped. - run_manager: A run manager with callbacks for the LLM. - """ - # Replace this with actual logic to generate a response from a list - # of messages. - last_message = messages[-1] - tokens = last_message.content[: self.parrot_buffer_length] - ct_input_tokens = sum(len(message.content) for message in messages) - ct_output_tokens = len(tokens) - message = AIMessage( - content=tokens, - additional_kwargs={}, # Used to add additional payload to the message - response_metadata={ # Use for response metadata - "time_in_seconds": 3, - }, - usage_metadata={ - "input_tokens": ct_input_tokens, - "output_tokens": ct_output_tokens, - "total_tokens": ct_input_tokens + ct_output_tokens, - }, - ) - ## - - generation = ChatGeneration(message=message) - return ChatResult(generations=[generation]) - - def _stream( - self, - messages: List[BaseMessage], - stop: Optional[List[str]] = None, - run_manager: Optional[CallbackManagerForLLMRun] = None, - **kwargs: Any, - ) -> Iterator[ChatGenerationChunk]: - """Stream the output of the model. - - This method should be implemented if the model can generate output - in a streaming fashion. If the model does not support streaming, - do not implement it. In that case streaming requests will be automatically - handled by the _generate method. - - Args: - messages: the prompt composed of a list of messages. - stop: a list of strings on which the model should stop generating. - If generation stops due to a stop token, the stop token itself - SHOULD BE INCLUDED as part of the output. This is not enforced - across models right now, but it's a good practice to follow since - it makes it much easier to parse the output of the model - downstream and understand why generation stopped. - run_manager: A run manager with callbacks for the LLM. - """ - last_message = messages[-1] - tokens = str(last_message.content[: self.parrot_buffer_length]) - ct_input_tokens = sum(len(message.content) for message in messages) - - for token in tokens: - usage_metadata = UsageMetadata( - { - "input_tokens": ct_input_tokens, - "output_tokens": 1, - "total_tokens": ct_input_tokens + 1, - } - ) - ct_input_tokens = 0 - chunk = ChatGenerationChunk( - message=AIMessageChunk(content=token, usage_metadata=usage_metadata) - ) - - if run_manager: - # This is optional in newer versions of LangChain - # The on_llm_new_token will be called automatically - run_manager.on_llm_new_token(token, chunk=chunk) - - yield chunk - - # Let's add some other information (e.g., response metadata) - chunk = ChatGenerationChunk( - message=AIMessageChunk(content="", response_metadata={"time_in_sec": 3}) - ) - if run_manager: - # This is optional in newer versions of LangChain - # The on_llm_new_token will be called automatically - run_manager.on_llm_new_token(token, chunk=chunk) - yield chunk - - @property - def _llm_type(self) -> str: - """Get the type of language model used by this chat model.""" - return "echoing-chat-model-advanced" - - @property - def _identifying_params(self) -> Dict[str, Any]: - """Return a dictionary of identifying parameters. - - This information is used by the LangChain callback system, which - is used for tracing purposes make it possible to monitor LLMs. - """ - return { - # The model name allows users to specify custom token counting - # rules in LLM monitoring applications (e.g., in LangSmith users - # can provide per token pricing for their model and monitor - # costs for the given LLM.) - "model_name": self.model_name, - } -``` -
- -## Push your package to a public Github repository +### Push your package to a public Github repository This is only required if you want to publish your integration in the LangChain documentation. diff --git a/docs/docs/contributing/how_to/integrations/standard_tests.ipynb b/docs/docs/contributing/how_to/integrations/standard_tests.ipynb index 8bee8a45dd2..a01dc8a60a3 100644 --- a/docs/docs/contributing/how_to/integrations/standard_tests.ipynb +++ b/docs/docs/contributing/how_to/integrations/standard_tests.ipynb @@ -219,6 +219,41 @@ "

Note: The standard tests for chat models are implemented in the example in the main body of this guide too.

" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Chat model standard tests test a range of behaviors, from the most basic requirements (generating a response to a query) to optional capabilities like multi-modal support and tool-calling. For a test run to be successful:\n", + "\n", + "1. If a feature is intended to be supported by the model, it should pass;\n", + "2. If a feature is not intended to be supported by the model, it should be skipped.\n", + "\n", + "Tests for \"optional\" capabilities are controlled via a set of properties that can be overridden on the test model subclass.\n", + "\n", + "You can see the entire list of properties in the API reference [here](https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.chat_models.ChatModelTests.html). These properties are shared by both unit and integration tests.\n", + "\n", + "For example, to enable integration tests for image inputs, we can implement\n", + "\n", + "```python\n", + "@property\n", + "def supports_image_inputs(self) -> bool:\n", + " return True\n", + "```\n", + "\n", + "on the integration test class.\n", + "\n", + ":::note\n", + "\n", + "Details on what tests are run, how each test can be skipped, and troubleshooting tips for each test can be found in the API references. See details:\n", + "\n", + "- [Unit tests API reference](https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.chat_models.ChatModelUnitTests.html)\n", + "- [Integration tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.html)\n", + "\n", + ":::\n", + "\n", + "Unit test example:" + ] + }, { "cell_type": "code", "execution_count": null, @@ -246,6 +281,13 @@ " }" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Integration test example:" + ] + }, { "cell_type": "code", "execution_count": null, @@ -418,6 +460,14 @@ " Vector Stores" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's how you would configure the standard tests for a typical vector store (using\n", + "`ParrotVectorStore` as a placeholder):" + ] + }, { "cell_type": "code", "execution_count": null, @@ -465,6 +515,59 @@ " pass" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are separate suites for testing synchronous and asynchronous methods.\n", + "Configuring the tests consists of implementing pytest fixtures for setting up an\n", + "empty vector store and tearing down the vector store after the test run ends.\n", + "\n", + "For example, below is the `ReadWriteTestSuite` for the [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma/)\n", + "integration:\n", + "\n", + "```python\n", + "from typing import Generator\n", + "\n", + "import pytest\n", + "from langchain_core.vectorstores import VectorStore\n", + "from langchain_tests.integration_tests.vectorstores import ReadWriteTestSuite\n", + "\n", + "from langchain_chroma import Chroma\n", + "\n", + "\n", + "class TestSync(ReadWriteTestSuite):\n", + " @pytest.fixture()\n", + " def vectorstore(self) -> Generator[VectorStore, None, None]: # type: ignore\n", + " \"\"\"Get an empty vectorstore.\"\"\"\n", + " store = Chroma(embedding_function=self.get_embeddings())\n", + " try:\n", + " yield store\n", + " finally:\n", + " store.delete_collection()\n", + " pass\n", + "```\n", + "\n", + "Note that before the initial `yield`, we instantiate the vector store with an\n", + "[embeddings](/docs/concepts/embedding_models/) object. This is a pre-defined\n", + "[\"fake\" embeddings model](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.ReadWriteTestSuite.html#langchain_tests.integration_tests.vectorstores.ReadWriteTestSuite.get_embeddings)\n", + "that will generate short, arbitrary vectors for documents. You can use a different\n", + "embeddings object if desired.\n", + "\n", + "In the `finally` block, we call whatever integration-specific logic is needed to\n", + "bring the vector store to a clean state. This logic is executed in between each test\n", + "(e.g., even if tests fail).\n", + "\n", + ":::note\n", + "\n", + "Details on what tests are run, how each test can be skipped, and troubleshooting tips for each test can be found in the API references. See details:\n", + "\n", + "- [Sync tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.ReadWriteTestSuite.html)\n", + "- [Async tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.AsyncReadWriteTestSuite.html)\n", + "\n", + ":::" + ] + }, { "cell_type": "markdown", "metadata": {},