x

nit
docs[patch]: add vector store contribution guide (#28518 )
2026-02-04 16:20:16 +00:00 · 2024-12-04 17:34:41 -08:00 · 2024-12-04 16:56:12 -05:00 · 2024-12-04 16:26:23 -05:00 · 2024-12-03 16:32:50 -05:00 · 2024-12-03 15:31:38 -05:00
14 changed files with 1452 additions and 743 deletions
--- a/docs/docs/concepts/testing.mdx
+++ b/docs/docs/concepts/testing.mdx
@@ -78,4 +78,4 @@ class TestParrotMultiplyToolUnit(ToolsUnitTests):
        return {"a": 2, "b": 3}
 ```

-To learn more, check out our guide on [how to add standard tests to an integration](../../contributing/how_to/integrations/standard_tests).
+To learn more, check out our guide guides on [contributing an integration](/docs/contributing/how_to/integrations/#standard-tests).
--- a/docs/docs/contributing/how_to/index.mdx
+++ b/docs/docs/contributing/how_to/index.mdx
@@ -7,4 +7,3 @@

 - [**Start Here**](integrations/index.mdx): Help us integrate with your favorite vendors and tools.
 - [**Package**](integrations/package): Publish an integration package to PyPi
- [**Standard Tests**](integrations/standard_tests): Ensure your integration passes an expected set of tests.
--- a/docs/docs/contributing/how_to/integrations/chat_models.md
+++ b/docs/docs/contributing/how_to/integrations/chat_models.md
@@ -0,0 +1,355 @@
+---
+pagination_next: contributing/how_to/integrations/publish
+pagination_prev: contributing/how_to/integrations/index
+---
+# How to implement and test a chat model integration
+
+This guide walks through how to implement and test a custom [chat model](/docs/concepts/chat_models) that you have developed.
+
+For testing, we will rely on the `langchain-tests` dependency we added in the previous [bootstrapping guide](/docs/contributing/how_to/integrations/package).
+
+## Implementation
+
+Let's say you're building a simple integration package that provides a `ChatParrotLink`
+chat model integration for LangChain. Here's a simple example of what your project
+structure might look like:
+
+```plaintext
+langchain-parrot-link/
+├── langchain_parrot_link/
+│   ├── __init__.py
+│   └── chat_models.py
+├── tests/
+│   ├── __init__.py
+│   └── test_chat_models.py
+├── pyproject.toml
+└── README.md
+```
+
+Following the [bootstrapping guide](/docs/contributing/how_to/integrations/package),
+all of these files should already exist, except for
+`chat_models.py` and `test_chat_models.py`. We will implement these files in this guide.
+
+To implement `chat_models.py`, we copy the [implementation](/docs/how_to/custom_chat_model/#implementation) from our
+[Custom Chat Model Guide](/docs/how_to/custom_chat_model). Refer to that guide for more detail.
+
+<details>
+    <summary>chat_models.py</summary>
+```python title="langchain_parrot_link/chat_models.py"
+from typing import Any, Dict, Iterator, List, Optional
+
+from langchain_core.callbacks import (
+    CallbackManagerForLLMRun,
+)
+from langchain_core.language_models import BaseChatModel
+from langchain_core.messages import (
+    AIMessage,
+    AIMessageChunk,
+    BaseMessage,
+)
+from langchain_core.messages.ai import UsageMetadata
+from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
+from pydantic import Field
+
+
+class ChatParrotLink(BaseChatModel):
+    """A custom chat model that echoes the first `parrot_buffer_length` characters
+    of the input.
+
+    When contributing an implementation to LangChain, carefully document
+    the model including the initialization parameters, include
+    an example of how to initialize the model and include any relevant
+    links to the underlying models documentation or API.
+
+    Example:
+
+        .. code-block:: python
+
+            model = ChatParrotLink(parrot_buffer_length=2, model="bird-brain-001")
+            result = model.invoke([HumanMessage(content="hello")])
+            result = model.batch([[HumanMessage(content="hello")],
+                                 [HumanMessage(content="world")]])
+    """
+
+    model_name: str = Field(alias="model")
+    """The name of the model"""
+    parrot_buffer_length: int
+    """The number of characters from the last message of the prompt to be echoed."""
+    temperature: Optional[float] = None
+    max_tokens: Optional[int] = None
+    timeout: Optional[int] = None
+    stop: Optional[List[str]] = None
+    max_retries: int = 2
+
+    def _generate(
+        self,
+        messages: List[BaseMessage],
+        stop: Optional[List[str]] = None,
+        run_manager: Optional[CallbackManagerForLLMRun] = None,
+        **kwargs: Any,
+    ) -> ChatResult:
+        """Override the _generate method to implement the chat model logic.
+
+        This can be a call to an API, a call to a local model, or any other
+        implementation that generates a response to the input prompt.
+
+        Args:
+            messages: the prompt composed of a list of messages.
+            stop: a list of strings on which the model should stop generating.
+                  If generation stops due to a stop token, the stop token itself
+                  SHOULD BE INCLUDED as part of the output. This is not enforced
+                  across models right now, but it's a good practice to follow since
+                  it makes it much easier to parse the output of the model
+                  downstream and understand why generation stopped.
+            run_manager: A run manager with callbacks for the LLM.
+        """
+        # Replace this with actual logic to generate a response from a list
+        # of messages.
+        last_message = messages[-1]
+        tokens = last_message.content[: self.parrot_buffer_length]
+        ct_input_tokens = sum(len(message.content) for message in messages)
+        ct_output_tokens = len(tokens)
+        message = AIMessage(
+            content=tokens,
+            additional_kwargs={},  # Used to add additional payload to the message
+            response_metadata={  # Use for response metadata
+                "time_in_seconds": 3,
+            },
+            usage_metadata={
+                "input_tokens": ct_input_tokens,
+                "output_tokens": ct_output_tokens,
+                "total_tokens": ct_input_tokens + ct_output_tokens,
+            },
+        )
+        ##
+
+        generation = ChatGeneration(message=message)
+        return ChatResult(generations=[generation])
+
+    def _stream(
+        self,
+        messages: List[BaseMessage],
+        stop: Optional[List[str]] = None,
+        run_manager: Optional[CallbackManagerForLLMRun] = None,
+        **kwargs: Any,
+    ) -> Iterator[ChatGenerationChunk]:
+        """Stream the output of the model.
+
+        This method should be implemented if the model can generate output
+        in a streaming fashion. If the model does not support streaming,
+        do not implement it. In that case streaming requests will be automatically
+        handled by the _generate method.
+
+        Args:
+            messages: the prompt composed of a list of messages.
+            stop: a list of strings on which the model should stop generating.
+                  If generation stops due to a stop token, the stop token itself
+                  SHOULD BE INCLUDED as part of the output. This is not enforced
+                  across models right now, but it's a good practice to follow since
+                  it makes it much easier to parse the output of the model
+                  downstream and understand why generation stopped.
+            run_manager: A run manager with callbacks for the LLM.
+        """
+        last_message = messages[-1]
+        tokens = str(last_message.content[: self.parrot_buffer_length])
+        ct_input_tokens = sum(len(message.content) for message in messages)
+
+        for token in tokens:
+            usage_metadata = UsageMetadata(
+                {
+                    "input_tokens": ct_input_tokens,
+                    "output_tokens": 1,
+                    "total_tokens": ct_input_tokens + 1,
+                }
+            )
+            ct_input_tokens = 0
+            chunk = ChatGenerationChunk(
+                message=AIMessageChunk(content=token, usage_metadata=usage_metadata)
+            )
+
+            if run_manager:
+                # This is optional in newer versions of LangChain
+                # The on_llm_new_token will be called automatically
+                run_manager.on_llm_new_token(token, chunk=chunk)
+
+            yield chunk
+
+        # Let's add some other information (e.g., response metadata)
+        chunk = ChatGenerationChunk(
+            message=AIMessageChunk(content="", response_metadata={"time_in_sec": 3})
+        )
+        if run_manager:
+            # This is optional in newer versions of LangChain
+            # The on_llm_new_token will be called automatically
+            run_manager.on_llm_new_token(token, chunk=chunk)
+        yield chunk
+
+    @property
+    def _llm_type(self) -> str:
+        """Get the type of language model used by this chat model."""
+        return "echoing-chat-model-advanced"
+
+    @property
+    def _identifying_params(self) -> Dict[str, Any]:
+        """Return a dictionary of identifying parameters.
+
+        This information is used by the LangChain callback system, which
+        is used for tracing purposes make it possible to monitor LLMs.
+        """
+        return {
+            # The model name allows users to specify custom token counting
+            # rules in LLM monitoring applications (e.g., in LangSmith users
+            # can provide per token pricing for their model and monitor
+            # costs for the given LLM.)
+            "model_name": self.model_name,
+        }
+```
+</details>
+
+:::tip
+
+The model from the [Custom Chat Model Guide](/docs/how_to/custom_chat_model) is tested
+against the standard unit and integration tests in the LangChain Github repository.
+You can always use this as a starting point.
+
+- [Model implementation](https://github.com/langchain-ai/langchain/blob/master/libs/standard-tests/tests/unit_tests/custom_chat_model.py)
+- [Tests](https://github.com/langchain-ai/langchain/blob/master/libs/standard-tests/tests/unit_tests/test_custom_chat_model.py)
+
+:::
+
+## Testing
+
+To implement our test files, we will subclass test classes from the `langchain_tests` package. These test classes contain the tests that will be run. We will just need to configure what model is tested, what parameters it is tested with, and specify any tests that should be skipped.
+
+### Setup
+
+First we need to install certain dependencies. These include:
+
+- `pytest`: For running tests
+- `pytest-socket`: For running unit tests
+- `pytest-asyncio`: For testing async functionality
+- `langchain-tests`: For importing standard tests
+- `langchain-core`: This should already be installed, but is needed to define our integration.
+
+If you followed the previous [bootstrapping guide](/docs/contributing/how_to/integrations/package/), these should already be installed.
+
+### Add and configure standard tests
+There are two namespaces in the langchain-tests package:
+
+- [Unit tests](../../../concepts/testing.mdx#unit-tests) (`langchain_tests.unit_tests`): designed to be used to test the component in isolation and without access to external services
+- [Integration tests](../../../concepts/testing.mdx#integration-tests) (`langchain_tests.integration_tests`): designed to be used to test the component with access to external services (in particular, the external service that the component is designed to interact with).
+
+Both types of tests are implemented as [pytest class-based test suites](https://docs.pytest.org/en/7.1.x/getting-started.html#group-multiple-tests-in-a-class).
+
+By subclassing the base classes for each type of standard test (see below), you get all of the standard tests for that type, and you can override the properties that the test suite uses to configure the tests.
+
+Here's how you would configure the standard unit tests for the custom chat model:
+
+```python
+# title="tests/unit_tests/test_chat_models.py"
+from typing import Type
+
+from my_package.chat_models import MyChatModel
+from langchain_tests.unit_tests import ChatModelUnitTests
+
+
+class TestChatParrotLinkUnit(ChatModelUnitTests):
+    @property
+    def chat_model_class(self) -> Type[MyChatModel]:
+        return MyChatModel
+
+    @property
+    def chat_model_params(self) -> dict:
+        # These should be parameters used to initialize your integration for testing
+        return {
+            "model": "bird-brain-001",
+            "temperature": 0,
+            "parrot_buffer_length": 50,
+        }
+```
+
+And here is the corresponding snippet for integration tests:
+
+```python
+# title="tests/integration_tests/test_chat_models.py"
+from typing import Type
+
+from my_package.chat_models import MyChatModel
+from langchain_tests.integration_tests import ChatModelIntegrationTests
+
+
+class TestChatParrotLinkIntegration(ChatModelIntegrationTests):
+    @property
+    def chat_model_class(self) -> Type[MyChatModel]:
+        return MyChatModel
+
+    @property
+    def chat_model_params(self) -> dict:
+        # These should be parameters used to initialize your integration for testing
+        return {
+            "model": "bird-brain-001",
+            "temperature": 0,
+            "parrot_buffer_length": 50,
+        }
+```
+
+These two snippets should be written into `tests/unit_tests/test_chat_models.py` and `tests/integration_tests/test_chat_models.py`, respectively.
+
+:::note
+
+LangChain standard tests test a range of behaviors, from the most basic requirements to optional capabilities like multi-modal support. The above implementation will likely need to be updated to specify any tests that should be ignored. See [below](#skipping-tests) for detail.
+
+:::
+
+### Run standard tests
+
+After setting tests up, you would run these with the following commands from your project root:
+
+```shell
+# run unit tests without network access
+pytest --disable-socket --allow-unix-socket --asyncio-mode=auto tests/unit_tests
+
+# run integration tests
+pytest --asyncio-mode=auto tests/integration_tests
+```
+
+Our objective is for the pytest run to be successful. That is,
+
+1. If a feature is intended to be supported by the model, it passes;
+2. If a feature is not intended to be supported by the model, it is skipped.
+
+### Skipping tests
+
+LangChain standard tests test a range of behaviors, from the most basic requirements (generating a response to a query) to optional capabilities like multi-modal support and tool-calling. Tests for "optional" capabilities are controlled via a set of properties that can be overridden on the test model subclass.
+
+You can see the entire list of properties in the API reference [here](https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.chat_models.ChatModelTests.html). These properties are shared by both unit and integration tests.
+
+For example, to enable integration tests for image inputs, we can implement
+
+```python
+@property
+def supports_image_inputs(self) -> bool:
+    return True
+```
+
+on the integration test class.
+
+The API references for individual test methods include instructions on whether and how
+they can be skipped. See details:
+
+- [Unit tests API reference](https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.chat_models.ChatModelUnitTests.html)
+- [Integration tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.html)
+
+
+### Test suite information and troubleshooting
+
+Each test method documents:
+
+1. Troubleshooting tips;
+2. (If applicable) how test can be skipped.
+
+This information along with the full set of tests that run can be found in the API
+reference. See details:
+
+- [Unit tests API reference](https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.chat_models.ChatModelUnitTests.html)
+- [Integration tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.html)
--- a/docs/docs/contributing/how_to/integrations/index.mdx
+++ b/docs/docs/contributing/how_to/integrations/index.mdx
@@ -4,14 +4,16 @@ pagination_next: contributing/how_to/integrations/package

 # Contribute Integrations

-LangChain integrations are packages that provide access to language models, vector stores, and other components that can be used in LangChain.
+Integrations are a core component of LangChain.
+LangChain provides standard interfaces for several different components (language models, vector stores, etc) that are crucial when building LLM applications.

-This guide will walk you through how to contribute new integrations to LangChain, by
-publishing an integration package to PyPi, and adding documentation for it
-to the LangChain Monorepo.

-These instructions will evolve over the next few months as we improve our integration
-processes.
+## Why contribute an integration to LangChain?
+
+- **Discoverability:** LangChain is the most used framework for building LLM applications, with over 20 million monthly downloads. LangChain integrations are discoverable by a large community of GenAI builders.
+- **Interoptability:** LangChain components expose a standard interface, allowing developers to easily swap them for each other. If you implement a LangChain integration, any developer using a different component will easily be able to swap yours in.
+- **Best Practices:** Through their standard interface, LangChain components encourage and facilitate best practices (streaming, async, etc)
+

 ## Components to Integrate

@@ -22,8 +24,7 @@ supported in LangChain

 :::

-While any component can be integrated into LangChain, at this time we are only accepting
-new integrations in the docs of the following kinds:
+While any component can be integrated into LangChain, there are specific types of integrations we encourage more:

 <table>
  <tr>
@@ -60,18 +61,53 @@ new integrations in the docs of the following kinds:

 ## How to contribute an integration

-The only step necessary to "be" a LangChain integration is to add documentation
-that will render on this site (https://python.langchain.com/).
+In order to contribute an integration, you should follow these steps:

-As a prerequisite to adding your integration to our documentation, you must:
+1. Confirm that your integration is in the [list of components](#components-to-integrate) we are currently encouraging.
+2. Create a package with the required dependencies (see example [here](/docs/contributing/how_to/integrations/package/)).
+3. Implement and test your integration following the [component-specific guides](#component-specific-guides).
+4. [Publish your integration](/docs/contributing/how_to/integrations/publish/) in a Python package to PyPi.
+5. [Optional] Open and merge a PR to add documentation for your integration to the official LangChain docs.
+6. [Optional] Engage with the LangChain team for joint co-marketing ([see below](#co-marketing)).

-1. Confirm that your integration is in the [list of components](#components-to-integrate) we are currently accepting.
-2. [Implement your package](./package.mdx) and publish it to a public github repository.
-3. [Implement the standard tests](./standard_tests) for your integration and successfully run them.
-4. [Publish your integration](./publish.mdx) by publishing the package to PyPi and add docs in the `docs/docs/integrations` directory of the LangChain monorepo.
+## Component-specific guides

-Once you have completed these steps, you can submit a PR to the LangChain monorepo to add your integration to the documentation.
+The guides below walk you through both implementing and testing specific components:

-## Further Reading
+- [Chat Models](/docs/contributing/how_to/integrations/chat_models)
+- [Tools]
+- [Toolkits]
+- [Retrievers]
+- [Document Loaders]
+- [Vector Stores](/docs/contributing/how_to/integrations/vector_stores)
+- [Embedding Models]

-To get started, let's learn [how to bootstrap a new integration package](./package.mdx) for LangChain.
+## Standard Tests
+
+Testing is a critical part of the development process that ensures your code works as expected and meets the desired quality standards.
+In the LangChain ecosystem, we have 2 main types of tests: **unit tests** and **integration tests**.
+These standard tests help maintain compatibility between different components and ensure reliability.
+
+**Unit Tests**: Unit tests are designed to validate the smallest parts of your code—individual functions or methods—ensuring they work as expected in isolation. They do not rely on external systems or integrations.
+
+**Integration Tests**: Integration tests validate that multiple components or systems work together as expected. For tools or integrations relying on external services, these tests often ensure end-to-end functionality.
+
+Each type of integration has its own set of standard tests. Please see the relevant [component-specific guide](#component-specific-guides) for details on testing your integration.
+
+## Co-Marketing
+
+With over 20 million monthly downloads, LangChain has a large audience of developers building LLM applications.
+Besides just adding integrations, we also like to show them examples of cool tools or APIs they can use.
+
+While traditionally called "co-marketing", we like to think of this more as "co-education".
+For that reason, while we are happy to highlight your integration through our social media channels, we prefer to highlight examples that also serve some educational purpose.
+Our main social media channels are Twitter and LinkedIn.
+
+Here are some heuristics for types of content we are excited to promote:
+
+- **Integration announcement:** If you announce the integration with a link to the LangChain documentation page, we are happy to re-tweet/re-share on Twitter/LinkedIn.
+- **Educational content:** We highlight good educational content on the weekends - if you write a good blog or make a good YouTube video, we are happy to share there! Note that we prefer content that is NOT framed as "here's how to use integration XYZ", but rather "here's how to do ABC", as we find that is more educational and helpful for developers.
+- **End-to-end applications:** End-to-end applications are great resources for developers looking to build. We prefer to highlight applications that are more complex/agentic in nature, and that use [LangGraph](https://github.com/langchain-ai/langgraph) as the orchestration framework. We get particularly excited about anything involving long-term memory, human-in-the-loop interaction patterns, or multi-agent architectures.
+- **Research:** We love highlighting novel research! Whether it is research built on top of LangChain or that integrates with it.
+
+// TODO: set up some form to gather these requests.
--- a/docs/docs/contributing/how_to/integrations/package.mdx
+++ b/docs/docs/contributing/how_to/integrations/package.mdx
@@ -1,6 +1,6 @@
 ---
-pagination_next: contributing/how_to/integrations/standard_tests
-pagination_prev: contributing/how_to/integrations/index
+pagination_next: contributing/how_to/integrations/index
+pagination_prev: null
 ---
 # How to bootstrap a new integration package

@@ -46,7 +46,7 @@ We will also add some `test` dependencies in a separate poetry dependency group.
 you are not using Poetry, we recommend adding these in a way that won't package them
 with your published package, or just installing them separately when you run tests.

-`langchain-tests` will provide the [standard tests](../standard_tests) we will use later. 
+`langchain-tests` will provide the [standard tests](/docs/contributing/how_to/integrations/#standard-tests) we will use later. 
 We recommended pinning these to the latest version: <img src="https://img.shields.io/pypi/v/langchain-tests" style={{position:"relative",top:4,left:3}} />

 Note: Replace `<latest_version>` with the latest version of `langchain-tests` below.
@@ -64,203 +64,7 @@ poetry install --with test

 You're now ready to start writing your integration package!

-## Writing your integration
-
-Let's say you're building a simple integration package that provides a `ChatParrotLink`
-chat model integration for LangChain. Here's a simple example of what your project
-structure might look like:
-
-```plaintext
-langchain-parrot-link/
-├── langchain_parrot_link/
-│   ├── __init__.py
-│   └── chat_models.py
-├── tests/
-│   ├── __init__.py
-│   └── test_chat_models.py
-├── pyproject.toml
-└── README.md
-```
-
-All of these files should already exist from step 1, except for 
-`chat_models.py` and `test_chat_models.py`! We will implement `test_chat_models.py` 
-later, following the [standard tests](../standard_tests) guide.
-
-To implement `chat_models.py`, let's copy the implementation from our
-[Custom Chat Model Guide](../../../../how_to/custom_chat_model).
-
-<details>
-    <summary>chat_models.py</summary>
-```python title="langchain_parrot_link/chat_models.py"
-from typing import Any, Dict, Iterator, List, Optional
-
-from langchain_core.callbacks import (
-    CallbackManagerForLLMRun,
-)
-from langchain_core.language_models import BaseChatModel
-from langchain_core.messages import (
-    AIMessage,
-    AIMessageChunk,
-    BaseMessage,
-)
-from langchain_core.messages.ai import UsageMetadata
-from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
-from pydantic import Field
-
-
-class ChatParrotLink(BaseChatModel):
-    """A custom chat model that echoes the first `parrot_buffer_length` characters
-    of the input.
-
-    When contributing an implementation to LangChain, carefully document
-    the model including the initialization parameters, include
-    an example of how to initialize the model and include any relevant
-    links to the underlying models documentation or API.
-
-    Example:
-
-        .. code-block:: python
-
-            model = ChatParrotLink(parrot_buffer_length=2, model="bird-brain-001")
-            result = model.invoke([HumanMessage(content="hello")])
-            result = model.batch([[HumanMessage(content="hello")],
-                                 [HumanMessage(content="world")]])
-    """
-
-    model_name: str = Field(alias="model")
-    """The name of the model"""
-    parrot_buffer_length: int
-    """The number of characters from the last message of the prompt to be echoed."""
-    temperature: Optional[float] = None
-    max_tokens: Optional[int] = None
-    timeout: Optional[int] = None
-    stop: Optional[List[str]] = None
-    max_retries: int = 2
-
-    def _generate(
-        self,
-        messages: List[BaseMessage],
-        stop: Optional[List[str]] = None,
-        run_manager: Optional[CallbackManagerForLLMRun] = None,
-        **kwargs: Any,
-    ) -> ChatResult:
-        """Override the _generate method to implement the chat model logic.
-
-        This can be a call to an API, a call to a local model, or any other
-        implementation that generates a response to the input prompt.
-
-        Args:
-            messages: the prompt composed of a list of messages.
-            stop: a list of strings on which the model should stop generating.
-                  If generation stops due to a stop token, the stop token itself
-                  SHOULD BE INCLUDED as part of the output. This is not enforced
-                  across models right now, but it's a good practice to follow since
-                  it makes it much easier to parse the output of the model
-                  downstream and understand why generation stopped.
-            run_manager: A run manager with callbacks for the LLM.
-        """
-        # Replace this with actual logic to generate a response from a list
-        # of messages.
-        last_message = messages[-1]
-        tokens = last_message.content[: self.parrot_buffer_length]
-        ct_input_tokens = sum(len(message.content) for message in messages)
-        ct_output_tokens = len(tokens)
-        message = AIMessage(
-            content=tokens,
-            additional_kwargs={},  # Used to add additional payload to the message
-            response_metadata={  # Use for response metadata
-                "time_in_seconds": 3,
-            },
-            usage_metadata={
-                "input_tokens": ct_input_tokens,
-                "output_tokens": ct_output_tokens,
-                "total_tokens": ct_input_tokens + ct_output_tokens,
-            },
-        )
-        ##
-
-        generation = ChatGeneration(message=message)
-        return ChatResult(generations=[generation])
-
-    def _stream(
-        self,
-        messages: List[BaseMessage],
-        stop: Optional[List[str]] = None,
-        run_manager: Optional[CallbackManagerForLLMRun] = None,
-        **kwargs: Any,
-    ) -> Iterator[ChatGenerationChunk]:
-        """Stream the output of the model.
-
-        This method should be implemented if the model can generate output
-        in a streaming fashion. If the model does not support streaming,
-        do not implement it. In that case streaming requests will be automatically
-        handled by the _generate method.
-
-        Args:
-            messages: the prompt composed of a list of messages.
-            stop: a list of strings on which the model should stop generating.
-                  If generation stops due to a stop token, the stop token itself
-                  SHOULD BE INCLUDED as part of the output. This is not enforced
-                  across models right now, but it's a good practice to follow since
-                  it makes it much easier to parse the output of the model
-                  downstream and understand why generation stopped.
-            run_manager: A run manager with callbacks for the LLM.
-        """
-        last_message = messages[-1]
-        tokens = str(last_message.content[: self.parrot_buffer_length])
-        ct_input_tokens = sum(len(message.content) for message in messages)
-
-        for token in tokens:
-            usage_metadata = UsageMetadata(
-                {
-                    "input_tokens": ct_input_tokens,
-                    "output_tokens": 1,
-                    "total_tokens": ct_input_tokens + 1,
-                }
-            )
-            ct_input_tokens = 0
-            chunk = ChatGenerationChunk(
-                message=AIMessageChunk(content=token, usage_metadata=usage_metadata)
-            )
-
-            if run_manager:
-                # This is optional in newer versions of LangChain
-                # The on_llm_new_token will be called automatically
-                run_manager.on_llm_new_token(token, chunk=chunk)
-
-            yield chunk
-
-        # Let's add some other information (e.g., response metadata)
-        chunk = ChatGenerationChunk(
-            message=AIMessageChunk(content="", response_metadata={"time_in_sec": 3})
-        )
-        if run_manager:
-            # This is optional in newer versions of LangChain
-            # The on_llm_new_token will be called automatically
-            run_manager.on_llm_new_token(token, chunk=chunk)
-        yield chunk
-
-    @property
-    def _llm_type(self) -> str:
-        """Get the type of language model used by this chat model."""
-        return "echoing-chat-model-advanced"
-
-    @property
-    def _identifying_params(self) -> Dict[str, Any]:
-        """Return a dictionary of identifying parameters.
-
-        This information is used by the LangChain callback system, which
-        is used for tracing purposes make it possible to monitor LLMs.
-        """
-        return {
-            # The model name allows users to specify custom token counting
-            # rules in LLM monitoring applications (e.g., in LangSmith users
-            # can provide per token pricing for their model and monitor
-            # costs for the given LLM.)
-            "model_name": self.model_name,
-        }
-```
-</details>
+See our [component-specific guides](/docs/contributing/how_to/integrations/#component-specific-guides) for detail on implementing and testing each component.

 ## Push your package to a public Github repository

@@ -272,4 +76,4 @@ This is only required if you want to publish your integration in the LangChain d

 ## Next Steps

-Now that you've implemented your package, you can move on to [testing your integration](../standard_tests) for your integration and successfully run them.
+Now that you've bootstrapped your package, you can move on to [implementing and testing](/docs/contributing/how_to/integrations/#component-specific-guides) your integration.
--- a/docs/docs/contributing/how_to/integrations/publish.mdx
+++ b/docs/docs/contributing/how_to/integrations/publish.mdx
@@ -1,5 +1,5 @@
 ---
-pagination_prev: contributing/how_to/integrations/standard_tests
+pagination_prev: contributing/how_to/integrations/index
 pagination_next: null
 ---

@@ -12,7 +12,7 @@ Now that your package is implemented and tested, you can:

 ## Publishing your package to PyPi

-This guide assumes you have already implemented your package and written tests for it. If you haven't done that yet, please refer to the [implementation guide](../package) and the [testing guide](../standard_tests).
+This guide assumes you have already implemented your package and written tests for it. If you haven't done that yet, please refer to the [component-specific guides](/docs/contributing/how_to/integrations/#component-specific-guides).

 Note that Poetry is not required to publish a package to PyPi, and we're using it in this guide end-to-end for convenience.
 You are welcome to publish your package using any other method you prefer.
--- a/docs/docs/contributing/how_to/integrations/retriever_guide.md
+++ b/docs/docs/contributing/how_to/integrations/retriever_guide.md
@@ -0,0 +1,206 @@
+---
+pagination_prev: contributing/how_to/integrations/index
+pagination_next: contributing/how_to/integrations/publish
+---
+# How to implement and test a retriever integration
+
+In this guide, we'll implement and test a custom [retriever](/docs/concepts/retrievers) that you have integrated with LangChain.
+
+For testing, we will rely on the `langchain-tests` dependency we added in the previous [package creation guide](/docs/contributing/how_to/integrations/package).
+
+## Implementation
+
+Let's say you're building a simple integration package that provides a `ToyRetriever`
+retriever integration for LangChain. Here's a simple example of what your project
+structure might look like:
+
+```plaintext
+langchain-parrot-link/
+├── langchain_parrot_link/
+│   ├── __init__.py
+│   └── retrievers.py
+├── tests/
+│   └── integration_tests
+|       ├── __init__.py
+|       └── test_retrievers.py
+├── pyproject.toml
+└── README.md
+```
+
+In this first step, we will implement the `retrievers.py` file
+
+import CustomRetrieverIntro from '/docs/how_to/_custom_retriever_intro.mdx';
+
+<CustomRetrieverIntro />
+
+<details>
+    <summary>retrievers.py</summary>
+```python title="langchain_parrot_link/retrievers.py"
+from typing import Any
+
+from langchain_core.callbacks import CallbackManagerForRetrieverRun
+from langchain_core.documents import Document
+from langchain_core.retrievers import BaseRetriever
+
+class ParrotRetriever(BaseRetriever):
+    parrot_name: str
+    k: int = 3
+
+    def _get_relevant_documents(
+        self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any
+    ) -> list[Document]:
+        k = kwargs.get("k", self.k)
+        return [Document(page_content=f"{self.parrot_name} says: {query}")] * k
+```
+</details>
+
+:::tip
+
+The `ParrotRetriever` from this guide is tested
+against the standard unit and integration tests in the LangChain Github repository.
+You can always use this as a starting point [here](https://github.com/langchain-ai/langchain/blob/master/libs/standard-tests/tests/unit_tests/test_basic_retriever.py).
+
+:::
+
+## Testing
+
+
+
+### 1. Create Your Retriever Class
+
+```python
+from langchain.schema import BaseRetriever, Document
+from langchain.callbacks.manager import CallbackManagerForRetrieverRun
+
+class MyCustomRetriever(BaseRetriever):
+    """Custom retriever implementation."""
+    
+    def _get_relevant_documents(
+        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
+    ) -> List[Document]:
+        """Core implementation of retrieving relevant documents."""
+        # Your implementation here
+        pass
+
+    async def _aget_relevant_documents(
+        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
+    ) -> List[Document]:
+        """Async implementation of retrieving relevant documents."""
+        # Your async implementation here
+        pass
+```
+
+### 2. Required Testing
+
+All retrievers must include the following tests:
+
+#### Basic Functionality Tests
+```python
+def test_get_relevant_documents():
+    retriever = MyCustomRetriever()
+    docs = retriever.get_relevant_documents("test query")
+    assert isinstance(docs, list)
+    assert all(isinstance(doc, Document) for doc in docs)
+
+@pytest.mark.asyncio
+async def test_aget_relevant_documents():
+    retriever = MyCustomRetriever()
+    docs = await retriever.aget_relevant_documents("test query")
+    assert isinstance(docs, list)
+    assert all(isinstance(doc, Document) for doc in docs)
+```
+
+#### Edge Cases
+- Empty query handling
+- Special character handling
+- Long query handling
+- Rate limiting (if applicable)
+- Error handling
+
+### 3. Documentation Requirements
+
+Your retriever should include:
+
+1. Class docstring with:
+   - General description
+   - Required dependencies
+   - Example usage
+   - Parameters explanation
+
+2. Integration documentation file:
+   - Installation instructions
+   - Basic usage example
+   - Advanced configuration
+   - Common issues and solutions
+
+### 4. Best Practices
+
+1. **Error Handling**
+   - Implement proper error handling for API calls
+   - Provide meaningful error messages
+   - Handle rate limits gracefully
+
+2. **Performance**
+   - Implement caching when appropriate
+   - Use batch operations where possible
+   - Consider implementing both sync and async methods
+
+3. **Configuration**
+   - Use environment variables for sensitive data
+   - Provide sensible defaults
+   - Allow for customization of key parameters
+
+4. **Type Hints**
+   - Use proper type hints throughout your code
+   - Document expected types in docstrings
+
+## Example Implementation
+
+Here's a minimal example of a custom retriever:
+
+```python
+from typing import List
+from langchain.schema import BaseRetriever, Document
+from langchain.callbacks.manager import CallbackManagerForRetrieverRun
+
+class SimpleKeywordRetriever(BaseRetriever):
+    """A simple retriever that matches documents based on keywords."""
+    
+    documents: List[Document]  # Store your documents here
+    
+    def _get_relevant_documents(
+        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
+    ) -> List[Document]:
+        """Return documents that contain the query string."""
+        return [
+            doc for doc in self.documents 
+            if query.lower() in doc.page_content.lower()
+        ]
+
+    async def _aget_relevant_documents(
+        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
+    ) -> List[Document]:
+        """Async version of get_relevant_documents."""
+        return self._get_relevant_documents(query, run_manager=run_manager)
+```
+
+## Submission Checklist
+
+- [ ] Implemented base retriever interface
+- [ ] Added comprehensive tests
+- [ ] Included proper documentation
+- [ ] Added type hints
+- [ ] Handled error cases
+- [ ] Implemented both sync and async methods
+- [ ] Added example usage
+- [ ] Followed code style guidelines
+- [ ] Added requirements.txt or setup.py updates
+
+## Getting Help
+
+If you need help while implementing your retriever:
+1. Check existing retriever implementations for reference
+2. Open a discussion in the GitHub repository
+3. Ask in the LangChain Discord community
+
+Remember to follow the existing patterns in the codebase and maintain consistency with other retrievers.
--- a/docs/docs/contributing/how_to/integrations/retriever_tests.md
+++ b/docs/docs/contributing/how_to/integrations/retriever_tests.md
@@ -0,0 +1,207 @@
+# Standard Tests for LangChain Retrievers
+
+This guide outlines the standard tests that should be implemented for all LangChain retrievers.
+
+## Test Structure
+
+### 1. Basic Functionality Tests
+
+```python
+import pytest
+from langchain.schema import Document
+from your_retriever import YourRetriever
+
+def test_basic_retrieval():
+    """Test basic document retrieval functionality."""
+    retriever = YourRetriever()
+    query = "test query"
+    docs = retriever.get_relevant_documents(query)
+    
+    assert isinstance(docs, list)
+    assert all(isinstance(doc, Document) for doc in docs)
+    assert len(docs) > 0  # Adjust if your retriever might return empty results
+
+@pytest.mark.asyncio
+async def test_async_retrieval():
+    """Test async document retrieval functionality."""
+    retriever = YourRetriever()
+    query = "test query"
+    docs = await retriever.aget_relevant_documents(query)
+    
+    assert isinstance(docs, list)
+    assert all(isinstance(doc, Document) for doc in docs)
+```
+
+### 2. Edge Cases
+
+```python
+def test_empty_query():
+    """Test behavior with empty query."""
+    retriever = YourRetriever()
+    docs = retriever.get_relevant_documents("")
+    assert isinstance(docs, list)
+
+def test_special_characters():
+    """Test handling of special characters."""
+    retriever = YourRetriever()
+    special_queries = [
+        "test!@#$%^&*()",
+        "múltiple áccents",
+        "中文测试",
+        "test\nwith\nnewlines",
+    ]
+    for query in special_queries:
+        docs = retriever.get_relevant_documents(query)
+        assert isinstance(docs, list)
+
+def test_long_query():
+    """Test handling of very long queries."""
+    retriever = YourRetriever()
+    long_query = "test " * 1000
+    docs = retriever.get_relevant_documents(long_query)
+    assert isinstance(docs, list)
+```
+
+### 3. Error Handling
+
+```python
+def test_invalid_configuration():
+    """Test behavior with invalid configuration."""
+    with pytest.raises(ValueError):
+        YourRetriever(invalid_param="invalid")
+
+def test_connection_error():
+    """Test behavior when connection fails (if applicable)."""
+    retriever = YourRetriever()
+    # Mock connection failure
+    with pytest.raises(ConnectionError):
+        retriever.get_relevant_documents("test")
+```
+
+### 4. Performance Tests (Optional)
+
+```python
+@pytest.mark.slow
+def test_large_scale_retrieval():
+    """Test retrieval with a large number of documents."""
+    retriever = YourRetriever()
+    # Test with a significant number of documents
+    docs = retriever.get_relevant_documents("test")
+    assert len(docs) <= YOUR_MAX_LIMIT  # If applicable
+
+@pytest.mark.slow
+def test_concurrent_requests():
+    """Test handling of concurrent requests."""
+    import asyncio
+    
+    async def run_concurrent_requests():
+        retriever = YourRetriever()
+        tasks = [
+            retriever.aget_relevant_documents("test")
+            for _ in range(5)
+        ]
+        results = await asyncio.gather(*tasks)
+        return results
+    
+    results = asyncio.run(run_concurrent_requests())
+    assert len(results) == 5
+```
+
+### 5. Integration Tests
+
+```python
+def test_chain_integration():
+    """Test integration with LangChain chains."""
+    from langchain.chains import RetrievalQA
+    from langchain.llms import FakeLLM
+    
+    retriever = YourRetriever()
+    llm = FakeLLM()
+    qa_chain = RetrievalQA.from_chain_type(
+        llm=llm,
+        retriever=retriever,
+        chain_type="stuff"
+    )
+    result = qa_chain.run("test query")
+    assert isinstance(result, str)
+```
+
+## Test Configuration
+
+```python
+# conftest.py
+import pytest
+
+def pytest_configure(config):
+    config.addinivalue_line(
+        "markers", "slow: marks tests as slow (deselect with '-m \"not slow\"')"
+    )
+
+@pytest.fixture
+def sample_documents():
+    """Fixture providing sample documents for testing."""
+    return [
+        Document(page_content="test document 1", metadata={"source": "test1"}),
+        Document(page_content="test document 2", metadata={"source": "test2"}),
+    ]
+
+@pytest.fixture
+def mock_retriever(sample_documents):
+    """Fixture providing a retriever with sample documents."""
+    retriever = YourRetriever()
+    # Set up retriever with sample documents
+    return retriever
+```
+
+## Running Tests
+
+To run the tests:
+
+```bash
+# Run all tests
+pytest tests/retrievers/test_your_retriever.py
+
+# Run only fast tests
+pytest tests/retrievers/test_your_retriever.py -m "not slow"
+
+# Run with coverage
+pytest tests/retrievers/test_your_retriever.py --cov=your_retriever
+```
+
+## Best Practices
+
+1. **Isolation**: Each test should be independent and not rely on the state from other tests.
+
+2. **Mocking**: Use mocks for external services to avoid actual API calls during testing:
+   ```python
+   @pytest.fixture
+   def mock_api(mocker):
+       return mocker.patch("your_retriever.api_client")
+   ```
+
+3. **Parametrization**: Use pytest.mark.parametrize for testing multiple scenarios:
+   ```python
+   @pytest.mark.parametrize("query,expected_count", [
+       ("test", 1),
+       ("invalid", 0),
+       ("multiple words", 2),
+   ])
+   def test_retrieval_counts(query, expected_count):
+       retriever = YourRetriever()
+       docs = retriever.get_relevant_documents(query)
+       assert len(docs) == expected_count
+   ```
+
+4. **Documentation**: Include docstrings in test functions explaining what they test.
+
+5. **Coverage**: Aim for high test coverage, especially for core functionality.
+
+## Common Pitfalls
+
+1. Not testing error cases
+2. Not testing async functionality
+3. Not handling rate limits in tests
+4. Missing edge cases
+5. Relying on external services in unit tests
+
+Remember to adapt these tests based on your retriever's specific functionality and requirements.
--- a/docs/docs/contributing/how_to/integrations/standard_tests.ipynb
+++ b/docs/docs/contributing/how_to/integrations/standard_tests.ipynb
@@ -1,497 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "raw",
-   "metadata": {
-    "vscode": {
-     "languageId": "raw"
-    }
-   },
-   "source": [
-    "---\n",
-    "pagination_next: contributing/how_to/integrations/publish\n",
-    "pagination_prev: contributing/how_to/integrations/package\n",
-    "---"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# How to add standard tests to an integration\n",
-    "\n",
-    "When creating either a custom class for yourself or to publish in a LangChain integration, it is important to add standard tests to ensure it works as expected. This guide will show you how to add standard tests to a custom chat model, and you can **[Skip to the test templates](#standard-test-templates-per-component)** for implementing tests for each integration type.\n",
-    "\n",
-    "## Setup\n",
-    "\n",
-    "If you're coming from the [previous guide](../package), you have already installed these dependencies, and you can skip this section.\n",
-    "\n",
-    "First, let's install 2 dependencies:\n",
-    "\n",
-    "- `langchain-core` will define the interfaces we want to import to define our custom tool.\n",
-    "- `langchain-tests` will provide the standard tests we want to use. Recommended to pin to the latest version: <img src=\"https://img.shields.io/pypi/v/langchain-tests\" style={{position:\"relative\",top:4,left:3}} />\n",
-    "\n",
-    ":::note\n",
-    "\n",
-    "Because added tests in new versions of `langchain-tests` can break your CI/CD pipelines, we recommend pinning the \n",
-    "version of `langchain-tests` to avoid unexpected changes.\n",
-    "\n",
-    ":::\n",
-    "\n",
-    "import Tabs from '@theme/Tabs';\n",
-    "import TabItem from '@theme/TabItem';\n",
-    "\n",
-    "<Tabs>\n",
-    "    <TabItem value=\"poetry\" label=\"Poetry\" default>\n",
-    "If you followed the [previous guide](../package), you should already have these dependencies installed!\n",
-    "\n",
-    "```bash\n",
-    "poetry add langchain-core\n",
-    "poetry add --group test pytest pytest-socket pytest-asyncio langchain-tests==<latest_version>\n",
-    "poetry install --with test\n",
-    "```\n",
-    "    </TabItem>\n",
-    "    <TabItem value=\"pip\" label=\"Pip\">\n",
-    "```bash\n",
-    "pip install -U langchain-core pytest pytest-socket pytest-asyncio langchain-tests\n",
-    "\n",
-    "# install current package in editable mode\n",
-    "pip install --editable .\n",
-    "```\n",
-    "    </TabItem>\n",
-    "</Tabs>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's say we're publishing a package, `langchain_parrot_link`, that exposes the chat model from the [guide on implementing the package](../package). We can add the standard tests to the package by following the steps below."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "And we'll assume you've structured your package the same way as the main LangChain\n",
-    "packages:\n",
-    "\n",
-    "```plaintext\n",
-    "langchain-parrot-link/\n",
-    "├── langchain_parrot_link/\n",
-    "│   ├── __init__.py\n",
-    "│   └── chat_models.py\n",
-    "├── tests/\n",
-    "│   ├── __init__.py\n",
-    "│   └── test_chat_models.py\n",
-    "├── pyproject.toml\n",
-    "└── README.md\n",
-    "```\n",
-    "\n",
-    "## Add and configure standard tests\n",
-    "\n",
-    "There are 2 namespaces in the `langchain-tests` package: \n",
-    "\n",
-    "- [unit tests](../../../concepts/testing.mdx#unit-tests) (`langchain_tests.unit_tests`): designed to be used to test the component in isolation and without access to external services\n",
-    "- [integration tests](../../../concepts/testing.mdx#unit-tests) (`langchain_tests.integration_tests`): designed to be used to test the component with access to external services (in particular, the external service that the component is designed to interact with).\n",
-    "\n",
-    "Both types of tests are implemented as [`pytest` class-based test suites](https://docs.pytest.org/en/7.1.x/getting-started.html#group-multiple-tests-in-a-class).\n",
-    "\n",
-    "By subclassing the base classes for each type of standard test (see below), you get all of the standard tests for that type, and you\n",
-    "can override the properties that the test suite uses to configure the tests.\n",
-    "\n",
-    "### Standard chat model tests\n",
-    "\n",
-    "Here's how you would configure the standard unit tests for the custom chat model:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# title=\"tests/unit_tests/test_chat_models.py\"\n",
-    "from typing import Tuple, Type\n",
-    "\n",
-    "from langchain_parrot_link.chat_models import ChatParrotLink\n",
-    "from langchain_tests.unit_tests import ChatModelUnitTests\n",
-    "\n",
-    "\n",
-    "class TestChatParrotLinkUnit(ChatModelUnitTests):\n",
-    "    @property\n",
-    "    def chat_model_class(self) -> Type[ChatParrotLink]:\n",
-    "        return ChatParrotLink\n",
-    "\n",
-    "    @property\n",
-    "    def chat_model_params(self) -> dict:\n",
-    "        return {\n",
-    "            \"model\": \"bird-brain-001\",\n",
-    "            \"temperature\": 0,\n",
-    "            \"parrot_buffer_length\": 50,\n",
-    "        }"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# title=\"tests/integration_tests/test_chat_models.py\"\n",
-    "from typing import Type\n",
-    "\n",
-    "from langchain_parrot_link.chat_models import ChatParrotLink\n",
-    "from langchain_tests.integration_tests import ChatModelIntegrationTests\n",
-    "\n",
-    "\n",
-    "class TestChatParrotLinkIntegration(ChatModelIntegrationTests):\n",
-    "    @property\n",
-    "    def chat_model_class(self) -> Type[ChatParrotLink]:\n",
-    "        return ChatParrotLink\n",
-    "\n",
-    "    @property\n",
-    "    def chat_model_params(self) -> dict:\n",
-    "        return {\n",
-    "            \"model\": \"bird-brain-001\",\n",
-    "            \"temperature\": 0,\n",
-    "            \"parrot_buffer_length\": 50,\n",
-    "        }"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "and you would run these with the following commands from your project root\n",
-    "\n",
-    "<Tabs>\n",
-    "    <TabItem value=\"poetry\" label=\"Poetry\" default>\n",
-    "\n",
-    "```bash\n",
-    "# run unit tests without network access\n",
-    "poetry run pytest --disable-socket --allow-unix-socket --asyncio-mode=auto tests/unit_tests\n",
-    "\n",
-    "# run integration tests\n",
-    "poetry run pytest --asyncio-mode=auto tests/integration_tests\n",
-    "```\n",
-    "\n",
-    "    </TabItem>\n",
-    "    <TabItem value=\"pip\" label=\"Pip\">\n",
-    "\n",
-    "```bash\n",
-    "# run unit tests without network access\n",
-    "pytest --disable-socket --allow-unix-socket --asyncio-mode=auto tests/unit_tests\n",
-    "\n",
-    "# run integration tests\n",
-    "pytest --asyncio-mode=auto tests/integration_tests\n",
-    "```\n",
-    "\n",
-    "    </TabItem>\n",
-    "</Tabs>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Test suite information and troubleshooting\n",
-    "\n",
-    "For a full list of the standard test suites that are available, as well as\n",
-    "information on which tests are included and how to troubleshoot common issues,\n",
-    "see the [Standard Tests API Reference](https://python.langchain.com/api_reference/standard_tests/index.html).\n",
-    "\n",
-    "An increasing number of troubleshooting guides are being added to this documentation,\n",
-    "and if you're interested in contributing, feel free to add docstrings to tests in \n",
-    "[Github](https://github.com/langchain-ai/langchain/tree/master/libs/standard-tests/langchain_tests)!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Standard test templates per component:\n",
-    "\n",
-    "Above, we implement the **unit** and **integration** standard tests for a tool. Below are the templates for implementing the standard tests for each component:\n",
-    "\n",
-    "<details>\n",
-    "    <summary>Chat Models</summary>\n",
-    "    <p>Note: The standard tests for chat models are implemented in the example in the main body of this guide too.</p>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# title=\"tests/unit_tests/test_chat_models.py\"\n",
-    "from typing import Type\n",
-    "\n",
-    "from langchain_parrot_link.chat_models import ChatParrotLink\n",
-    "from langchain_tests.unit_tests import ChatModelUnitTests\n",
-    "\n",
-    "\n",
-    "class TestChatParrotLinkUnit(ChatModelUnitTests):\n",
-    "    @property\n",
-    "    def chat_model_class(self) -> Type[ChatParrotLink]:\n",
-    "        return ChatParrotLink\n",
-    "\n",
-    "    @property\n",
-    "    def chat_model_params(self) -> dict:\n",
-    "        return {\n",
-    "            \"model\": \"bird-brain-001\",\n",
-    "            \"temperature\": 0,\n",
-    "            \"parrot_buffer_length\": 50,\n",
-    "        }"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# title=\"tests/integration_tests/test_chat_models.py\"\n",
-    "from typing import Type\n",
-    "\n",
-    "from langchain_parrot_link.chat_models import ChatParrotLink\n",
-    "from langchain_tests.integration_tests import ChatModelIntegrationTests\n",
-    "\n",
-    "\n",
-    "class TestChatParrotLinkIntegration(ChatModelIntegrationTests):\n",
-    "    @property\n",
-    "    def chat_model_class(self) -> Type[ChatParrotLink]:\n",
-    "        return ChatParrotLink\n",
-    "\n",
-    "    @property\n",
-    "    def chat_model_params(self) -> dict:\n",
-    "        return {\n",
-    "            \"model\": \"bird-brain-001\",\n",
-    "            \"temperature\": 0,\n",
-    "            \"parrot_buffer_length\": 50,\n",
-    "        }"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "</details>\n",
-    "<details>\n",
-    "    <summary>Embedding Models</summary>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# title=\"tests/unit_tests/test_embeddings.py\"\n",
-    "from typing import Tuple, Type\n",
-    "\n",
-    "from langchain_parrot_link.embeddings import ParrotLinkEmbeddings\n",
-    "from langchain_tests.unit_tests import EmbeddingsUnitTests\n",
-    "\n",
-    "\n",
-    "class TestParrotLinkEmbeddingsUnit(EmbeddingsUnitTests):\n",
-    "    @property\n",
-    "    def embeddings_class(self) -> Type[ParrotLinkEmbeddings]:\n",
-    "        return ParrotLinkEmbeddings\n",
-    "\n",
-    "    @property\n",
-    "    def embedding_model_params(self) -> dict:\n",
-    "        return {\"model\": \"nest-embed-001\", \"temperature\": 0}"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# title=\"tests/integration_tests/test_embeddings.py\"\n",
-    "from typing import Type\n",
-    "\n",
-    "from langchain_parrot_link.embeddings import ParrotLinkEmbeddings\n",
-    "from langchain_tests.integration_tests import EmbeddingsIntegrationTests\n",
-    "\n",
-    "\n",
-    "class TestParrotLinkEmbeddingsIntegration(EmbeddingsIntegrationTests):\n",
-    "    @property\n",
-    "    def embeddings_class(self) -> Type[ParrotLinkEmbeddings]:\n",
-    "        return ParrotLinkEmbeddings\n",
-    "\n",
-    "    @property\n",
-    "    def embedding_model_params(self) -> dict:\n",
-    "        return {\"model\": \"nest-embed-001\"}"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "</details>\n",
-    "<details>\n",
-    "    <summary>Tools/Toolkits</summary>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# title=\"tests/unit_tests/test_tools.py\"\n",
-    "from typing import Type\n",
-    "\n",
-    "from langchain_parrot_link.tools import ParrotMultiplyTool\n",
-    "from langchain_tests.unit_tests import ToolsUnitTests\n",
-    "\n",
-    "\n",
-    "class TestParrotMultiplyToolUnit(ToolsUnitTests):\n",
-    "    @property\n",
-    "    def tool_constructor(self) -> Type[ParrotMultiplyTool]:\n",
-    "        return ParrotMultiplyTool\n",
-    "\n",
-    "    @property\n",
-    "    def tool_constructor_params(self) -> dict:\n",
-    "        # if your tool constructor instead required initialization arguments like\n",
-    "        # `def __init__(self, some_arg: int):`, you would return those here\n",
-    "        # as a dictionary, e.g.: `return {'some_arg': 42}`\n",
-    "        return {}\n",
-    "\n",
-    "    @property\n",
-    "    def tool_invoke_params_example(self) -> dict:\n",
-    "        \"\"\"\n",
-    "        Returns a dictionary representing the \"args\" of an example tool call.\n",
-    "\n",
-    "        This should NOT be a ToolCall dict - i.e. it should not\n",
-    "        have {\"name\", \"id\", \"args\"} keys.\n",
-    "        \"\"\"\n",
-    "        return {\"a\": 2, \"b\": 3}"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# title=\"tests/integration_tests/test_tools.py\"\n",
-    "from typing import Type\n",
-    "\n",
-    "from langchain_parrot_link.tools import ParrotMultiplyTool\n",
-    "from langchain_tests.integration_tests import ToolsIntegrationTests\n",
-    "\n",
-    "\n",
-    "class TestParrotMultiplyToolIntegration(ToolsIntegrationTests):\n",
-    "    @property\n",
-    "    def tool_constructor(self) -> Type[ParrotMultiplyTool]:\n",
-    "        return ParrotMultiplyTool\n",
-    "\n",
-    "    @property\n",
-    "    def tool_constructor_params(self) -> dict:\n",
-    "        # if your tool constructor instead required initialization arguments like\n",
-    "        # `def __init__(self, some_arg: int):`, you would return those here\n",
-    "        # as a dictionary, e.g.: `return {'some_arg': 42}`\n",
-    "        return {}\n",
-    "\n",
-    "    @property\n",
-    "    def tool_invoke_params_example(self) -> dict:\n",
-    "        \"\"\"\n",
-    "        Returns a dictionary representing the \"args\" of an example tool call.\n",
-    "\n",
-    "        This should NOT be a ToolCall dict - i.e. it should not\n",
-    "        have {\"name\", \"id\", \"args\"} keys.\n",
-    "        \"\"\"\n",
-    "        return {\"a\": 2, \"b\": 3}"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "</details>\n",
-    "<details>\n",
-    "    <summary>Vector Stores</summary>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# title=\"tests/integration_tests/test_vectorstores_sync.py\"\n",
-    "\n",
-    "from typing import AsyncGenerator, Generator\n",
-    "\n",
-    "import pytest\n",
-    "from langchain_core.vectorstores import VectorStore\n",
-    "from langchain_parrot_link.vectorstores import ParrotVectorStore\n",
-    "from langchain_standard_tests.integration_tests.vectorstores import (\n",
-    "    AsyncReadWriteTestSuite,\n",
-    "    ReadWriteTestSuite,\n",
-    ")\n",
-    "\n",
-    "\n",
-    "class TestSync(ReadWriteTestSuite):\n",
-    "    @pytest.fixture()\n",
-    "    def vectorstore(self) -> Generator[VectorStore, None, None]:  # type: ignore\n",
-    "        \"\"\"Get an empty vectorstore for unit tests.\"\"\"\n",
-    "        store = ParrotVectorStore()\n",
-    "        # note: store should be EMPTY at this point\n",
-    "        # if you need to delete data, you may do so here\n",
-    "        try:\n",
-    "            yield store\n",
-    "        finally:\n",
-    "            # cleanup operations, or deleting data\n",
-    "            pass\n",
-    "\n",
-    "\n",
-    "class TestAsync(AsyncReadWriteTestSuite):\n",
-    "    @pytest.fixture()\n",
-    "    async def vectorstore(self) -> AsyncGenerator[VectorStore, None]:  # type: ignore\n",
-    "        \"\"\"Get an empty vectorstore for unit tests.\"\"\"\n",
-    "        store = ParrotVectorStore()\n",
-    "        # note: store should be EMPTY at this point\n",
-    "        # if you need to delete data, you may do so here\n",
-    "        try:\n",
-    "            yield store\n",
-    "        finally:\n",
-    "            # cleanup operations, or deleting data\n",
-    "            pass"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "</details>"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": ".venv",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.4"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/docs/docs/contributing/how_to/integrations/vector_stores.mdx
+++ b/docs/docs/contributing/how_to/integrations/vector_stores.mdx
@@ -0,0 +1,593 @@
+---
+pagination_next: contributing/how_to/integrations/publish
+pagination_prev: contributing/how_to/integrations/index
+---
+# How to implement and test a vector store integration
+
+This guide walks through how to implement and test a custom [vector store](/docs/concepts/vectorstores) that you have developed.
+
+For testing, we will rely on the `langchain-tests` dependency we added in the previous [bootstrapping guide](/docs/contributing/how_to/integrations/package).
+
+## Implementation
+
+Let's say you're building a simple integration package that provides a `ParrotVectorStore`
+vector store integration for LangChain. Here's a simple example of what your project
+structure might look like:
+
+```plaintext
+langchain-parrot-link/
+├── langchain_parrot_link/
+│   ├── __init__.py
+│   └── vectorstores.py
+├── tests/
+│   ├── __init__.py
+│   └── test_vectorstores.py
+├── pyproject.toml
+└── README.md
+```
+
+Following the [bootstrapping guide](/docs/contributing/how_to/integrations/package),
+all of these files should already exist, except for
+`vectorstores.py` and `test_vectorstores.py`. We will implement these files in this guide.
+
+First we need an implementation for our vector store. This implementation will depend
+on your chosen database technology. `langchain-core` includes a minimal
+[in-memory vector store](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.in_memory.InMemoryVectorStore.html)
+that we can use as a guide. You can access the code [here](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/vectorstores/in_memory.py).
+For convenience, we also provide it below.
+
+<details>
+    <summary>vectorstores.py</summary>
+```python title="langchain_parrot_link/vectorstores.py"
+from __future__ import annotations
+
+import json
+import uuid
+from collections.abc import Iterator, Sequence
+from pathlib import Path
+from typing import Any, Callable, Optional
+
+from langchain_core.documents import Document
+from langchain_core.embeddings import Embeddings
+from langchain_core.load import dumpd, load
+from langchain_core.vectorstores import VectorStore
+from langchain_core.vectorstores.utils import _cosine_similarity as cosine_similarity
+from langchain_core.vectorstores.utils import maximal_marginal_relevance
+
+
+class InMemoryVectorStore(VectorStore):
+    """In-memory vector store implementation.
+
+    Uses a dictionary, and computes cosine similarity for search using numpy.
+    """
+
+    def __init__(self, embedding: Embeddings) -> None:
+        """Initialize with the given embedding function.
+
+        Args:
+            embedding: embedding function to use.
+        """
+        self.store: dict[str, dict[str, Any]] = {}
+        self.embedding = embedding
+
+    @property
+    def embeddings(self) -> Embeddings:
+        return self.embedding
+
+    def delete(self, ids: Optional[Sequence[str]] = None, **kwargs: Any) -> None:
+        if ids:
+            for _id in ids:
+                self.store.pop(_id, None)
+
+    async def adelete(self, ids: Optional[Sequence[str]] = None, **kwargs: Any) -> None:
+        self.delete(ids)
+
+    def add_documents(
+        self,
+        documents: list[Document],
+        ids: Optional[list[str]] = None,
+        **kwargs: Any,
+    ) -> list[str]:
+        """Add documents to the store."""
+        texts = [doc.page_content for doc in documents]
+        vectors = self.embedding.embed_documents(texts)
+
+        if ids and len(ids) != len(texts):
+            msg = (
+                f"ids must be the same length as texts. "
+                f"Got {len(ids)} ids and {len(texts)} texts."
+            )
+            raise ValueError(msg)
+
+        id_iterator: Iterator[Optional[str]] = (
+            iter(ids) if ids else iter(doc.id for doc in documents)
+        )
+
+        ids_ = []
+
+        for doc, vector in zip(documents, vectors):
+            doc_id = next(id_iterator)
+            doc_id_ = doc_id if doc_id else str(uuid.uuid4())
+            ids_.append(doc_id_)
+            self.store[doc_id_] = {
+                "id": doc_id_,
+                "vector": vector,
+                "text": doc.page_content,
+                "metadata": doc.metadata,
+            }
+
+        return ids_
+
+    async def aadd_documents(
+        self, documents: list[Document], ids: Optional[list[str]] = None, **kwargs: Any
+    ) -> list[str]:
+        """Add documents to the store."""
+        texts = [doc.page_content for doc in documents]
+        vectors = await self.embedding.aembed_documents(texts)
+
+        if ids and len(ids) != len(texts):
+            msg = (
+                f"ids must be the same length as texts. "
+                f"Got {len(ids)} ids and {len(texts)} texts."
+            )
+            raise ValueError(msg)
+
+        id_iterator: Iterator[Optional[str]] = (
+            iter(ids) if ids else iter(doc.id for doc in documents)
+        )
+        ids_: list[str] = []
+
+        for doc, vector in zip(documents, vectors):
+            doc_id = next(id_iterator)
+            doc_id_ = doc_id if doc_id else str(uuid.uuid4())
+            ids_.append(doc_id_)
+            self.store[doc_id_] = {
+                "id": doc_id_,
+                "vector": vector,
+                "text": doc.page_content,
+                "metadata": doc.metadata,
+            }
+
+        return ids_
+
+    def get_by_ids(self, ids: Sequence[str], /) -> list[Document]:
+        """Get documents by their ids.
+
+        Args:
+            ids: The ids of the documents to get.
+
+        Returns:
+            A list of Document objects.
+        """
+        documents = []
+
+        for doc_id in ids:
+            doc = self.store.get(doc_id)
+            if doc:
+                documents.append(
+                    Document(
+                        id=doc["id"],
+                        page_content=doc["text"],
+                        metadata=doc["metadata"],
+                    )
+                )
+        return documents
+
+    async def aget_by_ids(self, ids: Sequence[str], /) -> list[Document]:
+        """Async get documents by their ids.
+
+        Args:
+            ids: The ids of the documents to get.
+
+        Returns:
+            A list of Document objects.
+        """
+        return self.get_by_ids(ids)
+
+    def _similarity_search_with_score_by_vector(
+        self,
+        embedding: list[float],
+        k: int = 4,
+        filter: Optional[Callable[[Document], bool]] = None,
+        **kwargs: Any,
+    ) -> list[tuple[Document, float, list[float]]]:
+        # get all docs with fixed order in list
+        docs = list(self.store.values())
+
+        if filter is not None:
+            docs = [
+                doc
+                for doc in docs
+                if filter(Document(page_content=doc["text"], metadata=doc["metadata"]))
+            ]
+
+        if not docs:
+            return []
+
+        similarity = cosine_similarity([embedding], [doc["vector"] for doc in docs])[0]
+
+        # get the indices ordered by similarity score
+        top_k_idx = similarity.argsort()[::-1][:k]
+
+        return [
+            (
+                Document(
+                    id=doc_dict["id"],
+                    page_content=doc_dict["text"],
+                    metadata=doc_dict["metadata"],
+                ),
+                float(similarity[idx].item()),
+                doc_dict["vector"],
+            )
+            for idx in top_k_idx
+            # Assign using walrus operator to avoid multiple lookups
+            if (doc_dict := docs[idx])
+        ]
+
+    def similarity_search_with_score_by_vector(
+        self,
+        embedding: list[float],
+        k: int = 4,
+        filter: Optional[Callable[[Document], bool]] = None,
+        **kwargs: Any,
+    ) -> list[tuple[Document, float]]:
+        return [
+            (doc, similarity)
+            for doc, similarity, _ in self._similarity_search_with_score_by_vector(
+                embedding=embedding, k=k, filter=filter, **kwargs
+            )
+        ]
+
+    def similarity_search_with_score(
+        self,
+        query: str,
+        k: int = 4,
+        **kwargs: Any,
+    ) -> list[tuple[Document, float]]:
+        embedding = self.embedding.embed_query(query)
+        docs = self.similarity_search_with_score_by_vector(
+            embedding,
+            k,
+            **kwargs,
+        )
+        return docs
+
+    async def asimilarity_search_with_score(
+        self, query: str, k: int = 4, **kwargs: Any
+    ) -> list[tuple[Document, float]]:
+        embedding = await self.embedding.aembed_query(query)
+        docs = self.similarity_search_with_score_by_vector(
+            embedding,
+            k,
+            **kwargs,
+        )
+        return docs
+
+    def similarity_search_by_vector(
+        self,
+        embedding: list[float],
+        k: int = 4,
+        **kwargs: Any,
+    ) -> list[Document]:
+        docs_and_scores = self.similarity_search_with_score_by_vector(
+            embedding,
+            k,
+            **kwargs,
+        )
+        return [doc for doc, _ in docs_and_scores]
+
+    async def asimilarity_search_by_vector(
+        self, embedding: list[float], k: int = 4, **kwargs: Any
+    ) -> list[Document]:
+        return self.similarity_search_by_vector(embedding, k, **kwargs)
+
+    def similarity_search(
+        self, query: str, k: int = 4, **kwargs: Any
+    ) -> list[Document]:
+        return [doc for doc, _ in self.similarity_search_with_score(query, k, **kwargs)]
+
+    async def asimilarity_search(
+        self, query: str, k: int = 4, **kwargs: Any
+    ) -> list[Document]:
+        return [
+            doc
+            for doc, _ in await self.asimilarity_search_with_score(query, k, **kwargs)
+        ]
+
+    def max_marginal_relevance_search_by_vector(
+        self,
+        embedding: list[float],
+        k: int = 4,
+        fetch_k: int = 20,
+        lambda_mult: float = 0.5,
+        **kwargs: Any,
+    ) -> list[Document]:
+        prefetch_hits = self._similarity_search_with_score_by_vector(
+            embedding=embedding,
+            k=fetch_k,
+            **kwargs,
+        )
+
+        try:
+            import numpy as np
+        except ImportError as e:
+            msg = (
+                "numpy must be installed to use max_marginal_relevance_search "
+                "pip install numpy"
+            )
+            raise ImportError(msg) from e
+
+        mmr_chosen_indices = maximal_marginal_relevance(
+            np.array(embedding, dtype=np.float32),
+            [vector for _, _, vector in prefetch_hits],
+            k=k,
+            lambda_mult=lambda_mult,
+        )
+        return [prefetch_hits[idx][0] for idx in mmr_chosen_indices]
+
+    def max_marginal_relevance_search(
+        self,
+        query: str,
+        k: int = 4,
+        fetch_k: int = 20,
+        lambda_mult: float = 0.5,
+        **kwargs: Any,
+    ) -> list[Document]:
+        embedding_vector = self.embedding.embed_query(query)
+        return self.max_marginal_relevance_search_by_vector(
+            embedding_vector,
+            k,
+            fetch_k,
+            lambda_mult=lambda_mult,
+            **kwargs,
+        )
+
+    async def amax_marginal_relevance_search(
+        self,
+        query: str,
+        k: int = 4,
+        fetch_k: int = 20,
+        lambda_mult: float = 0.5,
+        **kwargs: Any,
+    ) -> list[Document]:
+        embedding_vector = await self.embedding.aembed_query(query)
+        return self.max_marginal_relevance_search_by_vector(
+            embedding_vector,
+            k,
+            fetch_k,
+            lambda_mult=lambda_mult,
+            **kwargs,
+        )
+
+    @classmethod
+    def from_texts(
+        cls,
+        texts: list[str],
+        embedding: Embeddings,
+        metadatas: Optional[list[dict]] = None,
+        **kwargs: Any,
+    ) -> InMemoryVectorStore:
+        store = cls(
+            embedding=embedding,
+        )
+        store.add_texts(texts=texts, metadatas=metadatas, **kwargs)
+        return store
+
+    @classmethod
+    async def afrom_texts(
+        cls,
+        texts: list[str],
+        embedding: Embeddings,
+        metadatas: Optional[list[dict]] = None,
+        **kwargs: Any,
+    ) -> InMemoryVectorStore:
+        store = cls(
+            embedding=embedding,
+        )
+        await store.aadd_texts(texts=texts, metadatas=metadatas, **kwargs)
+        return store
+
+    @classmethod
+    def load(
+        cls, path: str, embedding: Embeddings, **kwargs: Any
+    ) -> InMemoryVectorStore:
+        """Load a vector store from a file.
+
+        Args:
+            path: The path to load the vector store from.
+            embedding: The embedding to use.
+            kwargs: Additional arguments to pass to the constructor.
+
+        Returns:
+            A VectorStore object.
+        """
+        _path: Path = Path(path)
+        with _path.open("r") as f:
+            store = load(json.load(f))
+        vectorstore = cls(embedding=embedding, **kwargs)
+        vectorstore.store = store
+        return vectorstore
+
+    def dump(self, path: str) -> None:
+        """Dump the vector store to a file.
+
+        Args:
+            path: The path to dump the vector store to.
+        """
+        _path: Path = Path(path)
+        _path.parent.mkdir(exist_ok=True, parents=True)
+        with _path.open("w") as f:
+            json.dump(dumpd(self.store), f, indent=2)
+
+```
+</details>
+
+All vector stores must inherit from the [VectorStore](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.base.VectorStore.html)
+base class. This interface consists of methods for writing, deleting and searching
+for documents in the vector store.
+
+`VectorStore` supports a variety of synchronous and asynchronous search types (e.g., 
+nearest-neighbor or maximum marginal relevance), as well as interfaces for adding
+documents to the store. See the [API Reference](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.base.VectorStore.html)
+for all supported methods. The required methods are tabulated below:
+
+| Method/Property         | Description                                          |
+|------------------------ |------------------------------------------------------|
+| `add_documents`         | Add documents to the vector store.                   |
+| `delete`                | Delete selected documents from vector store (by IDs) |
+| `get_by_ids`            | Get selected documents from vector store (by IDs)    |
+| `similarity_search`     | Get documents most similar to a query.               |
+| `embeddings` (property) | Embeddings object for vector store.                  |
+| `from_texts`            | Instantiate vector store via adding texts.           |
+
+Note that `InMemoryVectorStore` implements some optional search types, as well as
+convenience methods for loading and dumping the object to a file, but this is not
+necessary for all implementations.
+
+:::tip
+
+The in-memory vector store is tested against the standard tests in the LangChain
+Github repository. You can always use this as a starting point.
+
+- [Model implementation](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/vectorstores/in_memory.py)
+- [Tests](https://github.com/langchain-ai/langchain/blob/master/libs/standard-tests/tests/unit_tests/test_in_memory_vectorstore.py)
+
+:::
+
+## Testing
+
+To implement our test files, we will subclass test classes from the `langchain_tests`
+package. These test classes contain the tests that will be run. We will just need to
+configure what vector store implementation is tested.
+
+### Setup
+
+First we need to install certain dependencies. These include:
+
+- `pytest`: For running tests
+- `pytest-asyncio`: For testing async functionality
+- `langchain-tests`: For importing standard tests
+- `langchain-core`: This should already be installed, but is needed to define our integration.
+
+If you followed the previous [bootstrapping guide](/docs/contributing/how_to/integrations/package/),
+these should already be installed.
+
+### Add and configure standard tests
+
+The `langchain-test` package implements suites of tests for testing vector store
+integrations. By subclassing the base classes for each standard test, you
+get all of the standard tests for that type.
+
+:::note
+
+The full set of tests that run can be found in the API reference. See details:
+
+- [Sync tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.ReadWriteTestSuite.html)
+- [Async tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.AsyncReadWriteTestSuite.html)
+
+:::
+
+Here's how you would configure the standard tests for a typical vector store (using
+`ParrotVectorStore` as a placeholder):
+
+```python
+# title="tests/integration_tests/test_vectorstores.py"
+
+from typing import AsyncGenerator, Generator
+
+import pytest
+from langchain_core.vectorstores import VectorStore
+from langchain_parrot_link.vectorstores import ParrotVectorStore
+from langchain_tests.integration_tests.vectorstores import (
+    AsyncReadWriteTestSuite,
+    ReadWriteTestSuite,
+)
+
+
+class TestSync(ReadWriteTestSuite):
+    @pytest.fixture()
+    def vectorstore(self) -> Generator[VectorStore, None, None]:  # type: ignore
+        """Get an empty vectorstore."""
+        store = ParrotVectorStore(self.get_embeddings())
+        # note: store should be EMPTY at this point
+        # if you need to delete data, you may do so here
+        try:
+            yield store
+        finally:
+            # cleanup operations, or deleting data
+            pass
+
+
+class TestAsync(AsyncReadWriteTestSuite):
+    @pytest.fixture()
+    async def vectorstore(self) -> AsyncGenerator[VectorStore, None]:  # type: ignore
+        """Get an empty vectorstore."""
+        store = ParrotVectorStore(self.get_embeddings())
+        # note: store should be EMPTY at this point
+        # if you need to delete data, you may do so here
+        try:
+            yield store
+        finally:
+            # cleanup operations, or deleting data
+            pass
+```
+
+There are separate suites for testing synchronous and asynchronous methods.
+Configuring the tests consists of implementing pytest fixtures for setting up an
+empty vector store and tearing down the vector store after the test run ends.
+
+For example, below is the `ReadWriteTestSuite` for the [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma/)
+integration:
+
+```python
+from typing import Generator
+
+import pytest
+from langchain_core.vectorstores import VectorStore
+from langchain_tests.integration_tests.vectorstores import ReadWriteTestSuite
+
+from langchain_chroma import Chroma
+
+
+class TestSync(ReadWriteTestSuite):
+    @pytest.fixture()
+    def vectorstore(self) -> Generator[VectorStore, None, None]:  # type: ignore
+        """Get an empty vectorstore."""
+        store = Chroma(embedding_function=self.get_embeddings())
+        try:
+            yield store
+        finally:
+            store.delete_collection()
+            pass
+```
+
+Note that before the initial `yield`, we instantiate the vector store with an
+[embeddings](/docs/concepts/embedding_models/) object. This is a pre-defined
+["fake" embeddings model](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.ReadWriteTestSuite.html#langchain_tests.integration_tests.vectorstores.ReadWriteTestSuite.get_embeddings)
+that will generate short, arbitrary vectors for documents. You can use a different
+embeddings object if desired.
+
+In the `finally` block, we call whatever integration-specific logic is needed to
+bring the vector store to a clean state. This logic is executed in between each test
+(e.g., even if tests fail).
+
+### Run standard tests
+
+After setting tests up, you would run them with the following command from your project root:
+
+```shell
+pytest --asyncio-mode=auto tests/integration_tests
+```
+
+### Test suite information and troubleshooting
+
+Each test method documents:
+
+1. Troubleshooting tips;
+2. (If applicable) how test can be skipped.
+
+This information along with the full set of tests that run can be found in the API
+reference. See details:
+
+- [Sync tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.ReadWriteTestSuite.html)
+- [Async tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.AsyncReadWriteTestSuite.html)
--- a/docs/docs/contributing/index.mdx
+++ b/docs/docs/contributing/index.mdx
@@ -17,7 +17,6 @@ More coming soon! We are working on tutorials to help you make your first contri
 - [**Documentation**](how_to/documentation/index.mdx): Help improve our docs, including this one!
 - [**Code**](how_to/code/index.mdx): Help us write code, fix bugs, or improve our infrastructure.
 - [**Integrations**](how_to/integrations/index.mdx): Help us integrate with your favorite vendors and tools.
- [**Standard Tests**](how_to/integrations/standard_tests): Ensure your integration passes an expected set of tests.

 ## Reference

--- a/docs/docs/how_to/_custom_retriever_intro.mdx
+++ b/docs/docs/how_to/_custom_retriever_intro.mdx
@@ -0,0 +1,23 @@
+To create your own retriever, you need to extend the `BaseRetriever` class and implement the following methods:
+
+| Method                         | Description                                      | Required/Optional |
+|--------------------------------|--------------------------------------------------|-------------------|
+| `_get_relevant_documents`      | Get documents relevant to a query.               | Required          |
+| `_aget_relevant_documents`     | Implement to provide async native support.       | Optional          |
+
+
+The logic inside of `_get_relevant_documents` can involve arbitrary calls to a database or to the web using requests.
+
+:::tip
+By inherting from `BaseRetriever`, your retriever automatically becomes a LangChain [Runnable](/docs/concepts/runnables) and will gain the standard `Runnable` functionality out of the box!
+:::
+
+
+:::info
+You can use a `RunnableLambda` or `RunnableGenerator` to implement a retriever.
+
+The main benefit of implementing a retriever as a `BaseRetriever` vs. a `RunnableLambda` (a custom [runnable function](/docs/how_to/functions)) is that a `BaseRetriever` is a well
+known LangChain entity so some tooling for monitoring may implement specialized behavior for retrievers. Another difference
+is that a `BaseRetriever` will behave slightly differently from `RunnableLambda` in some APIs; e.g., the `start` event
+in `astream_events` API will be `on_retriever_start` instead of `on_chain_start`.
+:::
--- a/docs/docs/how_to/custom_retriever.ipynb
+++ b/docs/docs/how_to/custom_retriever.ipynb
@@ -27,29 +27,9 @@
    "\n",
    "## Interface\n",
    "\n",
-    "To create your own retriever, you need to extend the `BaseRetriever` class and implement the following methods:\n",
+    "import CustomRetrieverIntro from './_custom_retriever_intro.mdx';\n",
    "\n",
-    "| Method                         | Description                                      | Required/Optional |\n",
-    "|--------------------------------|--------------------------------------------------|-------------------|\n",
-    "| `_get_relevant_documents`      | Get documents relevant to a query.               | Required          |\n",
-    "| `_aget_relevant_documents`     | Implement to provide async native support.       | Optional          |\n",
-    "\n",
-    "\n",
-    "The logic inside of `_get_relevant_documents` can involve arbitrary calls to a database or to the web using requests.\n",
-    "\n",
-    ":::tip\n",
-    "By inherting from `BaseRetriever`, your retriever automatically becomes a LangChain [Runnable](/docs/concepts/runnables) and will gain the standard `Runnable` functionality out of the box!\n",
-    ":::\n",
-    "\n",
-    "\n",
-    ":::info\n",
-    "You can use a `RunnableLambda` or `RunnableGenerator` to implement a retriever.\n",
-    "\n",
-    "The main benefit of implementing a retriever as a `BaseRetriever` vs. a `RunnableLambda` (a custom [runnable function](/docs/how_to/functions)) is that a `BaseRetriever` is a well\n",
-    "known LangChain entity so some tooling for monitoring may implement specialized behavior for retrievers. Another difference\n",
-    "is that a `BaseRetriever` will behave slightly differently from `RunnableLambda` in some APIs; e.g., the `start` event\n",
-    "in `astream_events` API will be `on_retriever_start` instead of `on_chain_start`.\n",
-    ":::\n"
+    "<CustomRetrieverIntro />"
   ]
  },
  {
--- a/docs/vercel.json
+++ b/docs/vercel.json
@@ -113,6 +113,10 @@
    {
      "source": "/docs/contributing/:path((?:faq|repo_structure|review_process)/?)",
      "destination": "/docs/contributing/reference/:path"
+    },
+    {
+      "source": "/docs/contributing/how_to/integrations/standard_tests(/?)",
+      "destination": "/docs/contributing/how_to/integrations/#standard-tests"
    }
  ]
 }
Author	SHA1	Message	Date
Erick Friis	df4e0e6d81	x	2024-12-04 17:34:41 -08:00
Chester Curme	0ab8e5cfe0	nit	2024-12-04 16:56:12 -05:00
ccurme	3fe135f242	docs[patch]: add vector store contribution guide (#28518 )	2024-12-04 16:26:23 -05:00
ccurme	01045580f9	docs[patch]: add to chat model contributing guide (#28490 )	2024-12-03 16:32:50 -05:00
ccurme	fcbca18342	Merge branch 'master' into harrison/improve-integration-docs	2024-12-03 15:31:38 -05:00
ccurme	ff214eb503	Update docs/docs/contributing/how_to/integrations/index.mdx Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-12-03 15:31:20 -05:00
ccurme	027f49c661	docs[patch]: reorganize integration guides (#28457 ) Proposal is: - Each type of component (chat models, embeddings, etc) has a dedicated guide. - This guide contains detail on both implementation and testing via langchain-tests. - We delete the monolithic standard-tests guide.	2024-12-02 17:55:57 -05:00
Harrison Chase	ce6e4bb645	improve integration docs	2024-11-29 12:40:35 -05:00