docs: tool, retriever contributing docs (#28602)

2025-06-23 07:09:31 +00:00 · 2024-12-06 16:36:55 -08:00 · 2024-12-06 16:36:55 -08:00 · 9b848491c8
commit 9b848491c8
parent 5e8553c31a
2 changed files with 100 additions and 1 deletions
--- a/docs/docs/contributing/how_to/integrations/index.mdx
+++ b/docs/docs/contributing/how_to/integrations/index.mdx
@ -1,4 +1,5 @@
 ---
+pagination_prev: null
 pagination_next: contributing/how_to/integrations/package
 ---

@ -37,7 +38,6 @@ While any component can be integrated into LangChain, there are specific types o
        <li>Chat Models</li>
        <li>Tools/Toolkits</li>
        <li>Retrievers</li>
-        <li>Document Loaders</li>
        <li>Vector Stores</li>
        <li>Embedding Models</li>
      </ul>
@ -45,6 +45,7 @@ While any component can be integrated into LangChain, there are specific types o
    <td>
      <ul>
        <li>LLMs (Text-Completion Models)</li>
+        <li>Document Loaders</li>
        <li>Key-Value Stores</li>
        <li>Document Transformers</li>
        <li>Model Caches</li>
--- a/docs/docs/contributing/how_to/integrations/package.mdx
+++ b/docs/docs/contributing/how_to/integrations/package.mdx
@ -175,6 +175,60 @@ import EmbeddingsSource from '/src/theme/integration_template/integration_templa
    </TabItem>
    <TabItem value="tools" label="Tools">

+Tools are used in 2 main ways:
+
+1. To define an "input schema" or "args schema" to pass to a chat model's tool calling
+feature along with a text request, such that the chat model can generate a "tool call",
+or parameters to call the tool with.
+2. To take a "tool call" as generated above, and take some action and return a response
+that can be passed back to the chat model as a ToolMessage.
+
+The `Tools` class must inherit from the [BaseTool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html#langchain_core.tools.base.BaseTool) base class. This interface has 3 properties and 2 methods that should be implemented in a 
+subclass.
+
+| Method/Property         | Description                                          |
+|------------------------ |------------------------------------------------------|
+| `name`                  | Name of the tool (passed to the LLM too).            |
+| `description`           | Description of the tool (passed to the LLM too).     |
+| `args_schema`           | Define the schema for the tool's input arguments.    |
+| `_run`                  | Run the tool with the given arguments.               |
+| `_arun`                 | Asynchronously run the tool with the given arguments.|
+
+### Properties
+
+`name`, `description`, and `args_schema` are all properties that should be implemented
+in the subclass. `name` and `description` are strings that are used to identify the tool
+and provide a description of what the tool does. Both of these are passed to the LLM,
+and users may override these values depending on the LLM they are using as a form of
+"prompt engineering." Giving these a concise and LLM-usable name and description is
+important for the initial user experience of the tool.
+
+`args_schema` is a Pydantic `BaseModel` that defines the schema for the tool's input
+arguments. This is used to validate the input arguments to the tool, and to provide
+a schema for the LLM to fill out when calling the tool. Similar to the `name` and
+`description` of the overall Tool class, the fields' names (the variable name) and
+description (part of `Field(..., description="description")`) are passed to the LLM, 
+and the values in these fields should be concise and LLM-usable.
+
+### Run Methods
+
+`_run` is the main method that should be implemented in the subclass. This method
+takes in the arguments from `args_schema` and runs the tool, returning a string
+response. This method is usually called in a LangGraph [`ToolNode`](https://langchain-ai.github.io/langgraph/how-tos/tool-calling/), and can also be called in a legacy
+`langchain.agents.AgentExecutor`.
+
+`_arun` is optional because by default, `_run` will be run in an async executor.
+However, if your tool is calling any apis or doing any async work, you should implement
+this method to run the tool asynchronously in addition to `_run`.
+
+### Implementation
+
+The `langchain-cli` package contains [template integrations](https://github.com/langchain-ai/langchain/tree/master/libs/cli/langchain_cli/integration_template/integration_template)
+for major LangChain components that are tested against the standard unit and
+integration tests in the LangChain Github repository. You can access the starter
+embedding model implementation [here](https://github.com/langchain-ai/langchain/blob/master/libs/cli/langchain_cli/integration_template/integration_template/tools.py).
+For convenience, we also include the code below.
+
        <details>
            <summary>Example tool code</summary>

@ -194,6 +248,50 @@ import ToolSource from '/src/theme/integration_template/integration_template/too
    </TabItem>
    <TabItem value="retrievers" label="Retrievers">

+Retrievers are used to retrieve documents from APIs, databases, or other sources
+based on a query. The `Retriever` class must inherit from the [BaseRetriever](https://python.langchain.com/api_reference/core/retrievers/langchain_core.retrievers.BaseRetriever.html) base class. This interface has 1 attribute and 2 methods that should be implemented in a subclass.
+
+| Method/Property         | Description                                          |
+|------------------------ |------------------------------------------------------|
+| `k`                     | Default number of documents to retrieve (configurable). |
+| `_get_relevant_documents`| Retrieve documents based on a query.                 |
+| `_aget_relevant_documents`| Asynchronously retrieve documents based on a query.  |
+
+### Attributes
+
+`k` is an attribute that should be implemented in the subclass. This attribute
+can simply be defined at the top of the class with a default value like
+`k: int = 5`. This attribute is the default number of documents to retrieve
+from the retriever, and can be overridden by the user when constructing or calling
+the retriever.
+
+### Methods
+
+`_get_relevant_documents` is the main method that should be implemented in the subclass.
+
+This method takes in a query and returns a list of `Document` objects, which have 2
+main properties:
+
+- `page_content` - the text content of the document
+- `metadata` - a dictionary of metadata about the document
+
+Retrievers are typically directly invoked by a user, e.g. as
+`MyRetriever(k=4).invoke("query")`, which will automatically call `_get_relevant_documents`
+under the hood.
+
+`_aget_relevant_documents` is optional because by default, `_get_relevant_documents` will
+be run in an async executor. However, if your retriever is calling any apis or doing
+any async work, you should implement this method to run the retriever asynchronously
+in addition to `_get_relevant_documents` for performance reasons.
+
+### Implementation
+
+The `langchain-cli` package contains [template integrations](https://github.com/langchain-ai/langchain/tree/master/libs/cli/langchain_cli/integration_template/integration_template)
+for major LangChain components that are tested against the standard unit and
+integration tests in the LangChain Github repository. You can access the starter
+embedding model implementation [here](https://github.com/langchain-ai/langchain/blob/master/libs/cli/langchain_cli/integration_template/integration_template/retrievers.py).
+For convenience, we also include the code below.
+
        <details>
            <summary>Example retriever code</summary>