mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-09 18:51:07 +00:00
Compare commits
53 Commits
langchain-
...
langchain-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
534b8f4364 | ||
|
|
6815981578 | ||
|
|
ce3b69aa05 | ||
|
|
242fee11be | ||
|
|
85114b4f3a | ||
|
|
9fcd203556 | ||
|
|
bdb4cf7cc0 | ||
|
|
b476fdb54a | ||
|
|
b64d846347 | ||
|
|
4c70ffff01 | ||
|
|
ffb5c1905a | ||
|
|
6e6061fe73 | ||
|
|
ec9b41431e | ||
|
|
cef21a0b49 | ||
|
|
90f162efb6 | ||
|
|
b53f07bfb9 | ||
|
|
eabe587787 | ||
|
|
481c4bfaba | ||
|
|
54fba7e520 | ||
|
|
079c7ea0fc | ||
|
|
e8508fb4c6 | ||
|
|
220b33df7f | ||
|
|
1fc4ac32f0 | ||
|
|
2354bb7bfa | ||
|
|
317a38b83e | ||
|
|
fbf0704e48 | ||
|
|
524ee6d9ac | ||
|
|
dd0085a9ff | ||
|
|
9b848491c8 | ||
|
|
5e8553c31a | ||
|
|
d801c6ffc7 | ||
|
|
a32035d17d | ||
|
|
07c2ac765a | ||
|
|
4a7dc6ec4c | ||
|
|
80a88f8f04 | ||
|
|
0eb7ab65f1 | ||
|
|
b7c2029e84 | ||
|
|
925ca75ca5 | ||
|
|
f943205ebf | ||
|
|
9e2abcd152 | ||
|
|
246c10a1cc | ||
|
|
1cedf401a7 | ||
|
|
791d7e965e | ||
|
|
4f99952129 | ||
|
|
221ab03fe4 | ||
|
|
e6663b69f3 | ||
|
|
c38b845d7e | ||
|
|
2c6bc74cb1 | ||
|
|
dda9f90047 | ||
|
|
15cbc36a23 | ||
|
|
f3dc142d3c | ||
|
|
5277a021c1 | ||
|
|
18386c16c7 |
46
README.md
46
README.md
@@ -38,18 +38,21 @@ conda install langchain -c conda-forge
|
||||
|
||||
For these applications, LangChain simplifies the entire application lifecycle:
|
||||
|
||||
- **Open-source libraries**: Build your applications using LangChain's open-source [building blocks](https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel), [components](https://python.langchain.com/docs/concepts/), and [third-party integrations](https://python.langchain.com/docs/integrations/providers/).
|
||||
|
||||
- **Open-source libraries**: Build your applications using LangChain's open-source
|
||||
[components](https://python.langchain.com/docs/concepts/) and
|
||||
[third-party integrations](https://python.langchain.com/docs/integrations/providers/).
|
||||
Use [LangGraph](https://langchain-ai.github.io/langgraph/) to build stateful agents with first-class streaming and human-in-the-loop support.
|
||||
- **Productionization**: Inspect, monitor, and evaluate your apps with [LangSmith](https://docs.smith.langchain.com/) so that you can constantly optimize and deploy with confidence.
|
||||
- **Deployment**: Turn your LangGraph applications into production-ready APIs and Assistants with [LangGraph Cloud](https://langchain-ai.github.io/langgraph/cloud/).
|
||||
- **Deployment**: Turn your LangGraph applications into production-ready APIs and Assistants with [LangGraph Platform](https://langchain-ai.github.io/langgraph/cloud/).
|
||||
|
||||
### Open-source libraries
|
||||
|
||||
- **`langchain-core`**: Base abstractions and LangChain Expression Language.
|
||||
- **`langchain-community`**: Third party integrations.
|
||||
- Some integrations have been further split into **partner packages** that only rely on **`langchain-core`**. Examples include **`langchain_openai`** and **`langchain_anthropic`**.
|
||||
- **`langchain-core`**: Base abstractions.
|
||||
- **Integration packages** (e.g. **`langchain-openai`**, **`langchain-anthropic`**, etc.): Important integrations have been split into lightweight packages that are co-maintained by the LangChain team and the integration developers.
|
||||
- **`langchain`**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
|
||||
- **[`LangGraph`](https://langchain-ai.github.io/langgraph/)**: A library for building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph. Integrates smoothly with LangChain, but can be used without it. To learn more about LangGraph, check out our first LangChain Academy course, *Introduction to LangGraph*, available [here](https://academy.langchain.com/courses/intro-to-langgraph).
|
||||
- **`langchain-community`**: Third-party integrations that are community maintained.
|
||||
- **[LangGraph](https://langchain-ai.github.io/langgraph)**: Build robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph. Integrates smoothly with LangChain, but can be used without it. To learn more about LangGraph, check out our first LangChain Academy course, *Introduction to LangGraph*, available [here](https://academy.langchain.com/courses/intro-to-langgraph).
|
||||
|
||||
### Productionization:
|
||||
|
||||
@@ -57,7 +60,7 @@ For these applications, LangChain simplifies the entire application lifecycle:
|
||||
|
||||
### Deployment:
|
||||
|
||||
- **[LangGraph Cloud](https://langchain-ai.github.io/langgraph/cloud/)**: Turn your LangGraph applications into production-ready APIs and Assistants.
|
||||
- **[LangGraph Platform](https://langchain-ai.github.io/langgraph/cloud/)**: Turn your LangGraph applications into production-ready APIs and Assistants.
|
||||
|
||||

|
||||

|
||||
@@ -85,19 +88,12 @@ And much more! Head to the [Tutorials](https://python.langchain.com/docs/tutoria
|
||||
|
||||
The main value props of the LangChain libraries are:
|
||||
|
||||
1. **Components**: composable building blocks, tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
|
||||
2. **Off-the-shelf chains**: built-in assemblages of components for accomplishing higher-level tasks
|
||||
|
||||
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
|
||||
|
||||
## LangChain Expression Language (LCEL)
|
||||
|
||||
LCEL is a key part of LangChain, allowing you to build and organize chains of processes in a straightforward, declarative manner. It was designed to support taking prototypes directly into production without needing to alter any code. This means you can use LCEL to set up everything from basic "prompt + LLM" setups to intricate, multi-step workflows.
|
||||
|
||||
- **[Overview](https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel)**: LCEL and its benefits
|
||||
- **[Interface](https://python.langchain.com/docs/concepts/#runnable-interface)**: The standard Runnable interface for LCEL objects
|
||||
- **[Primitives](https://python.langchain.com/docs/how_to/#langchain-expression-language-lcel)**: More on the primitives LCEL includes
|
||||
- **[Cheatsheet](https://python.langchain.com/docs/how_to/lcel_cheatsheet/)**: Quick overview of the most common usage patterns
|
||||
1. **Components**: composable building blocks, tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not.
|
||||
2. **Easy orchestration with LangGraph**: [LangGraph](https://langchain-ai.github.io/langgraph/),
|
||||
built on top of `langchain-core`, has built-in support for [messages](https://python.langchain.com/docs/concepts/messages/), [tools](https://python.langchain.com/docs/concepts/tools/),
|
||||
and other LangChain abstractions. This makes it easy to combine components into
|
||||
production-ready applications with persistence, streaming, and other key features.
|
||||
Check out the LangChain [tutorials page](https://python.langchain.com/docs/tutorials/#orchestration) for examples.
|
||||
|
||||
## Components
|
||||
|
||||
@@ -105,15 +101,19 @@ Components fall into the following **modules**:
|
||||
|
||||
**📃 Model I/O**
|
||||
|
||||
This includes [prompt management](https://python.langchain.com/docs/concepts/#prompt-templates), [prompt optimization](https://python.langchain.com/docs/concepts/#example-selectors), a generic interface for [chat models](https://python.langchain.com/docs/concepts/#chat-models) and [LLMs](https://python.langchain.com/docs/concepts/#llms), and common utilities for working with [model outputs](https://python.langchain.com/docs/concepts/#output-parsers).
|
||||
This includes [prompt management](https://python.langchain.com/docs/concepts/prompt_templates/)
|
||||
and a generic interface for [chat models](https://python.langchain.com/docs/concepts/chat_models/), including a consistent interface for [tool-calling](https://python.langchain.com/docs/concepts/tool_calling/) and [structured output](https://python.langchain.com/docs/concepts/structured_outputs/) across model providers.
|
||||
|
||||
**📚 Retrieval**
|
||||
|
||||
Retrieval Augmented Generation involves [loading data](https://python.langchain.com/docs/concepts/#document-loaders) from a variety of sources, [preparing it](https://python.langchain.com/docs/concepts/#text-splitters), then [searching over (a.k.a. retrieving from)](https://python.langchain.com/docs/concepts/#retrievers) it for use in the generation step.
|
||||
Retrieval Augmented Generation involves [loading data](https://python.langchain.com/docs/concepts/document_loaders/) from a variety of sources, [preparing it](https://python.langchain.com/docs/concepts/text_splitters/), then [searching over (a.k.a. retrieving from)](https://python.langchain.com/docs/concepts/retrievers/) it for use in the generation step.
|
||||
|
||||
**🤖 Agents**
|
||||
|
||||
Agents allow an LLM autonomy over how a task is accomplished. Agents make decisions about which Actions to take, then take that Action, observe the result, and repeat until the task is complete. LangChain provides a [standard interface for agents](https://python.langchain.com/docs/concepts/#agents), along with [LangGraph](https://github.com/langchain-ai/langgraph) for building custom agents.
|
||||
Agents allow an LLM autonomy over how a task is accomplished. Agents make decisions about which Actions to take, then take that Action, observe the result, and repeat until the task is complete. [LangGraph](https://langchain-ai.github.io/langgraph/) makes it easy to use
|
||||
LangChain components to build both [custom](https://langchain-ai.github.io/langgraph/tutorials/)
|
||||
and [built-in](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/)
|
||||
LLM agents.
|
||||
|
||||
## 📖 Documentation
|
||||
|
||||
|
||||
@@ -60,6 +60,7 @@ copy-infra:
|
||||
cp package.json $(OUTPUT_NEW_DIR)
|
||||
cp sidebars.js $(OUTPUT_NEW_DIR)
|
||||
cp -r static $(OUTPUT_NEW_DIR)
|
||||
cp -r ../libs/cli/langchain_cli/integration_template $(OUTPUT_NEW_DIR)/src/theme
|
||||
cp yarn.lock $(OUTPUT_NEW_DIR)
|
||||
|
||||
render:
|
||||
@@ -81,6 +82,7 @@ build: install-py-deps generate-files copy-infra render md-sync append-related
|
||||
vercel-build: install-vercel-deps build generate-references
|
||||
rm -rf docs
|
||||
mv $(OUTPUT_NEW_DOCS_DIR) docs
|
||||
cp -r ../libs/cli/langchain_cli/integration_template src/theme
|
||||
rm -rf build
|
||||
mkdir static/api_reference
|
||||
git clone --depth=1 https://github.com/langchain-ai/langchain-api-docs-html.git
|
||||
|
||||
@@ -87,6 +87,18 @@ class Beta(BaseAdmonition):
|
||||
def setup(app):
|
||||
app.add_directive("example_links", ExampleLinksDirective)
|
||||
app.add_directive("beta", Beta)
|
||||
app.connect("autodoc-skip-member", skip_private_members)
|
||||
|
||||
|
||||
def skip_private_members(app, what, name, obj, skip, options):
|
||||
if skip:
|
||||
return True
|
||||
if hasattr(obj, "__doc__") and obj.__doc__ and ":private:" in obj.__doc__:
|
||||
return True
|
||||
if name == "__init__" and obj.__objclass__ is object:
|
||||
# dont document default init
|
||||
return True
|
||||
return None
|
||||
|
||||
|
||||
# -- Project information -----------------------------------------------------
|
||||
|
||||
@@ -72,14 +72,21 @@ def _load_module_members(module_path: str, namespace: str) -> ModuleMembers:
|
||||
Returns:
|
||||
list: A list of loaded module objects.
|
||||
"""
|
||||
|
||||
classes_: List[ClassInfo] = []
|
||||
functions: List[FunctionInfo] = []
|
||||
module = importlib.import_module(module_path)
|
||||
|
||||
if ":private:" in (module.__doc__ or ""):
|
||||
return ModuleMembers(classes_=[], functions=[])
|
||||
|
||||
for name, type_ in inspect.getmembers(module):
|
||||
if not hasattr(type_, "__module__"):
|
||||
continue
|
||||
if type_.__module__ != module_path:
|
||||
continue
|
||||
if ":private:" in (type_.__doc__ or ""):
|
||||
continue
|
||||
|
||||
if inspect.isclass(type_):
|
||||
# The type of the class is used to select a template
|
||||
|
||||
@@ -65,7 +65,7 @@ A package to deploy LangChain chains as REST APIs. Makes it easy to get a produc
|
||||
:::important
|
||||
LangServe is designed to primarily deploy simple Runnables and work with well-known primitives in langchain-core.
|
||||
|
||||
If you need a deployment option for LangGraph, you should instead be looking at LangGraph Cloud (beta) which will be better suited for deploying LangGraph applications.
|
||||
If you need a deployment option for LangGraph, you should instead be looking at LangGraph Platform (beta) which will be better suited for deploying LangGraph applications.
|
||||
:::
|
||||
|
||||
For more information, see the [LangServe documentation](/docs/langserve).
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
---
|
||||
pagination_prev: null
|
||||
pagination_next: contributing/how_to/integrations/package
|
||||
---
|
||||
|
||||
@@ -37,7 +38,6 @@ While any component can be integrated into LangChain, there are specific types o
|
||||
<li>Chat Models</li>
|
||||
<li>Tools/Toolkits</li>
|
||||
<li>Retrievers</li>
|
||||
<li>Document Loaders</li>
|
||||
<li>Vector Stores</li>
|
||||
<li>Embedding Models</li>
|
||||
</ul>
|
||||
@@ -45,6 +45,7 @@ While any component can be integrated into LangChain, there are specific types o
|
||||
<td>
|
||||
<ul>
|
||||
<li>LLMs (Text-Completion Models)</li>
|
||||
<li>Document Loaders</li>
|
||||
<li>Key-Value Stores</li>
|
||||
<li>Document Transformers</li>
|
||||
<li>Model Caches</li>
|
||||
|
||||
@@ -12,98 +12,90 @@ which contain classes that are compatible with LangChain's core interfaces.
|
||||
|
||||
We will cover:
|
||||
|
||||
1. How to implement components, such as [chat models](/docs/concepts/chat_models/) and [vector stores](/docs/concepts/vectorstores/), that adhere
|
||||
1. (Optional) How to bootstrap a new integration package
|
||||
2. How to implement components, such as [chat models](/docs/concepts/chat_models/) and [vector stores](/docs/concepts/vectorstores/), that adhere
|
||||
to the LangChain interface;
|
||||
2. (Optional) How to bootstrap a new integration package.
|
||||
|
||||
## Implementing LangChain components
|
||||
|
||||
LangChain components are subclasses of base classes in [langchain-core](/docs/concepts/architecture/#langchain-core).
|
||||
Examples include [chat models](/docs/concepts/chat_models/),
|
||||
[vector stores](/docs/concepts/vectorstores/), [tools](/docs/concepts/tools/),
|
||||
[embedding models](/docs/concepts/embedding_models/) and [retrievers](/docs/concepts/retrievers/).
|
||||
|
||||
Your integration package will typically implement a subclass of at least one of these
|
||||
components. Expand the tabs below to see details on each.
|
||||
|
||||
<details>
|
||||
<summary>Chat models</summary>
|
||||
|
||||
Refer to the [Custom Chat Model Guide](/docs/how_to/custom_chat_model) guide for
|
||||
detail on a starter chat model [implementation](/docs/how_to/custom_chat_model/#implementation).
|
||||
|
||||
:::tip
|
||||
|
||||
The model from the [Custom Chat Model Guide](/docs/how_to/custom_chat_model) is tested
|
||||
against the standard unit and integration tests in the LangChain Github repository.
|
||||
You can also access that implementation directly from Github
|
||||
[here](https://github.com/langchain-ai/langchain/blob/master/libs/standard-tests/tests/unit_tests/custom_chat_model.py).
|
||||
|
||||
:::
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Vector stores</summary>
|
||||
|
||||
Your vector store implementation will depend on your chosen database technology.
|
||||
`langchain-core` includes a minimal
|
||||
[in-memory vector store](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.in_memory.InMemoryVectorStore.html)
|
||||
that we can use as a guide. You can access the code [here](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/vectorstores/in_memory.py).
|
||||
|
||||
All vector stores must inherit from the [VectorStore](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.base.VectorStore.html)
|
||||
base class. This interface consists of methods for writing, deleting and searching
|
||||
for documents in the vector store.
|
||||
|
||||
`VectorStore` supports a variety of synchronous and asynchronous search types (e.g.,
|
||||
nearest-neighbor or maximum marginal relevance), as well as interfaces for adding
|
||||
documents to the store. See the [API Reference](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.base.VectorStore.html)
|
||||
for all supported methods. The required methods are tabulated below:
|
||||
|
||||
| Method/Property | Description |
|
||||
|------------------------ |------------------------------------------------------|
|
||||
| `add_documents` | Add documents to the vector store. |
|
||||
| `delete` | Delete selected documents from vector store (by IDs) |
|
||||
| `get_by_ids` | Get selected documents from vector store (by IDs) |
|
||||
| `similarity_search` | Get documents most similar to a query. |
|
||||
| `embeddings` (property) | Embeddings object for vector store. |
|
||||
| `from_texts` | Instantiate vector store via adding texts. |
|
||||
|
||||
Note that `InMemoryVectorStore` implements some optional search types, as well as
|
||||
convenience methods for loading and dumping the object to a file, but this is not
|
||||
necessary for all implementations.
|
||||
|
||||
:::tip
|
||||
|
||||
The [in-memory vector store](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/vectorstores/in_memory.py)
|
||||
is tested against the standard tests in the LangChain Github repository.
|
||||
|
||||
:::
|
||||
|
||||
</details>
|
||||
|
||||
<!-- <details>
|
||||
<summary>Embeddings</summary>
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Tools</summary>
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Retrievers</summary>
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Document Loaders</summary>
|
||||
|
||||
</details> -->
|
||||
|
||||
## (Optional) bootstrapping a new integration package
|
||||
|
||||
In this section, we will outline 2 options for bootstrapping a new integration package,
|
||||
and you're welcome to use other tools if you prefer!
|
||||
|
||||
1. **langchain-cli**: This is a command-line tool that can be used to bootstrap a new integration package with a template for LangChain components and Poetry for dependency management.
|
||||
2. **Poetry**: This is a Python dependency management tool that can be used to bootstrap a new Python package with dependencies. You can then add LangChain components to this package.
|
||||
|
||||
<details>
|
||||
<summary>Option 1: langchain-cli (recommended)</summary>
|
||||
|
||||
In this guide, we will be using the `langchain-cli` to create a new integration package
|
||||
from a template, which can be edited to implement your LangChain components.
|
||||
|
||||
### **Prerequisites**
|
||||
|
||||
- [GitHub](https://github.com) account
|
||||
- [PyPi](https://pypi.org/) account
|
||||
|
||||
### Boostrapping a new Python package with langchain-cli
|
||||
|
||||
First, install `langchain-cli` and `poetry`:
|
||||
|
||||
```bash
|
||||
pip install langchain-cli poetry
|
||||
```
|
||||
|
||||
Next, come up with a name for your package. For this guide, we'll use `langchain-parrot-link`.
|
||||
You can confirm that the name is available on PyPi by searching for it on the [PyPi website](https://pypi.org/).
|
||||
|
||||
Next, create your new Python package with `langchain-cli`, and navigate into the new directory with `cd`:
|
||||
|
||||
```bash
|
||||
langchain-cli integration new
|
||||
|
||||
> The name of the integration to create (e.g. `my-integration`): parrot-link
|
||||
> Name of integration in PascalCase [ParrotLink]:
|
||||
|
||||
cd parrot-link
|
||||
```
|
||||
|
||||
Next, let's add any dependencies we need
|
||||
|
||||
```bash
|
||||
poetry add my-integration-sdk
|
||||
```
|
||||
|
||||
We can also add some `typing` or `test` dependencies in a separate poetry dependency group.
|
||||
|
||||
```
|
||||
poetry add --group typing my-typing-dep
|
||||
poetry add --group test my-test-dep
|
||||
```
|
||||
|
||||
And finally, have poetry set up a virtual environment with your dependencies, as well
|
||||
as your integration package:
|
||||
|
||||
```bash
|
||||
poetry install --with lint,typing,test,test_integration
|
||||
```
|
||||
|
||||
You now have a new Python package with a template for LangChain components! This
|
||||
template comes with files for each integration type, and you're welcome to duplicate or
|
||||
delete any of these files as needed (including the associated test files).
|
||||
|
||||
To create any individual files from the [template], you can run e.g.:
|
||||
|
||||
```bash
|
||||
langchain-cli integration new \
|
||||
--name parrot-link \
|
||||
--name-class ParrotLink \
|
||||
--src integration_template/chat_models.py \
|
||||
--dst langchain_parrot_link/chat_models_2.py
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Option 2: Poetry (manual)</summary>
|
||||
|
||||
In this guide, we will be using [Poetry](https://python-poetry.org/) for
|
||||
dependency management and packaging, and you're welcome to use any other tools you prefer.
|
||||
|
||||
@@ -183,6 +175,8 @@ later, following the [standard tests](../standard_tests) guide.
|
||||
For `chat_models.py`, simply paste the contents of the chat model implementation
|
||||
[above](#implementing-langchain-components).
|
||||
|
||||
</details>
|
||||
|
||||
### Push your package to a public Github repository
|
||||
|
||||
This is only required if you want to publish your integration in the LangChain documentation.
|
||||
@@ -191,6 +185,319 @@ This is only required if you want to publish your integration in the LangChain d
|
||||
2. Push your code to the repository.
|
||||
3. Confirm that your repository is viewable by the public (e.g. in a private browsing window, where you're not logged into Github).
|
||||
|
||||
## Implementing LangChain components
|
||||
|
||||
LangChain components are subclasses of base classes in [langchain-core](/docs/concepts/architecture/#langchain-core).
|
||||
Examples include [chat models](/docs/concepts/chat_models/),
|
||||
[vector stores](/docs/concepts/vectorstores/), [tools](/docs/concepts/tools/),
|
||||
[embedding models](/docs/concepts/embedding_models/) and [retrievers](/docs/concepts/retrievers/).
|
||||
|
||||
Your integration package will typically implement a subclass of at least one of these
|
||||
components. Expand the tabs below to see details on each.
|
||||
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
import CodeBlock from '@theme/CodeBlock';
|
||||
|
||||
<Tabs>
|
||||
|
||||
<TabItem value="chat_models" label="Chat models">
|
||||
|
||||
Refer to the [Custom Chat Model Guide](/docs/how_to/custom_chat_model) guide for
|
||||
detail on a starter chat model [implementation](/docs/how_to/custom_chat_model/#implementation).
|
||||
|
||||
You can start from the following template or langchain-cli command:
|
||||
|
||||
```bash
|
||||
langchain-cli integration new \
|
||||
--name parrot-link \
|
||||
--name-class ParrotLink \
|
||||
--src integration_template/chat_models.py \
|
||||
--dst langchain_parrot_link/chat_models.py
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary>Example chat model code</summary>
|
||||
|
||||
import ChatModelSource from '../../../../src/theme/integration_template/integration_template/chat_models.py';
|
||||
|
||||
<CodeBlock language="python" title="langchain_parrot_link/chat_models.py">
|
||||
{
|
||||
ChatModelSource.replaceAll('__ModuleName__', 'ParrotLink')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT_LINK')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
</details>
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="vector_stores" label="Vector stores">
|
||||
|
||||
Your vector store implementation will depend on your chosen database technology.
|
||||
`langchain-core` includes a minimal
|
||||
[in-memory vector store](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.in_memory.InMemoryVectorStore.html)
|
||||
that we can use as a guide. You can access the code [here](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/vectorstores/in_memory.py).
|
||||
|
||||
All vector stores must inherit from the [VectorStore](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.base.VectorStore.html)
|
||||
base class. This interface consists of methods for writing, deleting and searching
|
||||
for documents in the vector store.
|
||||
|
||||
`VectorStore` supports a variety of synchronous and asynchronous search types (e.g.,
|
||||
nearest-neighbor or maximum marginal relevance), as well as interfaces for adding
|
||||
documents to the store. See the [API Reference](https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.base.VectorStore.html)
|
||||
for all supported methods. The required methods are tabulated below:
|
||||
|
||||
| Method/Property | Description |
|
||||
|------------------------ |------------------------------------------------------|
|
||||
| `add_documents` | Add documents to the vector store. |
|
||||
| `delete` | Delete selected documents from vector store (by IDs) |
|
||||
| `get_by_ids` | Get selected documents from vector store (by IDs) |
|
||||
| `similarity_search` | Get documents most similar to a query. |
|
||||
| `embeddings` (property) | Embeddings object for vector store. |
|
||||
| `from_texts` | Instantiate vector store via adding texts. |
|
||||
|
||||
Note that `InMemoryVectorStore` implements some optional search types, as well as
|
||||
convenience methods for loading and dumping the object to a file, but this is not
|
||||
necessary for all implementations.
|
||||
|
||||
:::tip
|
||||
|
||||
The [in-memory vector store](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/vectorstores/in_memory.py)
|
||||
is tested against the standard tests in the LangChain Github repository.
|
||||
|
||||
:::
|
||||
|
||||
<details>
|
||||
<summary>Example vector store code</summary>
|
||||
|
||||
import VectorstoreSource from '../../../../src/theme/integration_template/integration_template/vectorstores.py';
|
||||
|
||||
<CodeBlock language="python" title="langchain_parrot_link/vectorstores.py">
|
||||
{
|
||||
VectorstoreSource.replaceAll('__ModuleName__', 'ParrotLink')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT_LINK')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
</details>
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="embeddings" label="Embeddings">
|
||||
|
||||
Embeddings are used to convert `str` objects from `Document.page_content` fields
|
||||
into a vector representation (represented as a list of floats).
|
||||
|
||||
The `Embeddings` class must inherit from the [Embeddings](https://python.langchain.com/api_reference/core/embeddings/langchain_core.embeddings.embeddings.Embeddings.html#langchain_core.embeddings.embeddings.Embeddings)
|
||||
base class. This interface has 5 methods that can be implemented.
|
||||
|
||||
| Method/Property | Description |
|
||||
|------------------------ |------------------------------------------------------|
|
||||
| `__init__` | Initialize the embeddings object. (optional) |
|
||||
| `embed_query` | Embed a list of texts. (required) |
|
||||
| `embed_documents` | Embed a list of documents. (required) |
|
||||
| `aembed_query` | Asynchronously embed a list of texts. (optional) |
|
||||
| `aembed_documents` | Asynchronously embed a list of documents. (optional) |
|
||||
|
||||
### Constructor
|
||||
|
||||
The `__init__` constructor is optional but common, but can be used to set up any necessary attributes
|
||||
that a user can pass in when initializing the embeddings object. Common attributes include
|
||||
|
||||
- `model` - the id of the model to use for embeddings
|
||||
|
||||
### Embedding queries vs documents
|
||||
|
||||
The `embed_query` and `embed_documents` methods are required. These methods both operate
|
||||
on string inputs (the accessing of `Document.page_content` attributes) is handled
|
||||
by the VectorStore using the embedding model for legacy reasons.
|
||||
|
||||
`embed_query` takes in a single string and returns a single embedding as a list of floats.
|
||||
If your model has different modes for embedding queries vs the underlying documents, you can
|
||||
implement this method to handle that.
|
||||
|
||||
`embed_documents` takes in a list of strings and returns a list of embeddings as a list of lists of floats.
|
||||
|
||||
### Implementation
|
||||
|
||||
You can start from the following template or langchain-cli command:
|
||||
|
||||
```bash
|
||||
langchain-cli integration new \
|
||||
--name parrot-link \
|
||||
--name-class ParrotLink \
|
||||
--src integration_template/embeddings.py \
|
||||
--dst langchain_parrot_link/embeddings.py
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary>Example embeddings code</summary>
|
||||
|
||||
import EmbeddingsSource from '/src/theme/integration_template/integration_template/embeddings.py';
|
||||
|
||||
<CodeBlock language="python" title="langchain_parrot_link/embeddings.py">
|
||||
{
|
||||
EmbeddingsSource.replaceAll('__ModuleName__', 'ParrotLink')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT_LINK')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
</details>
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="tools" label="Tools">
|
||||
|
||||
Tools are used in 2 main ways:
|
||||
|
||||
1. To define an "input schema" or "args schema" to pass to a chat model's tool calling
|
||||
feature along with a text request, such that the chat model can generate a "tool call",
|
||||
or parameters to call the tool with.
|
||||
2. To take a "tool call" as generated above, and take some action and return a response
|
||||
that can be passed back to the chat model as a ToolMessage.
|
||||
|
||||
The `Tools` class must inherit from the [BaseTool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html#langchain_core.tools.base.BaseTool) base class. This interface has 3 properties and 2 methods that should be implemented in a
|
||||
subclass.
|
||||
|
||||
| Method/Property | Description |
|
||||
|------------------------ |------------------------------------------------------|
|
||||
| `name` | Name of the tool (passed to the LLM too). |
|
||||
| `description` | Description of the tool (passed to the LLM too). |
|
||||
| `args_schema` | Define the schema for the tool's input arguments. |
|
||||
| `_run` | Run the tool with the given arguments. |
|
||||
| `_arun` | Asynchronously run the tool with the given arguments.|
|
||||
|
||||
### Properties
|
||||
|
||||
`name`, `description`, and `args_schema` are all properties that should be implemented
|
||||
in the subclass. `name` and `description` are strings that are used to identify the tool
|
||||
and provide a description of what the tool does. Both of these are passed to the LLM,
|
||||
and users may override these values depending on the LLM they are using as a form of
|
||||
"prompt engineering." Giving these a concise and LLM-usable name and description is
|
||||
important for the initial user experience of the tool.
|
||||
|
||||
`args_schema` is a Pydantic `BaseModel` that defines the schema for the tool's input
|
||||
arguments. This is used to validate the input arguments to the tool, and to provide
|
||||
a schema for the LLM to fill out when calling the tool. Similar to the `name` and
|
||||
`description` of the overall Tool class, the fields' names (the variable name) and
|
||||
description (part of `Field(..., description="description")`) are passed to the LLM,
|
||||
and the values in these fields should be concise and LLM-usable.
|
||||
|
||||
### Run Methods
|
||||
|
||||
`_run` is the main method that should be implemented in the subclass. This method
|
||||
takes in the arguments from `args_schema` and runs the tool, returning a string
|
||||
response. This method is usually called in a LangGraph [`ToolNode`](https://langchain-ai.github.io/langgraph/how-tos/tool-calling/), and can also be called in a legacy
|
||||
`langchain.agents.AgentExecutor`.
|
||||
|
||||
`_arun` is optional because by default, `_run` will be run in an async executor.
|
||||
However, if your tool is calling any apis or doing any async work, you should implement
|
||||
this method to run the tool asynchronously in addition to `_run`.
|
||||
|
||||
### Implementation
|
||||
|
||||
You can start from the following template or langchain-cli command:
|
||||
|
||||
```bash
|
||||
langchain-cli integration new \
|
||||
--name parrot-link \
|
||||
--name-class ParrotLink \
|
||||
--src integration_template/tools.py \
|
||||
--dst langchain_parrot_link/tools.py
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary>Example tool code</summary>
|
||||
|
||||
import ToolSource from '/src/theme/integration_template/integration_template/tools.py';
|
||||
|
||||
<CodeBlock language="python" title="langchain_parrot_link/tools.py">
|
||||
{
|
||||
ToolSource.replaceAll('__ModuleName__', 'ParrotLink')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT_LINK')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
</details>
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="retrievers" label="Retrievers">
|
||||
|
||||
Retrievers are used to retrieve documents from APIs, databases, or other sources
|
||||
based on a query. The `Retriever` class must inherit from the [BaseRetriever](https://python.langchain.com/api_reference/core/retrievers/langchain_core.retrievers.BaseRetriever.html) base class. This interface has 1 attribute and 2 methods that should be implemented in a subclass.
|
||||
|
||||
| Method/Property | Description |
|
||||
|------------------------ |------------------------------------------------------|
|
||||
| `k` | Default number of documents to retrieve (configurable). |
|
||||
| `_get_relevant_documents`| Retrieve documents based on a query. |
|
||||
| `_aget_relevant_documents`| Asynchronously retrieve documents based on a query. |
|
||||
|
||||
### Attributes
|
||||
|
||||
`k` is an attribute that should be implemented in the subclass. This attribute
|
||||
can simply be defined at the top of the class with a default value like
|
||||
`k: int = 5`. This attribute is the default number of documents to retrieve
|
||||
from the retriever, and can be overridden by the user when constructing or calling
|
||||
the retriever.
|
||||
|
||||
### Methods
|
||||
|
||||
`_get_relevant_documents` is the main method that should be implemented in the subclass.
|
||||
|
||||
This method takes in a query and returns a list of `Document` objects, which have 2
|
||||
main properties:
|
||||
|
||||
- `page_content` - the text content of the document
|
||||
- `metadata` - a dictionary of metadata about the document
|
||||
|
||||
Retrievers are typically directly invoked by a user, e.g. as
|
||||
`MyRetriever(k=4).invoke("query")`, which will automatically call `_get_relevant_documents`
|
||||
under the hood.
|
||||
|
||||
`_aget_relevant_documents` is optional because by default, `_get_relevant_documents` will
|
||||
be run in an async executor. However, if your retriever is calling any apis or doing
|
||||
any async work, you should implement this method to run the retriever asynchronously
|
||||
in addition to `_get_relevant_documents` for performance reasons.
|
||||
|
||||
### Implementation
|
||||
|
||||
You can start from the following template or langchain-cli command:
|
||||
|
||||
```bash
|
||||
langchain-cli integration new \
|
||||
--name parrot-link \
|
||||
--name-class ParrotLink \
|
||||
--src integration_template/retrievers.py \
|
||||
--dst langchain_parrot_link/retrievers.py
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary>Example retriever code</summary>
|
||||
|
||||
import RetrieverSource from '/src/theme/integration_template/integration_template/retrievers.py';
|
||||
|
||||
<CodeBlock language="python" title="langchain_parrot_link/retrievers.py">
|
||||
{
|
||||
RetrieverSource.replaceAll('__ModuleName__', 'ParrotLink')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT_LINK')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
</details>
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
Now that you've implemented your package, you can move on to [testing your integration](../standard_tests) for your integration and successfully run them.
|
||||
|
||||
@@ -1,600 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "raw"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"---\n",
|
||||
"pagination_next: contributing/how_to/integrations/publish\n",
|
||||
"pagination_prev: contributing/how_to/integrations/package\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How to add standard tests to an integration\n",
|
||||
"\n",
|
||||
"When creating either a custom class for yourself or to publish in a LangChain integration, it is important to add standard tests to ensure it works as expected. This guide will show you how to add standard tests to a custom chat model, and you can **[Skip to the test templates](#standard-test-templates-per-component)** for implementing tests for each integration type.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"If you're coming from the [previous guide](../package), you have already installed these dependencies, and you can skip this section.\n",
|
||||
"\n",
|
||||
"First, let's install 2 dependencies:\n",
|
||||
"\n",
|
||||
"- `langchain-core` will define the interfaces we want to import to define our custom tool.\n",
|
||||
"- `langchain-tests` will provide the standard tests we want to use. Recommended to pin to the latest version: <img src=\"https://img.shields.io/pypi/v/langchain-tests\" style={{position:\"relative\",top:4,left:3}} />\n",
|
||||
"\n",
|
||||
":::note\n",
|
||||
"\n",
|
||||
"Because added tests in new versions of `langchain-tests` can break your CI/CD pipelines, we recommend pinning the \n",
|
||||
"version of `langchain-tests` to avoid unexpected changes.\n",
|
||||
"\n",
|
||||
":::\n",
|
||||
"\n",
|
||||
"import Tabs from '@theme/Tabs';\n",
|
||||
"import TabItem from '@theme/TabItem';\n",
|
||||
"\n",
|
||||
"<Tabs>\n",
|
||||
" <TabItem value=\"poetry\" label=\"Poetry\" default>\n",
|
||||
"If you followed the [previous guide](../package), you should already have these dependencies installed!\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"poetry add langchain-core\n",
|
||||
"poetry add --group test pytest pytest-socket pytest-asyncio langchain-tests==<latest_version>\n",
|
||||
"poetry install --with test\n",
|
||||
"```\n",
|
||||
" </TabItem>\n",
|
||||
" <TabItem value=\"pip\" label=\"Pip\">\n",
|
||||
"```bash\n",
|
||||
"pip install -U langchain-core pytest pytest-socket pytest-asyncio langchain-tests\n",
|
||||
"\n",
|
||||
"# install current package in editable mode\n",
|
||||
"pip install --editable .\n",
|
||||
"```\n",
|
||||
" </TabItem>\n",
|
||||
"</Tabs>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's say we're publishing a package, `langchain_parrot_link`, that exposes the chat model from the [guide on implementing the package](../package). We can add the standard tests to the package by following the steps below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And we'll assume you've structured your package the same way as the main LangChain\n",
|
||||
"packages:\n",
|
||||
"\n",
|
||||
"```plaintext\n",
|
||||
"langchain-parrot-link/\n",
|
||||
"├── langchain_parrot_link/\n",
|
||||
"│ ├── __init__.py\n",
|
||||
"│ └── chat_models.py\n",
|
||||
"├── tests/\n",
|
||||
"│ ├── __init__.py\n",
|
||||
"│ └── test_chat_models.py\n",
|
||||
"├── pyproject.toml\n",
|
||||
"└── README.md\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"## Add and configure standard tests\n",
|
||||
"\n",
|
||||
"There are 2 namespaces in the `langchain-tests` package: \n",
|
||||
"\n",
|
||||
"- [unit tests](../../../concepts/testing.mdx#unit-tests) (`langchain_tests.unit_tests`): designed to be used to test the component in isolation and without access to external services\n",
|
||||
"- [integration tests](../../../concepts/testing.mdx#unit-tests) (`langchain_tests.integration_tests`): designed to be used to test the component with access to external services (in particular, the external service that the component is designed to interact with).\n",
|
||||
"\n",
|
||||
"Both types of tests are implemented as [`pytest` class-based test suites](https://docs.pytest.org/en/7.1.x/getting-started.html#group-multiple-tests-in-a-class).\n",
|
||||
"\n",
|
||||
"By subclassing the base classes for each type of standard test (see below), you get all of the standard tests for that type, and you\n",
|
||||
"can override the properties that the test suite uses to configure the tests.\n",
|
||||
"\n",
|
||||
"### Standard chat model tests\n",
|
||||
"\n",
|
||||
"Here's how you would configure the standard unit tests for the custom chat model:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# title=\"tests/unit_tests/test_chat_models.py\"\n",
|
||||
"from typing import Tuple, Type\n",
|
||||
"\n",
|
||||
"from langchain_parrot_link.chat_models import ChatParrotLink\n",
|
||||
"from langchain_tests.unit_tests import ChatModelUnitTests\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class TestChatParrotLinkUnit(ChatModelUnitTests):\n",
|
||||
" @property\n",
|
||||
" def chat_model_class(self) -> Type[ChatParrotLink]:\n",
|
||||
" return ChatParrotLink\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def chat_model_params(self) -> dict:\n",
|
||||
" return {\n",
|
||||
" \"model\": \"bird-brain-001\",\n",
|
||||
" \"temperature\": 0,\n",
|
||||
" \"parrot_buffer_length\": 50,\n",
|
||||
" }"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# title=\"tests/integration_tests/test_chat_models.py\"\n",
|
||||
"from typing import Type\n",
|
||||
"\n",
|
||||
"from langchain_parrot_link.chat_models import ChatParrotLink\n",
|
||||
"from langchain_tests.integration_tests import ChatModelIntegrationTests\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class TestChatParrotLinkIntegration(ChatModelIntegrationTests):\n",
|
||||
" @property\n",
|
||||
" def chat_model_class(self) -> Type[ChatParrotLink]:\n",
|
||||
" return ChatParrotLink\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def chat_model_params(self) -> dict:\n",
|
||||
" return {\n",
|
||||
" \"model\": \"bird-brain-001\",\n",
|
||||
" \"temperature\": 0,\n",
|
||||
" \"parrot_buffer_length\": 50,\n",
|
||||
" }"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"and you would run these with the following commands from your project root\n",
|
||||
"\n",
|
||||
"<Tabs>\n",
|
||||
" <TabItem value=\"poetry\" label=\"Poetry\" default>\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"# run unit tests without network access\n",
|
||||
"poetry run pytest --disable-socket --allow-unix-socket --asyncio-mode=auto tests/unit_tests\n",
|
||||
"\n",
|
||||
"# run integration tests\n",
|
||||
"poetry run pytest --asyncio-mode=auto tests/integration_tests\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
" </TabItem>\n",
|
||||
" <TabItem value=\"pip\" label=\"Pip\">\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"# run unit tests without network access\n",
|
||||
"pytest --disable-socket --allow-unix-socket --asyncio-mode=auto tests/unit_tests\n",
|
||||
"\n",
|
||||
"# run integration tests\n",
|
||||
"pytest --asyncio-mode=auto tests/integration_tests\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
" </TabItem>\n",
|
||||
"</Tabs>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Test suite information and troubleshooting\n",
|
||||
"\n",
|
||||
"For a full list of the standard test suites that are available, as well as\n",
|
||||
"information on which tests are included and how to troubleshoot common issues,\n",
|
||||
"see the [Standard Tests API Reference](https://python.langchain.com/api_reference/standard_tests/index.html).\n",
|
||||
"\n",
|
||||
"An increasing number of troubleshooting guides are being added to this documentation,\n",
|
||||
"and if you're interested in contributing, feel free to add docstrings to tests in \n",
|
||||
"[Github](https://github.com/langchain-ai/langchain/tree/master/libs/standard-tests/langchain_tests)!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Standard test templates per component:\n",
|
||||
"\n",
|
||||
"Above, we implement the **unit** and **integration** standard tests for a tool. Below are the templates for implementing the standard tests for each component:\n",
|
||||
"\n",
|
||||
"<details>\n",
|
||||
" <summary>Chat Models</summary>\n",
|
||||
" <p>Note: The standard tests for chat models are implemented in the example in the main body of this guide too.</p>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Chat model standard tests test a range of behaviors, from the most basic requirements (generating a response to a query) to optional capabilities like multi-modal support and tool-calling. For a test run to be successful:\n",
|
||||
"\n",
|
||||
"1. If a feature is intended to be supported by the model, it should pass;\n",
|
||||
"2. If a feature is not intended to be supported by the model, it should be skipped.\n",
|
||||
"\n",
|
||||
"Tests for \"optional\" capabilities are controlled via a set of properties that can be overridden on the test model subclass.\n",
|
||||
"\n",
|
||||
"You can see the entire list of properties in the API reference [here](https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.chat_models.ChatModelTests.html). These properties are shared by both unit and integration tests.\n",
|
||||
"\n",
|
||||
"For example, to enable integration tests for image inputs, we can implement\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"@property\n",
|
||||
"def supports_image_inputs(self) -> bool:\n",
|
||||
" return True\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"on the integration test class.\n",
|
||||
"\n",
|
||||
":::note\n",
|
||||
"\n",
|
||||
"Details on what tests are run, how each test can be skipped, and troubleshooting tips for each test can be found in the API references. See details:\n",
|
||||
"\n",
|
||||
"- [Unit tests API reference](https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.chat_models.ChatModelUnitTests.html)\n",
|
||||
"- [Integration tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.html)\n",
|
||||
"\n",
|
||||
":::\n",
|
||||
"\n",
|
||||
"Unit test example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# title=\"tests/unit_tests/test_chat_models.py\"\n",
|
||||
"from typing import Type\n",
|
||||
"\n",
|
||||
"from langchain_parrot_link.chat_models import ChatParrotLink\n",
|
||||
"from langchain_tests.unit_tests import ChatModelUnitTests\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class TestChatParrotLinkUnit(ChatModelUnitTests):\n",
|
||||
" @property\n",
|
||||
" def chat_model_class(self) -> Type[ChatParrotLink]:\n",
|
||||
" return ChatParrotLink\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def chat_model_params(self) -> dict:\n",
|
||||
" return {\n",
|
||||
" \"model\": \"bird-brain-001\",\n",
|
||||
" \"temperature\": 0,\n",
|
||||
" \"parrot_buffer_length\": 50,\n",
|
||||
" }"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Integration test example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# title=\"tests/integration_tests/test_chat_models.py\"\n",
|
||||
"from typing import Type\n",
|
||||
"\n",
|
||||
"from langchain_parrot_link.chat_models import ChatParrotLink\n",
|
||||
"from langchain_tests.integration_tests import ChatModelIntegrationTests\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class TestChatParrotLinkIntegration(ChatModelIntegrationTests):\n",
|
||||
" @property\n",
|
||||
" def chat_model_class(self) -> Type[ChatParrotLink]:\n",
|
||||
" return ChatParrotLink\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def chat_model_params(self) -> dict:\n",
|
||||
" return {\n",
|
||||
" \"model\": \"bird-brain-001\",\n",
|
||||
" \"temperature\": 0,\n",
|
||||
" \"parrot_buffer_length\": 50,\n",
|
||||
" }"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>\n",
|
||||
"<details>\n",
|
||||
" <summary>Embedding Models</summary>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# title=\"tests/unit_tests/test_embeddings.py\"\n",
|
||||
"from typing import Tuple, Type\n",
|
||||
"\n",
|
||||
"from langchain_parrot_link.embeddings import ParrotLinkEmbeddings\n",
|
||||
"from langchain_tests.unit_tests import EmbeddingsUnitTests\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class TestParrotLinkEmbeddingsUnit(EmbeddingsUnitTests):\n",
|
||||
" @property\n",
|
||||
" def embeddings_class(self) -> Type[ParrotLinkEmbeddings]:\n",
|
||||
" return ParrotLinkEmbeddings\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def embedding_model_params(self) -> dict:\n",
|
||||
" return {\"model\": \"nest-embed-001\", \"temperature\": 0}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# title=\"tests/integration_tests/test_embeddings.py\"\n",
|
||||
"from typing import Type\n",
|
||||
"\n",
|
||||
"from langchain_parrot_link.embeddings import ParrotLinkEmbeddings\n",
|
||||
"from langchain_tests.integration_tests import EmbeddingsIntegrationTests\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class TestParrotLinkEmbeddingsIntegration(EmbeddingsIntegrationTests):\n",
|
||||
" @property\n",
|
||||
" def embeddings_class(self) -> Type[ParrotLinkEmbeddings]:\n",
|
||||
" return ParrotLinkEmbeddings\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def embedding_model_params(self) -> dict:\n",
|
||||
" return {\"model\": \"nest-embed-001\"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>\n",
|
||||
"<details>\n",
|
||||
" <summary>Tools/Toolkits</summary>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# title=\"tests/unit_tests/test_tools.py\"\n",
|
||||
"from typing import Type\n",
|
||||
"\n",
|
||||
"from langchain_parrot_link.tools import ParrotMultiplyTool\n",
|
||||
"from langchain_tests.unit_tests import ToolsUnitTests\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class TestParrotMultiplyToolUnit(ToolsUnitTests):\n",
|
||||
" @property\n",
|
||||
" def tool_constructor(self) -> Type[ParrotMultiplyTool]:\n",
|
||||
" return ParrotMultiplyTool\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def tool_constructor_params(self) -> dict:\n",
|
||||
" # if your tool constructor instead required initialization arguments like\n",
|
||||
" # `def __init__(self, some_arg: int):`, you would return those here\n",
|
||||
" # as a dictionary, e.g.: `return {'some_arg': 42}`\n",
|
||||
" return {}\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def tool_invoke_params_example(self) -> dict:\n",
|
||||
" \"\"\"\n",
|
||||
" Returns a dictionary representing the \"args\" of an example tool call.\n",
|
||||
"\n",
|
||||
" This should NOT be a ToolCall dict - i.e. it should not\n",
|
||||
" have {\"name\", \"id\", \"args\"} keys.\n",
|
||||
" \"\"\"\n",
|
||||
" return {\"a\": 2, \"b\": 3}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# title=\"tests/integration_tests/test_tools.py\"\n",
|
||||
"from typing import Type\n",
|
||||
"\n",
|
||||
"from langchain_parrot_link.tools import ParrotMultiplyTool\n",
|
||||
"from langchain_tests.integration_tests import ToolsIntegrationTests\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class TestParrotMultiplyToolIntegration(ToolsIntegrationTests):\n",
|
||||
" @property\n",
|
||||
" def tool_constructor(self) -> Type[ParrotMultiplyTool]:\n",
|
||||
" return ParrotMultiplyTool\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def tool_constructor_params(self) -> dict:\n",
|
||||
" # if your tool constructor instead required initialization arguments like\n",
|
||||
" # `def __init__(self, some_arg: int):`, you would return those here\n",
|
||||
" # as a dictionary, e.g.: `return {'some_arg': 42}`\n",
|
||||
" return {}\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def tool_invoke_params_example(self) -> dict:\n",
|
||||
" \"\"\"\n",
|
||||
" Returns a dictionary representing the \"args\" of an example tool call.\n",
|
||||
"\n",
|
||||
" This should NOT be a ToolCall dict - i.e. it should not\n",
|
||||
" have {\"name\", \"id\", \"args\"} keys.\n",
|
||||
" \"\"\"\n",
|
||||
" return {\"a\": 2, \"b\": 3}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>\n",
|
||||
"<details>\n",
|
||||
" <summary>Vector Stores</summary>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here's how you would configure the standard tests for a typical vector store (using\n",
|
||||
"`ParrotVectorStore` as a placeholder):"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# title=\"tests/integration_tests/test_vectorstores_sync.py\"\n",
|
||||
"\n",
|
||||
"from typing import AsyncGenerator, Generator\n",
|
||||
"\n",
|
||||
"import pytest\n",
|
||||
"from langchain_core.vectorstores import VectorStore\n",
|
||||
"from langchain_parrot_link.vectorstores import ParrotVectorStore\n",
|
||||
"from langchain_standard_tests.integration_tests.vectorstores import (\n",
|
||||
" AsyncReadWriteTestSuite,\n",
|
||||
" ReadWriteTestSuite,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class TestSync(ReadWriteTestSuite):\n",
|
||||
" @pytest.fixture()\n",
|
||||
" def vectorstore(self) -> Generator[VectorStore, None, None]: # type: ignore\n",
|
||||
" \"\"\"Get an empty vectorstore for unit tests.\"\"\"\n",
|
||||
" store = ParrotVectorStore()\n",
|
||||
" # note: store should be EMPTY at this point\n",
|
||||
" # if you need to delete data, you may do so here\n",
|
||||
" try:\n",
|
||||
" yield store\n",
|
||||
" finally:\n",
|
||||
" # cleanup operations, or deleting data\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class TestAsync(AsyncReadWriteTestSuite):\n",
|
||||
" @pytest.fixture()\n",
|
||||
" async def vectorstore(self) -> AsyncGenerator[VectorStore, None]: # type: ignore\n",
|
||||
" \"\"\"Get an empty vectorstore for unit tests.\"\"\"\n",
|
||||
" store = ParrotVectorStore()\n",
|
||||
" # note: store should be EMPTY at this point\n",
|
||||
" # if you need to delete data, you may do so here\n",
|
||||
" try:\n",
|
||||
" yield store\n",
|
||||
" finally:\n",
|
||||
" # cleanup operations, or deleting data\n",
|
||||
" pass"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are separate suites for testing synchronous and asynchronous methods.\n",
|
||||
"Configuring the tests consists of implementing pytest fixtures for setting up an\n",
|
||||
"empty vector store and tearing down the vector store after the test run ends.\n",
|
||||
"\n",
|
||||
"For example, below is the `ReadWriteTestSuite` for the [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma/)\n",
|
||||
"integration:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from typing import Generator\n",
|
||||
"\n",
|
||||
"import pytest\n",
|
||||
"from langchain_core.vectorstores import VectorStore\n",
|
||||
"from langchain_tests.integration_tests.vectorstores import ReadWriteTestSuite\n",
|
||||
"\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class TestSync(ReadWriteTestSuite):\n",
|
||||
" @pytest.fixture()\n",
|
||||
" def vectorstore(self) -> Generator[VectorStore, None, None]: # type: ignore\n",
|
||||
" \"\"\"Get an empty vectorstore.\"\"\"\n",
|
||||
" store = Chroma(embedding_function=self.get_embeddings())\n",
|
||||
" try:\n",
|
||||
" yield store\n",
|
||||
" finally:\n",
|
||||
" store.delete_collection()\n",
|
||||
" pass\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Note that before the initial `yield`, we instantiate the vector store with an\n",
|
||||
"[embeddings](/docs/concepts/embedding_models/) object. This is a pre-defined\n",
|
||||
"[\"fake\" embeddings model](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.ReadWriteTestSuite.html#langchain_tests.integration_tests.vectorstores.ReadWriteTestSuite.get_embeddings)\n",
|
||||
"that will generate short, arbitrary vectors for documents. You can use a different\n",
|
||||
"embeddings object if desired.\n",
|
||||
"\n",
|
||||
"In the `finally` block, we call whatever integration-specific logic is needed to\n",
|
||||
"bring the vector store to a clean state. This logic is executed in between each test\n",
|
||||
"(e.g., even if tests fail).\n",
|
||||
"\n",
|
||||
":::note\n",
|
||||
"\n",
|
||||
"Details on what tests are run, how each test can be skipped, and troubleshooting tips for each test can be found in the API references. See details:\n",
|
||||
"\n",
|
||||
"- [Sync tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.ReadWriteTestSuite.html)\n",
|
||||
"- [Async tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.AsyncReadWriteTestSuite.html)\n",
|
||||
"\n",
|
||||
":::"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"</details>"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
393
docs/docs/contributing/how_to/integrations/standard_tests.mdx
Normal file
393
docs/docs/contributing/how_to/integrations/standard_tests.mdx
Normal file
@@ -0,0 +1,393 @@
|
||||
---
|
||||
pagination_next: contributing/how_to/integrations/publish
|
||||
pagination_prev: contributing/how_to/integrations/package
|
||||
---
|
||||
# How to add standard tests to an integration
|
||||
|
||||
When creating either a custom class for yourself or to publish in a LangChain integration, it is important to add standard tests to ensure it works as expected. This guide will show you how to add standard tests to each integration type.
|
||||
|
||||
## Setup
|
||||
|
||||
First, let's install 2 dependencies:
|
||||
|
||||
- `langchain-core` will define the interfaces we want to import to define our custom tool.
|
||||
- `langchain-tests` will provide the standard tests we want to use, as well as pytest plugins necessary to run them. Recommended to pin to the latest version: <img src="https://img.shields.io/pypi/v/langchain-tests" style={{position:"relative",top:4,left:3}} />
|
||||
|
||||
:::note
|
||||
|
||||
Because added tests in new versions of `langchain-tests` can break your CI/CD pipelines, we recommend pinning the
|
||||
version of `langchain-tests` to avoid unexpected changes.
|
||||
|
||||
:::
|
||||
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="poetry" label="Poetry" default>
|
||||
If you followed the [previous guide](../package), you should already have these dependencies installed!
|
||||
|
||||
```bash
|
||||
poetry add langchain-core
|
||||
poetry add --group test langchain-tests==<latest_version>
|
||||
poetry install --with test
|
||||
```
|
||||
</TabItem>
|
||||
<TabItem value="pip" label="Pip">
|
||||
```bash
|
||||
pip install -U langchain-core langchain-tests
|
||||
|
||||
# install current package in editable mode
|
||||
pip install --editable .
|
||||
```
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## Add and configure standard tests
|
||||
|
||||
There are 2 namespaces in the `langchain-tests` package:
|
||||
|
||||
- [unit tests](../../../concepts/testing.mdx#unit-tests) (`langchain_tests.unit_tests`): designed to be used to test the component in isolation and without access to external services
|
||||
- [integration tests](../../../concepts/testing.mdx#integration-tests) (`langchain_tests.integration_tests`): designed to be used to test the component with access to external services (in particular, the external service that the component is designed to interact with).
|
||||
|
||||
Both types of tests are implemented as [`pytest` class-based test suites](https://docs.pytest.org/en/7.1.x/getting-started.html#group-multiple-tests-in-a-class).
|
||||
|
||||
By subclassing the base classes for each type of standard test (see below), you get all of the standard tests for that type, and you
|
||||
can override the properties that the test suite uses to configure the tests.
|
||||
|
||||
In order to run the tests in the same way as this guide, we recommend subclassing these
|
||||
classes in test files under two test subdirectories:
|
||||
|
||||
- `tests/unit_tests` for unit tests
|
||||
- `tests/integration_tests` for integration tests
|
||||
|
||||
### Implementing standard tests
|
||||
|
||||
import CodeBlock from '@theme/CodeBlock';
|
||||
|
||||
In the following tabs, we show how to implement the standard tests for
|
||||
each component type:
|
||||
|
||||
<Tabs>
|
||||
|
||||
<TabItem value="chat_models" label="Chat models">
|
||||
|
||||
To configure standard tests for a chat model, we subclass `ChatModelUnitTests` and `ChatModelIntegrationTests`. On each subclass, we override the following `@property` methods to specify the chat model to be tested and the chat model's configuration:
|
||||
|
||||
| Property | Description |
|
||||
| --- | --- |
|
||||
| `chat_model_class` | The class for the chat model to be tested |
|
||||
| `chat_model_params` | The parameters to pass to the chat
|
||||
model's constructor |
|
||||
|
||||
Additionally, chat model standard tests test a range of behaviors, from the most basic requirements (generating a response to a query) to optional capabilities like multi-modal support and tool-calling. For a test run to be successful:
|
||||
|
||||
1. If a feature is intended to be supported by the model, it should pass;
|
||||
2. If a feature is not intended to be supported by the model, it should be skipped.
|
||||
|
||||
Tests for "optional" capabilities are controlled via a set of properties that can be overridden on the test model subclass.
|
||||
|
||||
You can see the **entire list of configurable capabilities** in the API references for
|
||||
[unit tests](https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.chat_models.ChatModelUnitTests.html)
|
||||
and [integration tests](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.html).
|
||||
|
||||
For example, to enable integration tests for image inputs, we can implement
|
||||
|
||||
```python
|
||||
@property
|
||||
def supports_image_inputs(self) -> bool:
|
||||
return True
|
||||
```
|
||||
|
||||
on the integration test class.
|
||||
|
||||
:::note
|
||||
|
||||
Details on what tests are run, how each test can be skipped, and troubleshooting tips for each test can be found in the API references. See details:
|
||||
|
||||
- [Unit tests API reference](https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.chat_models.ChatModelUnitTests.html)
|
||||
- [Integration tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.html)
|
||||
|
||||
:::
|
||||
|
||||
Unit test example:
|
||||
|
||||
import ChatUnitSource from '../../../../src/theme/integration_template/tests/unit_tests/test_chat_models.py';
|
||||
|
||||
<CodeBlock language="python" title="tests/unit_tests/test_chat_models.py">
|
||||
{
|
||||
ChatUnitSource.replaceAll('__ModuleName__', 'ParrotLink')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT_LINK')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
Integration test example:
|
||||
|
||||
|
||||
import ChatIntegrationSource from '../../../../src/theme/integration_template/tests/integration_tests/test_chat_models.py';
|
||||
|
||||
<CodeBlock language="python" title="tests/integration_tests/test_chat_models.py">
|
||||
{
|
||||
ChatIntegrationSource.replaceAll('__ModuleName__', 'ParrotLink')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT_LINK')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="vector_stores" label="Vector stores">
|
||||
|
||||
|
||||
Here's how you would configure the standard tests for a typical vector store (using
|
||||
`ParrotVectorStore` as a placeholder):
|
||||
|
||||
Vector store tests do not have optional capabilities to be configured at this time.
|
||||
|
||||
import VectorStoreIntegrationSource from '../../../../src/theme/integration_template/tests/integration_tests/test_vectorstores.py';
|
||||
|
||||
<CodeBlock language="python" title="tests/integration_tests/test_vectorstores.py">
|
||||
{
|
||||
VectorStoreIntegrationSource.replaceAll('__ModuleName__', 'Parrot')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
Configuring the tests consists of implementing pytest fixtures for setting up an
|
||||
empty vector store and tearing down the vector store after the test run ends.
|
||||
|
||||
| Fixture | Description |
|
||||
| --- | --- |
|
||||
| `vectorstore` | A generator that yields an empty vector store for unit tests. The vector store is cleaned up after the test run ends. |
|
||||
|
||||
For example, below is the `VectorStoreIntegrationTests` class for the [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma/)
|
||||
integration:
|
||||
|
||||
```python
|
||||
from typing import Generator
|
||||
|
||||
import pytest
|
||||
from langchain_core.vectorstores import VectorStore
|
||||
from langchain_tests.integration_tests.vectorstores import VectorStoreIntegrationTests
|
||||
|
||||
from langchain_chroma import Chroma
|
||||
|
||||
|
||||
class TestChromaStandard(VectorStoreIntegrationTests):
|
||||
@pytest.fixture()
|
||||
def vectorstore(self) -> Generator[VectorStore, None, None]: # type: ignore
|
||||
"""Get an empty vectorstore for unit tests."""
|
||||
store = Chroma(embedding_function=self.get_embeddings())
|
||||
try:
|
||||
yield store
|
||||
finally:
|
||||
store.delete_collection()
|
||||
pass
|
||||
|
||||
```
|
||||
|
||||
Note that before the initial `yield`, we instantiate the vector store with an
|
||||
[embeddings](/docs/concepts/embedding_models/) object. This is a pre-defined
|
||||
["fake" embeddings model](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.VectorStoreIntegrationTests.html#langchain_tests.integration_tests.vectorstores.VectorStoreIntegrationTests.get_embeddings)
|
||||
that will generate short, arbitrary vectors for documents. You can use a different
|
||||
embeddings object if desired.
|
||||
|
||||
In the `finally` block, we call whatever integration-specific logic is needed to
|
||||
bring the vector store to a clean state. This logic is executed in between each test
|
||||
(e.g., even if tests fail).
|
||||
|
||||
:::note
|
||||
|
||||
Details on what tests are run and troubleshooting tips for each test can be found in the [API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.vectorstores.VectorStoreIntegrationTests.html).
|
||||
|
||||
:::
|
||||
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="embeddings" label="Embeddings">
|
||||
|
||||
To configure standard tests for an embeddings model, we subclass `EmbeddingsUnitTests` and `EmbeddingsIntegrationTests`. On each subclass, we override the following `@property` methods to specify the embeddings model to be tested and the embeddings model's configuration:
|
||||
|
||||
| Property | Description |
|
||||
| --- | --- |
|
||||
| `embeddings_class` | The class for the embeddings model to be tested |
|
||||
| `embedding_model_params` | The parameters to pass to the embeddings model's constructor |
|
||||
|
||||
:::note
|
||||
|
||||
Details on what tests are run, how each test can be skipped, and troubleshooting tips for each test can be found in the API references. See details:
|
||||
|
||||
- [Unit tests API reference](https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.embeddings.EmbeddingsUnitTests.html)
|
||||
- [Integration tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.embeddings.EmbeddingsIntegrationTests.html)
|
||||
|
||||
:::
|
||||
|
||||
Unit test example:
|
||||
|
||||
import EmbeddingsUnitSource from '../../../../src/theme/integration_template/tests/unit_tests/test_embeddings.py';
|
||||
|
||||
<CodeBlock language="python" title="tests/unit_tests/test_embeddings.py">
|
||||
{
|
||||
EmbeddingsUnitSource.replaceAll('__ModuleName__', 'ParrotLink')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT_LINK')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
Integration test example:
|
||||
|
||||
|
||||
```python title="tests/integration_tests/test_embeddings.py"
|
||||
from typing import Type
|
||||
|
||||
from langchain_parrot_link.embeddings import ParrotLinkEmbeddings
|
||||
from langchain_tests.integration_tests import EmbeddingsIntegrationTests
|
||||
|
||||
|
||||
class TestParrotLinkEmbeddingsIntegration(EmbeddingsIntegrationTests):
|
||||
@property
|
||||
def embeddings_class(self) -> Type[ParrotLinkEmbeddings]:
|
||||
return ParrotLinkEmbeddings
|
||||
|
||||
@property
|
||||
def embedding_model_params(self) -> dict:
|
||||
return {"model": "nest-embed-001"}
|
||||
```
|
||||
|
||||
import EmbeddingsIntegrationSource from '../../../../src/theme/integration_template/tests/integration_tests/test_embeddings.py';
|
||||
|
||||
<CodeBlock language="python" title="tests/integration_tests/test_embeddings.py">
|
||||
{
|
||||
EmbeddingsIntegrationSource.replaceAll('__ModuleName__', 'ParrotLink')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT_LINK')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="tools" label="Tools">
|
||||
|
||||
To configure standard tests for a tool, we subclass `ToolsUnitTests` and
|
||||
`ToolsIntegrationTests`. On each subclass, we override the following `@property` methods
|
||||
to specify the tool to be tested and the tool's configuration:
|
||||
|
||||
| Property | Description |
|
||||
| --- | --- |
|
||||
| `tool_constructor` | The constructor for the tool to be tested, or an instantiated tool. |
|
||||
| `tool_constructor_params` | The parameters to pass to the tool (optional). |
|
||||
| `tool_invoke_params_example` | An example of the parameters to pass to the tool's `invoke` method. |
|
||||
|
||||
If you are testing a tool class and pass a class like `MyTool` to `tool_constructor`, you can pass the parameters to the constructor in `tool_constructor_params`.
|
||||
|
||||
If you are testing an instantiated tool, you can pass the instantiated tool to `tool_constructor` and do not
|
||||
override `tool_constructor_params`.
|
||||
|
||||
:::note
|
||||
|
||||
Details on what tests are run, how each test can be skipped, and troubleshooting tips for each test can be found in the API references. See details:
|
||||
|
||||
- [Unit tests API reference](https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.tools.ToolsUnitTests.html)
|
||||
- [Integration tests API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.tools.ToolsIntegrationTests.html)
|
||||
|
||||
:::
|
||||
|
||||
import ToolsUnitSource from '../../../../src/theme/integration_template/tests/unit_tests/test_tools.py';
|
||||
|
||||
<CodeBlock language="python" title="tests/unit_tests/test_tools.py">
|
||||
{
|
||||
ToolsUnitSource.replaceAll('__ModuleName__', 'Parrot')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
import ToolsIntegrationSource from '../../../../src/theme/integration_template/tests/integration_tests/test_tools.py';
|
||||
|
||||
<CodeBlock language="python" title="tests/integration_tests/test_tools.py">
|
||||
{
|
||||
ToolsIntegrationSource.replaceAll('__ModuleName__', 'Parrot')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="retrievers" label="Retrievers">
|
||||
|
||||
To configure standard tests for a retriever, we subclass `RetrieversUnitTests` and
|
||||
`RetrieversIntegrationTests`. On each subclass, we override the following `@property` methods
|
||||
|
||||
| Property | Description |
|
||||
| --- | --- |
|
||||
| `retriever_constructor` | The class for the retriever to be tested |
|
||||
| `retriever_constructor_params` | The parameters to pass to the retriever's constructor |
|
||||
| `retriever_query_example` | An example of the query to pass to the retriever's `invoke` method |
|
||||
|
||||
:::note
|
||||
|
||||
Details on what tests are run and troubleshooting tips for each test can be found in the [API reference](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.retrievers.RetrieversIntegrationTests.html).
|
||||
|
||||
:::
|
||||
|
||||
import RetrieverIntegrationSource from '../../../../src/theme/integration_template/tests/integration_tests/test_retrievers.py';
|
||||
|
||||
<CodeBlock language="python" title="tests/integration_tests/test_retrievers.py">
|
||||
{
|
||||
RetrieverIntegrationSource.replaceAll('__ModuleName__', 'Parrot')
|
||||
.replaceAll('__package_name__', 'langchain-parrot-link')
|
||||
.replaceAll('__MODULE_NAME__', 'PARROT')
|
||||
.replaceAll('__module_name__', 'langchain_parrot_link')
|
||||
}
|
||||
</CodeBlock>
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
### Running the tests
|
||||
|
||||
You can run these with the following commands from your project root
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="poetry" label="Poetry" default>
|
||||
|
||||
```bash
|
||||
# run unit tests without network access
|
||||
poetry run pytest --disable-socket --allow-unix-socket --asyncio-mode=auto tests/unit_tests
|
||||
|
||||
# run integration tests
|
||||
poetry run pytest --asyncio-mode=auto tests/integration_tests
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="pip" label="Pip">
|
||||
|
||||
```bash
|
||||
# run unit tests without network access
|
||||
pytest --disable-socket --allow-unix-socket --asyncio-mode=auto tests/unit_tests
|
||||
|
||||
# run integration tests
|
||||
pytest --asyncio-mode=auto tests/integration_tests
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## Test suite information and troubleshooting
|
||||
|
||||
For a full list of the standard test suites that are available, as well as
|
||||
information on which tests are included and how to troubleshoot common issues,
|
||||
see the [Standard Tests API Reference](https://python.langchain.com/api_reference/standard_tests/index.html).
|
||||
|
||||
You can see troubleshooting guides under the individual test suites listed in that API Reference. For example,
|
||||
[here is the guide for `ChatModelIntegrationTests.test_usage_metadata`](https://python.langchain.com/api_reference/standard_tests/integration_tests/langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.html#langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests.test_usage_metadata).
|
||||
@@ -802,7 +802,7 @@
|
||||
"That's a wrap! In this quick start we covered how to create a simple agent. Agents are a complex topic, and there's lot to learn! \n",
|
||||
"\n",
|
||||
":::important\n",
|
||||
"This section covered building with LangChain Agents. LangChain Agents are fine for getting started, but past a certain point you will likely want flexibility and control that they do not offer. For working with more advanced agents, we'd reccommend checking out [LangGraph](/docs/concepts/architecture/#langgraph)\n",
|
||||
"This section covered building with LangChain Agents. They are fine for getting started, but past a certain point you will likely want flexibility and control which they do not offer. To develop more advanced agents, we recommend checking out [LangGraph](/docs/concepts/architecture/#langgraph)\n",
|
||||
":::\n",
|
||||
"\n",
|
||||
"If you want to continue using LangChain agents, some good advanced guides are:\n",
|
||||
|
||||
@@ -294,7 +294,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
":::caution\n",
|
||||
"By default, `@tool(parse_docstring=True)` will raise `ValueError` if the docstring does not parse correctly. See [API Reference](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.tool.html) for detail and examples.\n",
|
||||
"By default, `@tool(parse_docstring=True)` will raise `ValueError` if the docstring does not parse correctly. See [API Reference](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html) for detail and examples.\n",
|
||||
":::"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -1,459 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "5e61b0f2-15b9-4241-9ab5-ff0f3f732232",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_position: 1\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "846ef4f4-ee38-4a42-a7d3-1a23826e4830",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How to map values to a graph database\n",
|
||||
"\n",
|
||||
"In this guide we'll go over strategies to improve graph database query generation by mapping values from user inputs to database.\n",
|
||||
"When using the built-in graph chains, the LLM is aware of the graph schema, but has no information about the values of properties stored in the database.\n",
|
||||
"Therefore, we can introduce a new step in graph database QA system to accurately map values.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"First, get required packages and set environment variables:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "18294435-182d-48da-bcab-5b8945b6d9cf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet langchain langchain-neo4j langchain-openai neo4j"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d86dd771-4001-4a34-8680-22e9b50e1e88",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We default to OpenAI models in this guide, but you can swap them out for the model provider of your choice."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "9346f8e9-78bf-4667-b3d3-72807a73b718",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
|
||||
"\n",
|
||||
"# Uncomment the below to use LangSmith. Not required.\n",
|
||||
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
|
||||
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "271c8a23-e51c-4ead-a76e-cf21107db47e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, we need to define Neo4j credentials.\n",
|
||||
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a2a3bb65-05c7-4daf-bac2-b25ae7fe2751",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
|
||||
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
|
||||
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "50fa4510-29b7-49b6-8496-5e86f694e81f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "4ee9ef7a-eef9-4289-b9fd-8fbc31041688",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_neo4j import Neo4jGraph\n",
|
||||
"\n",
|
||||
"graph = Neo4jGraph()\n",
|
||||
"\n",
|
||||
"# Import movie information\n",
|
||||
"\n",
|
||||
"movies_query = \"\"\"\n",
|
||||
"LOAD CSV WITH HEADERS FROM \n",
|
||||
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
|
||||
"AS row\n",
|
||||
"MERGE (m:Movie {id:row.movieId})\n",
|
||||
"SET m.released = date(row.released),\n",
|
||||
" m.title = row.title,\n",
|
||||
" m.imdbRating = toFloat(row.imdbRating)\n",
|
||||
"FOREACH (director in split(row.director, '|') | \n",
|
||||
" MERGE (p:Person {name:trim(director)})\n",
|
||||
" MERGE (p)-[:DIRECTED]->(m))\n",
|
||||
"FOREACH (actor in split(row.actors, '|') | \n",
|
||||
" MERGE (p:Person {name:trim(actor)})\n",
|
||||
" MERGE (p)-[:ACTED_IN]->(m))\n",
|
||||
"FOREACH (genre in split(row.genres, '|') | \n",
|
||||
" MERGE (g:Genre {name:trim(genre)})\n",
|
||||
" MERGE (m)-[:IN_GENRE]->(g))\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"graph.query(movies_query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0cb0ea30-ca55-4f35-aad6-beb57453de66",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Detecting entities in the user input\n",
|
||||
"We have to extract the types of entities/values we want to map to a graph database. In this example, we are dealing with a movie graph, so we can map movies and people to the database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "e1a19424-6046-40c2-81d1-f3b88193a293",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List, Optional\n",
|
||||
"\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Entities(BaseModel):\n",
|
||||
" \"\"\"Identifying information about entities.\"\"\"\n",
|
||||
"\n",
|
||||
" names: List[str] = Field(\n",
|
||||
" ...,\n",
|
||||
" description=\"All the person or movies appearing in the text\",\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"You are extracting person and movies from the text.\",\n",
|
||||
" ),\n",
|
||||
" (\n",
|
||||
" \"human\",\n",
|
||||
" \"Use the given format to extract information from the following \"\n",
|
||||
" \"input: {question}\",\n",
|
||||
" ),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"entity_chain = prompt | llm.with_structured_output(Entities)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9c14084c-37a7-4a9c-a026-74e12961c781",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can test the entity extraction chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "bbfe0d8f-982e-46e6-88fb-8a4f0d850b07",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Entities(names=['Casino'])"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"entities = entity_chain.invoke({\"question\": \"Who played in Casino movie?\"})\n",
|
||||
"entities"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a8afbf13-05d0-4383-8050-f88b8c2f6fab",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will utilize a simple `CONTAINS` clause to match entities to database. In practice, you might want to use a fuzzy search or a fulltext index to allow for minor misspellings."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "6f92929f-74fb-4db2-b7e1-eb1e9d386a67",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Casino maps to Casino Movie in database\\n'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"match_query = \"\"\"MATCH (p:Person|Movie)\n",
|
||||
"WHERE p.name CONTAINS $value OR p.title CONTAINS $value\n",
|
||||
"RETURN coalesce(p.name, p.title) AS result, labels(p)[0] AS type\n",
|
||||
"LIMIT 1\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def map_to_database(entities: Entities) -> Optional[str]:\n",
|
||||
" result = \"\"\n",
|
||||
" for entity in entities.names:\n",
|
||||
" response = graph.query(match_query, {\"value\": entity})\n",
|
||||
" try:\n",
|
||||
" result += f\"{entity} maps to {response[0]['result']} {response[0]['type']} in database\\n\"\n",
|
||||
" except IndexError:\n",
|
||||
" pass\n",
|
||||
" return result\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"map_to_database(entities)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f66c6756-6efb-4b1e-9b5d-87ed914a5212",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Custom Cypher generating chain\n",
|
||||
"\n",
|
||||
"We need to define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement.\n",
|
||||
"We will be using the LangChain expression language to accomplish that."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "8ef3e21d-f1c2-45e2-9511-4920d1cf6e7e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"\n",
|
||||
"# Generate Cypher statement based on natural language input\n",
|
||||
"cypher_template = \"\"\"Based on the Neo4j graph schema below, write a Cypher query that would answer the user's question:\n",
|
||||
"{schema}\n",
|
||||
"Entities in the question map to the following database values:\n",
|
||||
"{entities_list}\n",
|
||||
"Question: {question}\n",
|
||||
"Cypher query:\"\"\"\n",
|
||||
"\n",
|
||||
"cypher_prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"Given an input question, convert it to a Cypher query. No pre-amble.\",\n",
|
||||
" ),\n",
|
||||
" (\"human\", cypher_template),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"cypher_response = (\n",
|
||||
" RunnablePassthrough.assign(names=entity_chain)\n",
|
||||
" | RunnablePassthrough.assign(\n",
|
||||
" entities_list=lambda x: map_to_database(x[\"names\"]),\n",
|
||||
" schema=lambda _: graph.get_schema,\n",
|
||||
" )\n",
|
||||
" | cypher_prompt\n",
|
||||
" | llm.bind(stop=[\"\\nCypherResult:\"])\n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "1f0011e3-9660-4975-af2a-486b1bc3b954",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'MATCH (:Movie {title: \"Casino\"})<-[:ACTED_IN]-(actor)\\nRETURN actor.name'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cypher = cypher_response.invoke({\"question\": \"Who played in Casino movie?\"})\n",
|
||||
"cypher"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "38095678-611f-4847-a4de-e51ef7ef727c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Generating answers based on database results\n",
|
||||
"\n",
|
||||
"Now that we have a chain that generates the Cypher statement, we need to execute the Cypher statement against the database and send the database results back to an LLM to generate the final answer.\n",
|
||||
"Again, we will be using LCEL."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "d1fa97c0-1c9c-41d3-9ee1-5f1905d17434",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_neo4j.chains.graph_qa.cypher_utils import (\n",
|
||||
" CypherQueryCorrector,\n",
|
||||
" Schema,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"graph.refresh_schema()\n",
|
||||
"# Cypher validation tool for relationship directions\n",
|
||||
"corrector_schema = [\n",
|
||||
" Schema(el[\"start\"], el[\"type\"], el[\"end\"])\n",
|
||||
" for el in graph.structured_schema.get(\"relationships\")\n",
|
||||
"]\n",
|
||||
"cypher_validation = CypherQueryCorrector(corrector_schema)\n",
|
||||
"\n",
|
||||
"# Generate natural language response based on database results\n",
|
||||
"response_template = \"\"\"Based on the the question, Cypher query, and Cypher response, write a natural language response:\n",
|
||||
"Question: {question}\n",
|
||||
"Cypher query: {query}\n",
|
||||
"Cypher Response: {response}\"\"\"\n",
|
||||
"\n",
|
||||
"response_prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"Given an input question and Cypher response, convert it to a natural\"\n",
|
||||
" \" language answer. No pre-amble.\",\n",
|
||||
" ),\n",
|
||||
" (\"human\", response_template),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"chain = (\n",
|
||||
" RunnablePassthrough.assign(query=cypher_response)\n",
|
||||
" | RunnablePassthrough.assign(\n",
|
||||
" response=lambda x: graph.query(cypher_validation(x[\"query\"])),\n",
|
||||
" )\n",
|
||||
" | response_prompt\n",
|
||||
" | llm\n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "918146e5-7918-46d2-a774-53f9547d8fcb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Robert De Niro, James Woods, Joe Pesci, and Sharon Stone played in the movie \"Casino\".'"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke({\"question\": \"Who played in Casino movie?\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c7ba75cd-8399-4e54-a6f8-8a411f159f56",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.18"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,548 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_position: 2\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How to best prompt for Graph-RAG\n",
|
||||
"\n",
|
||||
"In this guide we'll go over prompting strategies to improve graph database query generation. We'll largely focus on methods for getting relevant database-specific information in your prompt.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"First, get required packages and set environment variables:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet langchain langchain-neo4j langchain-openai neo4j"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We default to OpenAI models in this guide, but you can swap them out for the model provider of your choice."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
|
||||
"\n",
|
||||
"# Uncomment the below to use LangSmith. Not required.\n",
|
||||
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
|
||||
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, we need to define Neo4j credentials.\n",
|
||||
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
|
||||
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
|
||||
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_neo4j import Neo4jGraph\n",
|
||||
"\n",
|
||||
"graph = Neo4jGraph()\n",
|
||||
"\n",
|
||||
"# Import movie information\n",
|
||||
"\n",
|
||||
"movies_query = \"\"\"\n",
|
||||
"LOAD CSV WITH HEADERS FROM \n",
|
||||
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
|
||||
"AS row\n",
|
||||
"MERGE (m:Movie {id:row.movieId})\n",
|
||||
"SET m.released = date(row.released),\n",
|
||||
" m.title = row.title,\n",
|
||||
" m.imdbRating = toFloat(row.imdbRating)\n",
|
||||
"FOREACH (director in split(row.director, '|') | \n",
|
||||
" MERGE (p:Person {name:trim(director)})\n",
|
||||
" MERGE (p)-[:DIRECTED]->(m))\n",
|
||||
"FOREACH (actor in split(row.actors, '|') | \n",
|
||||
" MERGE (p:Person {name:trim(actor)})\n",
|
||||
" MERGE (p)-[:ACTED_IN]->(m))\n",
|
||||
"FOREACH (genre in split(row.genres, '|') | \n",
|
||||
" MERGE (g:Genre {name:trim(genre)})\n",
|
||||
" MERGE (m)-[:IN_GENRE]->(g))\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"graph.query(movies_query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Filtering graph schema\n",
|
||||
"\n",
|
||||
"At times, you may need to focus on a specific subset of the graph schema while generating Cypher statements.\n",
|
||||
"Let's say we are dealing with the following graph schema:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Node properties are the following:\n",
|
||||
"Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING},Person {name: STRING},Genre {name: STRING}\n",
|
||||
"Relationship properties are the following:\n",
|
||||
"\n",
|
||||
"The relationships are the following:\n",
|
||||
"(:Movie)-[:IN_GENRE]->(:Genre),(:Person)-[:DIRECTED]->(:Movie),(:Person)-[:ACTED_IN]->(:Movie)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"graph.refresh_schema()\n",
|
||||
"print(graph.schema)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's say we want to exclude the _Genre_ node from the schema representation we pass to an LLM.\n",
|
||||
"We can achieve that using the `exclude` parameter of the GraphCypherQAChain chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_neo4j import GraphCypherQAChain\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
|
||||
"chain = GraphCypherQAChain.from_llm(\n",
|
||||
" graph=graph,\n",
|
||||
" llm=llm,\n",
|
||||
" exclude_types=[\"Genre\"],\n",
|
||||
" verbose=True,\n",
|
||||
" allow_dangerous_requests=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Node properties are the following:\n",
|
||||
"Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING},Person {name: STRING}\n",
|
||||
"Relationship properties are the following:\n",
|
||||
"\n",
|
||||
"The relationships are the following:\n",
|
||||
"(:Person)-[:DIRECTED]->(:Movie),(:Person)-[:ACTED_IN]->(:Movie)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(chain.graph_schema)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Few-shot examples\n",
|
||||
"\n",
|
||||
"Including examples of natural language questions being converted to valid Cypher queries against our database in the prompt will often improve model performance, especially for complex queries.\n",
|
||||
"\n",
|
||||
"Let's say we have the following examples:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"examples = [\n",
|
||||
" {\n",
|
||||
" \"question\": \"How many artists are there?\",\n",
|
||||
" \"query\": \"MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Which actors played in the movie Casino?\",\n",
|
||||
" \"query\": \"MATCH (m:Movie {{title: 'Casino'}})<-[:ACTED_IN]-(a) RETURN a.name\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"How many movies has Tom Hanks acted in?\",\n",
|
||||
" \"query\": \"MATCH (a:Person {{name: 'Tom Hanks'}})-[:ACTED_IN]->(m:Movie) RETURN count(m)\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"List all the genres of the movie Schindler's List\",\n",
|
||||
" \"query\": \"MATCH (m:Movie {{title: 'Schindler\\\\'s List'}})-[:IN_GENRE]->(g:Genre) RETURN g.name\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Which actors have worked in movies from both the comedy and action genres?\",\n",
|
||||
" \"query\": \"MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Which directors have made movies with at least three different actors named 'John'?\",\n",
|
||||
" \"query\": \"MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Identify movies where directors also played a role in the film.\",\n",
|
||||
" \"query\": \"MATCH (p:Person)-[:DIRECTED]->(m:Movie), (p)-[:ACTED_IN]->(m) RETURN m.title, p.name\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Find the actor with the highest number of movies in the database.\",\n",
|
||||
" \"query\": \"MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1\",\n",
|
||||
" },\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can create a few-shot prompt with them like so:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate\n",
|
||||
"\n",
|
||||
"example_prompt = PromptTemplate.from_template(\n",
|
||||
" \"User input: {question}\\nCypher query: {query}\"\n",
|
||||
")\n",
|
||||
"prompt = FewShotPromptTemplate(\n",
|
||||
" examples=examples[:5],\n",
|
||||
" example_prompt=example_prompt,\n",
|
||||
" prefix=\"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\\n\\nHere is the schema information\\n{schema}.\\n\\nBelow are a number of examples of questions and their corresponding Cypher queries.\",\n",
|
||||
" suffix=\"User input: {question}\\nCypher query: \",\n",
|
||||
" input_variables=[\"question\", \"schema\"],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\n",
|
||||
"\n",
|
||||
"Here is the schema information\n",
|
||||
"foo.\n",
|
||||
"\n",
|
||||
"Below are a number of examples of questions and their corresponding Cypher queries.\n",
|
||||
"\n",
|
||||
"User input: How many artists are there?\n",
|
||||
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\n",
|
||||
"\n",
|
||||
"User input: Which actors played in the movie Casino?\n",
|
||||
"Cypher query: MATCH (m:Movie {title: 'Casino'})<-[:ACTED_IN]-(a) RETURN a.name\n",
|
||||
"\n",
|
||||
"User input: How many movies has Tom Hanks acted in?\n",
|
||||
"Cypher query: MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)\n",
|
||||
"\n",
|
||||
"User input: List all the genres of the movie Schindler's List\n",
|
||||
"Cypher query: MATCH (m:Movie {title: 'Schindler\\'s List'})-[:IN_GENRE]->(g:Genre) RETURN g.name\n",
|
||||
"\n",
|
||||
"User input: Which actors have worked in movies from both the comedy and action genres?\n",
|
||||
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\n",
|
||||
"\n",
|
||||
"User input: How many artists are there?\n",
|
||||
"Cypher query: \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(prompt.format(question=\"How many artists are there?\", schema=\"foo\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Dynamic few-shot examples\n",
|
||||
"\n",
|
||||
"If we have enough examples, we may want to only include the most relevant ones in the prompt, either because they don't fit in the model's context window or because the long tail of examples distracts the model. And specifically, given any input we want to include the examples most relevant to that input.\n",
|
||||
"\n",
|
||||
"We can do just this using an ExampleSelector. In this case we'll use a [SemanticSimilarityExampleSelector](https://python.langchain.com/api_reference/core/example_selectors/langchain_core.example_selectors.semantic_similarity.SemanticSimilarityExampleSelector.html), which will store the examples in the vector database of our choosing. At runtime it will perform a similarity search between the input and our examples, and return the most semantically similar ones: "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.example_selectors import SemanticSimilarityExampleSelector\n",
|
||||
"from langchain_neo4j import Neo4jVector\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"example_selector = SemanticSimilarityExampleSelector.from_examples(\n",
|
||||
" examples,\n",
|
||||
" OpenAIEmbeddings(),\n",
|
||||
" Neo4jVector,\n",
|
||||
" k=5,\n",
|
||||
" input_keys=[\"question\"],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'query': 'MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)',\n",
|
||||
" 'question': 'How many artists are there?'},\n",
|
||||
" {'query': \"MATCH (a:Person {{name: 'Tom Hanks'}})-[:ACTED_IN]->(m:Movie) RETURN count(m)\",\n",
|
||||
" 'question': 'How many movies has Tom Hanks acted in?'},\n",
|
||||
" {'query': \"MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\",\n",
|
||||
" 'question': 'Which actors have worked in movies from both the comedy and action genres?'},\n",
|
||||
" {'query': \"MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name\",\n",
|
||||
" 'question': \"Which directors have made movies with at least three different actors named 'John'?\"},\n",
|
||||
" {'query': 'MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1',\n",
|
||||
" 'question': 'Find the actor with the highest number of movies in the database.'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"example_selector.select_examples({\"question\": \"how many artists are there?\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To use it, we can pass the ExampleSelector directly in to our FewShotPromptTemplate:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"prompt = FewShotPromptTemplate(\n",
|
||||
" example_selector=example_selector,\n",
|
||||
" example_prompt=example_prompt,\n",
|
||||
" prefix=\"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\\n\\nHere is the schema information\\n{schema}.\\n\\nBelow are a number of examples of questions and their corresponding Cypher queries.\",\n",
|
||||
" suffix=\"User input: {question}\\nCypher query: \",\n",
|
||||
" input_variables=[\"question\", \"schema\"],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\n",
|
||||
"\n",
|
||||
"Here is the schema information\n",
|
||||
"foo.\n",
|
||||
"\n",
|
||||
"Below are a number of examples of questions and their corresponding Cypher queries.\n",
|
||||
"\n",
|
||||
"User input: How many artists are there?\n",
|
||||
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\n",
|
||||
"\n",
|
||||
"User input: How many movies has Tom Hanks acted in?\n",
|
||||
"Cypher query: MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)\n",
|
||||
"\n",
|
||||
"User input: Which actors have worked in movies from both the comedy and action genres?\n",
|
||||
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\n",
|
||||
"\n",
|
||||
"User input: Which directors have made movies with at least three different actors named 'John'?\n",
|
||||
"Cypher query: MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name\n",
|
||||
"\n",
|
||||
"User input: Find the actor with the highest number of movies in the database.\n",
|
||||
"Cypher query: MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1\n",
|
||||
"\n",
|
||||
"User input: how many artists are there?\n",
|
||||
"Cypher query: \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(prompt.format(question=\"how many artists are there?\", schema=\"foo\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
|
||||
"chain = GraphCypherQAChain.from_llm(\n",
|
||||
" graph=graph,\n",
|
||||
" llm=llm,\n",
|
||||
" cypher_prompt=prompt,\n",
|
||||
" verbose=True,\n",
|
||||
" allow_dangerous_requests=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
|
||||
"Generated Cypher:\n",
|
||||
"\u001b[32;1m\u001b[1;3mMATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001b[32;1m\u001b[1;3m[{'count(DISTINCT a)': 967}]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'query': 'How many actors are in the graph?',\n",
|
||||
" 'result': 'There are 967 actors in the graph.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke(\"How many actors are in the graph?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -316,9 +316,7 @@ For a high-level tutorial, check out [this guide](/docs/tutorials/sql_qa/).
|
||||
You can use an LLM to do question answering over graph databases.
|
||||
For a high-level tutorial, check out [this guide](/docs/tutorials/graph/).
|
||||
|
||||
- [How to: map values to a database](/docs/how_to/graph_mapping)
|
||||
- [How to: add a semantic layer over the database](/docs/how_to/graph_semantic)
|
||||
- [How to: improve results with prompting](/docs/how_to/graph_prompting)
|
||||
- [How to: construct knowledge graphs](/docs/how_to/graph_constructing)
|
||||
|
||||
### Summarization
|
||||
|
||||
@@ -12,7 +12,7 @@
|
||||
"There are two ways to implement a custom parser:\n",
|
||||
"\n",
|
||||
"1. Using `RunnableLambda` or `RunnableGenerator` in [LCEL](/docs/concepts/lcel/) -- we strongly recommend this for most use cases\n",
|
||||
"2. By inherting from one of the base classes for out parsing -- this is the hard way of doing things\n",
|
||||
"2. By inheriting from one of the base classes for out parsing -- this is the hard way of doing things\n",
|
||||
"\n",
|
||||
"The difference between the two approaches are mostly superficial and are mainly in terms of which callbacks are triggered (e.g., `on_chain_start` vs. `on_parser_start`), and how a runnable lambda vs. a parser might be visualized in a tracing platform like LangSmith."
|
||||
]
|
||||
@@ -200,7 +200,7 @@
|
||||
"id": "24067447-8a5a-4d6b-86a3-4b9cc4b4369b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Inherting from Parsing Base Classes"
|
||||
"## Inheriting from Parsing Base Classes"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -208,7 +208,7 @@
|
||||
"id": "9713f547-b2e4-48eb-807f-a0f6f6d0e7e0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Another approach to implement a parser is by inherting from `BaseOutputParser`, `BaseGenerationOutputParser` or another one of the base parsers depending on what you need to do.\n",
|
||||
"Another approach to implement a parser is by inheriting from `BaseOutputParser`, `BaseGenerationOutputParser` or another one of the base parsers depending on what you need to do.\n",
|
||||
"\n",
|
||||
"In general, we **do not** recommend this approach for most use cases as it results in more code to write without significant benefits.\n",
|
||||
"\n",
|
||||
|
||||
@@ -55,7 +55,7 @@
|
||||
"* Run `.read Chinook_Sqlite.sql`\n",
|
||||
"* Test `SELECT * FROM Artist LIMIT 10;`\n",
|
||||
"\n",
|
||||
"Now, `Chinhook.db` is in our directory and we can interface with it using the SQLAlchemy-driven [SQLDatabase](https://python.langchain.com/api_reference/community/utilities/langchain_community.utilities.sql_database.SQLDatabase.html) class:"
|
||||
"Now, `Chinook.db` is in our directory and we can interface with it using the SQLAlchemy-driven [SQLDatabase](https://python.langchain.com/api_reference/community/utilities/langchain_community.utilities.sql_database.SQLDatabase.html) class:"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -51,7 +51,7 @@
|
||||
"* Run `.read Chinook_Sqlite.sql`\n",
|
||||
"* Test `SELECT * FROM Artist LIMIT 10;`\n",
|
||||
"\n",
|
||||
"Now, `Chinhook.db` is in our directory and we can interface with it using the SQLAlchemy-driven `SQLDatabase` class:"
|
||||
"Now, `Chinook.db` is in our directory and we can interface with it using the SQLAlchemy-driven `SQLDatabase` class:"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -54,7 +54,7 @@
|
||||
"* Run `.read Chinook_Sqlite.sql`\n",
|
||||
"* Test `SELECT * FROM Artist LIMIT 10;`\n",
|
||||
"\n",
|
||||
"Now, `Chinhook.db` is in our directory and we can interface with it using the SQLAlchemy-driven `SQLDatabase` class:"
|
||||
"Now, `Chinook.db` is in our directory and we can interface with it using the SQLAlchemy-driven `SQLDatabase` class:"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -336,7 +336,7 @@
|
||||
"\n",
|
||||
"The **MultiQueryRetriever** is used to tackle the problem that the RAG pipeline might not return the best set of documents based on the query. It generates multiple queries that mean the same as the original query and then fetches documents for each.\n",
|
||||
"\n",
|
||||
"To evluate this retriever, UpTrain will run the following evaluation:\n",
|
||||
"To evaluate this retriever, UpTrain will run the following evaluation:\n",
|
||||
"- **[Multi Query Accuracy](https://docs.uptrain.ai/predefined-evaluations/query-quality/multi-query-accuracy)**: Checks if the multi-queries generated mean the same as the original query."
|
||||
]
|
||||
},
|
||||
|
||||
@@ -36,7 +36,7 @@
|
||||
"### Integration details\n",
|
||||
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/docs/integrations/chat/ibm/) | Package downloads | Package latest |\n",
|
||||
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
|
||||
"| [ChatWatsonx](https://python.langchain.com/api_reference/ibm/chat_models/langchain_ibm.chat_models.ChatWatsonx.html#langchain_ibm.chat_models.ChatWatsonx) | [langchain-ibm](https://python.langchain.com/api_reference/ibm/index.html) | ❌ | ❌ | ✅ |  |  |\n",
|
||||
"| [ChatWatsonx](https://python.langchain.com/api_reference/ibm/chat_models/langchain_ibm.chat_models.ChatWatsonx.html) | [langchain-ibm](https://python.langchain.com/api_reference/ibm/index.html) | ❌ | ❌ | ✅ |  |  |\n",
|
||||
"\n",
|
||||
"### Model features\n",
|
||||
"| [Tool calling](/docs/how_to/tool_calling/) | [Structured output](/docs/how_to/structured_output/) | JSON mode | Image input | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
|
||||
|
||||
41
docs/docs/integrations/providers/scrapegraph.mdx
Normal file
41
docs/docs/integrations/providers/scrapegraph.mdx
Normal file
@@ -0,0 +1,41 @@
|
||||
# ScrapeGraph AI
|
||||
|
||||
>[ScrapeGraph AI](https://scrapegraphai.com) is a service that provides AI-powered web scraping capabilities.
|
||||
>It offers tools for extracting structured data, converting webpages to markdown, and processing local HTML content
|
||||
>using natural language prompts.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
Install the required packages:
|
||||
|
||||
```bash
|
||||
pip install langchain-scrapegraph
|
||||
```
|
||||
|
||||
Set up your API key:
|
||||
|
||||
```bash
|
||||
export SGAI_API_KEY="your-scrapegraph-api-key"
|
||||
```
|
||||
|
||||
## Tools
|
||||
|
||||
See a [usage example](/docs/integrations/tools/scrapegraph).
|
||||
|
||||
There are four tools available:
|
||||
|
||||
```python
|
||||
from langchain_scrapegraph.tools import (
|
||||
SmartScraperTool, # Extract structured data from websites
|
||||
MarkdownifyTool, # Convert webpages to markdown
|
||||
LocalScraperTool, # Process local HTML content
|
||||
GetCreditsTool, # Check remaining API credits
|
||||
)
|
||||
```
|
||||
|
||||
Each tool serves a specific purpose:
|
||||
|
||||
- `SmartScraperTool`: Extract structured data from websites given a URL, prompt and optional output schema
|
||||
- `MarkdownifyTool`: Convert any webpage to clean markdown format
|
||||
- `LocalScraperTool`: Extract structured data from a local HTML file given a prompt and optional output schema
|
||||
- `GetCreditsTool`: Check your remaining ScrapeGraph AI credits
|
||||
@@ -8,7 +8,7 @@
|
||||
"\n",
|
||||
">[Upstage](https://upstage.ai) is a leading artificial intelligence (AI) company specializing in delivering above-human-grade performance LLM components.\n",
|
||||
">\n",
|
||||
">**Solar Mini Chat** is a fast yet powerful advanced large language model focusing on English and Korean. It has been specifically fine-tuned for multi-turn chat purposes, showing enhanced performance across a wide range of natural language processing tasks, like multi-turn conversation or tasks that require an understanding of long contexts, such as RAG (Retrieval-Augmented Generation), compared to other models of a similar size. This fine-tuning equips it with the ability to handle longer conversations more effectively, making it particularly adept for interactive applications.\n",
|
||||
">**Solar Pro** is an enterprise-grade LLM optimized for single-GPU deployment, excelling in instruction-following and processing structured formats like HTML and Markdown. It supports English, Korean, and Japanese with top multilingual performance and offers domain expertise in finance, healthcare, and legal.\n",
|
||||
"\n",
|
||||
">Other than Solar, Upstage also offers features for real-world RAG (retrieval-augmented generation), such as **Document Parse** and **Groundedness Check**. \n"
|
||||
]
|
||||
@@ -21,12 +21,12 @@
|
||||
"\n",
|
||||
"| API | Description | Import | Example usage |\n",
|
||||
"| --- | --- | --- | --- |\n",
|
||||
"| Chat | Build assistants using Solar Mini Chat | `from langchain_upstage import ChatUpstage` | [Go](../../chat/upstage) |\n",
|
||||
"| Chat | Build assistants using Solar Chat | `from langchain_upstage import ChatUpstage` | [Go](../../chat/upstage) |\n",
|
||||
"| Text Embedding | Embed strings to vectors | `from langchain_upstage import UpstageEmbeddings` | [Go](../../text_embedding/upstage) |\n",
|
||||
"| Groundedness Check | Verify groundedness of assistant's response | `from langchain_upstage import UpstageGroundednessCheck` | [Go](../../tools/upstage_groundedness_check) |\n",
|
||||
"| Document Parse | Serialize documents with tables and figures | `from langchain_upstage import UpstageDocumentParseLoader` | [Go](../../document_loaders/upstage) |\n",
|
||||
"\n",
|
||||
"See [documentations](https://developers.upstage.ai/) for more details about the features."
|
||||
"See [documentations](https://console.upstage.ai/docs/getting-started/overview) for more details about the models and features."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -35,9 +35,9 @@
|
||||
"\n",
|
||||
"### Integration details\n",
|
||||
"\n",
|
||||
"| Class | Package | JS support | Package downloads | Package latest |\n",
|
||||
"| Class | Package | [JS support](https://js.langchain.com/docs/integrations/document_compressors/ibm/) | Package downloads | Package latest |\n",
|
||||
"| :--- | :--- | :---: | :---: | :---: |\n",
|
||||
"| [WatsonxRerank](https://python.langchain.com/api_reference/ibm/chat_models/langchain_ibm.rerank.WatsonxRerank.html) | [langchain-ibm](https://python.langchain.com/api_reference/ibm/index.html) | ❌ |  |  |"
|
||||
"| [WatsonxRerank](https://python.langchain.com/api_reference/ibm/rerank/langchain_ibm.rerank.WatsonxRerank.html) | [langchain-ibm](https://python.langchain.com/api_reference/ibm/index.html) | ✅ |  |  |"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -445,7 +445,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"display_name": "langchain_ibm",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
|
||||
@@ -194,7 +194,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -146,7 +146,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -164,7 +164,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -185,7 +185,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_community.llms import Tongyi\n",
|
||||
"\n",
|
||||
|
||||
@@ -282,7 +282,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -196,7 +196,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -125,7 +125,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_community.query_constructors.hanavector import HanaTranslator\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
|
||||
@@ -119,7 +119,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -160,7 +160,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -165,7 +165,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -168,7 +168,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -135,7 +135,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -141,7 +141,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -190,7 +190,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -144,7 +144,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -194,7 +194,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -308,7 +308,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -218,7 +218,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -249,7 +249,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -91,7 +91,7 @@
|
||||
"os.environ[\"VECTARA_CORPUS_ID\"] = \"<YOUR_VECTARA_CORPUS_ID>\"\n",
|
||||
"os.environ[\"VECTARA_CUSTOMER_ID\"] = \"<YOUR_VECTARA_CUSTOMER_ID>\"\n",
|
||||
"\n",
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_community.vectorstores import Vectara\n",
|
||||
"from langchain_openai.chat_models import ChatOpenAI"
|
||||
|
||||
@@ -115,7 +115,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
|
||||
@@ -327,7 +327,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "langchain",
|
||||
"display_name": "langchain_ibm",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
|
||||
201
docs/docs/integrations/text_embedding/model2vec.ipynb
Normal file
201
docs/docs/integrations/text_embedding/model2vec.ipynb
Normal file
@@ -0,0 +1,201 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e8712110",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"Model2Vec is a technique to turn any sentence transformer into a really small static model\n",
|
||||
"[model2vec](https://github.com/MinishLab/model2vec) can be used to generate embeddings."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "266dd424",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"pip install -U langchain-community\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "78ab91a6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Instantiation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d06e7719",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Ensure that `model2vec` is installed\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"pip install -U model2vec\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f8ea1ed5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Indexing and Retrieval"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "d25dc22d-b656-46c6-a42d-eace958590cd",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-24T15:13:17.176956Z",
|
||||
"start_time": "2023-05-24T15:13:15.399076Z"
|
||||
},
|
||||
"execution": {
|
||||
"iopub.execute_input": "2024-03-29T15:39:19.252281Z",
|
||||
"iopub.status.busy": "2024-03-29T15:39:19.252101Z",
|
||||
"iopub.status.idle": "2024-03-29T15:39:19.339106Z",
|
||||
"shell.execute_reply": "2024-03-29T15:39:19.338614Z",
|
||||
"shell.execute_reply.started": "2024-03-29T15:39:19.252260Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.embeddings import Model2vecEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "8397b91f-a1f9-4be6-a699-fedaada7c37a",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-24T15:13:17.193751Z",
|
||||
"start_time": "2023-05-24T15:13:17.182053Z"
|
||||
},
|
||||
"execution": {
|
||||
"iopub.execute_input": "2024-03-29T15:39:19.901573Z",
|
||||
"iopub.status.busy": "2024-03-29T15:39:19.900935Z",
|
||||
"iopub.status.idle": "2024-03-29T15:39:19.906540Z",
|
||||
"shell.execute_reply": "2024-03-29T15:39:19.905345Z",
|
||||
"shell.execute_reply.started": "2024-03-29T15:39:19.901529Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = Model2vecEmbeddings(\"minishlab/potion-base-8M\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "abcf98b7-424c-4691-a1cd-862c3d53be11",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-24T15:13:17.844903Z",
|
||||
"start_time": "2023-05-24T15:13:17.198751Z"
|
||||
},
|
||||
"execution": {
|
||||
"iopub.execute_input": "2024-03-29T15:39:20.434581Z",
|
||||
"iopub.status.busy": "2024-03-29T15:39:20.433117Z",
|
||||
"iopub.status.idle": "2024-03-29T15:39:22.178650Z",
|
||||
"shell.execute_reply": "2024-03-29T15:39:22.176058Z",
|
||||
"shell.execute_reply.started": "2024-03-29T15:39:20.434501Z"
|
||||
},
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query_text = \"This is a test query.\"\n",
|
||||
"query_result = embeddings.embed_query(query_text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "98897454-b280-4ee1-bbb9-2c6c15342f87",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-24T15:13:18.605339Z",
|
||||
"start_time": "2023-05-24T15:13:17.845906Z"
|
||||
},
|
||||
"execution": {
|
||||
"iopub.execute_input": "2024-03-29T15:39:28.164009Z",
|
||||
"iopub.status.busy": "2024-03-29T15:39:28.161759Z",
|
||||
"iopub.status.idle": "2024-03-29T15:39:30.217232Z",
|
||||
"shell.execute_reply": "2024-03-29T15:39:30.215348Z",
|
||||
"shell.execute_reply.started": "2024-03-29T15:39:28.163876Z"
|
||||
},
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"document_text = \"This is a test document.\"\n",
|
||||
"document_result = embeddings.embed_documents([document_text])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "11bac134",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Direct Usage\n",
|
||||
"\n",
|
||||
"Here's how you would directly make use of `model2vec`\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from model2vec import StaticModel\n",
|
||||
"\n",
|
||||
"# Load a model from the HuggingFace hub (in this case the potion-base-8M model)\n",
|
||||
"model = StaticModel.from_pretrained(\"minishlab/potion-base-8M\")\n",
|
||||
"\n",
|
||||
"# Make embeddings\n",
|
||||
"embeddings = model.encode([\"It's dangerous to go alone!\", \"It's a secret to everybody.\"])\n",
|
||||
"\n",
|
||||
"# Make sequences of token embeddings\n",
|
||||
"token_embeddings = model.encode_as_sequence([\"It's dangerous to go alone!\", \"It's a secret to everybody.\"])\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d81e21aa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API Reference\n",
|
||||
"\n",
|
||||
"For more information check out the model2vec github [repo](https://github.com/MinishLab/model2vec)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
380
docs/docs/integrations/tools/scrapegraph.ipynb
Normal file
380
docs/docs/integrations/tools/scrapegraph.ipynb
Normal file
@@ -0,0 +1,380 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "10238e62-3465-4973-9279-606cbb7ccf16",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_label: ScrapeGraph\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a6f91f20",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ScrapeGraph\n",
|
||||
"\n",
|
||||
"This notebook provides a quick overview for getting started with ScrapeGraph [tools](/docs/integrations/tools/). For detailed documentation of all ScrapeGraph features and configurations head to the [API reference](https://python.langchain.com/docs/integrations/tools/scrapegraph).\n",
|
||||
"\n",
|
||||
"For more information about ScrapeGraph AI:\n",
|
||||
"- [ScrapeGraph AI Website](https://scrapegraphai.com)\n",
|
||||
"- [Open Source Project](https://github.com/ScrapeGraphAI/Scrapegraph-ai)\n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"### Integration details\n",
|
||||
"\n",
|
||||
"| Class | Package | Serializable | JS support | Package latest |\n",
|
||||
"| :--- | :--- | :---: | :---: | :---: |\n",
|
||||
"| [SmartScraperTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ |  |\n",
|
||||
"| [MarkdownifyTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ |  |\n",
|
||||
"| [LocalScraperTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ |  |\n",
|
||||
"| [GetCreditsTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ |  |\n",
|
||||
"\n",
|
||||
"### Tool features\n",
|
||||
"\n",
|
||||
"| Tool | Purpose | Input | Output |\n",
|
||||
"| :--- | :--- | :--- | :--- |\n",
|
||||
"| SmartScraperTool | Extract structured data from websites | URL + prompt | JSON |\n",
|
||||
"| MarkdownifyTool | Convert webpages to markdown | URL | Markdown text |\n",
|
||||
"| LocalScraperTool | Extract data from HTML content | HTML + prompt | JSON |\n",
|
||||
"| GetCreditsTool | Check API credits | None | Credit info |\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"The integration requires the following packages:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "f85b4089",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install --quiet -U langchain-scrapegraph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b15e9266",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"You'll need a ScrapeGraph AI API key to use these tools. Get one at [scrapegraphai.com](https://scrapegraphai.com)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "e0b178a2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"if not os.environ.get(\"SGAI_API_KEY\"):\n",
|
||||
" os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"ScrapeGraph AI API key:\\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bc5ab717",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It's also helpful (but not needed) to set up [LangSmith](https://smith.langchain.com/) for best-in-class observability:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a6c2f136",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
||||
"os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1c97218f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Instantiation\n",
|
||||
"\n",
|
||||
"Here we show how to instantiate instances of the ScrapeGraph tools:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "8b3ddfe9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_scrapegraph.tools import (\n",
|
||||
" GetCreditsTool,\n",
|
||||
" LocalScraperTool,\n",
|
||||
" MarkdownifyTool,\n",
|
||||
" SmartScraperTool,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"smartscraper = SmartScraperTool()\n",
|
||||
"markdownify = MarkdownifyTool()\n",
|
||||
"localscraper = LocalScraperTool()\n",
|
||||
"credits = GetCreditsTool()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "74147a1a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Invocation\n",
|
||||
"\n",
|
||||
"### [Invoke directly with args](/docs/concepts/tools)\n",
|
||||
"\n",
|
||||
"Let's try each tool individually:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "65310a8b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"SmartScraper Result: {'company_name': 'ScrapeGraphAI', 'description': \"ScrapeGraphAI is a powerful AI web scraping tool that turns entire websites into clean, structured data through a simple API. It's designed to help developers and AI companies extract valuable data from websites efficiently and transform it into formats that are ready for use in LLM applications and data analysis.\"}\n",
|
||||
"\n",
|
||||
"Markdownify Result (first 200 chars): [ScrapeGraphAI](https://scrapegraphai.com/)\n",
|
||||
"\n",
|
||||
"PartnersPricingFAQ[Blog](https://scrapegraphai.com/blog)DocsLog inSign up\n",
|
||||
"\n",
|
||||
"Op\n",
|
||||
"LocalScraper Result: {'company_name': 'Company Name', 'description': 'We are a technology company focused on AI solutions.', 'contact': {'email': 'contact@example.com', 'phone': '(555) 123-4567'}}\n",
|
||||
"\n",
|
||||
"Credits Info: {'remaining_credits': 49679, 'total_credits_used': 914}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# SmartScraper\n",
|
||||
"result = smartscraper.invoke(\n",
|
||||
" {\n",
|
||||
" \"user_prompt\": \"Extract the company name and description\",\n",
|
||||
" \"website_url\": \"https://scrapegraphai.com\",\n",
|
||||
" }\n",
|
||||
")\n",
|
||||
"print(\"SmartScraper Result:\", result)\n",
|
||||
"\n",
|
||||
"# Markdownify\n",
|
||||
"markdown = markdownify.invoke({\"website_url\": \"https://scrapegraphai.com\"})\n",
|
||||
"print(\"\\nMarkdownify Result (first 200 chars):\", markdown[:200])\n",
|
||||
"\n",
|
||||
"local_html = \"\"\"\n",
|
||||
"<html>\n",
|
||||
" <body>\n",
|
||||
" <h1>Company Name</h1>\n",
|
||||
" <p>We are a technology company focused on AI solutions.</p>\n",
|
||||
" <div class=\"contact\">\n",
|
||||
" <p>Email: contact@example.com</p>\n",
|
||||
" <p>Phone: (555) 123-4567</p>\n",
|
||||
" </div>\n",
|
||||
" </body>\n",
|
||||
"</html>\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# LocalScraper\n",
|
||||
"result_local = localscraper.invoke(\n",
|
||||
" {\n",
|
||||
" \"user_prompt\": \"Make a summary of the webpage and extract the email and phone number\",\n",
|
||||
" \"website_html\": local_html,\n",
|
||||
" }\n",
|
||||
")\n",
|
||||
"print(\"LocalScraper Result:\", result_local)\n",
|
||||
"\n",
|
||||
"# Check credits\n",
|
||||
"credits_info = credits.invoke({})\n",
|
||||
"print(\"\\nCredits Info:\", credits_info)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d6e73897",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### [Invoke with ToolCall](/docs/concepts/tools)\n",
|
||||
"\n",
|
||||
"We can also invoke the tool with a model-generated ToolCall:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "f90e33a7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"ToolMessage(content='{\"main_heading\": \"Get the data you need from any website\", \"description\": \"Easily extract and gather information with just a few lines of code with a simple api. Turn websites into clean and usable structured data.\"}', name='SmartScraper', tool_call_id='1')"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model_generated_tool_call = {\n",
|
||||
" \"args\": {\n",
|
||||
" \"user_prompt\": \"Extract the main heading and description\",\n",
|
||||
" \"website_url\": \"https://scrapegraphai.com\",\n",
|
||||
" },\n",
|
||||
" \"id\": \"1\",\n",
|
||||
" \"name\": smartscraper.name,\n",
|
||||
" \"type\": \"tool_call\",\n",
|
||||
"}\n",
|
||||
"smartscraper.invoke(model_generated_tool_call)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "659f9fbd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Chaining\n",
|
||||
"\n",
|
||||
"Let's use our tools with an LLM to analyze a website:\n",
|
||||
"\n",
|
||||
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
|
||||
"\n",
|
||||
"<ChatModelTabs customVarName=\"llm\" />"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "af3123ad",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"\n",
|
||||
"# %pip install -qU langchain langchain-openai\n",
|
||||
"from langchain.chat_models import init_chat_model\n",
|
||||
"\n",
|
||||
"llm = init_chat_model(model=\"gpt-4o\", model_provider=\"openai\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "fdbf35b5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='ScrapeGraph AI is an AI-powered web scraping tool that efficiently extracts and converts website data into structured formats via a simple API. It caters to developers, data scientists, and AI researchers, offering features like easy integration, support for dynamic content, and scalability for large projects. It supports various website types, including business, e-commerce, and educational sites. Contact: contact@scrapegraphai.com.', additional_kwargs={'tool_calls': [{'id': 'call_shkRPyjyAtfjH9ffG5rSy9xj', 'function': {'arguments': '{\"user_prompt\":\"Extract details about the products, services, and key features offered by ScrapeGraph AI, as well as any unique selling points or innovations mentioned on the website.\",\"website_url\":\"https://scrapegraphai.com\"}', 'name': 'SmartScraper'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 47, 'prompt_tokens': 480, 'total_tokens': 527, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_c7ca0ebaca', 'finish_reason': 'stop', 'logprobs': None}, id='run-45a12c86-d499-4273-8c59-0db926799bc7-0', tool_calls=[{'name': 'SmartScraper', 'args': {'user_prompt': 'Extract details about the products, services, and key features offered by ScrapeGraph AI, as well as any unique selling points or innovations mentioned on the website.', 'website_url': 'https://scrapegraphai.com'}, 'id': 'call_shkRPyjyAtfjH9ffG5rSy9xj', 'type': 'tool_call'}], usage_metadata={'input_tokens': 480, 'output_tokens': 47, 'total_tokens': 527, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_core.runnables import RunnableConfig, chain\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"You are a helpful assistant that can use tools to extract structured information from websites.\",\n",
|
||||
" ),\n",
|
||||
" (\"human\", \"{user_input}\"),\n",
|
||||
" (\"placeholder\", \"{messages}\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"llm_with_tools = llm.bind_tools([smartscraper], tool_choice=smartscraper.name)\n",
|
||||
"llm_chain = prompt | llm_with_tools\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"@chain\n",
|
||||
"def tool_chain(user_input: str, config: RunnableConfig):\n",
|
||||
" input_ = {\"user_input\": user_input}\n",
|
||||
" ai_msg = llm_chain.invoke(input_, config=config)\n",
|
||||
" tool_msgs = smartscraper.batch(ai_msg.tool_calls, config=config)\n",
|
||||
" return llm_chain.invoke({**input_, \"messages\": [ai_msg, *tool_msgs]}, config=config)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"tool_chain.invoke(\n",
|
||||
" \"What does ScrapeGraph AI do? Extract this information from their website https://scrapegraphai.com\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4ac8146c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all ScrapeGraph features and configurations head to the Langchain API reference: https://python.langchain.com/docs/integrations/tools/scrapegraph\n",
|
||||
"\n",
|
||||
"Or to the official SDK repo: https://github.com/ScrapeGraphAI/langchain-scrapegraph"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -552,7 +552,7 @@
|
||||
"id": "66690c78",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"A known limitation of large languag models (LLMs) is that their training data can be outdated, or not include the specific domain knowledge that you require.\n",
|
||||
"A known limitation of large language models (LLMs) is that their training data can be outdated, or not include the specific domain knowledge that you require.\n",
|
||||
"\n",
|
||||
"Take a look at the example below:"
|
||||
]
|
||||
|
||||
@@ -11,7 +11,7 @@ LangChain simplifies every stage of the LLM application lifecycle:
|
||||
- **Development**: Build your applications using LangChain's open-source [building blocks](/docs/concepts/lcel), [components](/docs/concepts), and [third-party integrations](/docs/integrations/providers/).
|
||||
Use [LangGraph](/docs/concepts/architecture/#langgraph) to build stateful agents with first-class streaming and human-in-the-loop support.
|
||||
- **Productionization**: Use [LangSmith](https://docs.smith.langchain.com/) to inspect, monitor and evaluate your chains, so that you can continuously optimize and deploy with confidence.
|
||||
- **Deployment**: Turn your LangGraph applications into production-ready APIs and Assistants with [LangGraph Cloud](https://langchain-ai.github.io/langgraph/cloud/).
|
||||
- **Deployment**: Turn your LangGraph applications into production-ready APIs and Assistants with [LangGraph Platform](https://langchain-ai.github.io/langgraph/cloud/).
|
||||
|
||||
import ThemedImage from '@theme/ThemedImage';
|
||||
import useBaseUrl from '@docusaurus/useBaseUrl';
|
||||
@@ -29,11 +29,11 @@ import useBaseUrl from '@docusaurus/useBaseUrl';
|
||||
Concretely, the framework consists of the following open-source libraries:
|
||||
|
||||
- **`langchain-core`**: Base abstractions and LangChain Expression Language.
|
||||
- Integration packages (e.g. **`langchain-openai`**, **`langchain-anthropic`**, etc.): Important integrations have been split into lightweight packages that are co-maintained by the LangChain team and the integration developers.
|
||||
- **Integration packages** (e.g. `langchain-openai`, `langchain-anthropic`, etc.): Important integrations have been split into lightweight packages that are co-maintained by the LangChain team and the integration developers.
|
||||
- **`langchain`**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
|
||||
- **`langchain-community`**: Third-party integrations that are community maintained.
|
||||
- **[LangGraph](https://langchain-ai.github.io/langgraph)**: Build robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph. Integrates smoothly with LangChain, but can be used without it.
|
||||
- **[LangGraphPlatform](https://langchain-ai.github.io/langgraph/concepts/#langgraph-platform)**: Deploy LLM applications built with LangGraph to production.
|
||||
- **[LangGraph](https://langchain-ai.github.io/langgraph)**: Build robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph. Integrates smoothly with LangChain, but can be used without it. To learn more about LangGraph, check out our first LangChain Academy course, *Introduction to LangGraph*, available [here](https://academy.langchain.com/courses/intro-to-langgraph).
|
||||
- **[LangGraph Platform](https://langchain-ai.github.io/langgraph/concepts/#langgraph-platform)**: Deploy LLM applications built with LangGraph to production.
|
||||
- **[LangSmith](https://docs.smith.langchain.com)**: A developer platform that lets you debug, test, evaluate, and monitor LLM applications.
|
||||
|
||||
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -35,6 +35,7 @@
|
||||
"json-loader": "^0.5.7",
|
||||
"prism-react-renderer": "^2.1.0",
|
||||
"process": "^0.11.10",
|
||||
"raw-loader": "^4.0.2",
|
||||
"react": "^18",
|
||||
"react-dom": "^18",
|
||||
"typescript": "^5.2.2",
|
||||
|
||||
@@ -25,8 +25,6 @@ NOTEBOOKS_NO_EXECUTION = [
|
||||
"docs/docs/how_to/example_selectors_langsmith.ipynb", # TODO: add langchain-benchmarks; fix cassette issue
|
||||
"docs/docs/how_to/extraction_long_text.ipynb", # Non-determinism due to batch
|
||||
"docs/docs/how_to/graph_constructing.ipynb", # Requires local neo4j
|
||||
"docs/docs/how_to/graph_mapping.ipynb", # Requires local neo4j
|
||||
"docs/docs/how_to/graph_prompting.ipynb", # Requires local neo4j
|
||||
"docs/docs/how_to/graph_semantic.ipynb", # Requires local neo4j
|
||||
"docs/docs/how_to/hybrid.ipynb", # Requires AstraDB instance
|
||||
"docs/docs/how_to/indexing.ipynb", # Requires local Elasticsearch
|
||||
|
||||
BIN
docs/static/img/langgraph_text2cypher.webp
vendored
Normal file
BIN
docs/static/img/langgraph_text2cypher.webp
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 10 KiB |
@@ -62,6 +62,14 @@
|
||||
"source": "/docs/tutorials/local_rag",
|
||||
"destination": "/docs/tutorials/rag"
|
||||
},
|
||||
{
|
||||
"source": "/docs/how_to/graph_mapping(/?)",
|
||||
"destination": "/docs/tutorials/graph#query-validation"
|
||||
},
|
||||
{
|
||||
"source": "/docs/how_to/graph_prompting(/?)",
|
||||
"destination": "/docs/tutorials/graph#few-shot-prompting"
|
||||
},
|
||||
{
|
||||
"source": "/docs/tutorials/data_generation",
|
||||
"destination": "https://python.langchain.com/v0.2/docs/tutorials/data_generation/"
|
||||
|
||||
@@ -9043,6 +9043,14 @@ raw-body@2.5.2:
|
||||
iconv-lite "0.4.24"
|
||||
unpipe "1.0.0"
|
||||
|
||||
raw-loader@^4.0.2:
|
||||
version "4.0.2"
|
||||
resolved "https://registry.yarnpkg.com/raw-loader/-/raw-loader-4.0.2.tgz#1aac6b7d1ad1501e66efdac1522c73e59a584eb6"
|
||||
integrity sha512-ZnScIV3ag9A4wPX/ZayxL/jZH+euYb6FcUinPcgiQW0+UBtEv0O6Q3lGd3cqJ+GHH+rksEv3Pj99oxJ3u3VIKA==
|
||||
dependencies:
|
||||
loader-utils "^2.0.0"
|
||||
schema-utils "^3.0.0"
|
||||
|
||||
rc@1.2.8:
|
||||
version "1.2.8"
|
||||
resolved "https://registry.yarnpkg.com/rc/-/rc-1.2.8.tgz#cd924bf5200a075b83c188cd6b9e211b7fc0d3ed"
|
||||
|
||||
@@ -45,5 +45,4 @@ _e2e_test:
|
||||
poetry run pip install -e ../../../standard-tests && \
|
||||
make format lint tests && \
|
||||
poetry install --with test_integration && \
|
||||
rm tests/integration_tests/test_vectorstores.py && \
|
||||
make integration_test
|
||||
|
||||
@@ -2,23 +2,23 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import uuid
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Callable,
|
||||
Iterable,
|
||||
Iterator,
|
||||
List,
|
||||
Optional,
|
||||
Sequence,
|
||||
Tuple,
|
||||
Type,
|
||||
TypeVar,
|
||||
)
|
||||
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.embeddings import Embeddings
|
||||
from langchain_core.vectorstores import VectorStore
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.vectorstores.utils import _cosine_similarity as cosine_similarity
|
||||
|
||||
VST = TypeVar("VST", bound=VectorStore)
|
||||
|
||||
@@ -158,40 +158,184 @@ class __ModuleName__VectorStore(VectorStore):
|
||||
|
||||
""" # noqa: E501
|
||||
|
||||
_database: dict[str, tuple[Document, list[float]]] = {}
|
||||
def __init__(self, embedding: Embeddings) -> None:
|
||||
"""Initialize with the given embedding function.
|
||||
|
||||
def add_texts(
|
||||
self,
|
||||
texts: Iterable[str],
|
||||
Args:
|
||||
embedding: embedding function to use.
|
||||
"""
|
||||
self._database: dict[str, dict[str, Any]] = {}
|
||||
self.embedding = embedding
|
||||
|
||||
@classmethod
|
||||
def from_texts(
|
||||
cls: Type[__ModuleName__VectorStore],
|
||||
texts: List[str],
|
||||
embedding: Embeddings,
|
||||
metadatas: Optional[List[dict]] = None,
|
||||
**kwargs: Any,
|
||||
) -> List[str]:
|
||||
raise NotImplementedError
|
||||
) -> __ModuleName__VectorStore:
|
||||
store = cls(
|
||||
embedding=embedding,
|
||||
)
|
||||
store.add_texts(texts=texts, metadatas=metadatas, **kwargs)
|
||||
return store
|
||||
|
||||
# optional: add custom async implementations
|
||||
# async def aadd_texts(
|
||||
# self,
|
||||
# texts: Iterable[str],
|
||||
# @classmethod
|
||||
# async def afrom_texts(
|
||||
# cls: Type[VST],
|
||||
# texts: List[str],
|
||||
# embedding: Embeddings,
|
||||
# metadatas: Optional[List[dict]] = None,
|
||||
# **kwargs: Any,
|
||||
# ) -> List[str]:
|
||||
# ) -> VST:
|
||||
# return await asyncio.get_running_loop().run_in_executor(
|
||||
# None, partial(self.add_texts, **kwargs), texts, metadatas
|
||||
# None, partial(cls.from_texts, **kwargs), texts, embedding, metadatas
|
||||
# )
|
||||
|
||||
def delete(self, ids: Optional[List[str]] = None, **kwargs: Any) -> Optional[bool]:
|
||||
raise NotImplementedError
|
||||
@property
|
||||
def embeddings(self) -> Embeddings:
|
||||
return self.embedding
|
||||
|
||||
def add_documents(
|
||||
self,
|
||||
documents: List[Document],
|
||||
ids: Optional[List[str]] = None,
|
||||
**kwargs: Any,
|
||||
) -> List[str]:
|
||||
"""Add documents to the store."""
|
||||
texts = [doc.page_content for doc in documents]
|
||||
vectors = self.embedding.embed_documents(texts)
|
||||
|
||||
if ids and len(ids) != len(texts):
|
||||
msg = (
|
||||
f"ids must be the same length as texts. "
|
||||
f"Got {len(ids)} ids and {len(texts)} texts."
|
||||
)
|
||||
raise ValueError(msg)
|
||||
|
||||
id_iterator: Iterator[Optional[str]] = (
|
||||
iter(ids) if ids else iter(doc.id for doc in documents)
|
||||
)
|
||||
|
||||
ids_ = []
|
||||
|
||||
for doc, vector in zip(documents, vectors):
|
||||
doc_id = next(id_iterator)
|
||||
doc_id_ = doc_id if doc_id else str(uuid.uuid4())
|
||||
ids_.append(doc_id_)
|
||||
self._database[doc_id_] = {
|
||||
"id": doc_id_,
|
||||
"vector": vector,
|
||||
"text": doc.page_content,
|
||||
"metadata": doc.metadata,
|
||||
}
|
||||
|
||||
return ids_
|
||||
|
||||
# optional: add custom async implementations
|
||||
# async def aadd_documents(
|
||||
# self,
|
||||
# documents: List[Document],
|
||||
# ids: Optional[List[str]] = None,
|
||||
# **kwargs: Any,
|
||||
# ) -> List[str]:
|
||||
# raise NotImplementedError
|
||||
|
||||
def delete(self, ids: Optional[List[str]] = None, **kwargs: Any) -> None:
|
||||
if ids:
|
||||
for _id in ids:
|
||||
self._database.pop(_id, None)
|
||||
|
||||
# optional: add custom async implementations
|
||||
# async def adelete(
|
||||
# self, ids: Optional[List[str]] = None, **kwargs: Any
|
||||
# ) -> Optional[bool]:
|
||||
# ) -> None:
|
||||
# raise NotImplementedError
|
||||
|
||||
def get_by_ids(self, ids: Sequence[str], /) -> list[Document]:
|
||||
"""Get documents by their ids.
|
||||
|
||||
Args:
|
||||
ids: The ids of the documents to get.
|
||||
|
||||
Returns:
|
||||
A list of Document objects.
|
||||
"""
|
||||
documents = []
|
||||
|
||||
for doc_id in ids:
|
||||
doc = self._database.get(doc_id)
|
||||
if doc:
|
||||
documents.append(
|
||||
Document(
|
||||
id=doc["id"],
|
||||
page_content=doc["text"],
|
||||
metadata=doc["metadata"],
|
||||
)
|
||||
)
|
||||
return documents
|
||||
|
||||
# optional: add custom async implementations
|
||||
# async def aget_by_ids(self, ids: Sequence[str], /) -> list[Document]:
|
||||
# raise NotImplementedError
|
||||
|
||||
# NOTE: the below helper method implements similarity search for in-memory
|
||||
# storage. It is optional and not a part of the vector store interface.
|
||||
def _similarity_search_with_score_by_vector(
|
||||
self,
|
||||
embedding: List[float],
|
||||
k: int = 4,
|
||||
filter: Optional[Callable[[Document], bool]] = None,
|
||||
**kwargs: Any,
|
||||
) -> List[tuple[Document, float, List[float]]]:
|
||||
# get all docs with fixed order in list
|
||||
docs = list(self._database.values())
|
||||
|
||||
if filter is not None:
|
||||
docs = [
|
||||
doc
|
||||
for doc in docs
|
||||
if filter(Document(page_content=doc["text"], metadata=doc["metadata"]))
|
||||
]
|
||||
|
||||
if not docs:
|
||||
return []
|
||||
|
||||
similarity = cosine_similarity([embedding], [doc["vector"] for doc in docs])[0]
|
||||
|
||||
# get the indices ordered by similarity score
|
||||
top_k_idx = similarity.argsort()[::-1][:k]
|
||||
|
||||
return [
|
||||
(
|
||||
# Document
|
||||
Document(
|
||||
id=doc_dict["id"],
|
||||
page_content=doc_dict["text"],
|
||||
metadata=doc_dict["metadata"],
|
||||
),
|
||||
# Score
|
||||
float(similarity[idx].item()),
|
||||
# Embedding vector
|
||||
doc_dict["vector"],
|
||||
)
|
||||
for idx in top_k_idx
|
||||
# Assign using walrus operator to avoid multiple lookups
|
||||
if (doc_dict := docs[idx])
|
||||
]
|
||||
|
||||
def similarity_search(
|
||||
self, query: str, k: int = 4, **kwargs: Any
|
||||
) -> List[Document]:
|
||||
raise NotImplementedError
|
||||
embedding = self.embedding.embed_query(query)
|
||||
return [
|
||||
doc
|
||||
for doc, _, _ in self._similarity_search_with_score_by_vector(
|
||||
embedding=embedding, k=k, **kwargs
|
||||
)
|
||||
]
|
||||
|
||||
# optional: add custom async implementations
|
||||
# async def asimilarity_search(
|
||||
@@ -204,9 +348,15 @@ class __ModuleName__VectorStore(VectorStore):
|
||||
# return await asyncio.get_event_loop().run_in_executor(None, func)
|
||||
|
||||
def similarity_search_with_score(
|
||||
self, *args: Any, **kwargs: Any
|
||||
self, query: str, k: int = 4, **kwargs: Any
|
||||
) -> List[Tuple[Document, float]]:
|
||||
raise NotImplementedError
|
||||
embedding = self.embedding.embed_query(query)
|
||||
return [
|
||||
(doc, similarity)
|
||||
for doc, similarity, _ in self._similarity_search_with_score_by_vector(
|
||||
embedding=embedding, k=k, **kwargs
|
||||
)
|
||||
]
|
||||
|
||||
# optional: add custom async implementations
|
||||
# async def asimilarity_search_with_score(
|
||||
@@ -218,10 +368,12 @@ class __ModuleName__VectorStore(VectorStore):
|
||||
# func = partial(self.similarity_search_with_score, *args, **kwargs)
|
||||
# return await asyncio.get_event_loop().run_in_executor(None, func)
|
||||
|
||||
def similarity_search_by_vector(
|
||||
self, embedding: List[float], k: int = 4, **kwargs: Any
|
||||
) -> List[Document]:
|
||||
raise NotImplementedError
|
||||
### ADDITIONAL OPTIONAL SEARCH METHODS BELOW ###
|
||||
|
||||
# def similarity_search_by_vector(
|
||||
# self, embedding: List[float], k: int = 4, **kwargs: Any
|
||||
# ) -> List[Document]:
|
||||
# raise NotImplementedError
|
||||
|
||||
# optional: add custom async implementations
|
||||
# async def asimilarity_search_by_vector(
|
||||
@@ -233,15 +385,15 @@ class __ModuleName__VectorStore(VectorStore):
|
||||
# func = partial(self.similarity_search_by_vector, embedding, k=k, **kwargs)
|
||||
# return await asyncio.get_event_loop().run_in_executor(None, func)
|
||||
|
||||
def max_marginal_relevance_search(
|
||||
self,
|
||||
query: str,
|
||||
k: int = 4,
|
||||
fetch_k: int = 20,
|
||||
lambda_mult: float = 0.5,
|
||||
**kwargs: Any,
|
||||
) -> List[Document]:
|
||||
raise NotImplementedError
|
||||
# def max_marginal_relevance_search(
|
||||
# self,
|
||||
# query: str,
|
||||
# k: int = 4,
|
||||
# fetch_k: int = 20,
|
||||
# lambda_mult: float = 0.5,
|
||||
# **kwargs: Any,
|
||||
# ) -> List[Document]:
|
||||
# raise NotImplementedError
|
||||
|
||||
# optional: add custom async implementations
|
||||
# async def amax_marginal_relevance_search(
|
||||
@@ -265,15 +417,15 @@ class __ModuleName__VectorStore(VectorStore):
|
||||
# )
|
||||
# return await asyncio.get_event_loop().run_in_executor(None, func)
|
||||
|
||||
def max_marginal_relevance_search_by_vector(
|
||||
self,
|
||||
embedding: List[float],
|
||||
k: int = 4,
|
||||
fetch_k: int = 20,
|
||||
lambda_mult: float = 0.5,
|
||||
**kwargs: Any,
|
||||
) -> List[Document]:
|
||||
raise NotImplementedError
|
||||
# def max_marginal_relevance_search_by_vector(
|
||||
# self,
|
||||
# embedding: List[float],
|
||||
# k: int = 4,
|
||||
# fetch_k: int = 20,
|
||||
# lambda_mult: float = 0.5,
|
||||
# **kwargs: Any,
|
||||
# ) -> List[Document]:
|
||||
# raise NotImplementedError
|
||||
|
||||
# optional: add custom async implementations
|
||||
# async def amax_marginal_relevance_search_by_vector(
|
||||
@@ -285,29 +437,3 @@ class __ModuleName__VectorStore(VectorStore):
|
||||
# **kwargs: Any,
|
||||
# ) -> List[Document]:
|
||||
# raise NotImplementedError
|
||||
|
||||
@classmethod
|
||||
def from_texts(
|
||||
cls: Type[VST],
|
||||
texts: List[str],
|
||||
embedding: Embeddings,
|
||||
metadatas: Optional[List[dict]] = None,
|
||||
**kwargs: Any,
|
||||
) -> VST:
|
||||
raise NotImplementedError
|
||||
|
||||
# optional: add custom async implementations
|
||||
# @classmethod
|
||||
# async def afrom_texts(
|
||||
# cls: Type[VST],
|
||||
# texts: List[str],
|
||||
# embedding: Embeddings,
|
||||
# metadatas: Optional[List[dict]] = None,
|
||||
# **kwargs: Any,
|
||||
# ) -> VST:
|
||||
# return await asyncio.get_running_loop().run_in_executor(
|
||||
# None, partial(cls.from_texts, **kwargs), texts, embedding, metadatas
|
||||
# )
|
||||
|
||||
def _select_relevance_score_fn(self) -> Callable[[float], float]:
|
||||
raise NotImplementedError
|
||||
|
||||
@@ -19,6 +19,6 @@ class Test__ModuleName__Retriever(RetrieversIntegrationTests):
|
||||
@property
|
||||
def retriever_query_example(self) -> str:
|
||||
"""
|
||||
Returns a dictionary representing the "args" of an example retriever call.
|
||||
Returns a str representing the "query" of an example retriever call.
|
||||
"""
|
||||
return "example query"
|
||||
|
||||
@@ -1,33 +1,16 @@
|
||||
from typing import AsyncGenerator, Generator
|
||||
from typing import Generator
|
||||
|
||||
import pytest
|
||||
from __module_name__.vectorstores import __ModuleName__VectorStore
|
||||
from langchain_core.vectorstores import VectorStore
|
||||
from langchain_tests.integration_tests import (
|
||||
AsyncReadWriteTestSuite,
|
||||
ReadWriteTestSuite,
|
||||
)
|
||||
from langchain_tests.integration_tests import VectorStoreIntegrationTests
|
||||
|
||||
|
||||
class Test__ModuleName__VectorStoreSync(ReadWriteTestSuite):
|
||||
class Test__ModuleName__VectorStore(VectorStoreIntegrationTests):
|
||||
@pytest.fixture()
|
||||
def vectorstore(self) -> Generator[VectorStore, None, None]: # type: ignore
|
||||
"""Get an empty vectorstore for unit tests."""
|
||||
store = __ModuleName__VectorStore()
|
||||
# note: store should be EMPTY at this point
|
||||
# if you need to delete data, you may do so here
|
||||
try:
|
||||
yield store
|
||||
finally:
|
||||
# cleanup operations, or deleting data
|
||||
pass
|
||||
|
||||
|
||||
class Test__ModuleName__VectorStoreAsync(AsyncReadWriteTestSuite):
|
||||
@pytest.fixture()
|
||||
async def vectorstore(self) -> AsyncGenerator[VectorStore, None]: # type: ignore
|
||||
"""Get an empty vectorstore for unit tests."""
|
||||
store = __ModuleName__VectorStore()
|
||||
store = __ModuleName__VectorStore(self.get_embeddings())
|
||||
# note: store should be EMPTY at this point
|
||||
# if you need to delete data, you may do so here
|
||||
try:
|
||||
|
||||
@@ -5,6 +5,7 @@ Manage LangChain apps
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import warnings
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional, Tuple
|
||||
|
||||
@@ -163,6 +164,12 @@ def add(
|
||||
langchain app add git+ssh://git@github.com/efriis/simple-pirate.git
|
||||
"""
|
||||
|
||||
if not branch and not repo:
|
||||
warnings.warn(
|
||||
"Adding templates from the default branch and repo is deprecated."
|
||||
" At a minimum, you will have to add `--branch v0.2` for this to work"
|
||||
)
|
||||
|
||||
parsed_deps = parse_dependencies(dependencies, repo, branch, api_path)
|
||||
|
||||
project_root = get_package_root(project_dir)
|
||||
|
||||
@@ -30,10 +30,12 @@ MODEL_COST_PER_1K_TOKENS = {
|
||||
"gpt-4o": 0.0025,
|
||||
"gpt-4o-2024-05-13": 0.005,
|
||||
"gpt-4o-2024-08-06": 0.0025,
|
||||
"gpt-4o-2024-11-20": 0.0025,
|
||||
# GPT-4o output
|
||||
"gpt-4o-completion": 0.01,
|
||||
"gpt-4o-2024-05-13-completion": 0.015,
|
||||
"gpt-4o-2024-08-06-completion": 0.01,
|
||||
"gpt-4o-2024-11-20-completion": 0.01,
|
||||
# GPT-4 input
|
||||
"gpt-4": 0.03,
|
||||
"gpt-4-0314": 0.03,
|
||||
|
||||
@@ -27,8 +27,9 @@ logger = logging.getLogger(__name__)
|
||||
PINECONE = "Pinecone"
|
||||
QDRANT = "Qdrant"
|
||||
PGVECTOR = "PGVector"
|
||||
PINECONE_VECTOR_STORE = "PineconeVectorStore"
|
||||
|
||||
SUPPORTED_VECTORSTORES = {PINECONE, QDRANT, PGVECTOR}
|
||||
SUPPORTED_VECTORSTORES = {PINECONE, QDRANT, PGVECTOR, PINECONE_VECTOR_STORE}
|
||||
|
||||
|
||||
def clear_enforcement_filters(retriever: VectorStoreRetriever) -> None:
|
||||
@@ -505,7 +506,7 @@ def _set_identity_enforcement_filter(
|
||||
of the retriever based on the type of the vectorstore.
|
||||
"""
|
||||
search_kwargs = retriever.search_kwargs
|
||||
if retriever.vectorstore.__class__.__name__ == PINECONE:
|
||||
if retriever.vectorstore.__class__.__name__ in [PINECONE, PINECONE_VECTOR_STORE]:
|
||||
_apply_pinecone_authorization_filter(search_kwargs, auth_context)
|
||||
elif retriever.vectorstore.__class__.__name__ == QDRANT:
|
||||
_apply_qdrant_authorization_filter(search_kwargs, auth_context)
|
||||
|
||||
@@ -11,6 +11,7 @@ from typing import (
|
||||
Dict,
|
||||
Iterator,
|
||||
List,
|
||||
Literal,
|
||||
Mapping,
|
||||
Optional,
|
||||
Sequence,
|
||||
@@ -212,6 +213,33 @@ def _convert_message_to_dict(message: BaseMessage) -> dict:
|
||||
return message_dict
|
||||
|
||||
|
||||
_OPENAI_MODELS = [
|
||||
"o1-mini",
|
||||
"o1-preview",
|
||||
"gpt-4o-mini",
|
||||
"gpt-4o-mini-2024-07-18",
|
||||
"gpt-4o",
|
||||
"gpt-4o-2024-08-06",
|
||||
"gpt-4o-2024-05-13",
|
||||
"gpt-4-turbo",
|
||||
"gpt-4-turbo-preview",
|
||||
"gpt-4-0125-preview",
|
||||
"gpt-4-1106-preview",
|
||||
"gpt-3.5-turbo-1106",
|
||||
"gpt-3.5-turbo",
|
||||
"gpt-3.5-turbo-0301",
|
||||
"gpt-3.5-turbo-0613",
|
||||
"gpt-3.5-turbo-16k",
|
||||
"gpt-3.5-turbo-16k-0613",
|
||||
"gpt-4",
|
||||
"gpt-4-0314",
|
||||
"gpt-4-0613",
|
||||
"gpt-4-32k",
|
||||
"gpt-4-32k-0314",
|
||||
"gpt-4-32k-0613",
|
||||
]
|
||||
|
||||
|
||||
class ChatLiteLLM(BaseChatModel):
|
||||
"""Chat model that uses the LiteLLM API."""
|
||||
|
||||
@@ -465,6 +493,9 @@ class ChatLiteLLM(BaseChatModel):
|
||||
def bind_tools(
|
||||
self,
|
||||
tools: Sequence[Union[Dict[str, Any], Type[BaseModel], Callable, BaseTool]],
|
||||
tool_choice: Optional[
|
||||
Union[dict, str, Literal["auto", "none", "required", "any"], bool]
|
||||
] = None,
|
||||
**kwargs: Any,
|
||||
) -> Runnable[LanguageModelInput, BaseMessage]:
|
||||
"""Bind tool-like objects to this chat model.
|
||||
@@ -476,17 +507,47 @@ class ChatLiteLLM(BaseChatModel):
|
||||
Can be a dictionary, pydantic model, callable, or BaseTool. Pydantic
|
||||
models, callables, and BaseTools will be automatically converted to
|
||||
their schema dictionary representation.
|
||||
tool_choice: Which tool to require the model to call.
|
||||
Must be the name of the single provided function or
|
||||
"auto" to automatically determine which function to call
|
||||
(if any), or a dict of the form:
|
||||
{"type": "function", "function": {"name": <<tool_name>>}}.
|
||||
tool_choice: Which tool to require the model to call. Options are:
|
||||
- str of the form ``"<<tool_name>>"``: calls <<tool_name>> tool.
|
||||
- ``"auto"``:
|
||||
automatically selects a tool (including no tool).
|
||||
- ``"none"``:
|
||||
does not call a tool.
|
||||
- ``"any"`` or ``"required"`` or ``True``:
|
||||
forces least one tool to be called.
|
||||
- dict of the form:
|
||||
``{"type": "function", "function": {"name": <<tool_name>>}}``
|
||||
- ``False`` or ``None``: no effect
|
||||
**kwargs: Any additional parameters to pass to the
|
||||
:class:`~langchain.runnable.Runnable` constructor.
|
||||
"""
|
||||
|
||||
formatted_tools = [convert_to_openai_tool(tool) for tool in tools]
|
||||
return super().bind(tools=formatted_tools, **kwargs)
|
||||
|
||||
# In case of openai if tool_choice is `any` or if bool has been provided we
|
||||
# change it to `required` as that is suppored by openai.
|
||||
if (
|
||||
(self.model is not None and "azure" in self.model)
|
||||
or (self.model_name is not None and "azure" in self.model_name)
|
||||
or (self.model is not None and self.model in _OPENAI_MODELS)
|
||||
or (self.model_name is not None and self.model_name in _OPENAI_MODELS)
|
||||
) and (tool_choice == "any" or isinstance(tool_choice, bool)):
|
||||
tool_choice = "required"
|
||||
# If tool_choice is bool apart from openai we make it `any`
|
||||
elif isinstance(tool_choice, bool):
|
||||
tool_choice = "any"
|
||||
elif isinstance(tool_choice, dict):
|
||||
tool_names = [
|
||||
formatted_tool["function"]["name"] for formatted_tool in formatted_tools
|
||||
]
|
||||
if not any(
|
||||
tool_name == tool_choice["function"]["name"] for tool_name in tool_names
|
||||
):
|
||||
raise ValueError(
|
||||
f"Tool choice {tool_choice} was specified, but the only "
|
||||
f"provided tools were {tool_names}."
|
||||
)
|
||||
return super().bind(tools=formatted_tools, tool_choice=tool_choice, **kwargs)
|
||||
|
||||
@property
|
||||
def _identifying_params(self) -> Dict[str, Any]:
|
||||
|
||||
@@ -13,21 +13,142 @@ from langchain_community.llms.moonshot import MOONSHOT_SERVICE_URL_BASE, Moonsho
|
||||
|
||||
|
||||
class MoonshotChat(MoonshotCommon, ChatOpenAI): # type: ignore[misc, override, override]
|
||||
"""Moonshot large language models.
|
||||
"""Moonshot chat model integration.
|
||||
|
||||
To use, you should have the ``openai`` python package installed, and the
|
||||
environment variable ``MOONSHOT_API_KEY`` set with your API key.
|
||||
(Moonshot's chat API is compatible with OpenAI's SDK.)
|
||||
Setup:
|
||||
Install ``openai`` and set environment variables ``MOONSHOT_API_KEY``.
|
||||
|
||||
Referenced from https://platform.moonshot.cn/docs
|
||||
.. code-block:: bash
|
||||
|
||||
Example:
|
||||
pip install openai
|
||||
export MOONSHOT_API_KEY="your-api-key"
|
||||
|
||||
Key init args — completion params:
|
||||
model: str
|
||||
Name of Moonshot model to use.
|
||||
temperature: float
|
||||
Sampling temperature.
|
||||
max_tokens: Optional[int]
|
||||
Max number of tokens to generate.
|
||||
|
||||
Key init args — client params:
|
||||
api_key: Optional[str]
|
||||
Moonshot API KEY. If not passed in will be read from env var MOONSHOT_API_KEY.
|
||||
api_base: Optional[str]
|
||||
Base URL for API requests.
|
||||
|
||||
See full list of supported init args and their descriptions in the params section.
|
||||
|
||||
Instantiate:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.chat_models.moonshot import MoonshotChat
|
||||
from langchain_community.chat_models import MoonshotChat
|
||||
|
||||
moonshot = MoonshotChat(model="moonshot-v1-8k")
|
||||
"""
|
||||
chat = MoonshotChat(
|
||||
temperature=0.5,
|
||||
api_key="your-api-key",
|
||||
model="moonshot-v1-8k",
|
||||
# api_base="...",
|
||||
# other params...
|
||||
)
|
||||
|
||||
Invoke:
|
||||
.. code-block:: python
|
||||
|
||||
messages = [
|
||||
("system", "你是一名专业的翻译家,可以将用户的中文翻译为英文。"),
|
||||
("human", "我喜欢编程。"),
|
||||
]
|
||||
chat.invoke(messages)
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
AIMessage(
|
||||
content='I like programming.',
|
||||
additional_kwargs={},
|
||||
response_metadata={
|
||||
'token_usage': {
|
||||
'completion_tokens': 5,
|
||||
'prompt_tokens': 27,
|
||||
'total_tokens': 32
|
||||
},
|
||||
'model_name': 'moonshot-v1-8k',
|
||||
'system_fingerprint': None,
|
||||
'finish_reason': 'stop',
|
||||
'logprobs': None
|
||||
},
|
||||
id='run-71c03f4e-6628-41d5-beb6-d2559ae68266-0'
|
||||
)
|
||||
|
||||
Stream:
|
||||
.. code-block:: python
|
||||
|
||||
for chunk in chat.stream(messages):
|
||||
print(chunk)
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
content='' additional_kwargs={} response_metadata={} id='run-80d77096-8b83-4c39-a84d-71d9c746da92'
|
||||
content='I' additional_kwargs={} response_metadata={} id='run-80d77096-8b83-4c39-a84d-71d9c746da92'
|
||||
content=' like' additional_kwargs={} response_metadata={} id='run-80d77096-8b83-4c39-a84d-71d9c746da92'
|
||||
content=' programming' additional_kwargs={} response_metadata={} id='run-80d77096-8b83-4c39-a84d-71d9c746da92'
|
||||
content='.' additional_kwargs={} response_metadata={} id='run-80d77096-8b83-4c39-a84d-71d9c746da92'
|
||||
content='' additional_kwargs={} response_metadata={'finish_reason': 'stop'} id='run-80d77096-8b83-4c39-a84d-71d9c746da92'
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
stream = chat.stream(messages)
|
||||
full = next(stream)
|
||||
for chunk in stream:
|
||||
full += chunk
|
||||
full
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
AIMessageChunk(
|
||||
content='I like programming.',
|
||||
additional_kwargs={},
|
||||
response_metadata={'finish_reason': 'stop'},
|
||||
id='run-10c80976-7aa5-4ff7-ba3e-1251665557ef'
|
||||
)
|
||||
|
||||
Async:
|
||||
.. code-block:: python
|
||||
|
||||
await chat.ainvoke(messages)
|
||||
|
||||
# stream:
|
||||
# async for chunk in chat.astream(messages):
|
||||
# print(chunk)
|
||||
|
||||
# batch:
|
||||
# await chat.abatch([messages])
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
[AIMessage(content='I like programming.', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 27, 'total_tokens': 32}, 'model_name': 'moonshot-v1-8k', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-2938b005-9204-4b9f-b273-1c3272fce9e5-0')]
|
||||
|
||||
Response metadata
|
||||
.. code-block:: python
|
||||
|
||||
ai_msg = chat.invoke(messages)
|
||||
ai_msg.response_metadata
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
{
|
||||
'token_usage': {
|
||||
'completion_tokens': 5,
|
||||
'prompt_tokens': 27,
|
||||
'total_tokens': 32
|
||||
},
|
||||
'model_name': 'moonshot-v1-8k',
|
||||
'system_fingerprint': None,
|
||||
'finish_reason': 'stop',
|
||||
'logprobs': None
|
||||
}
|
||||
|
||||
""" # noqa: E501
|
||||
|
||||
@pre_init
|
||||
def validate_environment(cls, values: Dict) -> Dict:
|
||||
|
||||
@@ -166,6 +166,7 @@ class ConfluenceLoader(BaseLoader):
|
||||
include_archived_content: bool = False,
|
||||
include_attachments: bool = False,
|
||||
include_comments: bool = False,
|
||||
include_labels: bool = False,
|
||||
content_format: ContentFormat = ContentFormat.STORAGE,
|
||||
limit: Optional[int] = 50,
|
||||
max_pages: Optional[int] = 1000,
|
||||
@@ -181,6 +182,7 @@ class ConfluenceLoader(BaseLoader):
|
||||
self.include_archived_content = include_archived_content
|
||||
self.include_attachments = include_attachments
|
||||
self.include_comments = include_comments
|
||||
self.include_labels = include_labels
|
||||
self.content_format = content_format
|
||||
self.limit = limit
|
||||
self.max_pages = max_pages
|
||||
@@ -327,12 +329,20 @@ class ConfluenceLoader(BaseLoader):
|
||||
)
|
||||
include_attachments = self._resolve_param("include_attachments", kwargs)
|
||||
include_comments = self._resolve_param("include_comments", kwargs)
|
||||
include_labels = self._resolve_param("include_labels", kwargs)
|
||||
content_format = self._resolve_param("content_format", kwargs)
|
||||
limit = self._resolve_param("limit", kwargs)
|
||||
max_pages = self._resolve_param("max_pages", kwargs)
|
||||
ocr_languages = self._resolve_param("ocr_languages", kwargs)
|
||||
keep_markdown_format = self._resolve_param("keep_markdown_format", kwargs)
|
||||
keep_newlines = self._resolve_param("keep_newlines", kwargs)
|
||||
expand = ",".join(
|
||||
[
|
||||
content_format.value,
|
||||
"version",
|
||||
*(["metadata.labels"] if include_labels else []),
|
||||
]
|
||||
)
|
||||
|
||||
if not space_key and not page_ids and not label and not cql:
|
||||
raise ValueError(
|
||||
@@ -347,13 +357,14 @@ class ConfluenceLoader(BaseLoader):
|
||||
limit=limit,
|
||||
max_pages=max_pages,
|
||||
status="any" if include_archived_content else "current",
|
||||
expand=f"{content_format.value},version",
|
||||
expand=expand,
|
||||
)
|
||||
yield from self.process_pages(
|
||||
pages,
|
||||
include_restricted_content,
|
||||
include_attachments,
|
||||
include_comments,
|
||||
include_labels,
|
||||
content_format,
|
||||
ocr_languages=ocr_languages,
|
||||
keep_markdown_format=keep_markdown_format,
|
||||
@@ -380,13 +391,14 @@ class ConfluenceLoader(BaseLoader):
|
||||
limit=limit,
|
||||
max_pages=max_pages,
|
||||
include_archived_spaces=include_archived_content,
|
||||
expand=f"{content_format.value},version",
|
||||
expand=expand,
|
||||
)
|
||||
yield from self.process_pages(
|
||||
pages,
|
||||
include_restricted_content,
|
||||
include_attachments,
|
||||
include_comments,
|
||||
False, # labels are not included in the search results
|
||||
content_format,
|
||||
ocr_languages,
|
||||
keep_markdown_format,
|
||||
@@ -408,7 +420,8 @@ class ConfluenceLoader(BaseLoader):
|
||||
before_sleep=before_sleep_log(logger, logging.WARNING),
|
||||
)(self.confluence.get_page_by_id)
|
||||
page = get_page(
|
||||
page_id=page_id, expand=f"{content_format.value},version"
|
||||
page_id=page_id,
|
||||
expand=expand,
|
||||
)
|
||||
if not include_restricted_content and not self.is_public_page(page):
|
||||
continue
|
||||
@@ -416,6 +429,7 @@ class ConfluenceLoader(BaseLoader):
|
||||
page,
|
||||
include_attachments,
|
||||
include_comments,
|
||||
include_labels,
|
||||
content_format,
|
||||
ocr_languages,
|
||||
keep_markdown_format,
|
||||
@@ -498,6 +512,7 @@ class ConfluenceLoader(BaseLoader):
|
||||
include_restricted_content: bool,
|
||||
include_attachments: bool,
|
||||
include_comments: bool,
|
||||
include_labels: bool,
|
||||
content_format: ContentFormat,
|
||||
ocr_languages: Optional[str] = None,
|
||||
keep_markdown_format: Optional[bool] = False,
|
||||
@@ -511,6 +526,7 @@ class ConfluenceLoader(BaseLoader):
|
||||
page,
|
||||
include_attachments,
|
||||
include_comments,
|
||||
include_labels,
|
||||
content_format,
|
||||
ocr_languages=ocr_languages,
|
||||
keep_markdown_format=keep_markdown_format,
|
||||
@@ -522,6 +538,7 @@ class ConfluenceLoader(BaseLoader):
|
||||
page: dict,
|
||||
include_attachments: bool,
|
||||
include_comments: bool,
|
||||
include_labels: bool,
|
||||
content_format: ContentFormat,
|
||||
ocr_languages: Optional[str] = None,
|
||||
keep_markdown_format: Optional[bool] = False,
|
||||
@@ -575,10 +592,19 @@ class ConfluenceLoader(BaseLoader):
|
||||
]
|
||||
text = text + "".join(comment_texts)
|
||||
|
||||
if include_labels:
|
||||
labels = [
|
||||
label["name"]
|
||||
for label in page.get("metadata", {})
|
||||
.get("labels", {})
|
||||
.get("results", [])
|
||||
]
|
||||
|
||||
metadata = {
|
||||
"title": page["title"],
|
||||
"id": page["id"],
|
||||
"source": self.base_url.strip("/") + page["_links"]["webui"],
|
||||
**({"labels": labels} if include_labels else {}),
|
||||
}
|
||||
|
||||
if "version" in page and "when" in page["version"]:
|
||||
|
||||
@@ -145,6 +145,9 @@ if TYPE_CHECKING:
|
||||
from langchain_community.embeddings.mlflow_gateway import (
|
||||
MlflowAIGatewayEmbeddings,
|
||||
)
|
||||
from langchain_community.embeddings.model2vec import (
|
||||
Model2vecEmbeddings,
|
||||
)
|
||||
from langchain_community.embeddings.modelscope_hub import (
|
||||
ModelScopeEmbeddings,
|
||||
)
|
||||
@@ -289,6 +292,7 @@ __all__ = [
|
||||
"MlflowAIGatewayEmbeddings",
|
||||
"MlflowCohereEmbeddings",
|
||||
"MlflowEmbeddings",
|
||||
"Model2vecEmbeddings",
|
||||
"ModelScopeEmbeddings",
|
||||
"MosaicMLInstructorEmbeddings",
|
||||
"NLPCloudEmbeddings",
|
||||
@@ -372,6 +376,7 @@ _module_lookup = {
|
||||
"MlflowAIGatewayEmbeddings": "langchain_community.embeddings.mlflow_gateway",
|
||||
"MlflowCohereEmbeddings": "langchain_community.embeddings.mlflow",
|
||||
"MlflowEmbeddings": "langchain_community.embeddings.mlflow",
|
||||
"Model2vecEmbeddings": "langchain_community.embeddings.model2vec",
|
||||
"ModelScopeEmbeddings": "langchain_community.embeddings.modelscope_hub",
|
||||
"MosaicMLInstructorEmbeddings": "langchain_community.embeddings.mosaicml",
|
||||
"NLPCloudEmbeddings": "langchain_community.embeddings.nlpcloud",
|
||||
|
||||
66
libs/community/langchain_community/embeddings/model2vec.py
Normal file
66
libs/community/langchain_community/embeddings/model2vec.py
Normal file
@@ -0,0 +1,66 @@
|
||||
"""Wrapper around model2vec embedding models."""
|
||||
|
||||
from typing import List
|
||||
|
||||
from langchain_core.embeddings import Embeddings
|
||||
|
||||
|
||||
class Model2vecEmbeddings(Embeddings):
|
||||
"""model2v embedding models.
|
||||
|
||||
Install model2vec first, run 'pip install -U model2vec'.
|
||||
The github repository for model2vec is : https://github.com/MinishLab/model2vec
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.embeddings import Model2vecEmbeddings
|
||||
|
||||
embedding = Model2vecEmbeddings("minishlab/potion-base-8M")
|
||||
embedding.embed_documents([
|
||||
"It's dangerous to go alone!",
|
||||
"It's a secret to everybody.",
|
||||
])
|
||||
embedding.embed_query(
|
||||
"Take this with you."
|
||||
)
|
||||
"""
|
||||
|
||||
def __init__(self, model: str):
|
||||
"""Initialize embeddings.
|
||||
|
||||
Args:
|
||||
model: Model name.
|
||||
"""
|
||||
try:
|
||||
from model2vec import StaticModel
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Unable to import model2vec, please install with "
|
||||
"`pip install -U model2vec`."
|
||||
) from e
|
||||
self._model = StaticModel.from_pretrained(model)
|
||||
|
||||
def embed_documents(self, texts: List[str]) -> List[List[float]]:
|
||||
"""Embed documents using the model2vec embeddings model.
|
||||
|
||||
Args:
|
||||
texts: The list of texts to embed.
|
||||
|
||||
Returns:
|
||||
List of embeddings, one for each text.
|
||||
"""
|
||||
|
||||
return self._model.encode_as_sequence(texts)
|
||||
|
||||
def embed_query(self, text: str) -> List[float]:
|
||||
"""Embed a query using the model2vec embeddings model.
|
||||
|
||||
Args:
|
||||
text: The text to embed.
|
||||
|
||||
Returns:
|
||||
Embeddings for the text.
|
||||
"""
|
||||
|
||||
return self._model.encode(text)
|
||||
1082
libs/community/poetry.lock
generated
1082
libs/community/poetry.lock
generated
File diff suppressed because it is too large
Load Diff
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
|
||||
|
||||
[tool.poetry]
|
||||
name = "langchain-community"
|
||||
version = "0.3.9"
|
||||
version = "0.3.10"
|
||||
description = "Community contributed LangChain integrations."
|
||||
authors = []
|
||||
license = "MIT"
|
||||
@@ -30,8 +30,8 @@ ignore-words-list = "momento,collison,ned,foor,reworkd,parth,whats,aapply,mysogy
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = ">=3.9,<4.0"
|
||||
langchain-core = "^0.3.21"
|
||||
langchain = "^0.3.8"
|
||||
langchain-core = "^0.3.22"
|
||||
langchain = "^0.3.10"
|
||||
SQLAlchemy = ">=1.4,<3"
|
||||
requests = "^2"
|
||||
PyYAML = ">=5.3"
|
||||
|
||||
@@ -3,27 +3,15 @@
|
||||
import uuid
|
||||
|
||||
import pytest
|
||||
from langchain_tests.integration_tests.vectorstores import (
|
||||
AsyncReadWriteTestSuite,
|
||||
ReadWriteTestSuite,
|
||||
)
|
||||
from langchain_tests.integration_tests.vectorstores import VectorStoreIntegrationTests
|
||||
|
||||
from langchain_community.vectorstores import ApertureDB
|
||||
|
||||
|
||||
class TestApertureDBReadWriteTestSuite(ReadWriteTestSuite):
|
||||
class TestApertureStandard(VectorStoreIntegrationTests):
|
||||
@pytest.fixture
|
||||
def vectorstore(self) -> ApertureDB:
|
||||
descriptor_set = uuid.uuid4().hex # Fresh descriptor set for each test
|
||||
return ApertureDB(
|
||||
embeddings=self.get_embeddings(), descriptor_set=descriptor_set
|
||||
)
|
||||
|
||||
|
||||
class TestAsyncApertureDBReadWriteTestSuite(AsyncReadWriteTestSuite):
|
||||
@pytest.fixture
|
||||
async def vectorstore(self) -> ApertureDB:
|
||||
descriptor_set = uuid.uuid4().hex # Fresh descriptor set for each test
|
||||
return ApertureDB(
|
||||
embeddings=self.get_embeddings(), descriptor_set=descriptor_set
|
||||
)
|
||||
|
||||
@@ -195,6 +195,36 @@ class TestConfluenceLoader:
|
||||
assert mock_confluence.cql.call_count == 0
|
||||
assert mock_confluence.get_page_child_by_type.call_count == 0
|
||||
|
||||
@pytest.mark.requires("markdownify")
|
||||
def test_confluence_loader_when_include_lables_set_to_true(
|
||||
self, mock_confluence: MagicMock
|
||||
) -> None:
|
||||
# one response with two pages
|
||||
mock_confluence.get_all_pages_from_space.return_value = [
|
||||
self._get_mock_page("123", include_labels=True),
|
||||
self._get_mock_page("456", include_labels=False),
|
||||
]
|
||||
mock_confluence.get_all_restrictions_for_content.side_effect = [
|
||||
self._get_mock_page_restrictions("123"),
|
||||
self._get_mock_page_restrictions("456"),
|
||||
]
|
||||
|
||||
conflence_loader = self._get_mock_confluence_loader(
|
||||
mock_confluence,
|
||||
space_key=self.MOCK_SPACE_KEY,
|
||||
include_labels=True,
|
||||
max_pages=2,
|
||||
)
|
||||
|
||||
documents = conflence_loader.load()
|
||||
|
||||
assert mock_confluence.get_all_pages_from_space.call_count == 1
|
||||
|
||||
assert len(documents) == 2
|
||||
assert all(isinstance(doc, Document) for doc in documents)
|
||||
assert documents[0].metadata["labels"] == ["l1", "l2"]
|
||||
assert documents[1].metadata["labels"] == []
|
||||
|
||||
def _get_mock_confluence_loader(
|
||||
self, mock_confluence: MagicMock, **kwargs: Any
|
||||
) -> ConfluenceLoader:
|
||||
@@ -208,7 +238,10 @@ class TestConfluenceLoader:
|
||||
return confluence_loader
|
||||
|
||||
def _get_mock_page(
|
||||
self, page_id: str, content_format: ContentFormat = ContentFormat.STORAGE
|
||||
self,
|
||||
page_id: str,
|
||||
content_format: ContentFormat = ContentFormat.STORAGE,
|
||||
include_labels: bool = False,
|
||||
) -> Dict:
|
||||
return {
|
||||
"id": f"{page_id}",
|
||||
@@ -216,6 +249,20 @@ class TestConfluenceLoader:
|
||||
"body": {
|
||||
f"{content_format.name.lower()}": {"value": f"<p>Content {page_id}</p>"}
|
||||
},
|
||||
**(
|
||||
{
|
||||
"metadata": {
|
||||
"labels": {
|
||||
"results": [
|
||||
{"prefix": "global", "name": "l1", "id": "111"},
|
||||
{"prefix": "global", "name": "l2", "id": "222"},
|
||||
]
|
||||
}
|
||||
}
|
||||
if include_labels
|
||||
else {},
|
||||
}
|
||||
),
|
||||
"status": "current",
|
||||
"type": "page",
|
||||
"_links": {
|
||||
|
||||
@@ -26,6 +26,7 @@ EXPECTED_ALL = [
|
||||
"MlflowAIGatewayEmbeddings",
|
||||
"MlflowEmbeddings",
|
||||
"MlflowCohereEmbeddings",
|
||||
"Model2vecEmbeddings",
|
||||
"ModelScopeEmbeddings",
|
||||
"TensorflowHubEmbeddings",
|
||||
"SagemakerEndpointEmbeddings",
|
||||
|
||||
11
libs/community/tests/unit_tests/embeddings/test_model2vec.py
Normal file
11
libs/community/tests/unit_tests/embeddings/test_model2vec.py
Normal file
@@ -0,0 +1,11 @@
|
||||
from langchain_community.embeddings.model2vec import Model2vecEmbeddings
|
||||
|
||||
|
||||
def test_hugginggface_inferenceapi_embedding_documents_init() -> None:
|
||||
"""Test model2vec embeddings."""
|
||||
try:
|
||||
embedding = Model2vecEmbeddings("minishlab/potion-base-8M")
|
||||
assert len(embedding.embed_query("hi")) == 256
|
||||
except Exception:
|
||||
# model2vec is not installed
|
||||
assert True
|
||||
@@ -3,10 +3,7 @@ from typing import Any
|
||||
|
||||
import pytest
|
||||
from langchain_core.documents import Document
|
||||
from langchain_tests.integration_tests.vectorstores import (
|
||||
AsyncReadWriteTestSuite,
|
||||
ReadWriteTestSuite,
|
||||
)
|
||||
from langchain_tests.integration_tests.vectorstores import VectorStoreIntegrationTests
|
||||
|
||||
from langchain_community.vectorstores.inmemory import InMemoryVectorStore
|
||||
from tests.integration_tests.vectorstores.fake_embeddings import (
|
||||
@@ -26,18 +23,12 @@ def _AnyDocument(**kwargs: Any) -> Document:
|
||||
return doc
|
||||
|
||||
|
||||
class TestInMemoryReadWriteTestSuite(ReadWriteTestSuite):
|
||||
class TestInMemoryStandard(VectorStoreIntegrationTests):
|
||||
@pytest.fixture
|
||||
def vectorstore(self) -> InMemoryVectorStore:
|
||||
return InMemoryVectorStore(embedding=self.get_embeddings())
|
||||
|
||||
|
||||
class TestAsyncInMemoryReadWriteTestSuite(AsyncReadWriteTestSuite):
|
||||
@pytest.fixture
|
||||
async def vectorstore(self) -> InMemoryVectorStore:
|
||||
return InMemoryVectorStore(embedding=self.get_embeddings())
|
||||
|
||||
|
||||
async def test_inmemory() -> None:
|
||||
"""Test end to end construction and search."""
|
||||
store = await InMemoryVectorStore.afrom_texts(
|
||||
|
||||
@@ -27,7 +27,7 @@ from inspect import signature
|
||||
from typing import TYPE_CHECKING, Any, Optional
|
||||
|
||||
from pydantic import ConfigDict
|
||||
from typing_extensions import TypedDict
|
||||
from typing_extensions import Self, TypedDict
|
||||
|
||||
from langchain_core._api import deprecated
|
||||
from langchain_core.documents import Document
|
||||
@@ -180,6 +180,18 @@ class BaseRetriever(RunnableSerializable[RetrieverInput, RetrieverOutput], ABC):
|
||||
cls._aget_relevant_documents = aswap # type: ignore[assignment]
|
||||
parameters = signature(cls._get_relevant_documents).parameters
|
||||
cls._new_arg_supported = parameters.get("run_manager") is not None
|
||||
if (
|
||||
not cls._new_arg_supported
|
||||
and cls._aget_relevant_documents == BaseRetriever._aget_relevant_documents
|
||||
):
|
||||
# we need to tolerate no run_manager in _aget_relevant_documents signature
|
||||
async def _aget_relevant_documents(
|
||||
self: Self, query: str
|
||||
) -> list[Document]:
|
||||
return await run_in_executor(None, self._get_relevant_documents, query) # type: ignore
|
||||
|
||||
cls._aget_relevant_documents = _aget_relevant_documents # type: ignore[assignment]
|
||||
|
||||
# If a V1 retriever broke the interface and expects additional arguments
|
||||
cls._expects_other_args = (
|
||||
len(set(parameters.keys()) - {"self", "query", "run_manager"}) > 0
|
||||
|
||||
@@ -470,14 +470,22 @@ class Graph:
|
||||
"""Remove the first node if it exists and has a single outgoing edge,
|
||||
i.e., if removing it would not leave the graph without a "first" node."""
|
||||
first_node = self.first_node()
|
||||
if first_node and _first_node(self, exclude=[first_node.id]):
|
||||
if (
|
||||
first_node
|
||||
and _first_node(self, exclude=[first_node.id])
|
||||
and len({e for e in self.edges if e.source == first_node.id}) == 1
|
||||
):
|
||||
self.remove_node(first_node)
|
||||
|
||||
def trim_last_node(self) -> None:
|
||||
"""Remove the last node if it exists and has a single incoming edge,
|
||||
i.e., if removing it would not leave the graph without a "last" node."""
|
||||
last_node = self.last_node()
|
||||
if last_node and _last_node(self, exclude=[last_node.id]):
|
||||
if (
|
||||
last_node
|
||||
and _last_node(self, exclude=[last_node.id])
|
||||
and len({e for e in self.edges if e.target == last_node.id}) == 1
|
||||
):
|
||||
self.remove_node(last_node)
|
||||
|
||||
def draw_ascii(self) -> str:
|
||||
|
||||
@@ -609,7 +609,7 @@ class ChildTool(BaseTool):
|
||||
run_id: The id of the run. Defaults to None.
|
||||
config: The configuration for the tool. Defaults to None.
|
||||
tool_call_id: The id of the tool call. Defaults to None.
|
||||
kwargs: Additional arguments to pass to the tool
|
||||
kwargs: Keyword arguments to be passed to tool callbacks
|
||||
|
||||
Returns:
|
||||
The output of the tool.
|
||||
@@ -721,7 +721,7 @@ class ChildTool(BaseTool):
|
||||
run_id: The id of the run. Defaults to None.
|
||||
config: The configuration for the tool. Defaults to None.
|
||||
tool_call_id: The id of the tool call. Defaults to None.
|
||||
kwargs: Additional arguments to pass to the tool
|
||||
kwargs: Keyword arguments to be passed to tool callbacks
|
||||
|
||||
Returns:
|
||||
The output of the tool.
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
[build-system]
|
||||
requires = ["poetry-core>=1.0.0"]
|
||||
requires = [ "poetry-core>=1.0.0",]
|
||||
build-backend = "poetry.core.masonry.api"
|
||||
|
||||
[tool.poetry]
|
||||
name = "langchain-core"
|
||||
version = "0.3.21"
|
||||
version = "0.3.22"
|
||||
description = "Building applications with LLMs through composability"
|
||||
authors = []
|
||||
license = "MIT"
|
||||
@@ -12,16 +12,10 @@ readme = "README.md"
|
||||
repository = "https://github.com/langchain-ai/langchain"
|
||||
|
||||
[tool.mypy]
|
||||
exclude = [
|
||||
"notebooks",
|
||||
"examples",
|
||||
"example_data",
|
||||
"langchain_core/pydantic",
|
||||
"tests/unit_tests/utils/test_function_calling.py",
|
||||
]
|
||||
exclude = [ "notebooks", "examples", "example_data", "langchain_core/pydantic", "tests/unit_tests/utils/test_function_calling.py",]
|
||||
disallow_untyped_defs = "True"
|
||||
[[tool.mypy.overrides]]
|
||||
module = ["numpy", "pytest"]
|
||||
module = [ "numpy", "pytest",]
|
||||
ignore_missing_imports = true
|
||||
|
||||
[tool.ruff]
|
||||
@@ -50,53 +44,17 @@ python = ">=3.12.4"
|
||||
[tool.poetry.extras]
|
||||
|
||||
[tool.ruff.lint]
|
||||
select = [
|
||||
"ASYNC",
|
||||
"B",
|
||||
"C4",
|
||||
"COM",
|
||||
"DJ",
|
||||
"E",
|
||||
"EM",
|
||||
"EXE",
|
||||
"F",
|
||||
"FLY",
|
||||
"FURB",
|
||||
"I",
|
||||
"ICN",
|
||||
"INT",
|
||||
"LOG",
|
||||
"N",
|
||||
"NPY",
|
||||
"PD",
|
||||
"PIE",
|
||||
"Q",
|
||||
"RSE",
|
||||
"S",
|
||||
"SIM",
|
||||
"SLOT",
|
||||
"T10",
|
||||
"T201",
|
||||
"TID",
|
||||
"UP",
|
||||
"W",
|
||||
"YTT",
|
||||
]
|
||||
ignore = ["COM812", "UP007", "W293", "S101", "S110", "S112"]
|
||||
select = [ "ASYNC", "B", "C4", "COM", "DJ", "E", "EM", "EXE", "F", "FLY", "FURB", "I", "ICN", "INT", "LOG", "N", "NPY", "PD", "PIE", "Q", "RSE", "S", "SIM", "SLOT", "T10", "T201", "TID", "UP", "W", "YTT",]
|
||||
ignore = [ "COM812", "UP007", "W293", "S101", "S110", "S112",]
|
||||
|
||||
[tool.coverage.run]
|
||||
omit = ["tests/*"]
|
||||
omit = [ "tests/*",]
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
addopts = "--snapshot-warn-unused --strict-markers --strict-config --durations=5"
|
||||
markers = [
|
||||
"requires: mark tests as requiring a specific library",
|
||||
"compile: mark placeholder test used to compile integration tests without running them",
|
||||
]
|
||||
markers = [ "requires: mark tests as requiring a specific library", "compile: mark placeholder test used to compile integration tests without running them",]
|
||||
asyncio_mode = "auto"
|
||||
filterwarnings = [
|
||||
"ignore::langchain_core._api.beta_decorator.LangChainBetaWarning",
|
||||
]
|
||||
filterwarnings = [ "ignore::langchain_core._api.beta_decorator.LangChainBetaWarning",]
|
||||
|
||||
[tool.poetry.group.lint]
|
||||
optional = true
|
||||
@@ -114,37 +72,29 @@ optional = true
|
||||
optional = true
|
||||
|
||||
[tool.ruff.lint.pep8-naming]
|
||||
classmethod-decorators = [
|
||||
"classmethod",
|
||||
"langchain_core.utils.pydantic.pre_init",
|
||||
"pydantic.field_validator",
|
||||
"pydantic.v1.root_validator",
|
||||
]
|
||||
classmethod-decorators = [ "classmethod", "langchain_core.utils.pydantic.pre_init", "pydantic.field_validator", "pydantic.v1.root_validator",]
|
||||
|
||||
[tool.ruff.lint.per-file-ignores]
|
||||
"tests/unit_tests/prompts/test_chat.py" = ["E501"]
|
||||
"tests/unit_tests/runnables/test_runnable.py" = ["E501"]
|
||||
"tests/unit_tests/runnables/test_graph.py" = ["E501"]
|
||||
"tests/**" = ["S"]
|
||||
"scripts/**" = ["S"]
|
||||
"tests/unit_tests/prompts/test_chat.py" = [ "E501",]
|
||||
"tests/unit_tests/runnables/test_runnable.py" = [ "E501",]
|
||||
"tests/unit_tests/runnables/test_graph.py" = [ "E501",]
|
||||
"tests/**" = [ "S",]
|
||||
"scripts/**" = [ "S",]
|
||||
|
||||
[tool.poetry.group.lint.dependencies]
|
||||
ruff = "^0.5"
|
||||
|
||||
|
||||
[tool.poetry.group.typing.dependencies]
|
||||
mypy = ">=1.10,<1.11"
|
||||
types-pyyaml = "^6.0.12.2"
|
||||
types-requests = "^2.28.11.5"
|
||||
types-jinja2 = "^2.11.9"
|
||||
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
jupyter = "^1.0.0"
|
||||
setuptools = "^67.6.1"
|
||||
grandalf = "^0.8"
|
||||
|
||||
|
||||
[tool.poetry.group.test.dependencies]
|
||||
pytest = "^8"
|
||||
freezegun = "^1.2.2"
|
||||
@@ -163,15 +113,12 @@ python = "<3.12"
|
||||
version = ">=1.26.0,<3"
|
||||
python = ">=3.12"
|
||||
|
||||
|
||||
[tool.poetry.group.test_integration.dependencies]
|
||||
|
||||
|
||||
[tool.poetry.group.typing.dependencies.langchain-text-splitters]
|
||||
path = "../text-splitters"
|
||||
develop = true
|
||||
|
||||
|
||||
[tool.poetry.group.test.dependencies.langchain-tests]
|
||||
path = "../standard-tests"
|
||||
develop = true
|
||||
|
||||
@@ -10,7 +10,11 @@ ROOT = HERE.parent.parent.parent
|
||||
def test_as_import_path() -> None:
|
||||
"""Test that the path is converted to a LangChain import path."""
|
||||
# Verify that default paths are correct
|
||||
assert path.PACKAGE_DIR == ROOT / "langchain_core"
|
||||
|
||||
# if editable install, check directory structure
|
||||
if path.PACKAGE_DIR == ROOT / "langchain_core":
|
||||
assert path.PACKAGE_DIR == ROOT / "langchain_core"
|
||||
|
||||
# Verify that as import path works correctly
|
||||
assert path.as_import_path(HERE, relative_to=ROOT) == "tests.unit_tests._api"
|
||||
assert (
|
||||
|
||||
@@ -69,6 +69,26 @@ def test_trim(snapshot: SnapshotAssertion) -> None:
|
||||
assert graph.last_node() is end
|
||||
|
||||
|
||||
def test_trim_multi_edge() -> None:
|
||||
class Scheme(BaseModel):
|
||||
a: str
|
||||
|
||||
graph = Graph()
|
||||
start = graph.add_node(Scheme, id="__start__")
|
||||
a = graph.add_node(Scheme, id="a")
|
||||
last = graph.add_node(Scheme, id="__end__")
|
||||
|
||||
graph.add_edge(start, a)
|
||||
graph.add_edge(a, last)
|
||||
graph.add_edge(start, last)
|
||||
|
||||
graph.trim_first_node() # should not remove __start__ since it has 2 outgoing edges
|
||||
assert graph.first_node() is start
|
||||
|
||||
graph.trim_last_node() # should not remove the __end__ node since it has 2 incoming edges
|
||||
assert graph.last_node() is last
|
||||
|
||||
|
||||
def test_graph_sequence(snapshot: SnapshotAssertion) -> None:
|
||||
fake_llm = FakeListLLM(responses=["a"])
|
||||
prompt = PromptTemplate.from_template("Hello, {name}!")
|
||||
|
||||
0
libs/core/tests/unit_tests/test_retrievers.py
Normal file
0
libs/core/tests/unit_tests/test_retrievers.py
Normal file
@@ -2,10 +2,7 @@ from pathlib import Path
|
||||
from unittest.mock import AsyncMock, Mock
|
||||
|
||||
import pytest
|
||||
from langchain_tests.integration_tests.vectorstores import (
|
||||
AsyncReadWriteTestSuite,
|
||||
ReadWriteTestSuite,
|
||||
)
|
||||
from langchain_tests.integration_tests.vectorstores import VectorStoreIntegrationTests
|
||||
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.embeddings.fake import DeterministicFakeEmbedding
|
||||
@@ -13,18 +10,12 @@ from langchain_core.vectorstores import InMemoryVectorStore
|
||||
from tests.unit_tests.stubs import _any_id_document
|
||||
|
||||
|
||||
class TestInMemoryReadWriteTestSuite(ReadWriteTestSuite):
|
||||
class TestInMemoryStandard(VectorStoreIntegrationTests):
|
||||
@pytest.fixture
|
||||
def vectorstore(self) -> InMemoryVectorStore:
|
||||
return InMemoryVectorStore(embedding=self.get_embeddings())
|
||||
|
||||
|
||||
class TestAsyncInMemoryReadWriteTestSuite(AsyncReadWriteTestSuite):
|
||||
@pytest.fixture
|
||||
async def vectorstore(self) -> InMemoryVectorStore:
|
||||
return InMemoryVectorStore(embedding=self.get_embeddings())
|
||||
|
||||
|
||||
async def test_inmemory_similarity_search() -> None:
|
||||
"""Test end to end similarity search."""
|
||||
store = await InMemoryVectorStore.afrom_texts(
|
||||
|
||||
1250
libs/langchain/poetry.lock
generated
1250
libs/langchain/poetry.lock
generated
File diff suppressed because it is too large
Load Diff
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
|
||||
|
||||
[tool.poetry]
|
||||
name = "langchain"
|
||||
version = "0.3.9"
|
||||
version = "0.3.10"
|
||||
description = "Building applications with LLMs through composability"
|
||||
authors = []
|
||||
license = "MIT"
|
||||
@@ -33,7 +33,7 @@ langchain-server = "langchain.server:main"
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = ">=3.9,<4.0"
|
||||
langchain-core = "^0.3.21"
|
||||
langchain-core = "^0.3.22"
|
||||
langchain-text-splitters = "^0.3.0"
|
||||
langsmith = "^0.1.17"
|
||||
pydantic = "^2.7.4"
|
||||
|
||||
@@ -68,6 +68,9 @@ packages:
|
||||
- name: langchain-qdrant
|
||||
repo: langchain-ai/langchain
|
||||
path: libs/partners/qdrant
|
||||
- name: langchain-scrapegraph
|
||||
repo: ScrapeGraphAI/langchain-scrapegraph
|
||||
path: .
|
||||
- name: langchain-sema4
|
||||
repo: langchain-ai/langchain-sema4
|
||||
path: libs/sema4
|
||||
|
||||
@@ -1,16 +1,13 @@
|
||||
from typing import AsyncGenerator, Generator
|
||||
from typing import Generator
|
||||
|
||||
import pytest
|
||||
from langchain_core.vectorstores import VectorStore
|
||||
from langchain_tests.integration_tests.vectorstores import (
|
||||
AsyncReadWriteTestSuite,
|
||||
ReadWriteTestSuite,
|
||||
)
|
||||
from langchain_tests.integration_tests.vectorstores import VectorStoreIntegrationTests
|
||||
|
||||
from langchain_chroma import Chroma
|
||||
|
||||
|
||||
class TestSync(ReadWriteTestSuite):
|
||||
class TestChromaStandard(VectorStoreIntegrationTests):
|
||||
@pytest.fixture()
|
||||
def vectorstore(self) -> Generator[VectorStore, None, None]: # type: ignore
|
||||
"""Get an empty vectorstore for unit tests."""
|
||||
@@ -20,15 +17,3 @@ class TestSync(ReadWriteTestSuite):
|
||||
finally:
|
||||
store.delete_collection()
|
||||
pass
|
||||
|
||||
|
||||
class TestAsync(AsyncReadWriteTestSuite):
|
||||
@pytest.fixture()
|
||||
async def vectorstore(self) -> AsyncGenerator[VectorStore, None]: # type: ignore
|
||||
"""Get an empty vectorstore for unit tests."""
|
||||
store = Chroma(embedding_function=self.get_embeddings())
|
||||
try:
|
||||
yield store
|
||||
finally:
|
||||
store.delete_collection()
|
||||
pass
|
||||
|
||||
@@ -4,6 +4,7 @@ from typing import Type
|
||||
|
||||
import pytest
|
||||
from langchain_core.language_models import BaseChatModel
|
||||
from langchain_core.tools import BaseTool
|
||||
from langchain_tests.integration_tests import ( # type: ignore[import-not-found]
|
||||
ChatModelIntegrationTests, # type: ignore[import-not-found]
|
||||
)
|
||||
@@ -24,5 +25,7 @@ class TestFireworksStandard(ChatModelIntegrationTests):
|
||||
}
|
||||
|
||||
@pytest.mark.xfail(reason="Not yet implemented.")
|
||||
def test_tool_message_histories_list_content(self, model: BaseChatModel) -> None:
|
||||
super().test_tool_message_histories_list_content(model)
|
||||
def test_tool_message_histories_list_content(
|
||||
self, model: BaseChatModel, my_adder_tool: BaseTool
|
||||
) -> None:
|
||||
super().test_tool_message_histories_list_content(model, my_adder_tool)
|
||||
|
||||
@@ -5,6 +5,7 @@ from typing import Optional, Type
|
||||
import pytest
|
||||
from langchain_core.language_models import BaseChatModel
|
||||
from langchain_core.rate_limiters import InMemoryRateLimiter
|
||||
from langchain_core.tools import BaseTool
|
||||
from langchain_tests.integration_tests import (
|
||||
ChatModelIntegrationTests,
|
||||
)
|
||||
@@ -20,8 +21,10 @@ class BaseTestGroq(ChatModelIntegrationTests):
|
||||
return ChatGroq
|
||||
|
||||
@pytest.mark.xfail(reason="Not yet implemented.")
|
||||
def test_tool_message_histories_list_content(self, model: BaseChatModel) -> None:
|
||||
super().test_tool_message_histories_list_content(model)
|
||||
def test_tool_message_histories_list_content(
|
||||
self, model: BaseChatModel, my_adder_tool: BaseTool
|
||||
) -> None:
|
||||
super().test_tool_message_histories_list_content(model, my_adder_tool)
|
||||
|
||||
|
||||
class TestGroqLlama(BaseTestGroq):
|
||||
@@ -47,8 +50,10 @@ class TestGroqLlama(BaseTestGroq):
|
||||
@pytest.mark.xfail(
|
||||
reason=("Fails with 'Failed to call a function. Please adjust your prompt.'")
|
||||
)
|
||||
def test_tool_message_histories_string_content(self, model: BaseChatModel) -> None:
|
||||
super().test_tool_message_histories_string_content(model)
|
||||
def test_tool_message_histories_string_content(
|
||||
self, model: BaseChatModel, my_adder_tool: BaseTool
|
||||
) -> None:
|
||||
super().test_tool_message_histories_string_content(model, my_adder_tool)
|
||||
|
||||
@pytest.mark.xfail(
|
||||
reason=(
|
||||
|
||||
@@ -595,7 +595,7 @@ class ChatMistralAI(BaseChatModel):
|
||||
for chunk in self.completion_with_retry(
|
||||
messages=message_dicts, run_manager=run_manager, **params
|
||||
):
|
||||
if len(chunk["choices"]) == 0:
|
||||
if len(chunk.get("choices", [])) == 0:
|
||||
continue
|
||||
new_chunk = _convert_chunk_to_message_chunk(chunk, default_chunk_class)
|
||||
# make future chunks same type as first chunk
|
||||
@@ -621,7 +621,7 @@ class ChatMistralAI(BaseChatModel):
|
||||
async for chunk in await acompletion_with_retry(
|
||||
self, messages=message_dicts, run_manager=run_manager, **params
|
||||
):
|
||||
if len(chunk["choices"]) == 0:
|
||||
if len(chunk.get("choices", [])) == 0:
|
||||
continue
|
||||
new_chunk = _convert_chunk_to_message_chunk(chunk, default_chunk_class)
|
||||
# make future chunks same type as first chunk
|
||||
|
||||
16
libs/partners/openai/poetry.lock
generated
16
libs/partners/openai/poetry.lock
generated
@@ -495,7 +495,7 @@ files = [
|
||||
|
||||
[[package]]
|
||||
name = "langchain-core"
|
||||
version = "0.3.21"
|
||||
version = "0.3.22"
|
||||
description = "Building applications with LLMs through composability"
|
||||
optional = false
|
||||
python-versions = ">=3.9,<4.0"
|
||||
@@ -520,7 +520,7 @@ url = "../../core"
|
||||
|
||||
[[package]]
|
||||
name = "langchain-tests"
|
||||
version = "0.3.4"
|
||||
version = "0.3.6"
|
||||
description = "Standard tests for LangChain implementations"
|
||||
optional = false
|
||||
python-versions = ">=3.9,<4.0"
|
||||
@@ -528,9 +528,15 @@ files = []
|
||||
develop = true
|
||||
|
||||
[package.dependencies]
|
||||
httpx = "^0.27.0"
|
||||
langchain-core = "^0.3.19"
|
||||
httpx = ">=0.25.0,<1"
|
||||
langchain-core = "^0.3.22"
|
||||
numpy = [
|
||||
{version = ">=1.24.0,<2.0.0", markers = "python_version < \"3.12\""},
|
||||
{version = ">=1.26.2,<3", markers = "python_version >= \"3.12\""},
|
||||
]
|
||||
pytest = ">=7,<9"
|
||||
pytest-asyncio = ">=0.20,<1"
|
||||
pytest-socket = ">=0.6.0,<1"
|
||||
syrupy = "^4"
|
||||
|
||||
[package.source]
|
||||
@@ -1639,4 +1645,4 @@ watchmedo = ["PyYAML (>=3.10)"]
|
||||
[metadata]
|
||||
lock-version = "2.0"
|
||||
python-versions = ">=3.9,<4.0"
|
||||
content-hash = "ded25b72c77fad9a869f3308c1bba084b58f54eb13df2785f061bc340d6ec748"
|
||||
content-hash = "6fb8c9f98c76ba402d53234ac2ac78bcebafbe818e64cd849e0ae26cafcd5ba4"
|
||||
|
||||
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
|
||||
|
||||
[tool.poetry]
|
||||
name = "langchain-openai"
|
||||
version = "0.2.11"
|
||||
version = "0.2.12"
|
||||
description = "An integration package connecting OpenAI and LangChain"
|
||||
authors = []
|
||||
readme = "README.md"
|
||||
@@ -24,7 +24,7 @@ ignore_missing_imports = true
|
||||
[tool.poetry.dependencies]
|
||||
python = ">=3.9,<4.0"
|
||||
langchain-core = "^0.3.21"
|
||||
openai = "^1.54.0"
|
||||
openai = "^1.55.3"
|
||||
tiktoken = ">=0.7,<1"
|
||||
|
||||
[tool.ruff.lint]
|
||||
|
||||
@@ -4,6 +4,7 @@ from typing import Tuple, Type
|
||||
|
||||
import pytest
|
||||
from langchain_core.language_models import BaseChatModel
|
||||
from langchain_core.tools import BaseTool
|
||||
from langchain_tests.unit_tests import ChatModelUnitTests
|
||||
|
||||
from langchain_openai import AzureChatOpenAI
|
||||
@@ -23,8 +24,10 @@ class TestOpenAIStandard(ChatModelUnitTests):
|
||||
}
|
||||
|
||||
@pytest.mark.xfail(reason="AzureOpenAI does not support tool_choice='any'")
|
||||
def test_bind_tool_pydantic(self, model: BaseChatModel) -> None:
|
||||
super().test_bind_tool_pydantic(model)
|
||||
def test_bind_tool_pydantic(
|
||||
self, model: BaseChatModel, my_adder_tool: BaseTool
|
||||
) -> None:
|
||||
super().test_bind_tool_pydantic(model, my_adder_tool)
|
||||
|
||||
@property
|
||||
def init_from_env_params(self) -> Tuple[dict, dict, dict]:
|
||||
|
||||
@@ -5,6 +5,7 @@ from typing import Optional, Type
|
||||
import pytest # type: ignore[import-not-found]
|
||||
from langchain_core.language_models import BaseChatModel
|
||||
from langchain_core.rate_limiters import InMemoryRateLimiter
|
||||
from langchain_core.tools import BaseTool
|
||||
from langchain_tests.integration_tests import ( # type: ignore[import-not-found]
|
||||
ChatModelIntegrationTests, # type: ignore[import-not-found]
|
||||
)
|
||||
@@ -40,13 +41,19 @@ class TestXAIStandard(ChatModelIntegrationTests):
|
||||
super().test_usage_metadata_streaming(model)
|
||||
|
||||
@pytest.mark.xfail(reason="Can't handle AIMessage with empty content.")
|
||||
def test_tool_message_error_status(self, model: BaseChatModel) -> None:
|
||||
super().test_tool_message_error_status(model)
|
||||
def test_tool_message_error_status(
|
||||
self, model: BaseChatModel, my_adder_tool: BaseTool
|
||||
) -> None:
|
||||
super().test_tool_message_error_status(model, my_adder_tool)
|
||||
|
||||
@pytest.mark.xfail(reason="Can't handle AIMessage with empty content.")
|
||||
def test_structured_few_shot_examples(self, model: BaseChatModel) -> None:
|
||||
super().test_structured_few_shot_examples(model)
|
||||
def test_structured_few_shot_examples(
|
||||
self, model: BaseChatModel, my_adder_tool: BaseTool
|
||||
) -> None:
|
||||
super().test_structured_few_shot_examples(model, my_adder_tool)
|
||||
|
||||
@pytest.mark.xfail(reason="Can't handle AIMessage with empty content.")
|
||||
def test_tool_message_histories_string_content(self, model: BaseChatModel) -> None:
|
||||
super().test_tool_message_histories_string_content(model)
|
||||
def test_tool_message_histories_string_content(
|
||||
self, model: BaseChatModel, my_adder_tool: BaseTool
|
||||
) -> None:
|
||||
super().test_tool_message_histories_string_content(model, my_adder_tool)
|
||||
|
||||
@@ -0,0 +1,7 @@
|
||||
"""
|
||||
Base Test classes for standard testing.
|
||||
|
||||
To learn how to use these classes, see the
|
||||
`Integration standard testing <https://python.langchain.com/docs/contributing/how_to/integrations/standard_tests/>`_
|
||||
guide.
|
||||
"""
|
||||
|
||||
@@ -3,9 +3,15 @@ from typing import Type
|
||||
|
||||
|
||||
class BaseStandardTests(ABC):
|
||||
"""
|
||||
:private:
|
||||
"""
|
||||
|
||||
def test_no_overrides_DO_NOT_OVERRIDE(self) -> None:
|
||||
"""
|
||||
Test that no standard tests are overridden.
|
||||
|
||||
:private:
|
||||
"""
|
||||
# find path to standard test implementations
|
||||
comparison_class = None
|
||||
|
||||
@@ -23,7 +23,7 @@ from .chat_models import ChatModelIntegrationTests
|
||||
from .embeddings import EmbeddingsIntegrationTests
|
||||
from .retrievers import RetrieversIntegrationTests
|
||||
from .tools import ToolsIntegrationTests
|
||||
from .vectorstores import AsyncReadWriteTestSuite, ReadWriteTestSuite
|
||||
from .vectorstores import VectorStoreIntegrationTests
|
||||
|
||||
__all__ = [
|
||||
"ChatModelIntegrationTests",
|
||||
@@ -33,7 +33,6 @@ __all__ = [
|
||||
"BaseStoreSyncTests",
|
||||
"AsyncCacheTestSuite",
|
||||
"SyncCacheTestSuite",
|
||||
"AsyncReadWriteTestSuite",
|
||||
"ReadWriteTestSuite",
|
||||
"VectorStoreIntegrationTests",
|
||||
"RetrieversIntegrationTests",
|
||||
]
|
||||
|
||||
@@ -1,3 +1,11 @@
|
||||
"""
|
||||
Standard tests for the BaseStore abstraction
|
||||
|
||||
We don't recommend implementing externally managed BaseStore abstractions at this time.
|
||||
|
||||
:private:
|
||||
"""
|
||||
|
||||
from abc import abstractmethod
|
||||
from typing import AsyncGenerator, Generator, Generic, Tuple, TypeVar
|
||||
|
||||
|
||||
@@ -1,3 +1,11 @@
|
||||
"""
|
||||
Standard tests for the BaseCache abstraction
|
||||
|
||||
We don't recommend implementing externally managed BaseCache abstractions at this time.
|
||||
|
||||
:private:
|
||||
"""
|
||||
|
||||
from abc import abstractmethod
|
||||
|
||||
import pytest
|
||||
|
||||
@@ -16,7 +16,7 @@ from langchain_core.messages import (
|
||||
)
|
||||
from langchain_core.output_parsers import StrOutputParser
|
||||
from langchain_core.prompts import ChatPromptTemplate
|
||||
from langchain_core.tools import tool
|
||||
from langchain_core.tools import BaseTool, tool
|
||||
from langchain_core.utils.function_calling import tool_example_to_messages
|
||||
from pydantic import BaseModel, Field
|
||||
from pydantic.v1 import BaseModel as BaseModelV1
|
||||
@@ -24,16 +24,29 @@ from pydantic.v1 import Field as FieldV1
|
||||
|
||||
from langchain_tests.unit_tests.chat_models import (
|
||||
ChatModelTests,
|
||||
my_adder_tool,
|
||||
)
|
||||
from langchain_tests.utils.pydantic import PYDANTIC_MAJOR_VERSION
|
||||
|
||||
|
||||
class MagicFunctionSchema(BaseModel):
|
||||
def _get_joke_class() -> type[BaseModel]:
|
||||
"""
|
||||
:private:
|
||||
"""
|
||||
|
||||
class Joke(BaseModel):
|
||||
"""Joke to tell user."""
|
||||
|
||||
setup: str = Field(description="question to set up a joke")
|
||||
punchline: str = Field(description="answer to resolve the joke")
|
||||
|
||||
return Joke
|
||||
|
||||
|
||||
class _MagicFunctionSchema(BaseModel):
|
||||
input: int = Field(..., gt=-1000, lt=1000)
|
||||
|
||||
|
||||
@tool(args_schema=MagicFunctionSchema)
|
||||
@tool(args_schema=_MagicFunctionSchema)
|
||||
def magic_function(input: int) -> int:
|
||||
"""Applies a magic function to an input."""
|
||||
return input + 2
|
||||
@@ -45,13 +58,6 @@ def magic_function_no_args() -> int:
|
||||
return 5
|
||||
|
||||
|
||||
class Joke(BaseModel):
|
||||
"""Joke to tell user."""
|
||||
|
||||
setup: str = Field(description="question to set up a joke")
|
||||
punchline: str = Field(description="answer to resolve the joke")
|
||||
|
||||
|
||||
def _validate_tool_call_message(message: BaseMessage) -> None:
|
||||
assert isinstance(message, AIMessage)
|
||||
assert len(message.tool_calls) == 1
|
||||
@@ -103,17 +109,214 @@ class ChatModelIntegrationTests(ChatModelTests):
|
||||
.. note::
|
||||
API references for individual test methods include troubleshooting tips.
|
||||
|
||||
.. note::
|
||||
Test subclasses can control what features are tested (such as tool
|
||||
calling or multi-modality) by selectively overriding the properties on the
|
||||
class. Relevant properties are mentioned in the references for each method.
|
||||
See this page for detail on all properties:
|
||||
https://python.langchain.com/api_reference/standard_tests/unit_tests/langchain_tests.unit_tests.chat_models.ChatModelTests.html
|
||||
|
||||
Test subclasses must implement the following two properties:
|
||||
|
||||
chat_model_class
|
||||
The chat model class to test, e.g., ``ChatParrotLink``.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@property
|
||||
def chat_model_class(self) -> Type[ChatParrotLink]:
|
||||
return ChatParrotLink
|
||||
|
||||
chat_model_params
|
||||
Initialization parameters for the chat model.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@property
|
||||
def chat_model_params(self) -> dict:
|
||||
return {"model": "bird-brain-001", "temperature": 0}
|
||||
|
||||
In addition, test subclasses can control what features are tested (such as tool
|
||||
calling or multi-modality) by selectively overriding the following properties.
|
||||
Expand to see details:
|
||||
|
||||
.. dropdown:: has_tool_calling
|
||||
|
||||
Boolean property indicating whether the chat model supports tool calling.
|
||||
|
||||
By default, this is determined by whether the chat model's `bind_tools` method
|
||||
is overridden. It typically does not need to be overridden on the test class.
|
||||
|
||||
Example override:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@property
|
||||
def has_tool_calling(self) -> bool:
|
||||
return True
|
||||
|
||||
.. dropdown:: tool_choice_value
|
||||
|
||||
Value to use for tool choice when used in tests.
|
||||
|
||||
Some tests for tool calling features attempt to force tool calling via a
|
||||
`tool_choice` parameter. A common value for this parameter is "any". Defaults
|
||||
to `None`.
|
||||
|
||||
Note: if the value is set to "tool_name", the name of the tool used in each
|
||||
test will be set as the value for `tool_choice`.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@property
|
||||
def tool_choice_value(self) -> Optional[str]:
|
||||
return "any"
|
||||
|
||||
.. dropdown:: has_structured_output
|
||||
|
||||
Boolean property indicating whether the chat model supports structured
|
||||
output.
|
||||
|
||||
By default, this is determined by whether the chat model's
|
||||
`with_structured_output` method is overridden. If the base implementation is
|
||||
intended to be used, this method should be overridden.
|
||||
|
||||
See: https://python.langchain.com/docs/concepts/structured_outputs/
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@property
|
||||
def has_structured_output(self) -> bool:
|
||||
return True
|
||||
|
||||
.. dropdown:: supports_image_inputs
|
||||
|
||||
Boolean property indicating whether the chat model supports image inputs.
|
||||
Defaults to ``False``.
|
||||
|
||||
If set to ``True``, the chat model will be tested using content blocks of the
|
||||
form
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
[
|
||||
{"type": "text", "text": "describe the weather in this image"},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
|
||||
},
|
||||
]
|
||||
|
||||
See https://python.langchain.com/docs/concepts/multimodality/
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@property
|
||||
def supports_image_inputs(self) -> bool:
|
||||
return True
|
||||
|
||||
.. dropdown:: supports_video_inputs
|
||||
|
||||
Boolean property indicating whether the chat model supports image inputs.
|
||||
Defaults to ``False``. No current tests are written for this feature.
|
||||
|
||||
.. dropdown:: returns_usage_metadata
|
||||
|
||||
Boolean property indicating whether the chat model returns usage metadata
|
||||
on invoke and streaming responses.
|
||||
|
||||
``usage_metadata`` is an optional dict attribute on AIMessages that track input
|
||||
and output tokens: https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.UsageMetadata.html
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@property
|
||||
def returns_usage_metadata(self) -> bool:
|
||||
return False
|
||||
|
||||
.. dropdown:: supports_anthropic_inputs
|
||||
|
||||
Boolean property indicating whether the chat model supports Anthropic-style
|
||||
inputs.
|
||||
|
||||
These inputs might feature "tool use" and "tool result" content blocks, e.g.,
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
[
|
||||
{"type": "text", "text": "Hmm let me think about that"},
|
||||
{
|
||||
"type": "tool_use",
|
||||
"input": {"fav_color": "green"},
|
||||
"id": "foo",
|
||||
"name": "color_picker",
|
||||
},
|
||||
]
|
||||
|
||||
If set to ``True``, the chat model will be tested using content blocks of this
|
||||
form.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@property
|
||||
def supports_anthropic_inputs(self) -> bool:
|
||||
return False
|
||||
|
||||
.. dropdown:: supports_image_tool_message
|
||||
|
||||
Boolean property indicating whether the chat model supports ToolMessages
|
||||
that include image content, e.g.,
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
ToolMessage(
|
||||
content=[
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
|
||||
},
|
||||
],
|
||||
tool_call_id="1",
|
||||
name="random_image",
|
||||
)
|
||||
|
||||
If set to ``True``, the chat model will be tested with message sequences that
|
||||
include ToolMessages of this form.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@property
|
||||
def supports_image_tool_message(self) -> bool:
|
||||
return False
|
||||
|
||||
.. dropdown:: supported_usage_metadata_details
|
||||
|
||||
Property controlling what usage metadata details are emitted in both invoke
|
||||
and stream.
|
||||
|
||||
``usage_metadata`` is an optional dict attribute on AIMessages that track input
|
||||
and output tokens: https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.UsageMetadata.html
|
||||
|
||||
It includes optional keys ``input_token_details`` and ``output_token_details``
|
||||
that can track usage details associated with special types of tokens, such as
|
||||
cached, audio, or reasoning.
|
||||
|
||||
Only needs to be overridden if these details are supplied.
|
||||
"""
|
||||
|
||||
@property
|
||||
def standard_chat_model_params(self) -> dict:
|
||||
""":meta private:"""
|
||||
""":private:"""
|
||||
return {}
|
||||
|
||||
def test_invoke(self, model: BaseChatModel) -> None:
|
||||
@@ -908,6 +1111,7 @@ class ChatModelIntegrationTests(ChatModelTests):
|
||||
if not self.has_tool_calling:
|
||||
pytest.skip("Test requires tool calling.")
|
||||
|
||||
Joke = _get_joke_class()
|
||||
# Pydantic class
|
||||
# Type ignoring since the interface only officially supports pydantic 1
|
||||
# or pydantic.v1.BaseModel but not pydantic.BaseModel from pydantic 2.
|
||||
@@ -960,6 +1164,8 @@ class ChatModelIntegrationTests(ChatModelTests):
|
||||
if not self.has_tool_calling:
|
||||
pytest.skip("Test requires tool calling.")
|
||||
|
||||
Joke = _get_joke_class()
|
||||
|
||||
# Pydantic class
|
||||
# Type ignoring since the interface only officially supports pydantic 1
|
||||
# or pydantic.v1.BaseModel but not pydantic.BaseModel from pydantic 2.
|
||||
@@ -1089,7 +1295,9 @@ class ChatModelIntegrationTests(ChatModelTests):
|
||||
joke_result = chat.invoke("Give me a joke about cats, include the punchline.")
|
||||
assert isinstance(joke_result, Joke)
|
||||
|
||||
def test_tool_message_histories_string_content(self, model: BaseChatModel) -> None:
|
||||
def test_tool_message_histories_string_content(
|
||||
self, model: BaseChatModel, my_adder_tool: BaseTool
|
||||
) -> None:
|
||||
"""Test that message histories are compatible with string tool contents
|
||||
(e.g. OpenAI format). If a model passes this test, it should be compatible
|
||||
with messages generated from providers following OpenAI format.
|
||||
@@ -1123,8 +1331,8 @@ class ChatModelIntegrationTests(ChatModelTests):
|
||||
.. code-block:: python
|
||||
|
||||
@pytest.mark.xfail(reason=("Not implemented."))
|
||||
def test_tool_message_histories_string_content(self, model: BaseChatModel) -> None:
|
||||
super().test_tool_message_histories_string_content(model)
|
||||
def test_tool_message_histories_string_content(self, *args: Any) -> None:
|
||||
super().test_tool_message_histories_string_content(*args)
|
||||
""" # noqa: E501
|
||||
if not self.has_tool_calling:
|
||||
pytest.skip("Test requires tool calling.")
|
||||
@@ -1158,6 +1366,7 @@ class ChatModelIntegrationTests(ChatModelTests):
|
||||
def test_tool_message_histories_list_content(
|
||||
self,
|
||||
model: BaseChatModel,
|
||||
my_adder_tool: BaseTool,
|
||||
) -> None:
|
||||
"""Test that message histories are compatible with list tool contents
|
||||
(e.g. Anthropic format).
|
||||
@@ -1206,8 +1415,8 @@ class ChatModelIntegrationTests(ChatModelTests):
|
||||
.. code-block:: python
|
||||
|
||||
@pytest.mark.xfail(reason=("Not implemented."))
|
||||
def test_tool_message_histories_list_content(self, model: BaseChatModel) -> None:
|
||||
super().test_tool_message_histories_list_content(model)
|
||||
def test_tool_message_histories_list_content(self, *args: Any) -> None:
|
||||
super().test_tool_message_histories_list_content(*args)
|
||||
""" # noqa: E501
|
||||
if not self.has_tool_calling:
|
||||
pytest.skip("Test requires tool calling.")
|
||||
@@ -1246,7 +1455,9 @@ class ChatModelIntegrationTests(ChatModelTests):
|
||||
result_list_content = model_with_tools.invoke(messages_list_content)
|
||||
assert isinstance(result_list_content, AIMessage)
|
||||
|
||||
def test_structured_few_shot_examples(self, model: BaseChatModel) -> None:
|
||||
def test_structured_few_shot_examples(
|
||||
self, model: BaseChatModel, my_adder_tool: BaseTool
|
||||
) -> None:
|
||||
"""Test that the model can process few-shot examples with tool calls.
|
||||
|
||||
These are represented as a sequence of messages of the following form:
|
||||
@@ -1286,8 +1497,8 @@ class ChatModelIntegrationTests(ChatModelTests):
|
||||
.. code-block:: python
|
||||
|
||||
@pytest.mark.xfail(reason=("Not implemented."))
|
||||
def test_structured_few_shot_examples(self, model: BaseChatModel) -> None:
|
||||
super().test_structured_few_shot_examples(model)
|
||||
def test_structured_few_shot_examples(self, *args: Any) -> None:
|
||||
super().test_structured_few_shot_examples(*args)
|
||||
""" # noqa: E501
|
||||
if not self.has_tool_calling:
|
||||
pytest.skip("Test requires tool calling.")
|
||||
@@ -1557,7 +1768,9 @@ class ChatModelIntegrationTests(ChatModelTests):
|
||||
]
|
||||
model.bind_tools([color_picker]).invoke(messages)
|
||||
|
||||
def test_tool_message_error_status(self, model: BaseChatModel) -> None:
|
||||
def test_tool_message_error_status(
|
||||
self, model: BaseChatModel, my_adder_tool: BaseTool
|
||||
) -> None:
|
||||
"""Test that ToolMessage with ``status="error"`` can be handled.
|
||||
|
||||
These messages may take the form:
|
||||
@@ -1647,16 +1860,21 @@ class ChatModelIntegrationTests(ChatModelTests):
|
||||
assert len(result.content) > 0
|
||||
|
||||
def invoke_with_audio_input(self, *, stream: bool = False) -> AIMessage:
|
||||
""":private:"""
|
||||
raise NotImplementedError()
|
||||
|
||||
def invoke_with_audio_output(self, *, stream: bool = False) -> AIMessage:
|
||||
""":private:"""
|
||||
raise NotImplementedError()
|
||||
|
||||
def invoke_with_reasoning_output(self, *, stream: bool = False) -> AIMessage:
|
||||
""":private:"""
|
||||
raise NotImplementedError()
|
||||
|
||||
def invoke_with_cache_read_input(self, *, stream: bool = False) -> AIMessage:
|
||||
""":private:"""
|
||||
raise NotImplementedError()
|
||||
|
||||
def invoke_with_cache_creation_input(self, *, stream: bool = False) -> AIMessage:
|
||||
""":private:"""
|
||||
raise NotImplementedError()
|
||||
|
||||
@@ -1,4 +1,12 @@
|
||||
"""Test suite to check index implementations."""
|
||||
"""Test suite to check index implementations.
|
||||
|
||||
Standard tests for the DocumentIndex abstraction
|
||||
|
||||
We don't recommend implementing externally managed DocumentIndex abstractions at this
|
||||
time.
|
||||
|
||||
:private:
|
||||
"""
|
||||
|
||||
import inspect
|
||||
import uuid
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user