Add dashvector vectorstore (#9163)

## Description
Add `Dashvector` vectorstore for langchain

- [dashvector quick
start](https://help.aliyun.com/document_detail/2510223.html)
- [dashvector package description](https://pypi.org/project/dashvector/)

## How to use
```python
from langchain.vectorstores.dashvector import DashVector

dashvector = DashVector.from_documents(docs, embeddings)
```

---------

Co-authored-by: smallrain.xuxy <smallrain.xuxy@alibaba-inc.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
Xiaoyu Xee
2023-08-16 07:19:30 +08:00
committed by GitHub
parent bfbb97b74c
commit b30f449dae
5 changed files with 702 additions and 0 deletions

View File

@@ -0,0 +1,24 @@
# DashVector
> [DashVector](https://help.aliyun.com/document_detail/2510225.html) is a fully-managed vectorDB service that supports high-dimension dense and sparse vectors, real-time insertion and filtered search. It is built to scale automatically and can adapt to different application requirements.
This document demonstrates to leverage DashVector within the LangChain ecosystem. In particular, it shows how to install DashVector, and how to use it as a VectorStore plugin in LangChain.
It is broken into two parts: installation and setup, and then references to specific DashVector wrappers.
## Installation and Setup
Install the Python SDK:
```bash
pip install dashvector
```
## VectorStore
A DashVector Collection is wrapped as a familiar VectorStore for native usage within LangChain,
which allows it to be readily used for various scenarios, such as semantic search or example selection.
You may import the vectorstore by:
```python
from langchain.vectorstores import DashVector
```
For a detailed walkthrough of the DashVector wrapper, please refer to [this notebook](/docs/integrations/vectorstores/dashvector.html)

View File

@@ -0,0 +1,236 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# DashVector\n",
"\n",
"> [DashVector](https://help.aliyun.com/document_detail/2510225.html) is a fully-managed vectorDB service that supports high-dimension dense and sparse vectors, real-time insertion and filtered search. It is built to scale automatically and can adapt to different application requirements.\n",
"\n",
"This notebook shows how to use functionality related to the `DashVector` vector database.\n",
"\n",
"To use DashVector, you must have an API key.\n",
"Here are the [installation instructions](https://help.aliyun.com/document_detail/2510223.html)."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Install"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"!pip install dashvector dashscope"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"We want to use `DashScopeEmbeddings` so we also have to get the Dashscope API Key."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"pycharm": {
"name": "#%%\n",
"is_executing": true
},
"ExecuteTime": {
"end_time": "2023-08-11T10:37:15.091585Z",
"start_time": "2023-08-11T10:36:51.859753Z"
}
},
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ[\"DASHVECTOR_API_KEY\"] = getpass.getpass(\"DashVector API Key:\")\n",
"os.environ[\"DASHSCOPE_API_KEY\"] = getpass.getpass(\"DashScope API Key:\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Example"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"pycharm": {
"name": "#%%\n",
"is_executing": true
},
"ExecuteTime": {
"end_time": "2023-08-11T10:42:30.243460Z",
"start_time": "2023-08-11T10:42:27.783785Z"
}
},
"outputs": [],
"source": [
"from langchain.embeddings.dashscope import DashScopeEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import DashVector"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"pycharm": {
"is_executing": true,
"name": "#%%\n"
},
"ExecuteTime": {
"end_time": "2023-08-11T10:42:30.391580Z",
"start_time": "2023-08-11T10:42:30.249021Z"
}
},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = DashScopeEmbeddings()"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"We can create DashVector from documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"dashvector = DashVector.from_documents(docs, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = dashvector.similarity_search(query)\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"We can add texts with meta datas and ids, and search with meta filter."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"pycharm": {
"name": "#%%\n"
},
"ExecuteTime": {
"end_time": "2023-08-11T10:42:51.641309Z",
"start_time": "2023-08-11T10:42:51.132109Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='baz', metadata={'key': 2})]\n"
]
}
],
"source": [
"texts = [\"foo\", \"bar\", \"baz\"]\n",
"metadatas = [{\"key\": i} for i in range(len(texts))]\n",
"ids = [\"0\", \"1\", \"2\"]\n",
"\n",
"dashvector.add_texts(texts, metadatas=metadatas, ids=ids)\n",
"\n",
"docs = dashvector.similarity_search(\"foo\", filter=\"key = 2\")\n",
"print(docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 1
}