mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-06 05:25:04 +00:00
community[minor]: add aerospike vectorstore integration (#21735)
Please let me know if you see any possible areas of improvement. I would very much appreciate your constructive criticism if time allows. **Description:** - Added a aerospike vector store integration that utilizes [Aerospike-Vector-Search](https://aerospike.com/products/vector-database-search-llm/) add-on. - Added both unit tests and integration tests - Added a docker compose file for spinning up a test environment - Added a notebook **Dependencies:** any dependencies required for this change - aerospike-vector-search **Twitter handle:** - No twitter, you can use my GitHub handle or LinkedIn if you'd like Thanks! --------- Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com> Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
@@ -60,7 +60,7 @@
|
||||
" * document addition by id (`add_documents` method with `ids` argument)\n",
|
||||
" * delete by id (`delete` method with `ids` argument)\n",
|
||||
"\n",
|
||||
"Compatible Vectorstores: `AnalyticDB`, `AstraDB`, `AwaDB`, `Bagel`, `Cassandra`, `Chroma`, `CouchbaseVectorStore`, `DashVector`, `DatabricksVectorSearch`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `HanaDB`, `Milvus`, `MyScale`, `OpenSearchVectorSearch`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `Rockset`, `ScaNN`, `SupabaseVectorStore`, `SurrealDBStore`, `TimescaleVector`, `Vald`, `VDMS`, `Vearch`, `VespaStore`, `Weaviate`, `ZepVectorStore`, `TencentVectorDB`, `OpenSearchVectorSearch`.\n",
|
||||
"Compatible Vectorstores: `Aerospike`, `AnalyticDB`, `AstraDB`, `AwaDB`, `Bagel`, `Cassandra`, `Chroma`, `CouchbaseVectorStore`, `DashVector`, `DatabricksVectorSearch`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `HanaDB`, `Milvus`, `MyScale`, `OpenSearchVectorSearch`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `Rockset`, `ScaNN`, `SupabaseVectorStore`, `SurrealDBStore`, `TimescaleVector`, `Vald`, `VDMS`, `Vearch`, `VespaStore`, `Weaviate`, `ZepVectorStore`, `TencentVectorDB`, `OpenSearchVectorSearch`.\n",
|
||||
" \n",
|
||||
"## Caution\n",
|
||||
"\n",
|
||||
|
706
docs/docs/integrations/vectorstores/aerospike.ipynb
Normal file
706
docs/docs/integrations/vectorstores/aerospike.ipynb
Normal file
@@ -0,0 +1,706 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Aerospike\n",
|
||||
"\n",
|
||||
"[Aerospike Vector Search](https://aerospike.com/docs/vector) (AVS) is an\n",
|
||||
"extension to the Aerospike Database that enables searches across very large\n",
|
||||
"datasets stored in Aerospike. This new service lives outside of Aerospike and\n",
|
||||
"builds an index to perform those searches.\n",
|
||||
"\n",
|
||||
"This notebook showcases the functionality of the LangChain Aerospike VectorStore\n",
|
||||
"integration.\n",
|
||||
"\n",
|
||||
"## Install AVS\n",
|
||||
"\n",
|
||||
"Before using this notebook, we need to have a running AVS instance. Use one of\n",
|
||||
"the [available installation methods](https://aerospike.com/docs/vector/install). \n",
|
||||
"\n",
|
||||
"When finished, store your AVS instance's IP address and port to use later\n",
|
||||
"in this demo:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"PROXIMUS_HOST = \"<avs-ip>\"\n",
|
||||
"PROXIMUS_PORT = 5000"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Install Dependencies \n",
|
||||
"The `sentence-transformers` dependency is large. This step could take several minutes to complete."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "shellscript"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install --upgrade --quiet aerospike-vector-search==0.6.1 sentence-transformers langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Download Quotes Dataset\n",
|
||||
"\n",
|
||||
"We will download a dataset of approximately 100,000 quotes and use a subset of those quotes for semantic search."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--2024-05-10 17:28:17-- https://github.com/aerospike/aerospike-vector-search-examples/raw/7dfab0fccca0852a511c6803aba46578729694b5/quote-semantic-search/container-volumes/quote-search/data/quotes.csv.tgz\n",
|
||||
"Resolving github.com (github.com)... 140.82.116.4\n",
|
||||
"Connecting to github.com (github.com)|140.82.116.4|:443... connected.\n",
|
||||
"HTTP request sent, awaiting response... 302 Found\n",
|
||||
"Location: https://raw.githubusercontent.com/aerospike/aerospike-vector-search-examples/7dfab0fccca0852a511c6803aba46578729694b5/quote-semantic-search/container-volumes/quote-search/data/quotes.csv.tgz [following]\n",
|
||||
"--2024-05-10 17:28:17-- https://raw.githubusercontent.com/aerospike/aerospike-vector-search-examples/7dfab0fccca0852a511c6803aba46578729694b5/quote-semantic-search/container-volumes/quote-search/data/quotes.csv.tgz\n",
|
||||
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...\n",
|
||||
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
|
||||
"HTTP request sent, awaiting response... 200 OK\n",
|
||||
"Length: 11597643 (11M) [application/octet-stream]\n",
|
||||
"Saving to: ‘quotes.csv.tgz’\n",
|
||||
"\n",
|
||||
"quotes.csv.tgz 100%[===================>] 11.06M 1.94MB/s in 6.1s \n",
|
||||
"\n",
|
||||
"2024-05-10 17:28:23 (1.81 MB/s) - ‘quotes.csv.tgz’ saved [11597643/11597643]\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!wget https://github.com/aerospike/aerospike-vector-search-examples/raw/7dfab0fccca0852a511c6803aba46578729694b5/quote-semantic-search/container-volumes/quote-search/data/quotes.csv.tgz"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load the Quotes Into Documents\n",
|
||||
"\n",
|
||||
"We will load our quotes dataset using the `CSVLoader` document loader. In this case, `lazy_load` returns an iterator to ingest our quotes more efficiently. In this example, we only load 5,000 quotes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import itertools\n",
|
||||
"import os\n",
|
||||
"import tarfile\n",
|
||||
"\n",
|
||||
"from langchain_community.document_loaders.csv_loader import CSVLoader\n",
|
||||
"\n",
|
||||
"filename = \"./quotes.csv\"\n",
|
||||
"\n",
|
||||
"if not os.path.exists(filename) and os.path.exists(filename + \".tgz\"):\n",
|
||||
" # Untar the file\n",
|
||||
" with tarfile.open(filename + \".tgz\", \"r:gz\") as tar:\n",
|
||||
" tar.extractall(path=os.path.dirname(filename))\n",
|
||||
"\n",
|
||||
"NUM_QUOTES = 5000\n",
|
||||
"documents = CSVLoader(filename, metadata_columns=[\"author\", \"category\"]).lazy_load()\n",
|
||||
"documents = list(\n",
|
||||
" itertools.islice(documents, NUM_QUOTES)\n",
|
||||
") # Allows us to slice an iterator"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content=\"quote: I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.\" metadata={'source': './quotes.csv', 'row': 0, 'author': 'Marilyn Monroe', 'category': 'attributed-no-source, best, life, love, mistakes, out-of-control, truth, worst'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(documents[0])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create your Embedder\n",
|
||||
"\n",
|
||||
"In this step, we use HuggingFaceEmbeddings and the \"all-MiniLM-L6-v2\" sentence transformer model to embed our documents so we can perform a vector search."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "60662fc2676a46a2ac48fbf30d9c85fe",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"modules.json: 0%| | 0.00/349 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "319412217d3944488f135c8bf8bca73b",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"config_sentence_transformers.json: 0%| | 0.00/116 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "eb020ec2e2f4486294f85c490ef4a387",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"README.md: 0%| | 0.00/10.7k [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "65d248263e4049bea4f6b554640a6aae",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"sentence_bert_config.json: 0%| | 0.00/53.0 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/opt/conda/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "c6b09a49fbd84c799ea28ace296406e3",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"config.json: 0%| | 0.00/612 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/opt/conda/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "7e649688c67544d5af6bdd883c47d315",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"model.safetensors: 0%| | 0.00/90.9M [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "de447c7e4df1485ead14efae1faf96d6",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"tokenizer_config.json: 0%| | 0.00/350 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "83ad1f289cd04f73aafca01a8e68e63b",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"vocab.txt: 0%| | 0.00/232k [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "2b612221e29e433cb50a54a6b838f5af",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"tokenizer.json: 0%| | 0.00/466k [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "1f5f0c29c58642478cd665731728dad0",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"special_tokens_map.json: 0%| | 0.00/112 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "dff1d16a5a6d4d20ac39adb5c9425cf6",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"1_Pooling/config.json: 0%| | 0.00/190 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from aerospike_vector_search.types import VectorDistanceMetric\n",
|
||||
"from langchain_community.embeddings import HuggingFaceEmbeddings\n",
|
||||
"\n",
|
||||
"MODEL_DIM = 384\n",
|
||||
"MODEL_DISTANCE_CALC = VectorDistanceMetric.COSINE\n",
|
||||
"embedder = HuggingFaceEmbeddings(model_name=\"all-MiniLM-L6-v2\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create an Aerospike Index and Embed Documents\n",
|
||||
"\n",
|
||||
"Before we add documents, we need to create an index in the Aerospike Database. In the example below, we use some convenience code that checks to see if the expected index already exists."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"quote-miniLM-L6-v2 does not exist. Creating index\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from aerospike_vector_search import AdminClient, Client, HostPort\n",
|
||||
"from aerospike_vector_search.types import VectorDistanceMetric\n",
|
||||
"from langchain_community.vectorstores import Aerospike\n",
|
||||
"\n",
|
||||
"# Here we are using the AVS host and port you configured earlier\n",
|
||||
"seed = HostPort(host=PROXIMUS_HOST, port=PROXIMUS_PORT)\n",
|
||||
"\n",
|
||||
"# The namespace of where to place our vectors. This should match the vector configured in your docstore.conf file.\n",
|
||||
"NAMESPACE = \"test\"\n",
|
||||
"\n",
|
||||
"# The name of our new index.\n",
|
||||
"INDEX_NAME = \"quote-miniLM-L6-v2\"\n",
|
||||
"\n",
|
||||
"# AVS needs to know which metadata key contains our vector when creating the index and inserting documents.\n",
|
||||
"VECTOR_KEY = \"vector\"\n",
|
||||
"\n",
|
||||
"client = Client(seeds=seed)\n",
|
||||
"admin_client = AdminClient(\n",
|
||||
" seeds=seed,\n",
|
||||
")\n",
|
||||
"index_exists = False\n",
|
||||
"\n",
|
||||
"# Check if the index already exists. If not, create it\n",
|
||||
"for index in admin_client.index_list():\n",
|
||||
" if index[\"id\"][\"namespace\"] == NAMESPACE and index[\"id\"][\"name\"] == INDEX_NAME:\n",
|
||||
" index_exists = True\n",
|
||||
" print(f\"{INDEX_NAME} already exists. Skipping creation\")\n",
|
||||
" break\n",
|
||||
"\n",
|
||||
"if not index_exists:\n",
|
||||
" print(f\"{INDEX_NAME} does not exist. Creating index\")\n",
|
||||
" admin_client.index_create(\n",
|
||||
" namespace=NAMESPACE,\n",
|
||||
" name=INDEX_NAME,\n",
|
||||
" vector_field=VECTOR_KEY,\n",
|
||||
" vector_distance_metric=MODEL_DISTANCE_CALC,\n",
|
||||
" dimensions=MODEL_DIM,\n",
|
||||
" index_meta_data={\n",
|
||||
" \"model\": \"miniLM-L6-v2\",\n",
|
||||
" \"date\": \"05/04/2024\",\n",
|
||||
" \"dim\": str(MODEL_DIM),\n",
|
||||
" \"distance\": \"cosine\",\n",
|
||||
" },\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"admin_client.close()\n",
|
||||
"\n",
|
||||
"docstore = Aerospike.from_documents(\n",
|
||||
" documents,\n",
|
||||
" embedder,\n",
|
||||
" client=client,\n",
|
||||
" namespace=NAMESPACE,\n",
|
||||
" vector_key=VECTOR_KEY,\n",
|
||||
" index_name=INDEX_NAME,\n",
|
||||
" distance_strategy=MODEL_DISTANCE_CALC,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Search the Documents\n",
|
||||
"Now that we have embedded our vectors, we can use vector search on our quotes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"~~~~ Document 0 ~~~~\n",
|
||||
"auto-generated id: f53589dd-e3e0-4f55-8214-766ca8dc082f\n",
|
||||
"author: Carl Sagan, Cosmos\n",
|
||||
"quote: The Cosmos is all that is or was or ever will be. Our feeblest contemplations of the Cosmos stir us -- there is a tingling in the spine, a catch in the voice, a faint sensation, as if a distant memory, of falling from a height. We know we are approaching the greatest of mysteries.\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n",
|
||||
"~~~~ Document 1 ~~~~\n",
|
||||
"auto-generated id: dde3e5d1-30b7-47b4-aab7-e319d14e1810\n",
|
||||
"author: Elizabeth Gilbert\n",
|
||||
"quote: The love that moves the sun and the other stars.\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n",
|
||||
"~~~~ Document 2 ~~~~\n",
|
||||
"auto-generated id: fd56575b-2091-45e7-91c1-9efff2fe5359\n",
|
||||
"author: Renee Ahdieh, The Rose & the Dagger\n",
|
||||
"quote: From the stars, to the stars.\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n",
|
||||
"~~~~ Document 3 ~~~~\n",
|
||||
"auto-generated id: 8567ed4e-885b-44a7-b993-e0caf422b3c9\n",
|
||||
"author: Dante Alighieri, Paradiso\n",
|
||||
"quote: Love, that moves the sun and the other stars\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n",
|
||||
"~~~~ Document 4 ~~~~\n",
|
||||
"auto-generated id: f868c25e-c54d-48cd-a5a8-14bf402f9ea8\n",
|
||||
"author: Thich Nhat Hanh, Teachings on Love\n",
|
||||
"quote: Through my love for you, I want to express my love for the whole cosmos, the whole of humanity, and all beings. By living with you, I want to learn to love everyone and all species. If I succeed in loving you, I will be able to love everyone and all species on Earth... This is the real message of love.\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"A quote about the beauty of the cosmos\"\n",
|
||||
"docs = docstore.similarity_search(\n",
|
||||
" query, k=5, index_name=INDEX_NAME, metadata_keys=[\"_id\", \"author\"]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def print_documents(docs):\n",
|
||||
" for i, doc in enumerate(docs):\n",
|
||||
" print(\"~~~~ Document\", i, \"~~~~\")\n",
|
||||
" print(\"auto-generated id:\", doc.metadata[\"_id\"])\n",
|
||||
" print(\"author: \", doc.metadata[\"author\"])\n",
|
||||
" print(doc.page_content)\n",
|
||||
" print(\"~~~~~~~~~~~~~~~~~~~~\\n\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"print_documents(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Embedding Additional Quotes as Text\n",
|
||||
"\n",
|
||||
"We can use `add_texts` to add additional quotes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"New IDs\n",
|
||||
"['972846bd-87ae-493b-8ba3-a3d023c03948', '8171122e-cbda-4eb7-a711-6625b120893b', '53b54409-ac19-4d90-b518-d7c40bf5ee5d']\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docstore = Aerospike(\n",
|
||||
" client,\n",
|
||||
" embedder,\n",
|
||||
" NAMESPACE,\n",
|
||||
" index_name=INDEX_NAME,\n",
|
||||
" vector_key=VECTOR_KEY,\n",
|
||||
" distance_strategy=MODEL_DISTANCE_CALC,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"ids = docstore.add_texts(\n",
|
||||
" [\n",
|
||||
" \"quote: Rebellions are built on hope.\",\n",
|
||||
" \"quote: Logic is the beginning of wisdom, not the end.\",\n",
|
||||
" \"quote: If wishes were fishes, we’d all cast nets.\",\n",
|
||||
" ],\n",
|
||||
" metadatas=[\n",
|
||||
" {\"author\": \"Jyn Erso, Rogue One\"},\n",
|
||||
" {\"author\": \"Spock, Star Trek\"},\n",
|
||||
" {\"author\": \"Frank Herbert, Dune\"},\n",
|
||||
" ],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(\"New IDs\")\n",
|
||||
"print(ids)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Search Documents Using Max Marginal Relevance Search\n",
|
||||
"\n",
|
||||
"We can use max marginal relevance search to find vectors that are similar to our query but dissimilar to each other. In this example, we create a retriever object using `as_retriever`, but this could be done just as easily by calling `docstore.max_marginal_relevance_search` directly. The `lambda_mult` search argument determines the diversity of our query response. 0 corresponds to maximum diversity and 1 to minimum diversity."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"~~~~ Document 0 ~~~~\n",
|
||||
"auto-generated id: 67d5b23f-b2d2-4872-80ad-5834ea08aa64\n",
|
||||
"author: John Grogan, Marley and Me: Life and Love With the World's Worst Dog\n",
|
||||
"quote: Such short little lives our pets have to spend with us, and they spend most of it waiting for us to come home each day. It is amazing how much love and laughter they bring into our lives and even how much closer we become with each other because of them.\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n",
|
||||
"~~~~ Document 1 ~~~~\n",
|
||||
"auto-generated id: a9b28eb0-a21c-45bf-9e60-ab2b80e988d8\n",
|
||||
"author: John Grogan, Marley and Me: Life and Love With the World's Worst Dog\n",
|
||||
"quote: Dogs are great. Bad dogs, if you can really call them that, are perhaps the greatest of them all.\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n",
|
||||
"~~~~ Document 2 ~~~~\n",
|
||||
"auto-generated id: ee7434c8-2551-4651-8a22-58514980fb4a\n",
|
||||
"author: Colleen Houck, Tiger's Curse\n",
|
||||
"quote: He then put both hands on the door on either side of my head and leaned in close, pinning me against it. I trembled like a downy rabbit caught in the clutches of a wolf. The wolf came closer. He bent his head and began nuzzling my cheek. The problem was…I wanted the wolf to devour me.\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n",
|
||||
"~~~~ Document 3 ~~~~\n",
|
||||
"auto-generated id: 9170804c-a155-473b-ab93-8a561dd48f91\n",
|
||||
"author: Ray Bradbury\n",
|
||||
"quote: Stuff your eyes with wonder,\" he said, \"live as if you'd drop dead in ten seconds. See the world. It's more fantastic than any dream made or paid for in factories. Ask no guarantees, ask for no security, there never was such an animal. And if there were, it would be related to the great sloth which hangs upside down in a tree all day every day, sleeping its life away. To hell with that,\" he said, \"shake the tree and knock the great sloth down on his ass.\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"A quote about our favorite four-legged pets\"\n",
|
||||
"retriever = docstore.as_retriever(\n",
|
||||
" search_type=\"mmr\", search_kwargs={\"fetch_k\": 20, \"lambda_mult\": 0.7}\n",
|
||||
")\n",
|
||||
"matched_docs = retriever.invoke(query)\n",
|
||||
"\n",
|
||||
"print_documents(matched_docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Search Documents with a Relevance Threshold\n",
|
||||
"\n",
|
||||
"Another useful feature is a similarity search with a relevance threshold. Generally, we only want results that are most similar to our query but also within some range of proximity. A relevance of 1 is most similar and a relevance of 0 is most dissimilar."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"~~~~ Document 0 ~~~~\n",
|
||||
"auto-generated id: 2c1d6ee1-b742-45ea-bed6-24a1f655c849\n",
|
||||
"author: Roy T. Bennett, The Light in the Heart\n",
|
||||
"quote: Never lose hope. Storms make people stronger and never last forever.\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n",
|
||||
"~~~~ Document 1 ~~~~\n",
|
||||
"auto-generated id: 5962c2cf-ffb5-4e03-9257-bdd630b5c7e9\n",
|
||||
"author: Roy T. Bennett, The Light in the Heart\n",
|
||||
"quote: Difficulties and adversities viciously force all their might on us and cause us to fall apart, but they are necessary elements of individual growth and reveal our true potential. We have got to endure and overcome them, and move forward. Never lose hope. Storms make people stronger and never last forever.\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n",
|
||||
"~~~~ Document 2 ~~~~\n",
|
||||
"auto-generated id: 3bbcc4ca-de89-4196-9a46-190a50bf6c47\n",
|
||||
"author: Vincent van Gogh, The Letters of Vincent van Gogh\n",
|
||||
"quote: There is peace even in the storm\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n",
|
||||
"~~~~ Document 3 ~~~~\n",
|
||||
"auto-generated id: 37d8cf02-fc2f-429d-b2b6-260a05286108\n",
|
||||
"author: Edwin Morgan, A Book of Lives\n",
|
||||
"quote: Valentine WeatherKiss me with rain on your eyelashes,come on, let us sway together,under the trees, and to hell with thunder.\n",
|
||||
"~~~~~~~~~~~~~~~~~~~~\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"A quote about stormy weather\"\n",
|
||||
"retriever = docstore.as_retriever(\n",
|
||||
" search_type=\"similarity_score_threshold\",\n",
|
||||
" search_kwargs={\n",
|
||||
" \"score_threshold\": 0.4\n",
|
||||
" }, # A greater value returns items with more relevance\n",
|
||||
")\n",
|
||||
"matched_docs = retriever.invoke(query)\n",
|
||||
"\n",
|
||||
"print_documents(matched_docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Clean up\n",
|
||||
"\n",
|
||||
"We need to make sure we close our client to release resources and clean up threads."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"client.close()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Ready. Set. Search!\n",
|
||||
"\n",
|
||||
"Now that you are up to speed with Aerospike Vector Search's LangChain integration, you have the power of the Aerospike Database and the LangChain ecosystem at your finger tips. Happy building!"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
Reference in New Issue
Block a user