mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-09 15:03:21 +00:00
community[minor]: added jaguar vector store (#14838)
Description: A new vector store Jaguar is being added. Class, test scripts, and documentation is added. Issue: None -- This is the first PR contributing to LangChain Dependencies: This depends on "pip install -U jaguardb-http-client" client http package Tag maintainer: @baskaryan, @eyurtsev, @hwchase1 Twitter handle: @workbot --------- Co-authored-by: JY <jyjy@jaguardb> Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
62
docs/docs/integrations/providers/jaguar.mdx
Normal file
62
docs/docs/integrations/providers/jaguar.mdx
Normal file
@@ -0,0 +1,62 @@
|
||||
# Jaguar
|
||||
|
||||
This page describes how to use Jaguar vector database within LangChain.
|
||||
It contains three sections: introduction, installation and setup, and Jaguar API.
|
||||
|
||||
|
||||
## Introduction
|
||||
|
||||
Jaguar vector database has the following characteristics:
|
||||
|
||||
1. It is a distributed vector database
|
||||
2. The “ZeroMove” feature of JaguarDB enables instant horizontal scalability
|
||||
3. Multimodal: embeddings, text, images, videos, PDFs, audio, time series, and geospatial
|
||||
4. All-masters: allows both parallel reads and writes
|
||||
5. Anomaly detection capabilities
|
||||
6. RAG support: combines LLM with proprietary and real-time data
|
||||
7. Shared metadata: sharing of metadata across multiple vector indexes
|
||||
8. Distance metrics: Euclidean, Cosine, InnerProduct, Manhatten, Chebyshev, Hamming, Jeccard, Minkowski
|
||||
|
||||
[Overview of Jaguar scalable vector database](http://www.jaguardb.com)
|
||||
|
||||
You can run JaguarDB in docker container; or download the software and run on-cloud or off-cloud.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- Install the JaguarDB on one host or multiple hosts
|
||||
- Install the Jaguar HTTP Gateway server on one host
|
||||
- Install the JaguarDB HTTP Client package
|
||||
|
||||
The steps are described in [Jaguar Documents](http://www.jaguardb.com/support.html)
|
||||
|
||||
Environment Variables in client programs:
|
||||
|
||||
export OPENAI_API_KEY="......"
|
||||
export JAGUAR_API_KEY="......"
|
||||
|
||||
|
||||
## Jaguar API
|
||||
|
||||
Together with LangChain, a Jaguar client class is provided by importing it in Python:
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores.jaguar import Jaguar
|
||||
```
|
||||
|
||||
Supported API functions of the Jaguar class are:
|
||||
|
||||
- `add_texts`
|
||||
- `add_documents`
|
||||
- `from_texts`
|
||||
- `from_documents`
|
||||
- `similarity_search`
|
||||
- `is_anomalous`
|
||||
- `create`
|
||||
- `delete`
|
||||
- `clear`
|
||||
- `drop`
|
||||
- `login`
|
||||
- `logout`
|
||||
|
||||
|
||||
For more details of the Jaguar API, please refer to [this notebook](/docs/integrations/vectorstores/jaguar)
|
246
docs/docs/integrations/retrievers/jaguar.ipynb
Normal file
246
docs/docs/integrations/retrievers/jaguar.ipynb
Normal file
@@ -0,0 +1,246 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "671e9ec1-fa00-4c92-a2fb-ceb142168ea9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Jaguar Vector Database\n",
|
||||
"\n",
|
||||
"1. It is a distributed vector database\n",
|
||||
"2. The “ZeroMove” feature of JaguarDB enables instant horizontal scalability\n",
|
||||
"3. Multimodal: embeddings, text, images, videos, PDFs, audio, time series, and geospatial\n",
|
||||
"4. All-masters: allows both parallel reads and writes\n",
|
||||
"5. Anomaly detection capabilities\n",
|
||||
"6. RAG support: combines LLM with proprietary and real-time data\n",
|
||||
"7. Shared metadata: sharing of metadata across multiple vector indexes\n",
|
||||
"8. Distance metrics: Euclidean, Cosine, InnerProduct, Manhatten, Chebyshev, Hamming, Jeccard, Minkowski"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1a87dc28-1344-4003-b31a-13e4cb71bf48",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"There are two requirements for running the examples in this file.\n",
|
||||
"1. You must install and set up the JaguarDB server and its HTTP gateway server.\n",
|
||||
" Please refer to the instructions in:\n",
|
||||
" [www.jaguardb.com](http://www.jaguardb.com)\n",
|
||||
"\n",
|
||||
"2. You must install the http client package for JaguarDB:\n",
|
||||
" ```\n",
|
||||
" pip install -U jaguardb-http-client\n",
|
||||
" ```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c7d56993-4809-4e42-a409-94d3a7305ad8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## RAG With Langchain\n",
|
||||
"\n",
|
||||
"This section demonstrates chatting with LLM together with Jaguar in the langchain software stack.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d62c2393-5c7c-4bb6-8367-c4389fa36a4e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain_community.vectorstores.jaguar import Jaguar\n",
|
||||
"\n",
|
||||
"\"\"\" \n",
|
||||
"Load a text file into a set of documents \n",
|
||||
"\"\"\"\n",
|
||||
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=300)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"Instantiate a Jaguar vector store\n",
|
||||
"\"\"\"\n",
|
||||
"### Jaguar HTTP endpoint\n",
|
||||
"url = \"http://192.168.5.88:8080/fwww/\"\n",
|
||||
"\n",
|
||||
"### Use OpenAI embedding model\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
"### Pod is a database for vectors\n",
|
||||
"pod = \"vdb\"\n",
|
||||
"\n",
|
||||
"### Vector store name\n",
|
||||
"store = \"langchain_rag_store\"\n",
|
||||
"\n",
|
||||
"### Vector index name\n",
|
||||
"vector_index = \"v\"\n",
|
||||
"\n",
|
||||
"### Type of the vector index\n",
|
||||
"# cosine: distance metric\n",
|
||||
"# fraction: embedding vectors are decimal numbers\n",
|
||||
"# float: values stored with floating-point numbers\n",
|
||||
"vector_type = \"cosine_fraction_float\"\n",
|
||||
"\n",
|
||||
"### Dimension of each embedding vector\n",
|
||||
"vector_dimension = 1536\n",
|
||||
"\n",
|
||||
"### Instantiate a Jaguar store object\n",
|
||||
"vectorstore = Jaguar(\n",
|
||||
" pod, store, vector_index, vector_type, vector_dimension, url, embeddings\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"Login must be performed to authorize the client.\n",
|
||||
"The environment variable JAGUAR_API_KEY or file $HOME/.jagrc\n",
|
||||
"should contain the API key for accessing JaguarDB servers.\n",
|
||||
"\"\"\"\n",
|
||||
"vectorstore.login()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"Create vector store on the JaguarDB database server.\n",
|
||||
"This should be done only once.\n",
|
||||
"\"\"\"\n",
|
||||
"# Extra metadata fields for the vector store\n",
|
||||
"metadata = \"category char(16)\"\n",
|
||||
"\n",
|
||||
"# Number of characters for the text field of the store\n",
|
||||
"text_size = 4096\n",
|
||||
"\n",
|
||||
"# Create a vector store on the server\n",
|
||||
"vectorstore.create(metadata, text_size)\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"Add the texts from the text splitter to our vectorstore\n",
|
||||
"\"\"\"\n",
|
||||
"vectorstore.add_documents(docs)\n",
|
||||
"\n",
|
||||
"\"\"\" Get the retriever object \"\"\"\n",
|
||||
"retriever = vectorstore.as_retriever()\n",
|
||||
"# retriever = vectorstore.as_retriever(search_kwargs={\"where\": \"m1='123' and m2='abc'\"})\n",
|
||||
"\n",
|
||||
"\"\"\" The retriever object can be used with LangChain and LLM \"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "11178867-d143-4a10-93bf-278f5f10dc1a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Interaction With Jaguar Vector Store\n",
|
||||
"\n",
|
||||
"Users can interact directly with the Jaguar vector store for similarity search and anomaly detection.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c9a53cb5-e298-4125-9ace-0d851198869a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain_community.vectorstores.jaguar import Jaguar\n",
|
||||
"\n",
|
||||
"# Instantiate a Jaguar vector store object\n",
|
||||
"url = \"http://192.168.3.88:8080/fwww/\"\n",
|
||||
"pod = \"vdb\"\n",
|
||||
"store = \"langchain_test_store\"\n",
|
||||
"vector_index = \"v\"\n",
|
||||
"vector_type = \"cosine_fraction_float\"\n",
|
||||
"vector_dimension = 10\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"vectorstore = Jaguar(\n",
|
||||
" pod, store, vector_index, vector_type, vector_dimension, url, embeddings\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Login for authorization\n",
|
||||
"vectorstore.login()\n",
|
||||
"\n",
|
||||
"# Create the vector store with two metadata fields\n",
|
||||
"# This needs to be run only once.\n",
|
||||
"metadata_str = \"author char(32), category char(16)\"\n",
|
||||
"vectorstore.create(metadata_str, 1024)\n",
|
||||
"\n",
|
||||
"# Add a list of texts\n",
|
||||
"texts = [\"foo\", \"bar\", \"baz\"]\n",
|
||||
"metadatas = [\n",
|
||||
" {\"author\": \"Adam\", \"category\": \"Music\"},\n",
|
||||
" {\"author\": \"Eve\", \"category\": \"Music\"},\n",
|
||||
" {\"author\": \"John\", \"category\": \"History\"},\n",
|
||||
"]\n",
|
||||
"ids = vectorstore.add_texts(texts=texts, metadatas=metadatas)\n",
|
||||
"\n",
|
||||
"# Search similar text\n",
|
||||
"output = vectorstore.similarity_search(\n",
|
||||
" query=\"foo\",\n",
|
||||
" k=1,\n",
|
||||
" metadatas=[\"author\", \"category\"],\n",
|
||||
")\n",
|
||||
"assert output[0].page_content == \"foo\"\n",
|
||||
"assert output[0].metadata[\"author\"] == \"Adam\"\n",
|
||||
"assert output[0].metadata[\"category\"] == \"Music\"\n",
|
||||
"assert len(output) == 1\n",
|
||||
"\n",
|
||||
"# Search with filtering (where)\n",
|
||||
"where = \"author='Eve'\"\n",
|
||||
"output = vectorstore.similarity_search(\n",
|
||||
" query=\"foo\",\n",
|
||||
" k=3,\n",
|
||||
" fetch_k=9,\n",
|
||||
" where=where,\n",
|
||||
" metadatas=[\"author\", \"category\"],\n",
|
||||
")\n",
|
||||
"assert output[0].page_content == \"bar\"\n",
|
||||
"assert output[0].metadata[\"author\"] == \"Eve\"\n",
|
||||
"assert output[0].metadata[\"category\"] == \"Music\"\n",
|
||||
"assert len(output) == 1\n",
|
||||
"\n",
|
||||
"# Anomaly detection\n",
|
||||
"result = vectorstore.is_anomalous(\n",
|
||||
" query=\"dogs can jump high\",\n",
|
||||
")\n",
|
||||
"assert result is False\n",
|
||||
"\n",
|
||||
"# Remove all data in the store\n",
|
||||
"vectorstore.clear()\n",
|
||||
"assert vectorstore.count() == 0\n",
|
||||
"\n",
|
||||
"# Remove the store completely\n",
|
||||
"vectorstore.drop()\n",
|
||||
"\n",
|
||||
"# Logout\n",
|
||||
"vectorstore.logout()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
271
docs/docs/integrations/vectorstores/jaguar.ipynb
Normal file
271
docs/docs/integrations/vectorstores/jaguar.ipynb
Normal file
@@ -0,0 +1,271 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "671e9ec1-fa00-4c92-a2fb-ceb142168ea9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Jaguar Vector Database\n",
|
||||
"\n",
|
||||
"1. It is a distributed vector database\n",
|
||||
"2. The “ZeroMove” feature of JaguarDB enables instant horizontal scalability\n",
|
||||
"3. Multimodal: embeddings, text, images, videos, PDFs, audio, time series, and geospatial\n",
|
||||
"4. All-masters: allows both parallel reads and writes\n",
|
||||
"5. Anomaly detection capabilities\n",
|
||||
"6. RAG support: combines LLM with proprietary and real-time data\n",
|
||||
"7. Shared metadata: sharing of metadata across multiple vector indexes\n",
|
||||
"8. Distance metrics: Euclidean, Cosine, InnerProduct, Manhatten, Chebyshev, Hamming, Jeccard, Minkowski"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1a87dc28-1344-4003-b31a-13e4cb71bf48",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"There are two requirements for running the examples in this file.\n",
|
||||
"1. You must install and set up the JaguarDB server and its HTTP gateway server.\n",
|
||||
" Please refer to the instructions in:\n",
|
||||
" [www.jaguardb.com](http://www.jaguardb.com)\n",
|
||||
"\n",
|
||||
"2. You must install the http client package for JaguarDB:\n",
|
||||
" ```\n",
|
||||
" pip install -U jaguardb-http-client\n",
|
||||
" ```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c7d56993-4809-4e42-a409-94d3a7305ad8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## RAG With Langchain\n",
|
||||
"\n",
|
||||
"This section demonstrates chatting with LLM together with Jaguar in the langchain software stack.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d62c2393-5c7c-4bb6-8367-c4389fa36a4e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import RetrievalQAWithSourcesChain\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser\n",
|
||||
"from langchain.schema.runnable import RunnablePassthrough\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain_community.vectorstores.jaguar import Jaguar\n",
|
||||
"\n",
|
||||
"\"\"\" \n",
|
||||
"Load a text file into a set of documents \n",
|
||||
"\"\"\"\n",
|
||||
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=300)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"Instantiate a Jaguar vector store\n",
|
||||
"\"\"\"\n",
|
||||
"### Jaguar HTTP endpoint\n",
|
||||
"url = \"http://192.168.5.88:8080/fwww/\"\n",
|
||||
"\n",
|
||||
"### Use OpenAI embedding model\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
"### Pod is a database for vectors\n",
|
||||
"pod = \"vdb\"\n",
|
||||
"\n",
|
||||
"### Vector store name\n",
|
||||
"store = \"langchain_rag_store\"\n",
|
||||
"\n",
|
||||
"### Vector index name\n",
|
||||
"vector_index = \"v\"\n",
|
||||
"\n",
|
||||
"### Type of the vector index\n",
|
||||
"# cosine: distance metric\n",
|
||||
"# fraction: embedding vectors are decimal numbers\n",
|
||||
"# float: values stored with floating-point numbers\n",
|
||||
"vector_type = \"cosine_fraction_float\"\n",
|
||||
"\n",
|
||||
"### Dimension of each embedding vector\n",
|
||||
"vector_dimension = 1536\n",
|
||||
"\n",
|
||||
"### Instantiate a Jaguar store object\n",
|
||||
"vectorstore = Jaguar(\n",
|
||||
" pod, store, vector_index, vector_type, vector_dimension, url, embeddings\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"Login must be performed to authorize the client.\n",
|
||||
"The environment variable JAGUAR_API_KEY or file $HOME/.jagrc\n",
|
||||
"should contain the API key for accessing JaguarDB servers.\n",
|
||||
"\"\"\"\n",
|
||||
"vectorstore.login()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"Create vector store on the JaguarDB database server.\n",
|
||||
"This should be done only once.\n",
|
||||
"\"\"\"\n",
|
||||
"# Extra metadata fields for the vector store\n",
|
||||
"metadata = \"category char(16)\"\n",
|
||||
"\n",
|
||||
"# Number of characters for the text field of the store\n",
|
||||
"text_size = 4096\n",
|
||||
"\n",
|
||||
"# Create a vector store on the server\n",
|
||||
"vectorstore.create(metadata, text_size)\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"Add the texts from the text splitter to our vectorstore\n",
|
||||
"\"\"\"\n",
|
||||
"vectorstore.add_documents(docs)\n",
|
||||
"\n",
|
||||
"\"\"\" Get the retriever object \"\"\"\n",
|
||||
"retriever = vectorstore.as_retriever()\n",
|
||||
"# retriever = vectorstore.as_retriever(search_kwargs={\"where\": \"m1='123' and m2='abc'\"})\n",
|
||||
"\n",
|
||||
"template = \"\"\"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\n",
|
||||
"Question: {question}\n",
|
||||
"Context: {context}\n",
|
||||
"Answer:\n",
|
||||
"\"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(template)\n",
|
||||
"\n",
|
||||
"\"\"\" Obtain a Large Language Model \"\"\"\n",
|
||||
"LLM = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
|
||||
"\n",
|
||||
"\"\"\" Create a chain for the RAG flow \"\"\"\n",
|
||||
"rag_chain = (\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
|
||||
" | prompt\n",
|
||||
" | LLM\n",
|
||||
" | StrOutputParser()\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"resp = rag_chain.invoke(\"What did the president say about Justice Breyer?\")\n",
|
||||
"print(resp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "11178867-d143-4a10-93bf-278f5f10dc1a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Interaction With Jaguar Vector Store\n",
|
||||
"\n",
|
||||
"Users can interact directly with the Jaguar vector store for similarity search and anomaly detection.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c9a53cb5-e298-4125-9ace-0d851198869a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain_community.vectorstores.jaguar import Jaguar\n",
|
||||
"\n",
|
||||
"# Instantiate a Jaguar vector store object\n",
|
||||
"url = \"http://192.168.3.88:8080/fwww/\"\n",
|
||||
"pod = \"vdb\"\n",
|
||||
"store = \"langchain_test_store\"\n",
|
||||
"vector_index = \"v\"\n",
|
||||
"vector_type = \"cosine_fraction_float\"\n",
|
||||
"vector_dimension = 10\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"vectorstore = Jaguar(\n",
|
||||
" pod, store, vector_index, vector_type, vector_dimension, url, embeddings\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Login for authorization\n",
|
||||
"vectorstore.login()\n",
|
||||
"\n",
|
||||
"# Create the vector store with two metadata fields\n",
|
||||
"# This needs to be run only once.\n",
|
||||
"metadata_str = \"author char(32), category char(16)\"\n",
|
||||
"vectorstore.create(metadata_str, 1024)\n",
|
||||
"\n",
|
||||
"# Add a list of texts\n",
|
||||
"texts = [\"foo\", \"bar\", \"baz\"]\n",
|
||||
"metadatas = [\n",
|
||||
" {\"author\": \"Adam\", \"category\": \"Music\"},\n",
|
||||
" {\"author\": \"Eve\", \"category\": \"Music\"},\n",
|
||||
" {\"author\": \"John\", \"category\": \"History\"},\n",
|
||||
"]\n",
|
||||
"ids = vectorstore.add_texts(texts=texts, metadatas=metadatas)\n",
|
||||
"\n",
|
||||
"# Search similar text\n",
|
||||
"output = vectorstore.similarity_search(\n",
|
||||
" query=\"foo\",\n",
|
||||
" k=1,\n",
|
||||
" metadatas=[\"author\", \"category\"],\n",
|
||||
")\n",
|
||||
"assert output[0].page_content == \"foo\"\n",
|
||||
"assert output[0].metadata[\"author\"] == \"Adam\"\n",
|
||||
"assert output[0].metadata[\"category\"] == \"Music\"\n",
|
||||
"assert len(output) == 1\n",
|
||||
"\n",
|
||||
"# Search with filtering (where)\n",
|
||||
"where = \"author='Eve'\"\n",
|
||||
"output = vectorstore.similarity_search(\n",
|
||||
" query=\"foo\",\n",
|
||||
" k=3,\n",
|
||||
" fetch_k=9,\n",
|
||||
" where=where,\n",
|
||||
" metadatas=[\"author\", \"category\"],\n",
|
||||
")\n",
|
||||
"assert output[0].page_content == \"bar\"\n",
|
||||
"assert output[0].metadata[\"author\"] == \"Eve\"\n",
|
||||
"assert output[0].metadata[\"category\"] == \"Music\"\n",
|
||||
"assert len(output) == 1\n",
|
||||
"\n",
|
||||
"# Anomaly detection\n",
|
||||
"result = vectorstore.is_anomalous(\n",
|
||||
" query=\"dogs can jump high\",\n",
|
||||
")\n",
|
||||
"assert result is False\n",
|
||||
"\n",
|
||||
"# Remove all data in the store\n",
|
||||
"vectorstore.clear()\n",
|
||||
"assert vectorstore.count() == 0\n",
|
||||
"\n",
|
||||
"# Remove the store completely\n",
|
||||
"vectorstore.drop()\n",
|
||||
"\n",
|
||||
"# Logout\n",
|
||||
"vectorstore.logout()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
Reference in New Issue
Block a user