mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-09 23:12:38 +00:00
Added AwaDB vector store, which is a wrapper over the AwaDB, that can be used as a vector storage and has an efficient similarity search. Added integration tests for the vector store Added jupyter notebook with the example Delete a unneeded empty file and resolve the conflict(https://github.com/hwchase17/langchain/pull/5886) Please check, Thanks! @dev2049 @hwchase17 --------- <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> --------- Co-authored-by: ljeagle <vincent_jieli@yeah.net> Co-authored-by: vincent <awadb.vincent@gmail.com>
This commit is contained in:
21
docs/integrations/awadb.md
Normal file
21
docs/integrations/awadb.md
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
# AwaDB
|
||||||
|
|
||||||
|
>[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.
|
||||||
|
|
||||||
|
## Installation and Setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install awadb
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## VectorStore
|
||||||
|
|
||||||
|
There exists a wrapper around AwaDB vector databases, allowing you to use it as a vectorstore,
|
||||||
|
whether for semantic search or example selection.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from langchain.vectorstores import AwaDB
|
||||||
|
```
|
||||||
|
|
||||||
|
For a more detailed walkthrough of the AwaDB wrapper, see [this notebook](../modules/indexes/vectorstores/examples/awadb.ipynb)
|
194
docs/modules/indexes/vectorstores/examples/awadb.ipynb
Normal file
194
docs/modules/indexes/vectorstores/examples/awadb.ipynb
Normal file
@@ -0,0 +1,194 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "833c4789",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# AwaDB\n",
|
||||||
|
"[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.\n",
|
||||||
|
"This notebook shows how to use functionality related to the AwaDB."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "252930ea",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"!pip install awadb"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "f2b71a47",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||||
|
"from langchain.vectorstores import AwaDB\n",
|
||||||
|
"from langchain.document_loaders import TextLoader"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "49be0bac",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||||
|
"documents = loader.load()\n",
|
||||||
|
"text_splitter = CharacterTextSplitter(chunk_size= 100, chunk_overlap=0)\n",
|
||||||
|
"docs = text_splitter.split_documents(documents)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "18714278",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"db = AwaDB.from_documents(docs)\n",
|
||||||
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||||
|
"docs = db.similarity_search(query)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "62b7a4c5",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(docs[0].page_content)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "a9b4be48",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "87fec6b5",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Similarity search with score"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "17231924",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The returned distance score is between 0-1. 0 is dissimilar, 1 is the most similar"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "f40ddae1",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"docs = db.similarity_search_with_score(query)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "f0045583",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(docs[0])"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "8c2da99d",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"(Document(page_content='And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}), 0.561813814013747)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "0b49fb59",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Restore the table created and added data before"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "1bfa6e25",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"AwaDB automatically persists added document data"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "2a0f3b35",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"If you can restore the table you created and added before, you can just do this as below:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "1fd4b5b0",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"awadb_client = awadb.Client()\n",
|
||||||
|
"ret = awadb_client.Load('langchain_awadb')\n",
|
||||||
|
"if ret : print('awadb load table success')\n",
|
||||||
|
"else:\n",
|
||||||
|
" print('awadb load table failed')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "5ae9a9dd",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"awadb load table success"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.11.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
@@ -2,6 +2,7 @@
|
|||||||
from langchain.vectorstores.analyticdb import AnalyticDB
|
from langchain.vectorstores.analyticdb import AnalyticDB
|
||||||
from langchain.vectorstores.annoy import Annoy
|
from langchain.vectorstores.annoy import Annoy
|
||||||
from langchain.vectorstores.atlas import AtlasDB
|
from langchain.vectorstores.atlas import AtlasDB
|
||||||
|
from langchain.vectorstores.awadb import AwaDB
|
||||||
from langchain.vectorstores.base import VectorStore
|
from langchain.vectorstores.base import VectorStore
|
||||||
from langchain.vectorstores.chroma import Chroma
|
from langchain.vectorstores.chroma import Chroma
|
||||||
from langchain.vectorstores.clickhouse import Clickhouse, ClickhouseSettings
|
from langchain.vectorstores.clickhouse import Clickhouse, ClickhouseSettings
|
||||||
@@ -60,4 +61,5 @@ __all__ = [
|
|||||||
"ClickhouseSettings",
|
"ClickhouseSettings",
|
||||||
"Tigris",
|
"Tigris",
|
||||||
"MatchingEngine",
|
"MatchingEngine",
|
||||||
|
"AwaDB",
|
||||||
]
|
]
|
||||||
|
284
langchain/vectorstores/awadb.py
Normal file
284
langchain/vectorstores/awadb.py
Normal file
@@ -0,0 +1,284 @@
|
|||||||
|
"""Wrapper around AwaDB for embedding vectors"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
from typing import TYPE_CHECKING, Any, Iterable, List, Optional, Tuple, Type
|
||||||
|
|
||||||
|
from langchain.docstore.document import Document
|
||||||
|
from langchain.embeddings.base import Embeddings
|
||||||
|
from langchain.vectorstores.base import VectorStore
|
||||||
|
|
||||||
|
# from pydantic import BaseModel, Field, root_validator
|
||||||
|
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
import awadb
|
||||||
|
|
||||||
|
logger = logging.getLogger()
|
||||||
|
DEFAULT_TOPN = 4
|
||||||
|
|
||||||
|
|
||||||
|
class AwaDB(VectorStore):
|
||||||
|
"""Interface implemented by AwaDB vector stores."""
|
||||||
|
|
||||||
|
_DEFAULT_TABLE_NAME = "langchain_awadb"
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
table_name: str = _DEFAULT_TABLE_NAME,
|
||||||
|
embedding_model: Optional[Embeddings] = None,
|
||||||
|
log_and_data_dir: Optional[str] = None,
|
||||||
|
client: Optional[awadb.Client] = None,
|
||||||
|
) -> None:
|
||||||
|
"""Initialize with AwaDB client."""
|
||||||
|
|
||||||
|
try:
|
||||||
|
import awadb
|
||||||
|
except ImportError:
|
||||||
|
raise ValueError(
|
||||||
|
"Could not import awadb python package. "
|
||||||
|
"Please install it with `pip install awadb`."
|
||||||
|
)
|
||||||
|
|
||||||
|
if client is not None:
|
||||||
|
self.awadb_client = client
|
||||||
|
else:
|
||||||
|
if log_and_data_dir is not None:
|
||||||
|
self.awadb_client = awadb.Client(log_and_data_dir)
|
||||||
|
else:
|
||||||
|
self.awadb_client = awadb.Client()
|
||||||
|
|
||||||
|
self.awadb_client.Create(table_name)
|
||||||
|
if embedding_model is not None:
|
||||||
|
self.embedding_model = embedding_model
|
||||||
|
|
||||||
|
self.added_doc_count = 0
|
||||||
|
|
||||||
|
def add_texts(
|
||||||
|
self,
|
||||||
|
texts: Iterable[str],
|
||||||
|
metadatas: Optional[List[dict]] = None,
|
||||||
|
**kwargs: Any,
|
||||||
|
) -> List[str]:
|
||||||
|
"""Run more texts through the embeddings and add to the vectorstore.
|
||||||
|
Args:
|
||||||
|
texts: Iterable of strings to add to the vectorstore.
|
||||||
|
metadatas: Optional list of metadatas associated with the texts.
|
||||||
|
kwargs: vectorstore specific parameters
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of ids from adding the texts into the vectorstore.
|
||||||
|
"""
|
||||||
|
if self.awadb_client is None:
|
||||||
|
raise ValueError("AwaDB client is None!!!")
|
||||||
|
|
||||||
|
embeddings = None
|
||||||
|
if self.embedding_model is not None:
|
||||||
|
embeddings = self.embedding_model.embed_documents(list(texts))
|
||||||
|
added_results: List[str] = []
|
||||||
|
doc_no = 0
|
||||||
|
for text in texts:
|
||||||
|
doc: List[Any] = []
|
||||||
|
if embeddings is not None:
|
||||||
|
doc.append(text)
|
||||||
|
doc.append(embeddings[doc_no])
|
||||||
|
else:
|
||||||
|
dict_tmp = {}
|
||||||
|
dict_tmp["embedding_text"] = text
|
||||||
|
doc.append(dict_tmp)
|
||||||
|
|
||||||
|
if metadatas is not None:
|
||||||
|
if doc_no < metadatas.__len__():
|
||||||
|
doc.append(metadatas[doc_no])
|
||||||
|
self.awadb_client.Add(doc)
|
||||||
|
added_results.append(str(self.added_doc_count))
|
||||||
|
|
||||||
|
doc_no = doc_no + 1
|
||||||
|
self.added_doc_count = self.added_doc_count + 1
|
||||||
|
|
||||||
|
return added_results
|
||||||
|
|
||||||
|
def load_local(
|
||||||
|
self,
|
||||||
|
table_name: str = _DEFAULT_TABLE_NAME,
|
||||||
|
**kwargs: Any,
|
||||||
|
) -> bool:
|
||||||
|
if self.awadb_client is None:
|
||||||
|
raise ValueError("AwaDB client is None!!!")
|
||||||
|
|
||||||
|
return self.awadb_client.Load(table_name)
|
||||||
|
|
||||||
|
def similarity_search(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
k: int = DEFAULT_TOPN,
|
||||||
|
**kwargs: Any,
|
||||||
|
) -> List[Document]:
|
||||||
|
"""Return docs most similar to query."""
|
||||||
|
if self.awadb_client is None:
|
||||||
|
raise ValueError("AwaDB client is None!!!")
|
||||||
|
|
||||||
|
embedding = None
|
||||||
|
if self.embedding_model is not None:
|
||||||
|
embedding = self.embedding_model.embed_query(query)
|
||||||
|
|
||||||
|
return self.similarity_search_by_vector(embedding, k)
|
||||||
|
|
||||||
|
def similarity_search_with_score(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
k: int = DEFAULT_TOPN,
|
||||||
|
**kwargs: Any,
|
||||||
|
) -> List[Tuple[Document, float]]:
|
||||||
|
"""Return docs and relevance scores, normalized on a scale from 0 to 1.
|
||||||
|
|
||||||
|
0 is dissimilar, 1 is most similar.
|
||||||
|
"""
|
||||||
|
|
||||||
|
if self.awadb_client is None:
|
||||||
|
raise ValueError("AwaDB client is None!!!")
|
||||||
|
|
||||||
|
embedding = None
|
||||||
|
if self.embedding_model is not None:
|
||||||
|
embedding = self.embedding_model.embed_query(query)
|
||||||
|
|
||||||
|
show_results = self.awadb_client.Search(embedding, k)
|
||||||
|
|
||||||
|
results: List[Tuple[Document, float]] = []
|
||||||
|
|
||||||
|
if show_results.__len__() == 0:
|
||||||
|
return results
|
||||||
|
|
||||||
|
scores: List[float] = []
|
||||||
|
retrieval_docs = self.similarity_search_by_vector(embedding, k, scores)
|
||||||
|
|
||||||
|
L2_Norm = 0.0
|
||||||
|
for score in scores:
|
||||||
|
L2_Norm = L2_Norm + score * score
|
||||||
|
|
||||||
|
L2_Norm = pow(L2_Norm, 0.5)
|
||||||
|
doc_no = 0
|
||||||
|
for doc in retrieval_docs:
|
||||||
|
doc_tuple = (doc, 1 - scores[doc_no] / L2_Norm)
|
||||||
|
results.append(doc_tuple)
|
||||||
|
doc_no = doc_no + 1
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
def similarity_search_with_relevance_scores(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
k: int = DEFAULT_TOPN,
|
||||||
|
**kwargs: Any,
|
||||||
|
) -> List[Tuple[Document, float]]:
|
||||||
|
"""Return docs and relevance scores, normalized on a scale from 0 to 1.
|
||||||
|
|
||||||
|
0 is dissimilar, 1 is most similar.
|
||||||
|
"""
|
||||||
|
|
||||||
|
if self.awadb_client is None:
|
||||||
|
raise ValueError("AwaDB client is None!!!")
|
||||||
|
|
||||||
|
embedding = None
|
||||||
|
if self.embedding_model is not None:
|
||||||
|
embedding = self.embedding_model.embed_query(query)
|
||||||
|
|
||||||
|
show_results = self.awadb_client.Search(embedding, k)
|
||||||
|
|
||||||
|
results: List[Tuple[Document, float]] = []
|
||||||
|
|
||||||
|
if show_results.__len__() == 0:
|
||||||
|
return results
|
||||||
|
|
||||||
|
scores: List[float] = []
|
||||||
|
retrieval_docs = self.similarity_search_by_vector(embedding, k, scores)
|
||||||
|
|
||||||
|
L2_Norm = 0.0
|
||||||
|
for score in scores:
|
||||||
|
L2_Norm = L2_Norm + score * score
|
||||||
|
|
||||||
|
L2_Norm = pow(L2_Norm, 0.5)
|
||||||
|
doc_no = 0
|
||||||
|
for doc in retrieval_docs:
|
||||||
|
doc_tuple = (doc, 1 - scores[doc_no] / L2_Norm)
|
||||||
|
results.append(doc_tuple)
|
||||||
|
doc_no = doc_no + 1
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
def similarity_search_by_vector(
|
||||||
|
self,
|
||||||
|
embedding: List[float],
|
||||||
|
k: int = DEFAULT_TOPN,
|
||||||
|
scores: Optional[list] = None,
|
||||||
|
**kwargs: Any,
|
||||||
|
) -> List[Document]:
|
||||||
|
"""Return docs most similar to embedding vector.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
embedding: Embedding to look up documents similar to.
|
||||||
|
k: Number of Documents to return. Defaults to 4.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of Documents most similar to the query vector.
|
||||||
|
"""
|
||||||
|
|
||||||
|
if self.awadb_client is None:
|
||||||
|
raise ValueError("AwaDB client is None!!!")
|
||||||
|
|
||||||
|
show_results = self.awadb_client.Search(embedding, k)
|
||||||
|
|
||||||
|
results: List[Document] = []
|
||||||
|
|
||||||
|
if show_results.__len__() == 0:
|
||||||
|
return results
|
||||||
|
|
||||||
|
for item_detail in show_results[0]["ResultItems"]:
|
||||||
|
content = ""
|
||||||
|
meta_data = {}
|
||||||
|
for item_key in item_detail:
|
||||||
|
if item_key == "Field@0": # text for the document
|
||||||
|
content = item_detail[item_key]
|
||||||
|
elif item_key == "Field@1": # embedding field for the document
|
||||||
|
continue
|
||||||
|
elif item_key == "score": # L2 distance
|
||||||
|
if scores is not None:
|
||||||
|
score = item_detail[item_key]
|
||||||
|
scores.append(score)
|
||||||
|
else:
|
||||||
|
meta_data[item_key] = item_detail[item_key]
|
||||||
|
results.append(Document(page_content=content, metadata=meta_data))
|
||||||
|
return results
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_texts(
|
||||||
|
cls: Type[AwaDB],
|
||||||
|
texts: List[str],
|
||||||
|
embedding: Optional[Embeddings] = None,
|
||||||
|
metadatas: Optional[List[dict]] = None,
|
||||||
|
table_name: str = _DEFAULT_TABLE_NAME,
|
||||||
|
logging_and_data_dir: Optional[str] = None,
|
||||||
|
client: Optional[awadb.Client] = None,
|
||||||
|
**kwargs: Any,
|
||||||
|
) -> AwaDB:
|
||||||
|
"""Create an AwaDB vectorstore from a raw documents.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
texts (List[str]): List of texts to add to the table.
|
||||||
|
embedding (Optional[Embeddings]): Embedding function. Defaults to None.
|
||||||
|
metadatas (Optional[List[dict]]): List of metadatas. Defaults to None.
|
||||||
|
table_name (str): Name of the table to create.
|
||||||
|
logging_and_data_dir (Optional[str]): Directory of logging and persistence.
|
||||||
|
client (Optional[awadb.Client]): AwaDB client
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
AwaDB: AwaDB vectorstore.
|
||||||
|
"""
|
||||||
|
awadb_client = cls(
|
||||||
|
table_name=table_name,
|
||||||
|
embedding_model=embedding,
|
||||||
|
log_and_data_dir=logging_and_data_dir,
|
||||||
|
client=client,
|
||||||
|
)
|
||||||
|
awadb_client.add_texts(texts=texts, metadatas=metadatas)
|
||||||
|
return awadb_client
|
61
poetry.lock
generated
61
poetry.lock
generated
@@ -570,6 +570,26 @@ dev = ["coverage (>=5,<6)", "flake8 (>=3,<4)", "pytest (>=6,<7)", "sphinx-copybu
|
|||||||
docs = ["sphinx-copybutton (>=0.4,<0.5)", "sphinx-rtd-theme (>=1.0,<2.0)", "sphinx-tabs (>=3,<4)", "sphinxcontrib-mermaid (>=0.7,<0.8)"]
|
docs = ["sphinx-copybutton (>=0.4,<0.5)", "sphinx-rtd-theme (>=1.0,<2.0)", "sphinx-tabs (>=3,<4)", "sphinxcontrib-mermaid (>=0.7,<0.8)"]
|
||||||
test = ["coverage (>=5,<6)", "pytest (>=6,<7)"]
|
test = ["coverage (>=5,<6)", "pytest (>=6,<7)"]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "awadb"
|
||||||
|
version = "0.3.2"
|
||||||
|
description = "The AI Native database for embedding vectors"
|
||||||
|
category = "main"
|
||||||
|
optional = true
|
||||||
|
python-versions = ">=3.6"
|
||||||
|
files = [
|
||||||
|
{file = "awadb-0.3.2-cp310-cp310-manylinux1_x86_64.whl", hash = "sha256:f3ce3b066198782fa413f452c56001c58ebec71a1e1dca0eee68f73321ba15a9"},
|
||||||
|
{file = "awadb-0.3.2-cp311-cp311-macosx_10_13_universal2.whl", hash = "sha256:c96b5e263c32b2563b1fa027035bdcf50540808ad303071cc1aed3471c3c39b7"},
|
||||||
|
{file = "awadb-0.3.2-cp311-cp311-manylinux1_x86_64.whl", hash = "sha256:3e43b5a74753261857d0b146543a4620e00938833181259f138f07457fa84812"},
|
||||||
|
{file = "awadb-0.3.2-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:6330b4d18a814c1562113b3b7897db629c2ac9b5818236ead0fc5f3445b6b7fb"},
|
||||||
|
{file = "awadb-0.3.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:82b4e61cc905339868a9f833d0988098f56411b42e0f8dd571aec7c8d6a3f1fa"},
|
||||||
|
{file = "awadb-0.3.2-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:5efaa93d69c467f16ec4f65ed250ec26015781826c0d059c8a54613a5d3e2c3e"},
|
||||||
|
{file = "awadb-0.3.2-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:7be0811550d72f49018e4790d290cf521f92ffa84d65ef1073e621f225d142ec"},
|
||||||
|
]
|
||||||
|
|
||||||
|
[package.extras]
|
||||||
|
test = ["pytest (>=6.0)"]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "azure-ai-formrecognizer"
|
name = "azure-ai-formrecognizer"
|
||||||
version = "3.2.1"
|
version = "3.2.1"
|
||||||
@@ -6030,14 +6050,51 @@ optional = true
|
|||||||
python-versions = ">=3.7"
|
python-versions = ">=3.7"
|
||||||
files = [
|
files = [
|
||||||
{file = "orjson-3.9.1-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:c4434b7b786fdc394b95d029fb99949d7c2b05bbd4bf5cb5e3906be96ffeee3b"},
|
{file = "orjson-3.9.1-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:c4434b7b786fdc394b95d029fb99949d7c2b05bbd4bf5cb5e3906be96ffeee3b"},
|
||||||
|
{file = "orjson-3.9.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:09faf14f74ed47e773fa56833be118e04aa534956f661eb491522970b7478e3b"},
|
||||||
|
{file = "orjson-3.9.1-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:503eb86a8d53a187fe66aa80c69295a3ca35475804da89a9547e4fce5f803822"},
|
||||||
|
{file = "orjson-3.9.1-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:20f2804b5a1dbd3609c086041bd243519224d47716efd7429db6c03ed28b7cc3"},
|
||||||
|
{file = "orjson-3.9.1-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0fd828e0656615a711c4cc4da70f3cac142e66a6703ba876c20156a14e28e3fa"},
|
||||||
|
{file = "orjson-3.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ec53d648176f873203b9c700a0abacab33ca1ab595066e9d616f98cdc56f4434"},
|
||||||
|
{file = "orjson-3.9.1-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:e186ae76b0d97c505500664193ddf508c13c1e675d9b25f1f4414a7606100da6"},
|
||||||
|
{file = "orjson-3.9.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:d4edee78503016f4df30aeede0d999b3cb11fb56f47e9db0e487bce0aaca9285"},
|
||||||
|
{file = "orjson-3.9.1-cp310-none-win_amd64.whl", hash = "sha256:a4cc5d21e68af982d9a2528ac61e604f092c60eed27aef3324969c68f182ec7e"},
|
||||||
{file = "orjson-3.9.1-cp311-cp311-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:761b6efd33c49de20dd73ce64cc59da62c0dab10aa6015f582680e0663cc792c"},
|
{file = "orjson-3.9.1-cp311-cp311-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:761b6efd33c49de20dd73ce64cc59da62c0dab10aa6015f582680e0663cc792c"},
|
||||||
|
{file = "orjson-3.9.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:31229f9d0b8dc2ef7ee7e4393f2e4433a28e16582d4b25afbfccc9d68dc768f8"},
|
||||||
|
{file = "orjson-3.9.1-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:0b7ab18d55ecb1de543d452f0a5f8094b52282b916aa4097ac11a4c79f317b86"},
|
||||||
|
{file = "orjson-3.9.1-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:db774344c39041f4801c7dfe03483df9203cbd6c84e601a65908e5552228dd25"},
|
||||||
|
{file = "orjson-3.9.1-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:ae47ef8c0fe89c4677db7e9e1fb2093ca6e66c3acbee5442d84d74e727edad5e"},
|
||||||
|
{file = "orjson-3.9.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:103952c21575b9805803c98add2eaecd005580a1e746292ed2ec0d76dd3b9746"},
|
||||||
|
{file = "orjson-3.9.1-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:2cb0121e6f2c9da3eddf049b99b95fef0adf8480ea7cb544ce858706cdf916eb"},
|
||||||
|
{file = "orjson-3.9.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:24d4ddaa2876e657c0fd32902b5c451fd2afc35159d66a58da7837357044b8c2"},
|
||||||
{file = "orjson-3.9.1-cp311-none-win_amd64.whl", hash = "sha256:0b53b5f72cf536dd8aa4fc4c95e7e09a7adb119f8ff8ee6cc60f735d7740ad6a"},
|
{file = "orjson-3.9.1-cp311-none-win_amd64.whl", hash = "sha256:0b53b5f72cf536dd8aa4fc4c95e7e09a7adb119f8ff8ee6cc60f735d7740ad6a"},
|
||||||
{file = "orjson-3.9.1-cp37-cp37m-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:d4b68d01a506242316a07f1d2f29fb0a8b36cee30a7c35076f1ef59dce0890c1"},
|
{file = "orjson-3.9.1-cp37-cp37m-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:d4b68d01a506242316a07f1d2f29fb0a8b36cee30a7c35076f1ef59dce0890c1"},
|
||||||
|
{file = "orjson-3.9.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d9dd4abe6c6fd352f00f4246d85228f6a9847d0cc14f4d54ee553718c225388f"},
|
||||||
|
{file = "orjson-3.9.1-cp37-cp37m-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:9e20bca5e13041e31ceba7a09bf142e6d63c8a7467f5a9c974f8c13377c75af2"},
|
||||||
|
{file = "orjson-3.9.1-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:d8ae0467d01eb1e4bcffef4486d964bfd1c2e608103e75f7074ed34be5df48cc"},
|
||||||
|
{file = "orjson-3.9.1-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:06f6ab4697fab090517f295915318763a97a12ee8186054adf21c1e6f6abbd3d"},
|
||||||
|
{file = "orjson-3.9.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8515867713301fa065c58ec4c9053ba1a22c35113ab4acad555317b8fd802e50"},
|
||||||
|
{file = "orjson-3.9.1-cp37-cp37m-musllinux_1_1_aarch64.whl", hash = "sha256:393d0697d1dfa18d27d193e980c04fdfb672c87f7765b87952f550521e21b627"},
|
||||||
|
{file = "orjson-3.9.1-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:d96747662d3666f79119e5d28c124e7d356c7dc195cd4b09faea4031c9079dc9"},
|
||||||
{file = "orjson-3.9.1-cp37-none-win_amd64.whl", hash = "sha256:6d173d3921dd58a068c88ec22baea7dbc87a137411501618b1292a9d6252318e"},
|
{file = "orjson-3.9.1-cp37-none-win_amd64.whl", hash = "sha256:6d173d3921dd58a068c88ec22baea7dbc87a137411501618b1292a9d6252318e"},
|
||||||
{file = "orjson-3.9.1-cp38-cp38-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:d1c2b0b4246c992ce2529fc610a446b945f1429445ece1c1f826a234c829a918"},
|
{file = "orjson-3.9.1-cp38-cp38-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:d1c2b0b4246c992ce2529fc610a446b945f1429445ece1c1f826a234c829a918"},
|
||||||
|
{file = "orjson-3.9.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:19f70ba1f441e1c4bb1a581f0baa092e8b3e3ce5b2aac2e1e090f0ac097966da"},
|
||||||
|
{file = "orjson-3.9.1-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:375d65f002e686212aac42680aed044872c45ee4bc656cf63d4a215137a6124a"},
|
||||||
|
{file = "orjson-3.9.1-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:4751cee4a7b1daeacb90a7f5adf2170ccab893c3ab7c5cea58b45a13f89b30b3"},
|
||||||
|
{file = "orjson-3.9.1-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:78d9a2a4b2302d5ebc3695498ebc305c3568e5ad4f3501eb30a6405a32d8af22"},
|
||||||
|
{file = "orjson-3.9.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:46b4facc32643b2689dfc292c0c463985dac4b6ab504799cf51fc3c6959ed668"},
|
||||||
|
{file = "orjson-3.9.1-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:ec7c8a0f1bf35da0d5fd14f8956f3b82a9a6918a3c6963d718dfd414d6d3b604"},
|
||||||
|
{file = "orjson-3.9.1-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:d3a40b0fbe06ccd4d6a99e523d20b47985655bcada8d1eba485b1b32a43e4904"},
|
||||||
{file = "orjson-3.9.1-cp38-none-win_amd64.whl", hash = "sha256:402f9d3edfec4560a98880224ec10eba4c5f7b4791e4bc0d4f4d8df5faf2a006"},
|
{file = "orjson-3.9.1-cp38-none-win_amd64.whl", hash = "sha256:402f9d3edfec4560a98880224ec10eba4c5f7b4791e4bc0d4f4d8df5faf2a006"},
|
||||||
{file = "orjson-3.9.1-cp39-cp39-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:49c0d78dcd34626e2e934f1192d7c052b94e0ecadc5f386fd2bda6d2e03dadf5"},
|
{file = "orjson-3.9.1-cp39-cp39-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:49c0d78dcd34626e2e934f1192d7c052b94e0ecadc5f386fd2bda6d2e03dadf5"},
|
||||||
|
{file = "orjson-3.9.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:125f63e56d38393daa0a1a6dc6fedefca16c538614b66ea5997c3bd3af35ef26"},
|
||||||
|
{file = "orjson-3.9.1-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:08927970365d2e1f3ce4894f9ff928a7b865d53f26768f1bbdd85dd4fee3e966"},
|
||||||
|
{file = "orjson-3.9.1-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f9a744e212d4780ecd67f4b6b128b2e727bee1df03e7059cddb2dfe1083e7dc4"},
|
||||||
|
{file = "orjson-3.9.1-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:5d1dbf36db7240c61eec98c8d21545d671bce70be0730deb2c0d772e06b71af3"},
|
||||||
|
{file = "orjson-3.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:80a1e384626f76b66df615f7bb622a79a25c166d08c5d2151ffd41f24c4cc104"},
|
||||||
|
{file = "orjson-3.9.1-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:15d28872fb055bf17ffca913826e618af61b2f689d2b170f72ecae1a86f80d52"},
|
||||||
|
{file = "orjson-3.9.1-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:1e4d905338f9ef32c67566929dfbfbb23cc80287af8a2c38930fb0eda3d40b76"},
|
||||||
{file = "orjson-3.9.1-cp39-none-win_amd64.whl", hash = "sha256:48a27da6c7306965846565cc385611d03382bbd84120008653aa2f6741e2105d"},
|
{file = "orjson-3.9.1-cp39-none-win_amd64.whl", hash = "sha256:48a27da6c7306965846565cc385611d03382bbd84120008653aa2f6741e2105d"},
|
||||||
|
{file = "orjson-3.9.1.tar.gz", hash = "sha256:db373a25ec4a4fccf8186f9a72a1b3442837e40807a736a815ab42481e83b7d0"},
|
||||||
]
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
@@ -11366,7 +11423,7 @@ cffi = {version = ">=1.11", markers = "platform_python_implementation == \"PyPy\
|
|||||||
cffi = ["cffi (>=1.11)"]
|
cffi = ["cffi (>=1.11)"]
|
||||||
|
|
||||||
[extras]
|
[extras]
|
||||||
all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "jina", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence-transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "pinecone-text", "pymongo", "weaviate-client", "redis", "google-api-python-client", "google-auth", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic", "aleph-alpha-client", "deeplake", "pgvector", "psycopg2-binary", "pyowm", "pytesseract", "html2text", "atlassian-python-api", "gptcache", "duckduckgo-search", "arxiv", "azure-identity", "clickhouse-connect", "azure-cosmos", "lancedb", "langkit", "lark", "pexpect", "pyvespa", "O365", "jq", "docarray", "steamship", "pdfminer-six", "lxml", "requests-toolbelt", "neo4j", "openlm", "azure-ai-formrecognizer", "azure-ai-vision", "azure-cognitiveservices-speech", "momento", "singlestoredb", "tigrisdb", "nebula3-python"]
|
all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "jina", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence-transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "pinecone-text", "pymongo", "weaviate-client", "redis", "google-api-python-client", "google-auth", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic", "aleph-alpha-client", "deeplake", "pgvector", "psycopg2-binary", "pyowm", "pytesseract", "html2text", "atlassian-python-api", "gptcache", "duckduckgo-search", "arxiv", "azure-identity", "clickhouse-connect", "azure-cosmos", "lancedb", "langkit", "lark", "pexpect", "pyvespa", "O365", "jq", "docarray", "steamship", "pdfminer-six", "lxml", "requests-toolbelt", "neo4j", "openlm", "azure-ai-formrecognizer", "azure-ai-vision", "azure-cognitiveservices-speech", "momento", "singlestoredb", "tigrisdb", "nebula3-python", "awadb"]
|
||||||
azure = ["azure-identity", "azure-cosmos", "openai", "azure-core", "azure-ai-formrecognizer", "azure-ai-vision", "azure-cognitiveservices-speech"]
|
azure = ["azure-identity", "azure-cosmos", "openai", "azure-core", "azure-ai-formrecognizer", "azure-ai-vision", "azure-cognitiveservices-speech"]
|
||||||
cohere = ["cohere"]
|
cohere = ["cohere"]
|
||||||
docarray = ["docarray"]
|
docarray = ["docarray"]
|
||||||
@@ -11380,4 +11437,4 @@ text-helpers = ["chardet"]
|
|||||||
[metadata]
|
[metadata]
|
||||||
lock-version = "2.0"
|
lock-version = "2.0"
|
||||||
python-versions = ">=3.8.1,<4.0"
|
python-versions = ">=3.8.1,<4.0"
|
||||||
content-hash = "dbbaa2907bf2ac09ed111ce712772bba0fe56901627f41c53aef71ae5a38d1c6"
|
content-hash = "ecf7086e83cc0ff19e6851c0b63170b082b267c1c1c00f47700fd3a8c8bb46c5"
|
||||||
|
@@ -106,6 +106,7 @@ pyspark = {version = "^3.4.0", optional = true}
|
|||||||
tigrisdb = {version = "^1.0.0b6", optional = true}
|
tigrisdb = {version = "^1.0.0b6", optional = true}
|
||||||
nebula3-python = {version = "^3.4.0", optional = true}
|
nebula3-python = {version = "^3.4.0", optional = true}
|
||||||
langchainplus-sdk = ">=0.0.7"
|
langchainplus-sdk = ">=0.0.7"
|
||||||
|
awadb = {version = "^0.3.2", optional = true}
|
||||||
|
|
||||||
|
|
||||||
[tool.poetry.group.docs.dependencies]
|
[tool.poetry.group.docs.dependencies]
|
||||||
@@ -286,6 +287,7 @@ all = [
|
|||||||
"singlestoredb",
|
"singlestoredb",
|
||||||
"tigrisdb",
|
"tigrisdb",
|
||||||
"nebula3-python",
|
"nebula3-python",
|
||||||
|
"awadb",
|
||||||
]
|
]
|
||||||
|
|
||||||
# An extra used to be able to add extended testing.
|
# An extra used to be able to add extended testing.
|
||||||
|
55
tests/integration_tests/vectorstores/test_awadb.py
Normal file
55
tests/integration_tests/vectorstores/test_awadb.py
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
"""Test AwaDB functionality."""
|
||||||
|
from langchain.docstore.document import Document
|
||||||
|
from langchain.vectorstores import AwaDB
|
||||||
|
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
|
||||||
|
|
||||||
|
|
||||||
|
def test_awadb() -> None:
|
||||||
|
"""Test end to end construction and search."""
|
||||||
|
texts = ["foo", "bar", "baz"]
|
||||||
|
docsearch = AwaDB.from_texts(
|
||||||
|
table_name="test_awadb", texts=texts, embedding=FakeEmbeddings()
|
||||||
|
)
|
||||||
|
output = docsearch.similarity_search("foo", k=1)
|
||||||
|
assert output == [Document(page_content="foo")]
|
||||||
|
|
||||||
|
|
||||||
|
def test_awadb_with_metadatas() -> None:
|
||||||
|
"""Test end to end construction and search."""
|
||||||
|
texts = ["foo", "bar", "baz"]
|
||||||
|
metadatas = [{"page": i} for i in range(len(texts))]
|
||||||
|
docsearch = AwaDB.from_texts(
|
||||||
|
table_name="test_awadb",
|
||||||
|
texts=texts,
|
||||||
|
embedding=FakeEmbeddings(),
|
||||||
|
metadatas=metadatas,
|
||||||
|
)
|
||||||
|
output = docsearch.similarity_search("foo", k=1)
|
||||||
|
assert output == [Document(page_content="foo", metadata={"page": 0})]
|
||||||
|
|
||||||
|
|
||||||
|
def test_awadb_with_metadatas_with_scores() -> None:
|
||||||
|
"""Test end to end construction and scored search."""
|
||||||
|
texts = ["foo", "bar", "baz"]
|
||||||
|
metadatas = [{"page": str(i)} for i in range(len(texts))]
|
||||||
|
docsearch = AwaDB.from_texts(
|
||||||
|
table_name="test_awadb",
|
||||||
|
texts=texts,
|
||||||
|
embedding=FakeEmbeddings(),
|
||||||
|
metadatas=metadatas,
|
||||||
|
)
|
||||||
|
output = docsearch.similarity_search_with_score("foo", k=1)
|
||||||
|
assert output == [(Document(page_content="foo", metadata={"page": "0"}), 0.0)]
|
||||||
|
|
||||||
|
|
||||||
|
def test_awadb_add_texts() -> None:
|
||||||
|
"""Test end to end adding of texts."""
|
||||||
|
# Create initial doc store.
|
||||||
|
texts = ["foo", "bar", "baz"]
|
||||||
|
docsearch = AwaDB.from_texts(
|
||||||
|
table_name="test_awadb", texts=texts, embedding=FakeEmbeddings()
|
||||||
|
)
|
||||||
|
# Test adding a similar document as before.
|
||||||
|
docsearch.add_texts(["foo"])
|
||||||
|
output = docsearch.similarity_search("foo", k=2)
|
||||||
|
assert output == [Document(page_content="foo"), Document(page_content="foo")]
|
Reference in New Issue
Block a user