community[minor]: Adds a vector store for Azure Cosmos DB for NoSQL (#21676)

This PR add supports for Azure Cosmos DB for NoSQL vector store.

Summary:

Description: added vector store integration for Azure Cosmos DB for
NoSQL Vector Store,
Dependencies: azure-cosmos dependency,
Tag maintainer: @hwchase17, @baskaryan @efriis @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This commit is contained in:
Aayush Kataria
2024-06-11 10:34:01 -07:00
committed by GitHub
parent 36cad5d25c
commit 71811e0547
14 changed files with 917 additions and 97 deletions

View File

@@ -60,7 +60,7 @@
" * document addition by id (`add_documents` method with `ids` argument)\n",
" * delete by id (`delete` method with `ids` argument)\n",
"\n",
"Compatible Vectorstores: `Aerospike`, `AnalyticDB`, `AstraDB`, `AwaDB`, `Bagel`, `Cassandra`, `Chroma`, `CouchbaseVectorStore`, `DashVector`, `DatabricksVectorSearch`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `HanaDB`, `Milvus`, `MyScale`, `OpenSearchVectorSearch`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `Rockset`, `ScaNN`, `SupabaseVectorStore`, `SurrealDBStore`, `TimescaleVector`, `Vald`, `VDMS`, `Vearch`, `VespaStore`, `Weaviate`, `Yellowbrick`, `ZepVectorStore`, `TencentVectorDB`, `OpenSearchVectorSearch`.\n",
"Compatible Vectorstores: `Aerospike`, `AnalyticDB`, `AstraDB`, `AwaDB`, `AzureCosmosDBNoSqlVectorSearch`, `AzureCosmosDBVectorSearch`, `Bagel`, `Cassandra`, `Chroma`, `CouchbaseVectorStore`, `DashVector`, `DatabricksVectorSearch`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `HanaDB`, `Milvus`, `MyScale`, `OpenSearchVectorSearch`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `Rockset`, `ScaNN`, `SupabaseVectorStore`, `SurrealDBStore`, `TimescaleVector`, `Vald`, `VDMS`, `Vearch`, `VespaStore`, `Weaviate`, `Yellowbrick`, `ZepVectorStore`, `TencentVectorDB`, `OpenSearchVectorSearch`.\n",
" \n",
"## Caution\n",
"\n",

View File

@@ -225,7 +225,7 @@ from langchain_community.document_loaders.onenote import OneNoteLoader
## Vector stores
### Azure Cosmos DB
### Azure Cosmos DB MongoDB vCore
>[Azure Cosmos DB for MongoDB vCore](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/) makes it easy to create a database with full native MongoDB support.
> You can apply your MongoDB experience and continue to use your favorite MongoDB drivers, SDKs, and tools by pointing your application to the API for MongoDB vCore account's connection string.
@@ -255,6 +255,38 @@ See a [usage example](/docs/integrations/vectorstores/azure_cosmos_db).
from langchain_community.vectorstores import AzureCosmosDBVectorSearch
```
### Azure Cosmos DB NoSQL
>[Azure Cosmos DB for NoSQL](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/vector-search) now offers vector indexing and search in preview.
This feature is designed to handle high-dimensional vectors, enabling efficient and accurate vector search at any scale. You can now store vectors
directly in the documents alongside your data. This means that each document in your database can contain not only traditional schema-free data,
but also high-dimensional vectors as other properties of the documents. This colocation of data and vectors allows for efficient indexing and searching,
as the vectors are stored in the same logical unit as the data they represent. This simplifies data management, AI application architectures, and the
efficiency of vector-based operations.
#### Installation and Setup
See [detail configuration instructions](/docs/integrations/vectorstores/azure_cosmos_db_no_sql).
We need to install `azure-cosmos` python package.
```bash
pip install azure-cosmos
```
#### Deploy Azure Cosmos DB on Microsoft Azure
Azure Cosmos DB offers a solution for modern apps and intelligent workloads by being very responsive with dynamic and elastic autoscale. It is available
in every Azure region and can automatically replicate data closer to users. It has SLA guaranteed low-latency and high availability.
[Sign Up](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/quickstart-python?pivots=devcontainer-codespace) for free to get started today.
See a [usage example](/docs/integrations/vectorstores/azure_cosmos_db_no_sql).
```python
from langchain_community.vectorstores import AzureCosmosDBNoSQLVectorSearch
```
## Retrievers
### Azure AI Search

View File

@@ -3,11 +3,9 @@
{
"cell_type": "markdown",
"id": "245c0aa70db77606",
"metadata": {
"collapsed": false
},
"metadata": {},
"source": [
"# Azure Cosmos DB\n",
"# Azure Cosmos DB Mongo vCore\n",
"\n",
"This notebook shows you how to leverage this integrated [vector database](https://learn.microsoft.com/en-us/azure/cosmos-db/vector-database) to store documents in collections, create indicies and perform vector search queries using approximate nearest neighbor algorithms such as COS (cosine distance), L2 (Euclidean distance), and IP (inner product) to locate documents close to the query vectors. \n",
" \n",
@@ -22,9 +20,7 @@
{
"cell_type": "markdown",
"id": "8c493e205ce1dda5",
"metadata": {
"collapsed": false
},
"metadata": {},
"source": []
},
{
@@ -35,8 +31,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:25:05.278480Z",
"start_time": "2024-02-08T18:24:51.560677Z"
},
"collapsed": false
}
},
"outputs": [
{
@@ -62,8 +57,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:25:56.926147Z",
"start_time": "2024-02-08T18:25:56.900087Z"
},
"collapsed": false
}
},
"outputs": [],
"source": [
@@ -78,9 +72,7 @@
{
"cell_type": "markdown",
"id": "f2e66b097c6ce2e3",
"metadata": {
"collapsed": false
},
"metadata": {},
"source": [
"We want to use `OpenAIEmbeddings` so we need to set up our Azure OpenAI API Key alongside other environment variables. "
]
@@ -93,8 +85,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:26:06.558294Z",
"start_time": "2024-02-08T18:26:06.550008Z"
},
"collapsed": false
}
},
"outputs": [],
"source": [
@@ -114,9 +105,7 @@
{
"cell_type": "markdown",
"id": "ebaa28c6e2b35063",
"metadata": {
"collapsed": false
},
"metadata": {},
"source": [
"Now, we need to load the documents into the collection, create the index and then run our queries against the index to retrieve matches.\n",
"\n",
@@ -131,8 +120,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:27:00.782280Z",
"start_time": "2024-02-08T18:26:47.339151Z"
},
"collapsed": false
}
},
"outputs": [],
"source": [
@@ -172,8 +160,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:31:13.486173Z",
"start_time": "2024-02-08T18:30:54.175890Z"
},
"collapsed": false
}
},
"outputs": [
{
@@ -236,8 +223,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:31:47.468902Z",
"start_time": "2024-02-08T18:31:46.053602Z"
},
"collapsed": false
}
},
"outputs": [],
"source": [
@@ -254,8 +240,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:31:50.982598Z",
"start_time": "2024-02-08T18:31:50.977605Z"
},
"collapsed": false
}
},
"outputs": [
{
@@ -279,9 +264,7 @@
{
"cell_type": "markdown",
"id": "37e4df8c7d7db851",
"metadata": {
"collapsed": false
},
"metadata": {},
"source": [
"Once the documents have been loaded and the index has been created, you can now instantiate the vector store directly and run queries against the index"
]
@@ -294,8 +277,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:32:14.299599Z",
"start_time": "2024-02-08T18:32:12.923464Z"
},
"collapsed": false
}
},
"outputs": [
{
@@ -332,8 +314,7 @@
"ExecuteTime": {
"end_time": "2024-02-08T18:32:24.021434Z",
"start_time": "2024-02-08T18:32:22.867658Z"
},
"collapsed": false
}
},
"outputs": [
{
@@ -366,30 +347,28 @@
"cell_type": "code",
"execution_count": null,
"id": "b63c73c7e905001c",
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,

File diff suppressed because one or more lines are too long