mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-07 22:11:51 +00:00
Community[minor]: Update VDMS vectorstore (#23729)
**Description:** - This PR exposes some functions in VDMS vectorstore, updates VDMS related notebooks, updates tests, and upgrade version of VDMS (>=0.0.20) **Issue:** N/A **Dependencies:** - Update vdms>=0.0.20
This commit is contained in:
committed by
GitHub
parent
703491e824
commit
69eacaa887
@@ -12,7 +12,8 @@
|
||||
"VDMS supports:\n",
|
||||
"* K nearest neighbor search\n",
|
||||
"* Euclidean distance (L2) and inner product (IP)\n",
|
||||
"* Libraries for indexing and computing distances: TileDBDense, TileDBSparse, FaissFlat (Default), FaissIVFFlat\n",
|
||||
"* Libraries for indexing and computing distances: TileDBDense, TileDBSparse, FaissFlat (Default), FaissIVFFlat, Flinng\n",
|
||||
"* Embeddings for text, images, and video\n",
|
||||
"* Vector and metadata searches\n",
|
||||
"\n",
|
||||
"VDMS has server and client components. To setup the server, see the [installation instructions](https://github.com/IntelLabs/vdms/blob/master/INSTALL.md) or use the [docker image](https://hub.docker.com/r/intellabs/vdms).\n",
|
||||
@@ -40,7 +41,7 @@
|
||||
],
|
||||
"source": [
|
||||
"# Pip install necessary package\n",
|
||||
"%pip install --upgrade --quiet pip sentence-transformers vdms \"unstructured-inference==0.6.6\";"
|
||||
"%pip install --upgrade --quiet pip vdms sentence-transformers langchain-huggingface > /dev/null"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -62,7 +63,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"e6061b270eef87de5319a6c5af709b36badcad8118069a8f6b577d2e01ad5e2d\n"
|
||||
"b26917ffac236673ef1d035ab9c91fe999e29c9eb24aa6c7103d7baa6bf2f72d\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -92,6 +93,9 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import time\n",
|
||||
"import warnings\n",
|
||||
"\n",
|
||||
"warnings.filterwarnings(\"ignore\")\n",
|
||||
"\n",
|
||||
"from langchain_community.document_loaders.text import TextLoader\n",
|
||||
"from langchain_community.vectorstores import VDMS\n",
|
||||
@@ -290,7 +294,7 @@
|
||||
"source": [
|
||||
"# add data\n",
|
||||
"collection_name = \"my_collection_faiss_L2\"\n",
|
||||
"db = VDMS.from_documents(\n",
|
||||
"db_FaissFlat = VDMS.from_documents(\n",
|
||||
" docs,\n",
|
||||
" client=vdms_client,\n",
|
||||
" ids=ids,\n",
|
||||
@@ -301,7 +305,7 @@
|
||||
"# Query (No metadata filtering)\n",
|
||||
"k = 3\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"returned_docs = db.similarity_search(query, k=k, filter=None)\n",
|
||||
"returned_docs = db_FaissFlat.similarity_search(query, k=k, filter=None)\n",
|
||||
"print_results(returned_docs, score=False)"
|
||||
]
|
||||
},
|
||||
@@ -392,25 +396,24 @@
|
||||
"k = 3\n",
|
||||
"constraints = {\"page_number\": [\">\", 30], \"president_included\": [\"==\", True]}\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"returned_docs = db.similarity_search(query, k=k, filter=constraints)\n",
|
||||
"returned_docs = db_FaissFlat.similarity_search(query, k=k, filter=constraints)\n",
|
||||
"print_results(returned_docs, score=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a5984766",
|
||||
"id": "92ab3370",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Similarity Search using TileDBDense and Euclidean Distance\n",
|
||||
"### Similarity Search using Faiss IVFFlat and Inner Product (IP) Distance\n",
|
||||
"\n",
|
||||
"In this section, we add the documents to VDMS using TileDB Dense indexing and L2 as the distance metric for similarity search. We search for three documents (`k=3`) related to the query `What did the president say about Ketanji Brown Jackson` and also return the score along with the document.\n",
|
||||
"\n"
|
||||
"In this section, we add the documents to VDMS using Faiss IndexIVFFlat indexing and IP as the distance metric for similarity search. We search for three documents (`k=3`) related to the query `What did the president say about Ketanji Brown Jackson` and also return the score along with the document.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "3001ba6e",
|
||||
"id": "78f502cf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -419,7 +422,7 @@
|
||||
"text": [
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.2032090425491333\n",
|
||||
"Score:\t1.2032090425\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
@@ -437,7 +440,7 @@
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.495247483253479\n",
|
||||
"Score:\t1.4952471256\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tAs Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit. \n",
|
||||
@@ -463,7 +466,224 @@
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.5008409023284912\n",
|
||||
"Score:\t1.5008399487\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tA former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
|
||||
"\n",
|
||||
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
|
||||
"\n",
|
||||
"We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
|
||||
"\n",
|
||||
"We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
|
||||
"\n",
|
||||
"We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
|
||||
"\n",
|
||||
"We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t33\n",
|
||||
"\tpage_number:\t33\n",
|
||||
"\tpresident_included:\tFalse\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"db_FaissIVFFlat = VDMS.from_documents(\n",
|
||||
" docs,\n",
|
||||
" client=vdms_client,\n",
|
||||
" ids=ids,\n",
|
||||
" collection_name=\"my_collection_FaissIVFFlat_IP\",\n",
|
||||
" embedding=embedding,\n",
|
||||
" engine=\"FaissIVFFlat\",\n",
|
||||
" distance_strategy=\"IP\",\n",
|
||||
")\n",
|
||||
"# Query\n",
|
||||
"k = 3\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs_with_score = db_FaissIVFFlat.similarity_search_with_score(query, k=k, filter=None)\n",
|
||||
"print_results(docs_with_score)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e66d9125",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Similarity Search using FLINNG and IP Distance\n",
|
||||
"\n",
|
||||
"In this section, we add the documents to VDMS using Filters to Identify Near-Neighbor Groups (FLINNG) indexing and IP as the distance metric for similarity search. We search for three documents (`k=3`) related to the query `What did the president say about Ketanji Brown Jackson` and also return the score along with the document."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "add81beb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.2032090425\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t32\n",
|
||||
"\tpage_number:\t32\n",
|
||||
"\tpresident_included:\tTrue\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.4952471256\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tAs Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit. \n",
|
||||
"\n",
|
||||
"It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children. \n",
|
||||
"\n",
|
||||
"And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care. \n",
|
||||
"\n",
|
||||
"Third, support our veterans. \n",
|
||||
"\n",
|
||||
"Veterans are the best of us. \n",
|
||||
"\n",
|
||||
"I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home. \n",
|
||||
"\n",
|
||||
"My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free. \n",
|
||||
"\n",
|
||||
"Our troops in Iraq and Afghanistan faced many dangers.\n",
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t37\n",
|
||||
"\tpage_number:\t37\n",
|
||||
"\tpresident_included:\tFalse\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.5008399487\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tA former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
|
||||
"\n",
|
||||
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
|
||||
"\n",
|
||||
"We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
|
||||
"\n",
|
||||
"We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
|
||||
"\n",
|
||||
"We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
|
||||
"\n",
|
||||
"We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t33\n",
|
||||
"\tpage_number:\t33\n",
|
||||
"\tpresident_included:\tFalse\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"db_Flinng = VDMS.from_documents(\n",
|
||||
" docs,\n",
|
||||
" client=vdms_client,\n",
|
||||
" ids=ids,\n",
|
||||
" collection_name=\"my_collection_Flinng_IP\",\n",
|
||||
" embedding=embedding,\n",
|
||||
" engine=\"Flinng\",\n",
|
||||
" distance_strategy=\"IP\",\n",
|
||||
")\n",
|
||||
"# Query\n",
|
||||
"k = 3\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs_with_score = db_Flinng.similarity_search_with_score(query, k=k, filter=None)\n",
|
||||
"print_results(docs_with_score)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a5984766",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Similarity Search using TileDBDense and Euclidean Distance\n",
|
||||
"\n",
|
||||
"In this section, we add the documents to VDMS using TileDB Dense indexing and L2 as the distance metric for similarity search. We search for three documents (`k=3`) related to the query `What did the president say about Ketanji Brown Jackson` and also return the score along with the document.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "3001ba6e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.2032090425\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t32\n",
|
||||
"\tpage_number:\t32\n",
|
||||
"\tpresident_included:\tTrue\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.4952471256\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tAs Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit. \n",
|
||||
"\n",
|
||||
"It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children. \n",
|
||||
"\n",
|
||||
"And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care. \n",
|
||||
"\n",
|
||||
"Third, support our veterans. \n",
|
||||
"\n",
|
||||
"Veterans are the best of us. \n",
|
||||
"\n",
|
||||
"I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home. \n",
|
||||
"\n",
|
||||
"My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free. \n",
|
||||
"\n",
|
||||
"Our troops in Iraq and Afghanistan faced many dangers.\n",
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t37\n",
|
||||
"\tpage_number:\t37\n",
|
||||
"\tpresident_included:\tFalse\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.5008399487\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tA former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
|
||||
@@ -505,114 +725,6 @@
|
||||
"print_results(docs_with_score)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "92ab3370",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Similarity Search using Faiss IVFFlat and Euclidean Distance\n",
|
||||
"\n",
|
||||
"In this section, we add the documents to VDMS using Faiss IndexIVFFlat indexing and L2 as the distance metric for similarity search. We search for three documents (`k=3`) related to the query `What did the president say about Ketanji Brown Jackson` and also return the score along with the document.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "78f502cf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.2032090425491333\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t32\n",
|
||||
"\tpage_number:\t32\n",
|
||||
"\tpresident_included:\tTrue\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.495247483253479\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tAs Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they’re conducting on our children for profit. \n",
|
||||
"\n",
|
||||
"It’s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children. \n",
|
||||
"\n",
|
||||
"And let’s get all Americans the mental health services they need. More people they can turn to for help, and full parity between physical and mental health care. \n",
|
||||
"\n",
|
||||
"Third, support our veterans. \n",
|
||||
"\n",
|
||||
"Veterans are the best of us. \n",
|
||||
"\n",
|
||||
"I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home. \n",
|
||||
"\n",
|
||||
"My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free. \n",
|
||||
"\n",
|
||||
"Our troops in Iraq and Afghanistan faced many dangers.\n",
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t37\n",
|
||||
"\tpage_number:\t37\n",
|
||||
"\tpresident_included:\tFalse\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.5008409023284912\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tA former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
|
||||
"\n",
|
||||
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
|
||||
"\n",
|
||||
"We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
|
||||
"\n",
|
||||
"We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
|
||||
"\n",
|
||||
"We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
|
||||
"\n",
|
||||
"We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t33\n",
|
||||
"\tpage_number:\t33\n",
|
||||
"\tpresident_included:\tFalse\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"db_FaissIVFFlat = VDMS.from_documents(\n",
|
||||
" docs,\n",
|
||||
" client=vdms_client,\n",
|
||||
" ids=ids,\n",
|
||||
" collection_name=\"my_collection_FaissIVFFlat_L2\",\n",
|
||||
" embedding=embedding,\n",
|
||||
" engine=\"FaissIVFFlat\",\n",
|
||||
" distance_strategy=\"L2\",\n",
|
||||
")\n",
|
||||
"# Query\n",
|
||||
"k = 3\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs_with_score = db_FaissIVFFlat.similarity_search_with_score(query, k=k, filter=None)\n",
|
||||
"print_results(docs_with_score)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9ed3ec50",
|
||||
@@ -622,12 +734,12 @@
|
||||
"\n",
|
||||
"While building toward a real application, you want to go beyond adding data, and also update and delete data.\n",
|
||||
"\n",
|
||||
"Here is a basic example showing how to do so. First, we will update the metadata for the document most relevant to the query."
|
||||
"Here is a basic example showing how to do so. First, we will update the metadata for the document most relevant to the query by adding a date. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 11,
|
||||
"id": "81a02810",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -638,7 +750,7 @@
|
||||
"Original metadata: \n",
|
||||
"\t{'id': '32', 'page_number': 32, 'president_included': True, 'source': '../../how_to/state_of_the_union.txt'}\n",
|
||||
"new metadata: \n",
|
||||
"\t{'id': '32', 'page_number': 32, 'president_included': True, 'source': '../../how_to/state_of_the_union.txt', 'new_value': 'hello world'}\n",
|
||||
"\t{'id': '32', 'page_number': 32, 'president_included': True, 'source': '../../how_to/state_of_the_union.txt', 'last_date_read': {'_date': '2024-05-01T14:30:00'}}\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"UPDATED ENTRY (id=32):\n",
|
||||
@@ -655,8 +767,8 @@
|
||||
"id:\n",
|
||||
"\t32\n",
|
||||
"\n",
|
||||
"new_value:\n",
|
||||
"\thello world\n",
|
||||
"last_date_read:\n",
|
||||
"\t2024-05-01T14:30:00+00:00\n",
|
||||
"\n",
|
||||
"page_number:\n",
|
||||
"\t32\n",
|
||||
@@ -672,19 +784,26 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"doc = db.similarity_search(query)[0]\n",
|
||||
"from datetime import datetime\n",
|
||||
"\n",
|
||||
"doc = db_FaissFlat.similarity_search(query)[0]\n",
|
||||
"print(f\"Original metadata: \\n\\t{doc.metadata}\")\n",
|
||||
"\n",
|
||||
"# update the metadata for a document\n",
|
||||
"doc.metadata[\"new_value\"] = \"hello world\"\n",
|
||||
"# Update the metadata for a document by adding last datetime document read\n",
|
||||
"datetime_str = datetime(2024, 5, 1, 14, 30, 0).isoformat()\n",
|
||||
"doc.metadata[\"last_date_read\"] = {\"_date\": datetime_str}\n",
|
||||
"print(f\"new metadata: \\n\\t{doc.metadata}\")\n",
|
||||
"print(f\"{DELIMITER}\\n\")\n",
|
||||
"\n",
|
||||
"# Update document in VDMS\n",
|
||||
"id_to_update = doc.metadata[\"id\"]\n",
|
||||
"db.update_document(collection_name, id_to_update, doc)\n",
|
||||
"response, response_array = db.get(\n",
|
||||
" collection_name, constraints={\"id\": [\"==\", id_to_update]}\n",
|
||||
"db_FaissFlat.update_document(collection_name, id_to_update, doc)\n",
|
||||
"response, response_array = db_FaissFlat.get(\n",
|
||||
" collection_name,\n",
|
||||
" constraints={\n",
|
||||
" \"id\": [\"==\", id_to_update],\n",
|
||||
" \"last_date_read\": [\">=\", {\"_date\": \"2024-05-01T00:00:00\"}],\n",
|
||||
" },\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Display Results\n",
|
||||
@@ -702,7 +821,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 12,
|
||||
"id": "95537fe8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -716,11 +835,13 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"Documents before deletion: \", db.count(collection_name))\n",
|
||||
"print(\"Documents before deletion: \", db_FaissFlat.count(collection_name))\n",
|
||||
"\n",
|
||||
"id_to_remove = ids[-1]\n",
|
||||
"db.delete(collection_name=collection_name, ids=[id_to_remove])\n",
|
||||
"print(f\"Documents after deletion (id={id_to_remove}): {db.count(collection_name)}\")"
|
||||
"db_FaissFlat.delete(collection_name=collection_name, ids=[id_to_remove])\n",
|
||||
"print(\n",
|
||||
" f\"Documents after deletion (id={id_to_remove}): {db_FaissFlat.count(collection_name)}\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -739,7 +860,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 13,
|
||||
"id": "1db4d6ed",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -758,7 +879,7 @@
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t32\n",
|
||||
"\tnew_value:\thello world\n",
|
||||
"\tlast_date_read:\t2024-05-01T14:30:00+00:00\n",
|
||||
"\tpage_number:\t32\n",
|
||||
"\tpresident_included:\tTrue\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n"
|
||||
@@ -767,7 +888,7 @@
|
||||
],
|
||||
"source": [
|
||||
"embedding_vector = embedding.embed_query(query)\n",
|
||||
"returned_docs = db.similarity_search_by_vector(embedding_vector)\n",
|
||||
"returned_docs = db_FaissFlat.similarity_search_by_vector(embedding_vector)\n",
|
||||
"\n",
|
||||
"# Print Results\n",
|
||||
"print_document_details(returned_docs[0])"
|
||||
@@ -787,7 +908,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 14,
|
||||
"id": "2bc0313b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -795,7 +916,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Returned entry:\n",
|
||||
"Deleted entry:\n",
|
||||
"\n",
|
||||
"blob:\n",
|
||||
"\tTrue\n",
|
||||
@@ -838,18 +959,18 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"response, response_array = db.get(\n",
|
||||
"response, response_array = db_FaissFlat.get(\n",
|
||||
" collection_name,\n",
|
||||
" limit=1,\n",
|
||||
" include=[\"metadata\", \"embeddings\"],\n",
|
||||
" constraints={\"id\": [\"==\", \"2\"]},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(\"Returned entry:\")\n",
|
||||
"print_response([response[0][\"FindDescriptor\"][\"entities\"][0]])\n",
|
||||
"\n",
|
||||
"# Delete id=2\n",
|
||||
"db.delete(collection_name=collection_name, ids=[\"2\"])"
|
||||
"db_FaissFlat.delete(collection_name=collection_name, ids=[\"2\"])\n",
|
||||
"\n",
|
||||
"print(\"Deleted entry:\")\n",
|
||||
"print_response([response[0][\"FindDescriptor\"][\"entities\"][0]])"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -869,7 +990,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 15,
|
||||
"id": "120f55eb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -888,7 +1009,7 @@
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t32\n",
|
||||
"\tnew_value:\thello world\n",
|
||||
"\tlast_date_read:\t2024-05-01T14:30:00+00:00\n",
|
||||
"\tpage_number:\t32\n",
|
||||
"\tpresident_included:\tTrue\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n"
|
||||
@@ -896,7 +1017,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever = db.as_retriever()\n",
|
||||
"retriever = db_FaissFlat.as_retriever()\n",
|
||||
"relevant_docs = retriever.invoke(query)[0]\n",
|
||||
"\n",
|
||||
"print_document_details(relevant_docs)"
|
||||
@@ -914,7 +1035,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 16,
|
||||
"id": "f00be6d0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -933,7 +1054,7 @@
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t32\n",
|
||||
"\tnew_value:\thello world\n",
|
||||
"\tlast_date_read:\t2024-05-01T14:30:00+00:00\n",
|
||||
"\tpage_number:\t32\n",
|
||||
"\tpresident_included:\tTrue\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n"
|
||||
@@ -941,7 +1062,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever = db.as_retriever(search_type=\"mmr\")\n",
|
||||
"retriever = db_FaissFlat.as_retriever(search_type=\"mmr\")\n",
|
||||
"relevant_docs = retriever.invoke(query)[0]\n",
|
||||
"\n",
|
||||
"print_document_details(relevant_docs)"
|
||||
@@ -957,7 +1078,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 17,
|
||||
"id": "ab911470",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -967,7 +1088,7 @@
|
||||
"text": [
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.2032092809677124\n",
|
||||
"Score:\t1.2032091618\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
@@ -980,13 +1101,13 @@
|
||||
"\n",
|
||||
"Metadata:\n",
|
||||
"\tid:\t32\n",
|
||||
"\tnew_value:\thello world\n",
|
||||
"\tlast_date_read:\t2024-05-01T14:30:00+00:00\n",
|
||||
"\tpage_number:\t32\n",
|
||||
"\tpresident_included:\tTrue\n",
|
||||
"\tsource:\t../../how_to/state_of_the_union.txt\n",
|
||||
"--------------------------------------------------\n",
|
||||
"\n",
|
||||
"Score:\t1.507053256034851\n",
|
||||
"Score:\t1.50705266\n",
|
||||
"\n",
|
||||
"Content:\n",
|
||||
"\tBut cancer from prolonged exposure to burn pits ravaged Heath’s lungs and body. \n",
|
||||
@@ -1022,7 +1143,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"mmr_resp = db.max_marginal_relevance_search_with_score(query, k=2, fetch_k=10)\n",
|
||||
"mmr_resp = db_FaissFlat.max_marginal_relevance_search_with_score(query, k=2, fetch_k=10)\n",
|
||||
"print_results(mmr_resp)"
|
||||
]
|
||||
},
|
||||
@@ -1037,7 +1158,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 18,
|
||||
"id": "874e7af9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -1051,11 +1172,11 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"Documents before deletion: \", db.count(collection_name))\n",
|
||||
"print(\"Documents before deletion: \", db_FaissFlat.count(collection_name))\n",
|
||||
"\n",
|
||||
"db.delete(collection_name=collection_name)\n",
|
||||
"db_FaissFlat.delete(collection_name=collection_name)\n",
|
||||
"\n",
|
||||
"print(\"Documents after deletion: \", db.count(collection_name))"
|
||||
"print(\"Documents after deletion: \", db_FaissFlat.count(collection_name))"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1068,7 +1189,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 19,
|
||||
"id": "08931796",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -1097,7 +1218,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0386ea81",
|
||||
"id": "a60725a6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
@@ -1119,7 +1240,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
Reference in New Issue
Block a user