Files
langchain/docs/versioned_docs/version-0.2.x/integrations/vectorstores/vlite.ipynb
Jacob Lee aff771923a Jacob/new docs (#20570)
Use docusaurus versioning with a callout, merged master as well

@hwchase17 @baskaryan

---------

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Averi Kitsch <akitsch@google.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Martín Gotelli Ferenaz <martingotelliferenaz@gmail.com>
Co-authored-by: Fayfox <admin@fayfox.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Dawson Bauer <105886620+djbauer2@users.noreply.github.com>
Co-authored-by: Ravindu Somawansa <ravindu.somawansa@gmail.com>
Co-authored-by: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: WeichenXu <weichen.xu@databricks.com>
Co-authored-by: Benito Geordie <89472452+benitoThree@users.noreply.github.com>
Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
Co-authored-by: Sevin F. Varoglu <sfvaroglu@octoml.ai>
Co-authored-by: MacanPN <martin.triska@gmail.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
Co-authored-by: Hyeongchan Kim <kozistr@gmail.com>
Co-authored-by: sdan <git@sdan.io>
Co-authored-by: Guangdong Liu <liugddx@gmail.com>
Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: pjb157 <84070455+pjb157@users.noreply.github.com>
Co-authored-by: Eun Hye Kim <ehkim1440@gmail.com>
Co-authored-by: kaijietti <43436010+kaijietti@users.noreply.github.com>
Co-authored-by: Pengcheng Liu <pcliu.fd@gmail.com>
Co-authored-by: Tomer Cagan <tomer@tomercagan.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
2024-04-18 11:10:55 -07:00

187 lines
6.0 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# vlite\n",
"\n",
"VLite is a simple and blazing fast vector database that allows you to store and retrieve data semantically using embeddings. Made with numpy, vlite is a lightweight batteries-included database to implement RAG, similarity search, and embeddings into your projects.\n",
"\n",
"## Installation\n",
"\n",
"To use the VLite in LangChain, you need to install the `vlite` package:\n",
"\n",
"```bash\n",
"!pip install vlite\n",
"```\n",
"\n",
"## Importing VLite\n",
"\n",
"```python\n",
"from langchain.vectorstores import VLite\n",
"```\n",
"\n",
"## Basic Example\n",
"\n",
"In this basic example, we load a text document, and store them in the VLite vector database. Then, we perform a similarity search to retrieve relevant documents based on a query.\n",
"\n",
"VLite handles chunking and embedding of the text for you, and you can change these parameters by pre-chunking the text and/or embeddings those chunks into the VLite database.\n",
"\n",
"```python\n",
"from langchain.document_loaders import TextLoader\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"\n",
"# Load the document and split it into chunks\n",
"loader = TextLoader(\"path/to/document.txt\")\n",
"documents = loader.load()\n",
"\n",
"# Create a VLite instance\n",
"vlite = VLite(collection=\"my_collection\")\n",
"\n",
"# Add documents to the VLite vector database\n",
"vlite.add_documents(documents)\n",
"\n",
"# Perform a similarity search\n",
"query = \"What is the main topic of the document?\"\n",
"docs = vlite.similarity_search(query)\n",
"\n",
"# Print the most relevant document\n",
"print(docs[0].page_content)\n",
"```\n",
"\n",
"## Adding Texts and Documents\n",
"\n",
"You can add texts or documents to the VLite vector database using the `add_texts` and `add_documents` methods, respectively.\n",
"\n",
"```python\n",
"# Add texts to the VLite vector database\n",
"texts = [\"This is the first text.\", \"This is the second text.\"]\n",
"vlite.add_texts(texts)\n",
"\n",
"# Add documents to the VLite vector database\n",
"documents = [Document(page_content=\"This is a document.\", metadata={\"source\": \"example.txt\"})]\n",
"vlite.add_documents(documents)\n",
"```\n",
"\n",
"## Similarity Search\n",
"\n",
"VLite provides methods for performing similarity search on the stored documents.\n",
"\n",
"```python\n",
"# Perform a similarity search\n",
"query = \"What is the main topic of the document?\"\n",
"docs = vlite.similarity_search(query, k=3)\n",
"\n",
"# Perform a similarity search with scores\n",
"docs_with_scores = vlite.similarity_search_with_score(query, k=3)\n",
"```\n",
"\n",
"## Max Marginal Relevance Search\n",
"\n",
"VLite also supports Max Marginal Relevance (MMR) search, which optimizes for both similarity to the query and diversity among the retrieved documents.\n",
"\n",
"```python\n",
"# Perform an MMR search\n",
"docs = vlite.max_marginal_relevance_search(query, k=3)\n",
"```\n",
"\n",
"## Updating and Deleting Documents\n",
"\n",
"You can update or delete documents in the VLite vector database using the `update_document` and `delete` methods.\n",
"\n",
"```python\n",
"# Update a document\n",
"document_id = \"doc_id_1\"\n",
"updated_document = Document(page_content=\"Updated content\", metadata={\"source\": \"updated.txt\"})\n",
"vlite.update_document(document_id, updated_document)\n",
"\n",
"# Delete documents\n",
"document_ids = [\"doc_id_1\", \"doc_id_2\"]\n",
"vlite.delete(document_ids)\n",
"```\n",
"\n",
"## Retrieving Documents\n",
"\n",
"You can retrieve documents from the VLite vector database based on their IDs or metadata using the `get` method.\n",
"\n",
"```python\n",
"# Retrieve documents by IDs\n",
"document_ids = [\"doc_id_1\", \"doc_id_2\"]\n",
"docs = vlite.get(ids=document_ids)\n",
"\n",
"# Retrieve documents by metadata\n",
"metadata_filter = {\"source\": \"example.txt\"}\n",
"docs = vlite.get(where=metadata_filter)\n",
"```\n",
"\n",
"## Creating VLite Instances\n",
"\n",
"You can create VLite instances using various methods:\n",
"\n",
"```python\n",
"# Create a VLite instance from texts\n",
"vlite = VLite.from_texts(texts)\n",
"\n",
"# Create a VLite instance from documents\n",
"vlite = VLite.from_documents(documents)\n",
"\n",
"# Create a VLite instance from an existing index\n",
"vlite = VLite.from_existing_index(collection=\"existing_collection\")\n",
"```\n",
"\n",
"## Additional Features\n",
"\n",
"VLite provides additional features for managing the vector database:\n",
"\n",
"```python\n",
"from langchain.vectorstores import VLite\n",
"vlite = VLite(collection=\"my_collection\")\n",
"\n",
"# Get the number of items in the collection\n",
"count = vlite.count()\n",
"\n",
"# Save the collection\n",
"vlite.save()\n",
"\n",
"# Clear the collection\n",
"vlite.clear()\n",
"\n",
"# Get collection information\n",
"vlite.info()\n",
"\n",
"# Dump the collection data\n",
"data = vlite.dump()\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}