{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "# vlite\n", "\n", "VLite is a simple and blazing fast vector database that allows you to store and retrieve data semantically using embeddings. Made with numpy, vlite is a lightweight batteries-included database to implement RAG, similarity search, and embeddings into your projects.\n", "\n", "## Installation\n", "\n", "To use the VLite in LangChain, you need to install the `vlite` package:\n", "\n", "```bash\n", "!pip install vlite\n", "```\n", "\n", "## Importing VLite\n", "\n", "```python\n", "from langchain.vectorstores import VLite\n", "```\n", "\n", "## Basic Example\n", "\n", "In this basic example, we load a text document, and store them in the VLite vector database. Then, we perform a similarity search to retrieve relevant documents based on a query.\n", "\n", "VLite handles chunking and embedding of the text for you, and you can change these parameters by pre-chunking the text and/or embeddings those chunks into the VLite database.\n", "\n", "```python\n", "from langchain.document_loaders import TextLoader\n", "from langchain.text_splitter import CharacterTextSplitter\n", "\n", "# Load the document and split it into chunks\n", "loader = TextLoader(\"path/to/document.txt\")\n", "documents = loader.load()\n", "\n", "# Create a VLite instance\n", "vlite = VLite(collection=\"my_collection\")\n", "\n", "# Add documents to the VLite vector database\n", "vlite.add_documents(documents)\n", "\n", "# Perform a similarity search\n", "query = \"What is the main topic of the document?\"\n", "docs = vlite.similarity_search(query)\n", "\n", "# Print the most relevant document\n", "print(docs[0].page_content)\n", "```\n", "\n", "## Adding Texts and Documents\n", "\n", "You can add texts or documents to the VLite vector database using the `add_texts` and `add_documents` methods, respectively.\n", "\n", "```python\n", "# Add texts to the VLite vector database\n", "texts = [\"This is the first text.\", \"This is the second text.\"]\n", "vlite.add_texts(texts)\n", "\n", "# Add documents to the VLite vector database\n", "documents = [Document(page_content=\"This is a document.\", metadata={\"source\": \"example.txt\"})]\n", "vlite.add_documents(documents)\n", "```\n", "\n", "## Similarity Search\n", "\n", "VLite provides methods for performing similarity search on the stored documents.\n", "\n", "```python\n", "# Perform a similarity search\n", "query = \"What is the main topic of the document?\"\n", "docs = vlite.similarity_search(query, k=3)\n", "\n", "# Perform a similarity search with scores\n", "docs_with_scores = vlite.similarity_search_with_score(query, k=3)\n", "```\n", "\n", "## Max Marginal Relevance Search\n", "\n", "VLite also supports Max Marginal Relevance (MMR) search, which optimizes for both similarity to the query and diversity among the retrieved documents.\n", "\n", "```python\n", "# Perform an MMR search\n", "docs = vlite.max_marginal_relevance_search(query, k=3)\n", "```\n", "\n", "## Updating and Deleting Documents\n", "\n", "You can update or delete documents in the VLite vector database using the `update_document` and `delete` methods.\n", "\n", "```python\n", "# Update a document\n", "document_id = \"doc_id_1\"\n", "updated_document = Document(page_content=\"Updated content\", metadata={\"source\": \"updated.txt\"})\n", "vlite.update_document(document_id, updated_document)\n", "\n", "# Delete documents\n", "document_ids = [\"doc_id_1\", \"doc_id_2\"]\n", "vlite.delete(document_ids)\n", "```\n", "\n", "## Retrieving Documents\n", "\n", "You can retrieve documents from the VLite vector database based on their IDs or metadata using the `get` method.\n", "\n", "```python\n", "# Retrieve documents by IDs\n", "document_ids = [\"doc_id_1\", \"doc_id_2\"]\n", "docs = vlite.get(ids=document_ids)\n", "\n", "# Retrieve documents by metadata\n", "metadata_filter = {\"source\": \"example.txt\"}\n", "docs = vlite.get(where=metadata_filter)\n", "```\n", "\n", "## Creating VLite Instances\n", "\n", "You can create VLite instances using various methods:\n", "\n", "```python\n", "# Create a VLite instance from texts\n", "vlite = VLite.from_texts(texts)\n", "\n", "# Create a VLite instance from documents\n", "vlite = VLite.from_documents(documents)\n", "\n", "# Create a VLite instance from an existing index\n", "vlite = VLite.from_existing_index(collection=\"existing_collection\")\n", "```\n", "\n", "## Additional Features\n", "\n", "VLite provides additional features for managing the vector database:\n", "\n", "```python\n", "from langchain.vectorstores import VLite\n", "vlite = VLite(collection=\"my_collection\")\n", "\n", "# Get the number of items in the collection\n", "count = vlite.count()\n", "\n", "# Save the collection\n", "vlite.save()\n", "\n", "# Clear the collection\n", "vlite.clear()\n", "\n", "# Get collection information\n", "vlite.info()\n", "\n", "# Dump the collection data\n", "data = vlite.dump()\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 4 }