From aa37893c008cb7265bace8a8d5dad05c8bdfbe90 Mon Sep 17 00:00:00 2001 From: diego dupin Date: Fri, 4 Apr 2025 16:56:25 +0200 Subject: [PATCH] MariaDB vector store documentation addition (#30229) ### New Feature Since version 11.7.1, MariaDB support vector. This is a super fast implementation (see [some perf blog](https://smalldatum.blogspot.com/2025/01/evaluating-vector-indexes-in-mariadb.html) The goal is to support MariaDB with langchain Implementation is done in https://github.com/mariadb-corporation/langchain-mariadb, published in https://pypi.org/project/langchain-mariadb/ This concerns the doc addition (initial PR https://github.com/langchain-ai/langchain/pull/29989) --------- Co-authored-by: ccurme Co-authored-by: Oskar Stark --- docs/docs/integrations/providers/mariadb.mdx | 41 +++ .../integrations/vectorstores/mariadb.ipynb | 298 ++++++++++++++++++ libs/packages.yml | 3 + 3 files changed, 342 insertions(+) create mode 100644 docs/docs/integrations/providers/mariadb.mdx create mode 100644 docs/docs/integrations/vectorstores/mariadb.ipynb diff --git a/docs/docs/integrations/providers/mariadb.mdx b/docs/docs/integrations/providers/mariadb.mdx new file mode 100644 index 00000000000..dfbbf2e8706 --- /dev/null +++ b/docs/docs/integrations/providers/mariadb.mdx @@ -0,0 +1,41 @@ +# MariaDB + +This page covers how to use the [MariaDB](https://github.com/mariadb/) ecosystem within LangChain. +It is broken into two parts: installation and setup, and then references to specific PGVector wrappers. + +## Installation +- Install c/c connector + +on Debian, Ubuntu +```bash +sudo apt install libmariadb3 libmariadb-dev +``` + +on CentOS, RHEL, Rocky Linux +```bash +sudo yum install MariaDB-shared MariaDB-devel +``` + +- Install the Python connector package with `pip install mariadb` + + +## Setup +1. The first step is to have a MariaDB 11.7.1 or later installed. + + The docker image is the easiest way to get started. + +## Wrappers + +### VectorStore + +There exists a wrapper around MariaDB vector databases, allowing you to use it as a vectorstore, +whether for semantic search or example selection. + +To import this vectorstore: +```python +from langchain_mariadb import MariaDBStore +``` + +### Usage + +For a more detailed walkthrough of the MariaDB wrapper, see [this notebook](/docs/integrations/vectorstores/mariadb) diff --git a/docs/docs/integrations/vectorstores/mariadb.ipynb b/docs/docs/integrations/vectorstores/mariadb.ipynb new file mode 100644 index 00000000000..8b061a6cf4f --- /dev/null +++ b/docs/docs/integrations/vectorstores/mariadb.ipynb @@ -0,0 +1,298 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7679dd7b-7ed4-4755-a499-824deadba708", + "metadata": {}, + "source": [ + "# MariaDB\n", + "\n", + "LangChain's MariaDB integration (langchain-mariadb) provides vector capabilities for working with MariaDB version 11.7.1 and above, distributed under the MIT license. Users can use the provided implementations as-is or customize them for specific needs.\n", + " Key features include:\n", + "\n", + " * Built-in vector similarity search\n", + " * Support for cosine and euclidean distance metrics\n", + " * Robust metadata filtering options\n", + " * Performance optimization through connection pooling\n", + " * Configurable table and column settings\n", + "\n", + "## Setup\n", + "\n", + "Launch a MariaDB Docker container with:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "92df32f0", + "metadata": {}, + "outputs": [], + "source": [ + "!docker run --name mariadb-container -e MARIADB_ROOT_PASSWORD=langchain -e MARIADB_DATABASE=langchain -p 3306:3306 -d mariadb:11.7" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Installing the Package\n", + "\n", + "The package uses SQLAlchemy but works best with the MariaDB connector, which requires C/C++ components:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2acbaf9b", + "metadata": {}, + "outputs": [], + "source": [ + "# Debian, Ubuntu\n", + "!sudo apt install libmariadb3 libmariadb-dev\n", + "\n", + "# CentOS, RHEL, Rocky Linux\n", + "!sudo yum install MariaDB-shared MariaDB-devel\n", + "\n", + "# Install Python connector\n", + "!pip install -U mariadb" + ] + }, + { + "cell_type": "markdown", + "id": "0dd87fcc", + "metadata": {}, + "source": [ + "Then install `langchain-mariadb` package\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2acbaf9b", + "metadata": {}, + "outputs": [], + "source": [ + "pip install -U langchain-mariadb\n" + ] + }, + { + "cell_type": "markdown", + "id": "0dd87fca", + "metadata": {}, + "source": [ + "VectorStore works along with an LLM model, here using `langchain-openai` as example.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2acbaf9b", + "metadata": {}, + "outputs": [], + "source": [ + "pip install langchain-openai\n", + "export OPENAI_API_KEY=...\n" + ] + }, + { + "cell_type": "markdown", + "id": "0dd87fcc", + "metadata": {}, + "source": [ + "## Initialization" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "94f5c129", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_core.documents import Document\n", + "from langchain_mariadb import MariaDBStore\n", + "from langchain_openai import OpenAIEmbeddings\n", + "\n", + "# connection string\n", + "url = f\"mariadb+mariadbconnector://myuser:mypassword@localhost/langchain\"\n", + "\n", + "# Initialize vector store\n", + "vectorstore = MariaDBStore(\n", + " embeddings=OpenAIEmbeddings(),\n", + " embedding_length=1536,\n", + " datasource=url,\n", + " collection_name=\"my_docs\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "61a224a1-d70b-4daf-86ba-ab6e43c08b50", + "metadata": {}, + "source": [ + "## Manage vector store\n", + "\n", + "### Adding Data\n", + "You can add data as documents with metadata:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "94f5d129", + "metadata": {}, + "outputs": [], + "source": [ + "docs = [\n", + " Document(\n", + " page_content=\"there are cats in the pond\",\n", + " metadata={\"id\": 1, \"location\": \"pond\", \"topic\": \"animals\"},\n", + " ),\n", + " Document(\n", + " page_content=\"ducks are also found in the pond\",\n", + " metadata={\"id\": 2, \"location\": \"pond\", \"topic\": \"animals\"},\n", + " ),\n", + " # More documents...\n", + "]\n", + "vectorstore.add_documents(docs)" + ] + }, + { + "cell_type": "markdown", + "id": "0c712fa3", + "metadata": {}, + "source": [ + "\n", + "Or as plain text with optional metadata:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "a5b2b71f-49eb-407d-b03a-dea4c0a517d6", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "texts = [\n", + " \"a sculpture exhibit is also at the museum\",\n", + " \"a new coffee shop opened on Main Street\",\n", + "]\n", + "metadatas = [\n", + " {\"id\": 6, \"location\": \"museum\", \"topic\": \"art\"},\n", + " {\"id\": 7, \"location\": \"Main Street\", \"topic\": \"food\"},\n", + "]\n", + "\n", + "vectorstore.add_texts(texts=texts, metadatas=metadatas)" + ] + }, + { + "cell_type": "markdown", + "id": "59f82250-7903-4279-8300-062542c83416", + "metadata": {}, + "source": [ + "## Query vector store" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "f15a2359-6dc3-4099-8214-785f167a9ca4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Basic similarity search\n", + "results = vectorstore.similarity_search(\"Hello\", k=2)\n", + "\n", + "# Search with metadata filtering\n", + "results = vectorstore.similarity_search(\"Hello\", filter={\"category\": \"greeting\"})" + ] + }, + { + "cell_type": "markdown", + "id": "7ecd77a0", + "metadata": {}, + "source": [ + "\n", + "### Filter Options\n", + "\n", + "The system supports various filtering operations on metadata:\n", + "\n", + "* Equality: $eq\n", + "* Inequality: $ne\n", + "* Comparisons: $lt, $lte, $gt, $gte\n", + "* List operations: $in, $nin\n", + "* Text matching: $like, $nlike\n", + "* Logical operations: $and, $or, $not\n", + "\n", + "Example:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "f15a2359-6dc3-4099-8214-785f167a9cb4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Search with simple filter\n", + "results = vectorstore.similarity_search(\n", + " \"kitty\", k=10, filter={\"id\": {\"$in\": [1, 5, 2, 9]}}\n", + ")\n", + "\n", + "# Search with multiple conditions (AND)\n", + "results = vectorstore.similarity_search(\n", + " \"ducks\",\n", + " k=10,\n", + " filter={\"id\": {\"$in\": [1, 5, 2, 9]}, \"location\": {\"$in\": [\"pond\", \"market\"]}},\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "90a65b31", + "metadata": {}, + "source": [ + "## Usage for retrieval-augmented generation\n", + "\n", + "TODO: document example" + ] + }, + { + "cell_type": "markdown", + "id": "f08d3a3c", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "See the repo [here](https://github.com/mariadb-corporation/langchain-mariadb) for more detail." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/libs/packages.yml b/libs/packages.yml index f5fc826876c..7d634c5c176 100644 --- a/libs/packages.yml +++ b/libs/packages.yml @@ -576,3 +576,6 @@ packages: path: . name_title: RunPod provider_page: runpod +- name: langchain-mariadb + path: . + repo: mariadb-corporation/langchain-mariadb