docs: YDB Vector Store docs (#30636)

This PR adds docs about how to use YDB as a vector store [YDB](https://ydb.tech/) is a versatile open-source distributed SQL database. It supports [vector search](https://ydb.tech/docs/en/yql/reference/udf/list/knn) which means it can be used as a vector store with langchain. YDB vectore store comes with [langchain-ydb](https://pypi.org/project/langchain-ydb/) pypi package. Co-authored-by: ccurme <chester.curme@gmail.com>
2025-08-14 23:26:34 +00:00 · 2025-04-11 04:33:56 +03:00 · 2025-04-11 04:33:56 +03:00 · b6fe7e8c10
commit b6fe7e8c10
parent 7a4ae6fbff
3 changed files with 517 additions and 0 deletions
--- a/docs/docs/integrations/providers/ydb.mdx
+++ b/docs/docs/integrations/providers/ydb.mdx
@ -0,0 +1,23 @@
+# YDB
+
+All functionality related to YDB.
+
+> [YDB](https://ydb.tech/) is a versatile open source Distributed SQL Database that combines
+> high availability and scalability with strong consistency and ACID transactions.
+> It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously.
+
+## Installation and Setup
+
+```bash
+pip install langchain-ydb
+```
+
+## Vector Store
+
+To import YDB vector store:
+
+```python
+from langchain_ydb.vectorstores import YDB
+```
+
+For a more detailed walkthrough of the YDB vector store, see [this notebook](/docs/integrations/vectorstores/ydb).
--- a/docs/docs/integrations/vectorstores/ydb.ipynb
+++ b/docs/docs/integrations/vectorstores/ydb.ipynb
@ -0,0 +1,491 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "683953b3",
+   "metadata": {},
+   "source": [
+    "# YDB\n",
+    "\n",
+    "> [YDB](https://ydb.tech/) is a versatile open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions. It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously.\n",
+    "\n",
+    "This notebook shows how to use functionality related to the `YDB` vector store.\n",
+    "\n",
+    "## Setup\n",
+    "\n",
+    "First, set up a local YDB with Docker:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "da75f17c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! docker run -d -p 2136:2136 --name ydb-langchain -e YDB_USE_IN_MEMORY_PDISKS=true -h localhost ydbplatform/local-ydb:trunk"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0acb2a8d",
+   "metadata": {},
+   "source": [
+    "You'll need to install `langchain-ydb` to use this integration"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d454fb7c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! pip install -qU langchain-ydb"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3df5501b",
+   "metadata": {},
+   "source": [
+    "### Credentials\n",
+    "\n",
+    "There are no credentials for this notebook, just make sure you have installed the packages as shown above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54d5276f",
+   "metadata": {},
+   "source": [
+    "If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f6fd5b03",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
+    "# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b87fe34",
+   "metadata": {},
+   "source": [
+    "## Initialization\n",
+    "\n",
+    "import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
+    "\n",
+    "<EmbeddingTabs/>\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "60276097",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/ovcharuk/opensource/langchain/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "# | output: false\n",
+    "# | echo: false\n",
+    "from langchain_openai import OpenAIEmbeddings\n",
+    "\n",
+    "embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aac9563e",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2023-06-03T08:33:31.554934Z",
+     "start_time": "2023-06-03T08:33:31.549590Z"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain_ydb.vectorstores import YDB, YDBSearchStrategy, YDBSettings\n",
+    "\n",
+    "settings = YDBSettings(\n",
+    "    table=\"ydb_example\",\n",
+    "    strategy=YDBSearchStrategy.COSINE_SIMILARITY,\n",
+    ")\n",
+    "vector_store = YDB(embeddings, config=settings)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32dd3f67",
+   "metadata": {},
+   "source": [
+    "## Manage vector store\n",
+    "\n",
+    "Once you have created your vector store, you can interact with it by adding and deleting different items.\n",
+    "\n",
+    "### Add items to vector store\n",
+    "\n",
+    "Prepare documents to work with:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "944743ee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from uuid import uuid4\n",
+    "\n",
+    "from langchain_core.documents import Document\n",
+    "\n",
+    "document_1 = Document(\n",
+    "    page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
+    "    metadata={\"source\": \"tweet\"},\n",
+    ")\n",
+    "\n",
+    "document_2 = Document(\n",
+    "    page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
+    "    metadata={\"source\": \"news\"},\n",
+    ")\n",
+    "\n",
+    "document_3 = Document(\n",
+    "    page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
+    "    metadata={\"source\": \"tweet\"},\n",
+    ")\n",
+    "\n",
+    "document_4 = Document(\n",
+    "    page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
+    "    metadata={\"source\": \"news\"},\n",
+    ")\n",
+    "\n",
+    "document_5 = Document(\n",
+    "    page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
+    "    metadata={\"source\": \"tweet\"},\n",
+    ")\n",
+    "\n",
+    "document_6 = Document(\n",
+    "    page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
+    "    metadata={\"source\": \"website\"},\n",
+    ")\n",
+    "\n",
+    "document_7 = Document(\n",
+    "    page_content=\"The top 10 soccer players in the world right now.\",\n",
+    "    metadata={\"source\": \"website\"},\n",
+    ")\n",
+    "\n",
+    "document_8 = Document(\n",
+    "    page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
+    "    metadata={\"source\": \"tweet\"},\n",
+    ")\n",
+    "\n",
+    "document_9 = Document(\n",
+    "    page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
+    "    metadata={\"source\": \"news\"},\n",
+    ")\n",
+    "\n",
+    "document_10 = Document(\n",
+    "    page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
+    "    metadata={\"source\": \"tweet\"},\n",
+    ")\n",
+    "\n",
+    "documents = [\n",
+    "    document_1,\n",
+    "    document_2,\n",
+    "    document_3,\n",
+    "    document_4,\n",
+    "    document_5,\n",
+    "    document_6,\n",
+    "    document_7,\n",
+    "    document_8,\n",
+    "    document_9,\n",
+    "    document_10,\n",
+    "]\n",
+    "uuids = [str(uuid4()) for _ in range(len(documents))]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c616b7f4",
+   "metadata": {},
+   "source": [
+    "You can add items to your vector store by using the `add_documents` function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "3f632996",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Inserting data...: 100%|██████████| 10/10 [00:00<00:00, 14.67it/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "['947be6aa-d489-44c5-910e-62e4d58d2ffb',\n",
+       " '7a62904d-9db3-412b-83b6-f01b34dd7de3',\n",
+       " 'e5a49c64-c985-4ed7-ac58-5ffa31ade699',\n",
+       " '99cf4104-36ab-4bd5-b0da-e210d260e512',\n",
+       " '5810bcd0-b46e-443e-a663-e888c9e028d1',\n",
+       " '190c193d-844e-4dbb-9a4b-b8f5f16cfae6',\n",
+       " 'f8912944-f80a-4178-954e-4595bf59e341',\n",
+       " '34fc7b09-6000-42c9-95f7-7d49f430b904',\n",
+       " '0f6b6783-f300-4a4d-bb04-8025c4dfd409',\n",
+       " '46c37ba9-7cf2-4ac8-9bd1-d84e2cb1155c']"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "vector_store.add_documents(documents=documents, ids=uuids)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18af81cc",
+   "metadata": {},
+   "source": [
+    "### Delete items from vector store\n",
+    "\n",
+    "You can delete items from your vector store by ID using the `delete` function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "12b32762",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "vector_store.delete(ids=[uuids[-1]])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ada27577",
+   "metadata": {},
+   "source": [
+    "## Query vector store\n",
+    "\n",
+    "Once your vector store has been created and relevant documents have been added, you will likely want to query it during the execution of your chain or agent.\n",
+    "\n",
+    "### Query directly\n",
+    "\n",
+    "#### Similarity search\n",
+    "\n",
+    "A simple similarity search can be performed as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "015831a3",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
+      "* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
+     ]
+    }
+   ],
+   "source": [
+    "results = vector_store.similarity_search(\n",
+    "    \"LangChain provides abstractions to make working with LLMs easy\", k=2\n",
+    ")\n",
+    "for res in results:\n",
+    "    print(f\"* {res.page_content} [{res.metadata}]\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "623d3b9d",
+   "metadata": {},
+   "source": [
+    "#### Similarity search with score\n",
+    "\n",
+    "You can also perform a search with a score:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "e7d43430",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "* [SIM=0.595] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]\n",
+      "* [SIM=0.212] I had chocalate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]\n",
+      "* [SIM=0.118] Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]\n"
+     ]
+    }
+   ],
+   "source": [
+    "results = vector_store.similarity_search_with_score(\"Will it be hot tomorrow?\", k=3)\n",
+    "for res, score in results:\n",
+    "    print(f\"* [SIM={score:.3f}] {res.page_content} [{res.metadata}]\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5a90c12",
+   "metadata": {},
+   "source": [
+    "### Filtering\n",
+    "\n",
+    "You can search with filters as described below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "169d01d1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "* I had chocalate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]\n",
+      "* Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]\n",
+      "* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
+      "* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
+     ]
+    }
+   ],
+   "source": [
+    "results = vector_store.similarity_search_with_score(\n",
+    "    \"What did I eat for breakfast?\",\n",
+    "    k=4,\n",
+    "    filter={\"source\": \"tweet\"},\n",
+    ")\n",
+    "for res, _ in results:\n",
+    "    print(f\"* {res.page_content} [{res.metadata}]\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afacfd4e",
+   "metadata": {},
+   "source": [
+    "### Query by turning into retriever\n",
+    "\n",
+    "You can also transform the vector store into a retriever for easier usage in your chains.\n",
+    "\n",
+    "Here's how to transform your vector store into a retriever and then invoke the retriever with a simple query and filter."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "97187188",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "* Robbers broke into the city bank and stole $1 million in cash. [{'source': 'news'}]\n",
+      "* The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]\n"
+     ]
+    }
+   ],
+   "source": [
+    "retriever = vector_store.as_retriever(\n",
+    "    search_kwargs={\"k\": 2},\n",
+    ")\n",
+    "results = retriever.invoke(\n",
+    "    \"Stealing from the bank is a crime\", filter={\"source\": \"news\"}\n",
+    ")\n",
+    "for res in results:\n",
+    "    print(f\"* {res.page_content} [{res.metadata}]\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57fade30",
+   "metadata": {},
+   "source": [
+    "## Usage for retrieval-augmented generation\n",
+    "\n",
+    "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
+    "\n",
+    "- [Tutorials](/docs/tutorials/)\n",
+    "- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
+    "- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02452d34",
+   "metadata": {},
+   "source": [
+    "## API reference\n",
+    "\n",
+    "For detailed documentation of all `YDB` features and configurations head to the API reference:https://python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.ydb.YDB.html"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/libs/packages.yml
+++ b/libs/packages.yml
@ -603,3 +603,6 @@ packages:
 - name: langchain-cloudflare
  repo: cloudflare/langchain-cloudflare
  path: .
+- name: langchain-ydb
+  path: .
+  repo: ydb-platform/langchain-ydb