docs: YDB Vector Store docs (#30636)

This PR adds docs about how to use YDB as a vector store

[YDB](https://ydb.tech/) is a versatile open-source distributed SQL
database. It supports [vector
search](https://ydb.tech/docs/en/yql/reference/udf/list/knn) which means
it can be used as a vector store with langchain.

YDB vectore store comes with
[langchain-ydb](https://pypi.org/project/langchain-ydb/) pypi package.

Co-authored-by: ccurme <chester.curme@gmail.com>
This commit is contained in:
Oleg Ovcharuk 2025-04-11 04:33:56 +03:00 committed by GitHub
parent 7a4ae6fbff
commit b6fe7e8c10
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 517 additions and 0 deletions

View File

@ -0,0 +1,23 @@
# YDB
All functionality related to YDB.
> [YDB](https://ydb.tech/) is a versatile open source Distributed SQL Database that combines
> high availability and scalability with strong consistency and ACID transactions.
> It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously.
## Installation and Setup
```bash
pip install langchain-ydb
```
## Vector Store
To import YDB vector store:
```python
from langchain_ydb.vectorstores import YDB
```
For a more detailed walkthrough of the YDB vector store, see [this notebook](/docs/integrations/vectorstores/ydb).

View File

@ -0,0 +1,491 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# YDB\n",
"\n",
"> [YDB](https://ydb.tech/) is a versatile open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions. It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously.\n",
"\n",
"This notebook shows how to use functionality related to the `YDB` vector store.\n",
"\n",
"## Setup\n",
"\n",
"First, set up a local YDB with Docker:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da75f17c",
"metadata": {},
"outputs": [],
"source": [
"! docker run -d -p 2136:2136 --name ydb-langchain -e YDB_USE_IN_MEMORY_PDISKS=true -h localhost ydbplatform/local-ydb:trunk"
]
},
{
"cell_type": "markdown",
"id": "0acb2a8d",
"metadata": {},
"source": [
"You'll need to install `langchain-ydb` to use this integration"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d454fb7c",
"metadata": {},
"outputs": [],
"source": [
"! pip install -qU langchain-ydb"
]
},
{
"cell_type": "markdown",
"id": "3df5501b",
"metadata": {},
"source": [
"### Credentials\n",
"\n",
"There are no credentials for this notebook, just make sure you have installed the packages as shown above."
]
},
{
"cell_type": "markdown",
"id": "54d5276f",
"metadata": {},
"source": [
"If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f6fd5b03",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"id": "2b87fe34",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
"\n",
"<EmbeddingTabs/>\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60276097",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/ovcharuk/opensource/langchain/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
"source": [
"# | output: false\n",
"# | echo: false\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aac9563e",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-03T08:33:31.554934Z",
"start_time": "2023-06-03T08:33:31.549590Z"
},
"tags": []
},
"outputs": [],
"source": [
"from langchain_ydb.vectorstores import YDB, YDBSearchStrategy, YDBSettings\n",
"\n",
"settings = YDBSettings(\n",
" table=\"ydb_example\",\n",
" strategy=YDBSearchStrategy.COSINE_SIMILARITY,\n",
")\n",
"vector_store = YDB(embeddings, config=settings)"
]
},
{
"cell_type": "markdown",
"id": "32dd3f67",
"metadata": {},
"source": [
"## Manage vector store\n",
"\n",
"Once you have created your vector store, you can interact with it by adding and deleting different items.\n",
"\n",
"### Add items to vector store\n",
"\n",
"Prepare documents to work with:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "944743ee",
"metadata": {},
"outputs": [],
"source": [
"from uuid import uuid4\n",
"\n",
"from langchain_core.documents import Document\n",
"\n",
"document_1 = Document(\n",
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
" metadata={\"source\": \"tweet\"},\n",
")\n",
"\n",
"document_2 = Document(\n",
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
" metadata={\"source\": \"news\"},\n",
")\n",
"\n",
"document_3 = Document(\n",
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
" metadata={\"source\": \"tweet\"},\n",
")\n",
"\n",
"document_4 = Document(\n",
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
" metadata={\"source\": \"news\"},\n",
")\n",
"\n",
"document_5 = Document(\n",
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
" metadata={\"source\": \"tweet\"},\n",
")\n",
"\n",
"document_6 = Document(\n",
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
" metadata={\"source\": \"website\"},\n",
")\n",
"\n",
"document_7 = Document(\n",
" page_content=\"The top 10 soccer players in the world right now.\",\n",
" metadata={\"source\": \"website\"},\n",
")\n",
"\n",
"document_8 = Document(\n",
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
" metadata={\"source\": \"tweet\"},\n",
")\n",
"\n",
"document_9 = Document(\n",
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
" metadata={\"source\": \"news\"},\n",
")\n",
"\n",
"document_10 = Document(\n",
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
" metadata={\"source\": \"tweet\"},\n",
")\n",
"\n",
"documents = [\n",
" document_1,\n",
" document_2,\n",
" document_3,\n",
" document_4,\n",
" document_5,\n",
" document_6,\n",
" document_7,\n",
" document_8,\n",
" document_9,\n",
" document_10,\n",
"]\n",
"uuids = [str(uuid4()) for _ in range(len(documents))]"
]
},
{
"cell_type": "markdown",
"id": "c616b7f4",
"metadata": {},
"source": [
"You can add items to your vector store by using the `add_documents` function."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3f632996",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Inserting data...: 100%|██████████| 10/10 [00:00<00:00, 14.67it/s]\n"
]
},
{
"data": {
"text/plain": [
"['947be6aa-d489-44c5-910e-62e4d58d2ffb',\n",
" '7a62904d-9db3-412b-83b6-f01b34dd7de3',\n",
" 'e5a49c64-c985-4ed7-ac58-5ffa31ade699',\n",
" '99cf4104-36ab-4bd5-b0da-e210d260e512',\n",
" '5810bcd0-b46e-443e-a663-e888c9e028d1',\n",
" '190c193d-844e-4dbb-9a4b-b8f5f16cfae6',\n",
" 'f8912944-f80a-4178-954e-4595bf59e341',\n",
" '34fc7b09-6000-42c9-95f7-7d49f430b904',\n",
" '0f6b6783-f300-4a4d-bb04-8025c4dfd409',\n",
" '46c37ba9-7cf2-4ac8-9bd1-d84e2cb1155c']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vector_store.add_documents(documents=documents, ids=uuids)"
]
},
{
"cell_type": "markdown",
"id": "18af81cc",
"metadata": {},
"source": [
"### Delete items from vector store\n",
"\n",
"You can delete items from your vector store by ID using the `delete` function."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "12b32762",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vector_store.delete(ids=[uuids[-1]])"
]
},
{
"cell_type": "markdown",
"id": "ada27577",
"metadata": {},
"source": [
"## Query vector store\n",
"\n",
"Once your vector store has been created and relevant documents have been added, you will likely want to query it during the execution of your chain or agent.\n",
"\n",
"### Query directly\n",
"\n",
"#### Similarity search\n",
"\n",
"A simple similarity search can be performed as follows:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "015831a3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
]
}
],
"source": [
"results = vector_store.similarity_search(\n",
" \"LangChain provides abstractions to make working with LLMs easy\", k=2\n",
")\n",
"for res in results:\n",
" print(f\"* {res.page_content} [{res.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "623d3b9d",
"metadata": {},
"source": [
"#### Similarity search with score\n",
"\n",
"You can also perform a search with a score:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "e7d43430",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* [SIM=0.595] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]\n",
"* [SIM=0.212] I had chocalate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]\n",
"* [SIM=0.118] Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]\n"
]
}
],
"source": [
"results = vector_store.similarity_search_with_score(\"Will it be hot tomorrow?\", k=3)\n",
"for res, score in results:\n",
" print(f\"* [SIM={score:.3f}] {res.page_content} [{res.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "f5a90c12",
"metadata": {},
"source": [
"### Filtering\n",
"\n",
"You can search with filters as described below:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "169d01d1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* I had chocalate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]\n",
"* Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]\n",
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
]
}
],
"source": [
"results = vector_store.similarity_search_with_score(\n",
" \"What did I eat for breakfast?\",\n",
" k=4,\n",
" filter={\"source\": \"tweet\"},\n",
")\n",
"for res, _ in results:\n",
" print(f\"* {res.page_content} [{res.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "afacfd4e",
"metadata": {},
"source": [
"### Query by turning into retriever\n",
"\n",
"You can also transform the vector store into a retriever for easier usage in your chains.\n",
"\n",
"Here's how to transform your vector store into a retriever and then invoke the retriever with a simple query and filter."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "97187188",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Robbers broke into the city bank and stole $1 million in cash. [{'source': 'news'}]\n",
"* The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]\n"
]
}
],
"source": [
"retriever = vector_store.as_retriever(\n",
" search_kwargs={\"k\": 2},\n",
")\n",
"results = retriever.invoke(\n",
" \"Stealing from the bank is a crime\", filter={\"source\": \"news\"}\n",
")\n",
"for res in results:\n",
" print(f\"* {res.page_content} [{res.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "57fade30",
"metadata": {},
"source": [
"## Usage for retrieval-augmented generation\n",
"\n",
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
"\n",
"- [Tutorials](/docs/tutorials/)\n",
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval)"
]
},
{
"cell_type": "markdown",
"id": "02452d34",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `YDB` features and configurations head to the API reference:https://python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.ydb.YDB.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -603,3 +603,6 @@ packages:
- name: langchain-cloudflare
repo: cloudflare/langchain-cloudflare
path: .
- name: langchain-ydb
path: .
repo: ydb-platform/langchain-ydb