mirror of
https://github.com/hwchase17/langchain.git
synced 2025-08-14 23:26:34 +00:00
docs: YDB Vector Store docs (#30636)
This PR adds docs about how to use YDB as a vector store [YDB](https://ydb.tech/) is a versatile open-source distributed SQL database. It supports [vector search](https://ydb.tech/docs/en/yql/reference/udf/list/knn) which means it can be used as a vector store with langchain. YDB vectore store comes with [langchain-ydb](https://pypi.org/project/langchain-ydb/) pypi package. Co-authored-by: ccurme <chester.curme@gmail.com>
This commit is contained in:
parent
7a4ae6fbff
commit
b6fe7e8c10
23
docs/docs/integrations/providers/ydb.mdx
Normal file
23
docs/docs/integrations/providers/ydb.mdx
Normal file
@ -0,0 +1,23 @@
|
||||
# YDB
|
||||
|
||||
All functionality related to YDB.
|
||||
|
||||
> [YDB](https://ydb.tech/) is a versatile open source Distributed SQL Database that combines
|
||||
> high availability and scalability with strong consistency and ACID transactions.
|
||||
> It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install langchain-ydb
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
|
||||
To import YDB vector store:
|
||||
|
||||
```python
|
||||
from langchain_ydb.vectorstores import YDB
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the YDB vector store, see [this notebook](/docs/integrations/vectorstores/ydb).
|
491
docs/docs/integrations/vectorstores/ydb.ipynb
Normal file
491
docs/docs/integrations/vectorstores/ydb.ipynb
Normal file
@ -0,0 +1,491 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# YDB\n",
|
||||
"\n",
|
||||
"> [YDB](https://ydb.tech/) is a versatile open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions. It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `YDB` vector store.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"First, set up a local YDB with Docker:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "da75f17c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! docker run -d -p 2136:2136 --name ydb-langchain -e YDB_USE_IN_MEMORY_PDISKS=true -h localhost ydbplatform/local-ydb:trunk"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0acb2a8d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You'll need to install `langchain-ydb` to use this integration"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d454fb7c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install -qU langchain-ydb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3df5501b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"There are no credentials for this notebook, just make sure you have installed the packages as shown above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "54d5276f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f6fd5b03",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2b87fe34",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
|
||||
"\n",
|
||||
"<EmbeddingTabs/>\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "60276097",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/ovcharuk/opensource/langchain/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
|
||||
" from .autonotebook import tqdm as notebook_tqdm\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:33:31.554934Z",
|
||||
"start_time": "2023-06-03T08:33:31.549590Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_ydb.vectorstores import YDB, YDBSearchStrategy, YDBSettings\n",
|
||||
"\n",
|
||||
"settings = YDBSettings(\n",
|
||||
" table=\"ydb_example\",\n",
|
||||
" strategy=YDBSearchStrategy.COSINE_SIMILARITY,\n",
|
||||
")\n",
|
||||
"vector_store = YDB(embeddings, config=settings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "32dd3f67",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Manage vector store\n",
|
||||
"\n",
|
||||
"Once you have created your vector store, you can interact with it by adding and deleting different items.\n",
|
||||
"\n",
|
||||
"### Add items to vector store\n",
|
||||
"\n",
|
||||
"Prepare documents to work with:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "944743ee",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from uuid import uuid4\n",
|
||||
"\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"document_1 = Document(\n",
|
||||
" page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_2 = Document(\n",
|
||||
" page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_3 = Document(\n",
|
||||
" page_content=\"Building an exciting new project with LangChain - come check it out!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_4 = Document(\n",
|
||||
" page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_5 = Document(\n",
|
||||
" page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_6 = Document(\n",
|
||||
" page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_7 = Document(\n",
|
||||
" page_content=\"The top 10 soccer players in the world right now.\",\n",
|
||||
" metadata={\"source\": \"website\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_8 = Document(\n",
|
||||
" page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_9 = Document(\n",
|
||||
" page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n",
|
||||
" metadata={\"source\": \"news\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_10 = Document(\n",
|
||||
" page_content=\"I have a bad feeling I am going to get deleted :(\",\n",
|
||||
" metadata={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = [\n",
|
||||
" document_1,\n",
|
||||
" document_2,\n",
|
||||
" document_3,\n",
|
||||
" document_4,\n",
|
||||
" document_5,\n",
|
||||
" document_6,\n",
|
||||
" document_7,\n",
|
||||
" document_8,\n",
|
||||
" document_9,\n",
|
||||
" document_10,\n",
|
||||
"]\n",
|
||||
"uuids = [str(uuid4()) for _ in range(len(documents))]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c616b7f4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can add items to your vector store by using the `add_documents` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "3f632996",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Inserting data...: 100%|██████████| 10/10 [00:00<00:00, 14.67it/s]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['947be6aa-d489-44c5-910e-62e4d58d2ffb',\n",
|
||||
" '7a62904d-9db3-412b-83b6-f01b34dd7de3',\n",
|
||||
" 'e5a49c64-c985-4ed7-ac58-5ffa31ade699',\n",
|
||||
" '99cf4104-36ab-4bd5-b0da-e210d260e512',\n",
|
||||
" '5810bcd0-b46e-443e-a663-e888c9e028d1',\n",
|
||||
" '190c193d-844e-4dbb-9a4b-b8f5f16cfae6',\n",
|
||||
" 'f8912944-f80a-4178-954e-4595bf59e341',\n",
|
||||
" '34fc7b09-6000-42c9-95f7-7d49f430b904',\n",
|
||||
" '0f6b6783-f300-4a4d-bb04-8025c4dfd409',\n",
|
||||
" '46c37ba9-7cf2-4ac8-9bd1-d84e2cb1155c']"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"vector_store.add_documents(documents=documents, ids=uuids)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "18af81cc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Delete items from vector store\n",
|
||||
"\n",
|
||||
"You can delete items from your vector store by ID using the `delete` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "12b32762",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"vector_store.delete(ids=[uuids[-1]])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ada27577",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query vector store\n",
|
||||
"\n",
|
||||
"Once your vector store has been created and relevant documents have been added, you will likely want to query it during the execution of your chain or agent.\n",
|
||||
"\n",
|
||||
"### Query directly\n",
|
||||
"\n",
|
||||
"#### Similarity search\n",
|
||||
"\n",
|
||||
"A simple similarity search can be performed as follows:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "015831a3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
|
||||
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search(\n",
|
||||
" \"LangChain provides abstractions to make working with LLMs easy\", k=2\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "623d3b9d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Similarity search with score\n",
|
||||
"\n",
|
||||
"You can also perform a search with a score:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "e7d43430",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* [SIM=0.595] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]\n",
|
||||
"* [SIM=0.212] I had chocalate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]\n",
|
||||
"* [SIM=0.118] Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(\"Will it be hot tomorrow?\", k=3)\n",
|
||||
"for res, score in results:\n",
|
||||
" print(f\"* [SIM={score:.3f}] {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f5a90c12",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Filtering\n",
|
||||
"\n",
|
||||
"You can search with filters as described below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "169d01d1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* I had chocalate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]\n",
|
||||
"* Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]\n",
|
||||
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
|
||||
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = vector_store.similarity_search_with_score(\n",
|
||||
" \"What did I eat for breakfast?\",\n",
|
||||
" k=4,\n",
|
||||
" filter={\"source\": \"tweet\"},\n",
|
||||
")\n",
|
||||
"for res, _ in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "afacfd4e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query by turning into retriever\n",
|
||||
"\n",
|
||||
"You can also transform the vector store into a retriever for easier usage in your chains.\n",
|
||||
"\n",
|
||||
"Here's how to transform your vector store into a retriever and then invoke the retriever with a simple query and filter."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "97187188",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"* Robbers broke into the city bank and stole $1 million in cash. [{'source': 'news'}]\n",
|
||||
"* The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever = vector_store.as_retriever(\n",
|
||||
" search_kwargs={\"k\": 2},\n",
|
||||
")\n",
|
||||
"results = retriever.invoke(\n",
|
||||
" \"Stealing from the bank is a crime\", filter={\"source\": \"news\"}\n",
|
||||
")\n",
|
||||
"for res in results:\n",
|
||||
" print(f\"* {res.page_content} [{res.metadata}]\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "57fade30",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage for retrieval-augmented generation\n",
|
||||
"\n",
|
||||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
|
||||
"\n",
|
||||
"- [Tutorials](/docs/tutorials/)\n",
|
||||
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
|
||||
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02452d34",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all `YDB` features and configurations head to the API reference:https://python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.ydb.YDB.html"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.13.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -603,3 +603,6 @@ packages:
|
||||
- name: langchain-cloudflare
|
||||
repo: cloudflare/langchain-cloudflare
|
||||
path: .
|
||||
- name: langchain-ydb
|
||||
path: .
|
||||
repo: ydb-platform/langchain-ydb
|
||||
|
Loading…
Reference in New Issue
Block a user