mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-10 07:21:03 +00:00
community: SQLStrStore/SQLDocStore provide an easy SQL alternative to InMemoryStore
to persist data remotely in a SQL storage (#15909)
**Description:** - Implement `SQLStrStore` and `SQLDocStore` classes that inherits from `BaseStore` to allow to persist data remotely on a SQL server. - SQL is widely used and sometimes we do not want to install a caching solution like Redis. - Multiple issues/comments complain that there is no easy remote and persistent solution that are not in memory (users want to replace InMemoryStore), e.g., https://github.com/langchain-ai/langchain/issues/14267, https://github.com/langchain-ai/langchain/issues/15633, https://github.com/langchain-ai/langchain/issues/14643, https://stackoverflow.com/questions/77385587/persist-parentdocumentretriever-of-langchain - This is particularly painful when wanting to use `ParentDocumentRetriever ` - This implementation is particularly useful when: * it's expensive to construct an InMemoryDocstore/dict * you want to retrieve documents from remote sources * you just want to reuse existing objects - This implementation integrates well with PGVector, indeed, when using PGVector, you already have a SQL instance running. `SQLDocStore` is a convenient way of using this instance to store documents associated to vectors. An integration example with ParentDocumentRetriever and PGVector is provided in docs/docs/integrations/stores/sql.ipynb or [here](https://github.com/gcheron/langchain/blob/sql-store/docs/docs/integrations/stores/sql.ipynb). - It persists `str` and `Document` objects but can be easily extended. **Issue:** Provide an easy SQL alternative to `InMemoryStore`. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This commit is contained in:
186
docs/docs/integrations/stores/sql.ipynb
Normal file
186
docs/docs/integrations/stores/sql.ipynb
Normal file
@@ -0,0 +1,186 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_label: SQL\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# SQLStore\n",
|
||||
"\n",
|
||||
"The `SQLStrStore` and `SQLDocStore` implement remote data access and persistence to store strings or LangChain documents in your SQL instance."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"['value1', 'value2']\n",
|
||||
"['key2']\n",
|
||||
"['key2']\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.storage import SQLStrStore\n",
|
||||
"\n",
|
||||
"# simple example using an SQLStrStore to store strings\n",
|
||||
"# same as you would use in \"InMemoryStore\" but using SQL persistence\n",
|
||||
"CONNECTION_STRING = \"postgresql+psycopg2://user:pass@localhost:5432/db\"\n",
|
||||
"COLLECTION_NAME = \"test_collection\"\n",
|
||||
"\n",
|
||||
"store = SQLStrStore(\n",
|
||||
" collection_name=COLLECTION_NAME,\n",
|
||||
" connection_string=CONNECTION_STRING,\n",
|
||||
")\n",
|
||||
"store.mset([(\"key1\", \"value1\"), (\"key2\", \"value2\")])\n",
|
||||
"print(store.mget([\"key1\", \"key2\"]))\n",
|
||||
"# ['value1', 'value2']\n",
|
||||
"store.mdelete([\"key1\"])\n",
|
||||
"print(list(store.yield_keys()))\n",
|
||||
"# ['key2']\n",
|
||||
"print(list(store.yield_keys(prefix=\"k\")))\n",
|
||||
"# ['key2']\n",
|
||||
"# delete the COLLECTION_NAME collection"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Integration with ParentRetriever and PGVector\n",
|
||||
"\n",
|
||||
"When using PGVector, you already have a SQL instance running. Here is a convenient way of using this instance to store documents associated to vectors. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Prepare the PGVector vectorestore with something like this:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.vectorstores import PGVector\n",
|
||||
"from langchain_openai import OpenAIEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"vector_db = PGVector.from_existing_index(\n",
|
||||
" embedding=embeddings,\n",
|
||||
" collection_name=COLLECTION_NAME,\n",
|
||||
" connection_string=CONNECTION_STRING,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Then create the parent retiever using `SQLDocStore` to persist the documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"from langchain.retrievers import ParentDocumentRetriever\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"from langchain_community.storage import SQLDocStore\n",
|
||||
"\n",
|
||||
"CONNECTION_STRING = \"postgresql+psycopg2://user:pass@localhost:5432/db\"\n",
|
||||
"COLLECTION_NAME = \"state_of_the_union_test\"\n",
|
||||
"docstore = SQLDocStore(\n",
|
||||
" collection_name=COLLECTION_NAME,\n",
|
||||
" connection_string=CONNECTION_STRING,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"./state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"\n",
|
||||
"parent_splitter = RecursiveCharacterTextSplitter(chunk_size=400)\n",
|
||||
"child_splitter = RecursiveCharacterTextSplitter(chunk_size=50)\n",
|
||||
"\n",
|
||||
"retriever = ParentDocumentRetriever(\n",
|
||||
" vectorstore=vector_db,\n",
|
||||
" docstore=docstore,\n",
|
||||
" child_splitter=child_splitter,\n",
|
||||
" parent_splitter=parent_splitter,\n",
|
||||
")\n",
|
||||
"retriever.add_documents(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Delete a collection"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.storage import SQLStrStore\n",
|
||||
"\n",
|
||||
"# delete the COLLECTION_NAME collection\n",
|
||||
"CONNECTION_STRING = \"postgresql+psycopg2://user:pass@localhost:5432/db\"\n",
|
||||
"COLLECTION_NAME = \"test_collection\"\n",
|
||||
"store = SQLStrStore(\n",
|
||||
" collection_name=COLLECTION_NAME,\n",
|
||||
" connection_string=CONNECTION_STRING,\n",
|
||||
")\n",
|
||||
"store.delete_collection()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
Reference in New Issue
Block a user