community[minor]: Added document loader for SurrealDB (#15995)

Added a simple document loader to work with SurrealDB.
This commit is contained in:
Karim Lalani 2024-01-15 12:32:42 -06:00 committed by GitHub
parent 768e5e33bc
commit 14244bd7e5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 336 additions and 0 deletions

View File

@ -0,0 +1,236 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "5812b612-3e77-4be2-aefb-fbb16141ab79",
"metadata": {},
"source": [
"# SurrealDB\n",
"\n",
">[SurrealDB](https://surrealdb.com/) is an end-to-end cloud-native database designed for modern applications, including web, mobile, serverless, Jamstack, backend, and traditional applications. With SurrealDB, you can simplify your database and API infrastructure, reduce development time, and build secure, performant apps quickly and cost-effectively.\n",
">\n",
">**Key features of SurrealDB include:**\n",
">\n",
">* **Reduces development time:** SurrealDB simplifies your database and API stack by removing the need for most server-side components, allowing you to build secure, performant apps faster and cheaper.\n",
">* **Real-time collaborative API backend service:** SurrealDB functions as both a database and an API backend service, enabling real-time collaboration.\n",
">* **Support for multiple querying languages:** SurrealDB supports SQL querying from client devices, GraphQL, ACID transactions, WebSocket connections, structured and unstructured data, graph querying, full-text indexing, and geospatial querying.\n",
">* **Granular access control:** SurrealDB provides row-level permissions-based access control, giving you the ability to manage data access with precision.\n",
">\n",
">View the [features](https://surrealdb.com/features), the latest [releases](https://surrealdb.com/releases), and [documentation](https://surrealdb.com/docs).\n",
"\n",
"This notebook shows how to use functionality related to the `SurrealDBLoader`."
]
},
{
"cell_type": "markdown",
"id": "f56ccec5-24b3-4762-91a6-91385e041fee",
"metadata": {},
"source": [
"## Overview\n",
"\n",
"The SurrealDB Document Loader returns a list of Langchain Documents from a SurrealDB database.\n",
"\n",
"The Document Loader takes the following optional parameters:\n",
"\n",
"* `dburl`: connection string to the websocket endpoint. default: `ws://localhost:8000/rpc`\n",
"* `ns`: name of the namespace. default: `langchain`\n",
"* `db`: name of the database. default: `database`\n",
"* `table`: name of the table. default: `documents`\n",
"* `db_user`: SurrealDB credentials if needed: db username.\n",
"* `db_pass`: SurrealDB credentails if needed: db password.\n",
"* `filter_criteria`: dictionary to construct the `WHERE` clause for filtering results from table.\n",
"\n",
"The output `Document` takes the following shape:\n",
"```\n",
"Document(\n",
" page_content=<json encoded string containing the result document>,\n",
" metadata={\n",
" 'id': <document id>,\n",
" 'ns': <namespace name>,\n",
" 'db': <database_name>,\n",
" 'table': <table name>,\n",
" ... <additional fields from metadata property of the document>\n",
" }\n",
")\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "77b024e0-c804-4b19-9f5e-0099eb61ba79",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Uncomment the below cells to install surrealdb and langchain."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "508bc4f3-3aa2-45d3-8e59-cd7d0ffec379",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# %pip install --upgrade --quiet surrealdb langchain langchain-community"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3ee3d767-b9ba-4be4-9e80-8fa6376beaba",
"metadata": {},
"outputs": [],
"source": [
"# add this import for running in jupyter notebook\n",
"import nest_asyncio\n",
"\n",
"nest_asyncio.apply()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "1ec629f4-b99a-44f1-a938-29de7439f121",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"from langchain_community.document_loaders.surrealdb import SurrealDBLoader"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8deb90ac-7d4e-422c-a87a-8e6e41390a6d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"42"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader = SurrealDBLoader(\n",
" dburl=\"ws://localhost:8000/rpc\",\n",
" ns=\"langchain\",\n",
" db=\"database\",\n",
" table=\"documents\",\n",
" db_user=\"root\",\n",
" db_pass=\"root\",\n",
" filter_criteria={},\n",
")\n",
"docs = loader.load()\n",
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0aa9d3f7-56b3-464d-9d3d-1df7164122ba",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': 'documents:zzz434sa584xl3b4ohvk',\n",
" 'source': '../../modules/state_of_the_union.txt',\n",
" 'ns': 'langchain',\n",
" 'db': 'database',\n",
" 'table': 'documents'}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"doc = docs[-1]\n",
"doc.metadata"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "0378dd34-c690-4b8e-8816-90a8acc2f227",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"18078"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(doc.page_content)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f30f1141-329b-4674-acb4-36d9d5a9ef0a",
"metadata": {},
"outputs": [],
"source": [
"page_content = json.loads(doc.page_content)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "2a58496f-a831-40ec-be6b-92ce70f78133",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'When we use taxpayer dollars to rebuild America we are going to Buy American: buy American products to support American jobs. \\n\\nThe federal government spends about $600 Billion a year to keep the country safe and secure. \\n\\nTheres been a law on the books for almost a century \\nto make sure taxpayers dollars support American jobs and businesses. \\n\\nEvery Administration says theyll do it, but we are actually doing it. \\n\\nWe will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America. \\n\\nBut to compete for the best jobs of the future, we also need to level the playing field with China and other competitors. \\n\\nThats why it is so important to pass the Bipartisan Innovation Act sitting in Congress that will make record investments in emerging technologies and American manufacturing. \\n\\nLet me give you one example of why its so important to pass it.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"page_content[\"text\"]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -180,6 +180,7 @@ from langchain_community.document_loaders.snowflake_loader import SnowflakeLoade
from langchain_community.document_loaders.spreedly import SpreedlyLoader
from langchain_community.document_loaders.srt import SRTLoader
from langchain_community.document_loaders.stripe import StripeLoader
from langchain_community.document_loaders.surrealdb import SurrealDBLoader
from langchain_community.document_loaders.telegram import (
TelegramChatApiLoader,
TelegramChatFileLoader,
@ -360,6 +361,7 @@ __all__ = [
"SnowflakeLoader",
"SpreedlyLoader",
"StripeLoader",
"SurrealDBLoader",
"TelegramChatApiLoader",
"TelegramChatFileLoader",
"TelegramChatLoader",

View File

@ -0,0 +1,97 @@
import asyncio
import json
import logging
from typing import Any, Dict, List, Optional
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseLoader
logger = logging.getLogger(__name__)
class SurrealDBLoader(BaseLoader):
"""Load SurrealDB documents."""
def __init__(
self,
filter_criteria: Optional[Dict] = None,
**kwargs: Any,
) -> None:
try:
from surrealdb import Surreal
except ImportError as e:
raise ImportError(
"""Cannot import from surrealdb.
please install with `pip install surrealdb`."""
) from e
self.dburl = kwargs.pop("dburl", "ws://localhost:8000/rpc")
if self.dburl[0:2] == "ws":
self.sdb = Surreal(self.dburl)
else:
raise ValueError("Only websocket connections are supported at this time.")
self.filter_criteria = filter_criteria or {}
if "table" in self.filter_criteria:
raise ValueError(
"key `table` is not a valid criteria for `filter_criteria` argument."
)
self.ns = kwargs.pop("ns", "langchain")
self.db = kwargs.pop("db", "database")
self.table = kwargs.pop("table", "documents")
self.sdb = Surreal(self.dburl)
self.kwargs = kwargs
asyncio.run(self.initialize())
async def initialize(self) -> None:
"""
Initialize connection to surrealdb database
and authenticate if credentials are provided
"""
await self.sdb.connect()
if "db_user" in self.kwargs and "db_pass" in self.kwargs:
user = self.kwargs.get("db_user")
password = self.kwargs.get("db_pass")
await self.sdb.signin({"user": user, "pass": password})
await self.sdb.use(self.ns, self.db)
def load(self) -> List[Document]:
async def _load() -> List[Document]:
await self.initialize()
return await self.aload()
return asyncio.run(_load())
async def aload(self) -> List[Document]:
"""Load data into Document objects."""
query = "SELECT * FROM type::table($table)"
if self.filter_criteria is not None and len(self.filter_criteria) > 0:
query += " WHERE "
for idx, key in enumerate(self.filter_criteria):
query += f""" {"AND" if idx > 0 else ""} {key} = ${key}"""
metadata = {
"ns": self.ns,
"db": self.db,
"table": self.table,
}
results = await self.sdb.query(
query, {"table": self.table, **self.filter_criteria}
)
return [
(
Document(
page_content=json.dumps(result),
metadata={"id": result["id"], **result["metadata"], **metadata},
)
)
for result in results[0]["result"]
]

View File

@ -132,6 +132,7 @@ EXPECTED_ALL = [
"SnowflakeLoader",
"SpreedlyLoader",
"StripeLoader",
"SurrealDBLoader",
"TelegramChatApiLoader",
"TelegramChatFileLoader",
"TelegramChatLoader",