Xata as a chat message memory store (#9719)

This adds Xata as a memory store also to the python version of LangChain, similar to the [one for LangChain.js](https://github.com/hwchase17/langchainjs/pull/2217). I have added a Jupyter Notebook with a simple and a more complex example using an agent. To run the integration test, you need to execute something like: ``` XATA_API_KEY='xau_...' XATA_DB_URL="https://demo-uni3q8.eu-west-1.xata.sh/db/langchain" poetry run pytest tests/integration_tests/memory/test_xata.py ``` Where `langchain` is the database you create in Xata.
2025-09-16 06:53:16 +00:00 · 2023-08-25 01:37:46 +01:00
parent dff00ea91e
commit dc30edf51c
5 changed files with 503 additions and 0 deletions
--- a/docs/extras/integrations/memory/xata_chat_message_history.ipynb
+++ b/docs/extras/integrations/memory/xata_chat_message_history.ipynb
@@ -0,0 +1,326 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Xata chat memory\n",
+    "\n",
+    "[Xata](https://xata.io) is a serverless data platform, based on PostgreSQL and Elasticsearch. It provides a Python SDK for interacting with your database, and a UI for managing your data. With the `XataChatMessageHistory` class, you can use Xata databases for longer-term persistence of chat sessions.\n",
+    "\n",
+    "This notebook covers:\n",
+    "\n",
+    "* A simple example showing what `XataChatMessageHistory` does.\n",
+    "* A more complex example using a REACT agent that answer questions based on a knowledge based or documentation (stored in Xata as a vector store) and also having a long-term searchable history of its past messages (stored in Xata as a memory store)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "### Create a database\n",
+    "\n",
+    "In the [Xata UI](https://app.xata.io) create a new database. You can name it whatever you want, in this notepad we'll use `langchain`. The Langchain integration can auto-create the table used for storying the memory, and this is what we'll use in this example. If you want to pre-create the table, ensure it has the right schema and set `create_table` to `False` when creating the class. Pre-creating the table saves one round-trip to the database during each session initialization."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's first install our dependencies:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install xata==1.0.0rc0 openai langchain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we need to get the environment variables for Xata. You can create a new API key by visiting your [account settings](https://app.xata.io/settings). To find the database URL, go to the Settings page of the database that you have created. The database URL should look something like this: `https://demo-uni3q8.eu-west-1.xata.sh/db/langchain`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import getpass\n",
+    "\n",
+    "api_key = getpass.getpass(\"Xata API key: \")\n",
+    "db_url = input(\"Xata database URL (copy it from your DB settings):\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create a simple memory store\n",
+    "\n",
+    "To test the memory store functionality in isolation, let's use the following code snippet:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.memory import XataChatMessageHistory\n",
+    "\n",
+    "history = XataChatMessageHistory(\n",
+    "    session_id=\"session-1\",\n",
+    "    api_key=api_key,\n",
+    "    db_url=db_url,\n",
+    "    table_name=\"memory\"\n",
+    ")\n",
+    "\n",
+    "history.add_user_message(\"hi!\")\n",
+    "\n",
+    "history.add_ai_message(\"whats up?\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The above code creates a session with the ID `session-1` and stores two messages in it. After running the above, if you visit the Xata UI, you should see a table named `memory` and the two messages added to it.\n",
+    "\n",
+    "You can retrieve the message history for a particular session with the following code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "history.messages"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Conversational Q&A chain on your data with memory\n",
+    "\n",
+    "Let's now see a more complex example in which we combine OpenAI, the Xata Vector Store integration, and the Xata memory store integration to create a Q&A chat bot on your data, with follow-up questions and history."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We're going to need to access the OpenAI API, so let's configure the API key:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To store the documents that the chatbot will search for answers, add a table named `docs` to your `langchain` database using the Xata UI, and add the following columns:\n",
+    "\n",
+    "* `content` of type \"Text\". This is used to store the `Document.pageContent` values.\n",
+    "* `embedding` of type \"Vector\". Use the dimension used by the model you plan to use. In this notebook we use OpenAI embeddings, which have 1536 dimensions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's create the vector store and add some sample docs to it:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
+    "from langchain.vectorstores.xata import XataVectorStore\n",
+    "\n",
+    "embeddings = OpenAIEmbeddings()\n",
+    "\n",
+    "texts = [\n",
+    "    \"Xata is a Serverless Data platform based on PostgreSQL\",\n",
+    "    \"Xata offers a built-in vector type that can be used to store and query vectors\",\n",
+    "    \"Xata includes similarity search\"\n",
+    "]\n",
+    "\n",
+    "vector_store = XataVectorStore.from_texts(texts, embeddings, api_key=api_key, db_url=db_url, table_name=\"docs\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After running the above command, if you go to the Xata UI, you should see the documents loaded together with their embeddings in the `docs` table."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's now create a ConversationBufferMemory to store the chat messages from both the user and the AI."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.memory import ConversationBufferMemory\n",
+    "from uuid import uuid4\n",
+    "\n",
+    "chat_memory = XataChatMessageHistory(\n",
+    "    session_id=str(uuid4()),   # needs to be unique per user session\n",
+    "    api_key=api_key,\n",
+    "    db_url=db_url,\n",
+    "    table_name=\"memory\"\n",
+    ")\n",
+    "memory = ConversationBufferMemory(memory_key=\"chat_history\", chat_memory=chat_memory, return_messages=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now it's time to create an Agent to use both the vector store and the chat memory together."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.agents import initialize_agent, AgentType\n",
+    "from langchain.agents.agent_toolkits import create_retriever_tool\n",
+    "from langchain.chat_models import ChatOpenAI\n",
+    "\n",
+    "tool = create_retriever_tool(\n",
+    "    vector_store.as_retriever(), \n",
+    "    \"search_docs\",\n",
+    "    \"Searches and returns documents from the Xata manual. Useful when you need to answer questions about Xata.\"\n",
+    ")\n",
+    "tools = [tool]\n",
+    "\n",
+    "llm = ChatOpenAI(temperature=0)\n",
+    "\n",
+    "agent = initialize_agent(\n",
+    "    tools,\n",
+    "    llm,\n",
+    "    agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,\n",
+    "    verbose=True,\n",
+    "    memory=memory)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To test, let's tell the agent our name:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "agent.run(input=\"My name is bob\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, let's now ask the agent some questions about Xata:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "agent.run(input=\"What is xata?\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice that it answers based on the data stored in the document store. And now, let's ask a follow up question:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "agent.run(input=\"Does it support similarity search?\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And now let's test its memory:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "agent.run(input=\"Did I tell you my name? What is it?\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}