Xata as a chat message memory store (#9719)

This adds Xata as a memory store also to the python version of
LangChain, similar to the [one for
LangChain.js](https://github.com/hwchase17/langchainjs/pull/2217).

I have added a Jupyter Notebook with a simple and a more complex example
using an agent.

To run the integration test, you need to execute something like:

```
XATA_API_KEY='xau_...' XATA_DB_URL="https://demo-uni3q8.eu-west-1.xata.sh/db/langchain"  poetry run pytest tests/integration_tests/memory/test_xata.py
```

Where `langchain` is the database you create in Xata.
This commit is contained in:
Tudor Golubenco
2023-08-25 01:37:46 +01:00
committed by GitHub
parent dff00ea91e
commit dc30edf51c
5 changed files with 503 additions and 0 deletions

View File

@@ -0,0 +1,326 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Xata chat memory\n",
"\n",
"[Xata](https://xata.io) is a serverless data platform, based on PostgreSQL and Elasticsearch. It provides a Python SDK for interacting with your database, and a UI for managing your data. With the `XataChatMessageHistory` class, you can use Xata databases for longer-term persistence of chat sessions.\n",
"\n",
"This notebook covers:\n",
"\n",
"* A simple example showing what `XataChatMessageHistory` does.\n",
"* A more complex example using a REACT agent that answer questions based on a knowledge based or documentation (stored in Xata as a vector store) and also having a long-term searchable history of its past messages (stored in Xata as a memory store)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"### Create a database\n",
"\n",
"In the [Xata UI](https://app.xata.io) create a new database. You can name it whatever you want, in this notepad we'll use `langchain`. The Langchain integration can auto-create the table used for storying the memory, and this is what we'll use in this example. If you want to pre-create the table, ensure it has the right schema and set `create_table` to `False` when creating the class. Pre-creating the table saves one round-trip to the database during each session initialization."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's first install our dependencies:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install xata==1.0.0rc0 openai langchain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we need to get the environment variables for Xata. You can create a new API key by visiting your [account settings](https://app.xata.io/settings). To find the database URL, go to the Settings page of the database that you have created. The database URL should look something like this: `https://demo-uni3q8.eu-west-1.xata.sh/db/langchain`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"\n",
"api_key = getpass.getpass(\"Xata API key: \")\n",
"db_url = input(\"Xata database URL (copy it from your DB settings):\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a simple memory store\n",
"\n",
"To test the memory store functionality in isolation, let's use the following code snippet:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.memory import XataChatMessageHistory\n",
"\n",
"history = XataChatMessageHistory(\n",
" session_id=\"session-1\",\n",
" api_key=api_key,\n",
" db_url=db_url,\n",
" table_name=\"memory\"\n",
")\n",
"\n",
"history.add_user_message(\"hi!\")\n",
"\n",
"history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above code creates a session with the ID `session-1` and stores two messages in it. After running the above, if you visit the Xata UI, you should see a table named `memory` and the two messages added to it.\n",
"\n",
"You can retrieve the message history for a particular session with the following code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"history.messages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conversational Q&A chain on your data with memory\n",
"\n",
"Let's now see a more complex example in which we combine OpenAI, the Xata Vector Store integration, and the Xata memory store integration to create a Q&A chat bot on your data, with follow-up questions and history."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're going to need to access the OpenAI API, so let's configure the API key:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To store the documents that the chatbot will search for answers, add a table named `docs` to your `langchain` database using the Xata UI, and add the following columns:\n",
"\n",
"* `content` of type \"Text\". This is used to store the `Document.pageContent` values.\n",
"* `embedding` of type \"Vector\". Use the dimension used by the model you plan to use. In this notebook we use OpenAI embeddings, which have 1536 dimensions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create the vector store and add some sample docs to it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores.xata import XataVectorStore\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"\n",
"texts = [\n",
" \"Xata is a Serverless Data platform based on PostgreSQL\",\n",
" \"Xata offers a built-in vector type that can be used to store and query vectors\",\n",
" \"Xata includes similarity search\"\n",
"]\n",
"\n",
"vector_store = XataVectorStore.from_texts(texts, embeddings, api_key=api_key, db_url=db_url, table_name=\"docs\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After running the above command, if you go to the Xata UI, you should see the documents loaded together with their embeddings in the `docs` table."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now create a ConversationBufferMemory to store the chat messages from both the user and the AI."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.memory import ConversationBufferMemory\n",
"from uuid import uuid4\n",
"\n",
"chat_memory = XataChatMessageHistory(\n",
" session_id=str(uuid4()), # needs to be unique per user session\n",
" api_key=api_key,\n",
" db_url=db_url,\n",
" table_name=\"memory\"\n",
")\n",
"memory = ConversationBufferMemory(memory_key=\"chat_history\", chat_memory=chat_memory, return_messages=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now it's time to create an Agent to use both the vector store and the chat memory together."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import initialize_agent, AgentType\n",
"from langchain.agents.agent_toolkits import create_retriever_tool\n",
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"tool = create_retriever_tool(\n",
" vector_store.as_retriever(), \n",
" \"search_docs\",\n",
" \"Searches and returns documents from the Xata manual. Useful when you need to answer questions about Xata.\"\n",
")\n",
"tools = [tool]\n",
"\n",
"llm = ChatOpenAI(temperature=0)\n",
"\n",
"agent = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,\n",
" verbose=True,\n",
" memory=memory)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To test, let's tell the agent our name:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"agent.run(input=\"My name is bob\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's now ask the agent some questions about Xata:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"agent.run(input=\"What is xata?\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that it answers based on the data stored in the document store. And now, let's ask a follow up question:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"agent.run(input=\"Does it support similarity search?\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now let's test its memory:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"agent.run(input=\"Did I tell you my name? What is it?\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}