community[minor]: add graph store implementation for apache age (#20582)

**Description:** implemented GraphStore class for Apache Age graph db

**Dependencies:** depends on psycopg2

Unit and integration tests included. Formatting and linting have been
run.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
shumway743
2024-04-20 14:31:04 -07:00
committed by GitHub
parent c909ae0152
commit cb6e5e56c2
4 changed files with 1920 additions and 0 deletions

View File

@@ -0,0 +1,689 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "c94240f5",
"metadata": {},
"source": [
"# Apache AGE\n",
"\n",
">[Apache AGE](https://age.apache.org/) is a PostgreSQL extension that provides graph database functionality. AGE is an acronym for A Graph Extension, and is inspired by Bitnines fork of PostgreSQL 10, AgensGraph, which is a multi-model database. The goal of the project is to create single storage that can handle both relational and graph model data so that users can use standard ANSI SQL along with openCypher, the Graph query language. The data elements `Apache AGE` stores are nodes, edges connecting them, and attributes of nodes and edges.\n",
"\n",
">This notebook shows how to use LLMs to provide a natural language interface to a graph database you can query with the `Cypher` query language.\n",
"\n",
">[Cypher](https://en.wikipedia.org/wiki/Cypher_(query_language)) is a declarative graph query language that allows for expressive and efficient data querying in a property graph.\n"
]
},
{
"cell_type": "markdown",
"id": "dbc0ee68",
"metadata": {},
"source": [
"## Settin up\n",
"\n",
"You will need to have a running `Postgre` instance with the AGE extension installed. One option for testing is to run a docker container using the official AGE docker image.\n",
"You can run a local docker container by running the executing the following script:\n",
"\n",
"```\n",
"docker run \\\n",
" --name age \\\n",
" -p 5432:5432 \\\n",
" -e POSTGRES_USER=postgresUser \\\n",
" -e POSTGRES_PASSWORD=postgresPW \\\n",
" -e POSTGRES_DB=postgresDB \\\n",
" -d \\\n",
" apache/age\n",
"```\n",
"\n",
"Additional instructions on running in docker can be found [here](https://hub.docker.com/r/apache/age)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "62812aad",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import GraphCypherQAChain\n",
"from langchain_community.graphs.age_graph import AGEGraph\n",
"from langchain_openai import ChatOpenAI"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "0928915d",
"metadata": {},
"outputs": [],
"source": [
"conf = {\n",
" \"database\": \"postgresDB\",\n",
" \"user\": \"postgresUser\",\n",
" \"password\": \"postgresPW\",\n",
" \"host\": \"localhost\",\n",
" \"port\": 5432,\n",
"}\n",
"\n",
"graph = AGEGraph(graph_name=\"age_test\", conf=conf)"
]
},
{
"cell_type": "markdown",
"id": "995ea9b9",
"metadata": {},
"source": [
"## Seeding the database\n",
"\n",
"Assuming your database is empty, you can populate it using Cypher query language. The following Cypher statement is idempotent, which means the database information will be the same if you run it one or multiple times."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "fedd26b9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"graph.query(\n",
" \"\"\"\n",
"MERGE (m:Movie {name:\"Top Gun\"})\n",
"WITH m\n",
"UNWIND [\"Tom Cruise\", \"Val Kilmer\", \"Anthony Edwards\", \"Meg Ryan\"] AS actor\n",
"MERGE (a:Actor {name:actor})\n",
"MERGE (a)-[:ACTED_IN]->(m)\n",
"\"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "58c1a8ea",
"metadata": {},
"source": [
"## Refresh graph schema information\n",
"If the schema of database changes, you can refresh the schema information needed to generate Cypher statements."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4e3de44f",
"metadata": {},
"outputs": [],
"source": [
"graph.refresh_schema()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1fe76ccd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" Node properties are the following:\n",
" [{'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Actor'}, {'properties': [{'property': 'property_a', 'type': 'STRING'}], 'labels': 'LabelA'}, {'properties': [], 'labels': 'LabelB'}, {'properties': [], 'labels': 'LabelC'}, {'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Movie'}]\n",
" Relationship properties are the following:\n",
" [{'properties': [], 'type': 'ACTED_IN'}, {'properties': [{'property': 'rel_prop', 'type': 'STRING'}], 'type': 'REL_TYPE'}]\n",
" The relationships are the following:\n",
" ['(:`Actor`)-[:`ACTED_IN`]->(:`Movie`)', '(:`LabelA`)-[:`REL_TYPE`]->(:`LabelB`)', '(:`LabelA`)-[:`REL_TYPE`]->(:`LabelC`)']\n",
" \n"
]
}
],
"source": [
"print(graph.schema)"
]
},
{
"cell_type": "markdown",
"id": "68a3c677",
"metadata": {},
"source": [
"## Querying the graph\n",
"\n",
"We can now use the graph cypher QA chain to ask question of the graph"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "7476ce98",
"metadata": {},
"outputs": [],
"source": [
"chain = GraphCypherQAChain.from_llm(\n",
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ef8ee27b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)\n",
"WHERE m.name = 'Top Gun'\n",
"RETURN a.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'name': 'Tom Cruise'}, {'name': 'Val Kilmer'}, {'name': 'Anthony Edwards'}, {'name': 'Meg Ryan'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'query': 'Who played in Top Gun?',\n",
" 'result': 'Tom Cruise, Val Kilmer, Anthony Edwards, Meg Ryan played in Top Gun.'}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\"Who played in Top Gun?\")"
]
},
{
"cell_type": "markdown",
"id": "2d28c4df",
"metadata": {},
"source": [
"## Limit the number of results\n",
"You can limit the number of results from the Cypher QA Chain using the `top_k` parameter.\n",
"The default is 10."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "df230946",
"metadata": {},
"outputs": [],
"source": [
"chain = GraphCypherQAChain.from_llm(\n",
" ChatOpenAI(temperature=0), graph=graph, verbose=True, top_k=2\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3f1600ee",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})\n",
"RETURN a.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'name': 'Tom Cruise'}, {'name': 'Val Kilmer'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'query': 'Who played in Top Gun?',\n",
" 'result': 'Tom Cruise, Val Kilmer played in Top Gun.'}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\"Who played in Top Gun?\")"
]
},
{
"cell_type": "markdown",
"id": "88c16206",
"metadata": {},
"source": [
"## Return intermediate results\n",
"You can return intermediate steps from the Cypher QA Chain using the `return_intermediate_steps` parameter"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "e412f36b",
"metadata": {},
"outputs": [],
"source": [
"chain = GraphCypherQAChain.from_llm(\n",
" ChatOpenAI(temperature=0), graph=graph, verbose=True, return_intermediate_steps=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "4f4699dc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)\n",
"WHERE m.name = 'Top Gun'\n",
"RETURN a.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'name': 'Tom Cruise'}, {'name': 'Val Kilmer'}, {'name': 'Anthony Edwards'}, {'name': 'Meg Ryan'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"Intermediate steps: [{'query': \"MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)\\nWHERE m.name = 'Top Gun'\\nRETURN a.name\"}, {'context': [{'name': 'Tom Cruise'}, {'name': 'Val Kilmer'}, {'name': 'Anthony Edwards'}, {'name': 'Meg Ryan'}]}]\n",
"Final answer: Tom Cruise, Val Kilmer, Anthony Edwards, Meg Ryan played in Top Gun.\n"
]
}
],
"source": [
"result = chain(\"Who played in Top Gun?\")\n",
"print(f\"Intermediate steps: {result['intermediate_steps']}\")\n",
"print(f\"Final answer: {result['result']}\")"
]
},
{
"cell_type": "markdown",
"id": "d6e1b054",
"metadata": {},
"source": [
"## Return direct results\n",
"You can return direct results from the Cypher QA Chain using the `return_direct` parameter"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "2d3acf10",
"metadata": {},
"outputs": [],
"source": [
"chain = GraphCypherQAChain.from_llm(\n",
" ChatOpenAI(temperature=0), graph=graph, verbose=True, return_direct=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "b0a9d143",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})\n",
"RETURN a.name\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'query': 'Who played in Top Gun?',\n",
" 'result': [{'name': 'Tom Cruise'},\n",
" {'name': 'Val Kilmer'},\n",
" {'name': 'Anthony Edwards'},\n",
" {'name': 'Meg Ryan'}]}"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\"Who played in Top Gun?\")"
]
},
{
"cell_type": "markdown",
"id": "f01dfb72-24ec-4ae7-883a-ee6646889b59",
"metadata": {},
"source": [
"## Add examples in the Cypher generation prompt\n",
"You can define the Cypher statement you want the LLM to generate for particular questions"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "59baeb88-adfa-4c26-8334-fcbff3a98efb",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts.prompt import PromptTemplate\n",
"\n",
"CYPHER_GENERATION_TEMPLATE = \"\"\"Task:Generate Cypher statement to query a graph database.\n",
"Instructions:\n",
"Use only the provided relationship types and properties in the schema.\n",
"Do not use any other relationship types or properties that are not provided.\n",
"Schema:\n",
"{schema}\n",
"Note: Do not include any explanations or apologies in your responses.\n",
"Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.\n",
"Do not include any text except the generated Cypher statement.\n",
"Examples: Here are a few examples of generated Cypher statements for particular questions:\n",
"# How many people played in Top Gun?\n",
"MATCH (m:Movie {{title:\"Top Gun\"}})<-[:ACTED_IN]-()\n",
"RETURN count(*) AS numberOfActors\n",
"\n",
"The question is:\n",
"{question}\"\"\"\n",
"\n",
"CYPHER_GENERATION_PROMPT = PromptTemplate(\n",
" input_variables=[\"schema\", \"question\"], template=CYPHER_GENERATION_TEMPLATE\n",
")\n",
"\n",
"chain = GraphCypherQAChain.from_llm(\n",
" ChatOpenAI(temperature=0),\n",
" graph=graph,\n",
" verbose=True,\n",
" cypher_prompt=CYPHER_GENERATION_PROMPT,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "47c64027-cf42-493a-9c76-2d10ba753728",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (:Movie {name:\"Top Gun\"})<-[:ACTED_IN]-(:Actor)\n",
"RETURN count(*) AS numberOfActors\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'numberofactors': 4}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'query': 'How many people played in Top Gun?',\n",
" 'result': \"I don't know the answer.\"}"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\"How many people played in Top Gun?\")"
]
},
{
"cell_type": "markdown",
"id": "3e721cad-aa87-4526-9231-2dfc0e365939",
"metadata": {},
"source": [
"## Use separate LLMs for Cypher and answer generation\n",
"You can use the `cypher_llm` and `qa_llm` parameters to define different llms"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "6f9becc2-f579-45bf-9b50-2ce02bde92da",
"metadata": {},
"outputs": [],
"source": [
"chain = GraphCypherQAChain.from_llm(\n",
" graph=graph,\n",
" cypher_llm=ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo\"),\n",
" qa_llm=ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-16k\"),\n",
" verbose=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "ff18e3e3-3402-4683-aec4-a19898f23ca1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)\n",
"WHERE m.name = 'Top Gun'\n",
"RETURN a.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'name': 'Tom Cruise'}, {'name': 'Val Kilmer'}, {'name': 'Anthony Edwards'}, {'name': 'Meg Ryan'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'query': 'Who played in Top Gun?',\n",
" 'result': 'Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.'}"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\"Who played in Top Gun?\")"
]
},
{
"cell_type": "markdown",
"id": "eefea16b-508f-4552-8942-9d5063ed7d37",
"metadata": {},
"source": [
"## Ignore specified node and relationship types\n",
"\n",
"You can use `include_types` or `exclude_types` to ignore parts of the graph schema when generating Cypher statements."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "a20fa21e-fb85-41c4-aac0-53fb25e34604",
"metadata": {},
"outputs": [],
"source": [
"chain = GraphCypherQAChain.from_llm(\n",
" graph=graph,\n",
" cypher_llm=ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo\"),\n",
" qa_llm=ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-16k\"),\n",
" verbose=True,\n",
" exclude_types=[\"Movie\"],\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "3ad7f6b8-543e-46e4-a3b2-40fa3e66e895",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Node properties are the following:\n",
"Actor {name: STRING},LabelA {property_a: STRING},LabelB {},LabelC {}\n",
"Relationship properties are the following:\n",
"ACTED_IN {},REL_TYPE {rel_prop: STRING}\n",
"The relationships are the following:\n",
"(:LabelA)-[:REL_TYPE]->(:LabelB),(:LabelA)-[:REL_TYPE]->(:LabelC)\n"
]
}
],
"source": [
"# Inspect graph schema\n",
"print(chain.graph_schema)"
]
},
{
"cell_type": "markdown",
"id": "f0202e88-d700-40ed-aef9-0c969c7bf951",
"metadata": {},
"source": [
"## Validate generated Cypher statements\n",
"You can use the `validate_cypher` parameter to validate and correct relationship directions in generated Cypher statements"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "53665d03-7afd-433c-bdd5-750127bfb152",
"metadata": {},
"outputs": [],
"source": [
"chain = GraphCypherQAChain.from_llm(\n",
" llm=ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo\"),\n",
" graph=graph,\n",
" verbose=True,\n",
" validate_cypher=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "19e1a591-9c10-4d7b-aa36-a5e1b778a97b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)\n",
"WHERE m.name = 'Top Gun'\n",
"RETURN a.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'name': 'Tom Cruise'}, {'name': 'Val Kilmer'}, {'name': 'Anthony Edwards'}, {'name': 'Meg Ryan'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'query': 'Who played in Top Gun?',\n",
" 'result': 'Tom Cruise, Val Kilmer, Anthony Edwards, Meg Ryan played in Top Gun.'}"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\"Who played in Top Gun?\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
}
},
"nbformat": 4,
"nbformat_minor": 5
}