community[minor], langchian[minor]: Add Neptune Rdf graph and chain (#16650)

**Description**: This PR adds a chain for Amazon Neptune graph database
RDF format. It complements the existing Neptune Cypher chain. The PR
also includes a Neptune RDF graph class to connect to, introspect, and
query a Neptune RDF graph database from the chain. A sample notebook is
provided under docs that demonstrates the overall effect: invoking the
chain to make natural language queries against Neptune using an LLM.

**Issue**: This is a new feature
 
**Dependencies**: The RDF graph class depends on the AWS boto3 library
if using IAM authentication to connect to the Neptune database.

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
mhavey
2024-02-13 00:30:20 -05:00
committed by GitHub
parent e1cfd0f3e7
commit 1bbb64d956
8 changed files with 799 additions and 1 deletions

View File

@@ -0,0 +1,337 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Neptune SPARQL QA Chain\n",
"\n",
"This notebook shows use of LLM to query RDF graph in Amazon Neptune. This code uses a `NeptuneRdfGraph` class that connects with the Neptune database and loads it's schema. The `NeptuneSparqlQAChain` is used to connect the graph and LLM to ask natural language questions.\n",
"\n",
"Requirements for running this notebook:\n",
"- Neptune 1.2.x cluster accessible from this notebook\n",
"- Kernel with Python 3.9 or higher\n",
"- For Bedrock access, ensure IAM role has this policy\n",
"\n",
"```json\n",
"{\n",
" \"Action\": [\n",
" \"bedrock:ListFoundationModels\",\n",
" \"bedrock:InvokeModel\"\n",
" ],\n",
" \"Resource\": \"*\",\n",
" \"Effect\": \"Allow\"\n",
"}\n",
"```\n",
"\n",
"- S3 bucket for staging sample data, bucket should be in same account/region as Neptune."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Seed W3C organizational data\n",
"W3C org ontology plus some instances. \n",
"\n",
"You will need an S3 bucket in the same region and account. Set STAGE_BUCKET to name of that bucket."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"STAGE_BUCKET = \"<bucket-name>\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bash -s \"$STAGE_BUCKET\"\n",
"\n",
"rm -rf data\n",
"mkdir -p data\n",
"cd data\n",
"echo getting org ontology and sample org instances\n",
"wget http://www.w3.org/ns/org.ttl \n",
"wget https://raw.githubusercontent.com/aws-samples/amazon-neptune-ontology-example-blog/main/data/example_org.ttl \n",
"\n",
"echo Copying org ttl to S3\n",
"aws s3 cp org.ttl s3://$1/org.ttl\n",
"aws s3 cp example_org.ttl s3://$1/example_org.ttl\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Bulk-load the org ttl - both ontology and instances"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%load -s s3://{STAGE_BUCKET} -f turtle --store-to loadres --run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%load_status {loadres['payload']['loadId']} --errors --details"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup Chain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"EXAMPLES = \"\"\"\n",
"\n",
"<question>\n",
"Find organizations.\n",
"</question>\n",
"\n",
"<sparql>\n",
"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n",
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
"PREFIX org: <http://www.w3.org/ns/org#> \n",
"\n",
"select ?org ?orgName where {{\n",
" ?org rdfs:label ?orgName .\n",
"}} \n",
"</sparql>\n",
"\n",
"<question>\n",
"Find sites of an organization\n",
"</question>\n",
"\n",
"<sparql>\n",
"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n",
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
"PREFIX org: <http://www.w3.org/ns/org#> \n",
"\n",
"select ?org ?orgName ?siteName where {{\n",
" ?org rdfs:label ?orgName .\n",
" ?org org:hasSite/rdfs:label ?siteName . \n",
"}} \n",
"</sparql>\n",
"\n",
"<question>\n",
"Find suborganizations of an organization\n",
"</question>\n",
"\n",
"<sparql>\n",
"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n",
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
"PREFIX org: <http://www.w3.org/ns/org#> \n",
"\n",
"select ?org ?orgName ?subName where {{\n",
" ?org rdfs:label ?orgName .\n",
" ?org org:hasSubOrganization/rdfs:label ?subName .\n",
"}} \n",
"</sparql>\n",
"\n",
"<question>\n",
"Find organizational units of an organization\n",
"</question>\n",
"\n",
"<sparql>\n",
"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \n",
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \n",
"PREFIX org: <http://www.w3.org/ns/org#> \n",
"\n",
"select ?org ?orgName ?unitName where {{\n",
" ?org rdfs:label ?orgName .\n",
" ?org org:hasUnit/rdfs:label ?unitName . \n",
"}} \n",
"</sparql>\n",
"\n",
"<question>\n",
"Find members of an organization. Also find their manager, or the member they report to.\n",
"</question>\n",
"\n",
"<sparql>\n",
"PREFIX org: <http://www.w3.org/ns/org#> \n",
"PREFIX foaf: <http://xmlns.com/foaf/0.1/> \n",
"\n",
"select * where {{\n",
" ?person rdf:type foaf:Person .\n",
" ?person org:memberOf ?org .\n",
" OPTIONAL {{ ?person foaf:firstName ?firstName . }}\n",
" OPTIONAL {{ ?person foaf:family_name ?lastName . }}\n",
" OPTIONAL {{ ?person org:reportsTo ??manager }} .\n",
"}}\n",
"</sparql>\n",
"\n",
"\n",
"<question>\n",
"Find change events, such as mergers and acquisitions, of an organization\n",
"</question>\n",
"\n",
"<sparql>\n",
"PREFIX org: <http://www.w3.org/ns/org#> \n",
"\n",
"select ?event ?prop ?obj where {{\n",
" ?org rdfs:label ?orgName .\n",
" ?event rdf:type org:ChangeEvent .\n",
" ?event org:originalOrganization ?origOrg .\n",
" ?event org:resultingOrganization ?resultingOrg .\n",
"}}\n",
"</sparql>\n",
"\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"from langchain.chains.graph_qa.neptune_sparql import NeptuneSparqlQAChain\n",
"from langchain.chat_models import BedrockChat\n",
"from langchain_community.graphs import NeptuneRdfGraph\n",
"\n",
"host = \"<neptune-host>\"\n",
"port = \"<neptune-port>\"\n",
"region = \"us-east-1\" # specify region\n",
"\n",
"graph = NeptuneRdfGraph(\n",
" host=host, port=port, use_iam_auth=True, region_name=region, hide_comments=True\n",
")\n",
"\n",
"schema_elements = graph.get_schema_elements\n",
"# Optionally, you can update the schema_elements, and\n",
"# load the schema from the pruned elements.\n",
"graph.load_from_schema_elements(schema_elements)\n",
"\n",
"bedrock_client = boto3.client(\"bedrock-runtime\")\n",
"llm = BedrockChat(model_id=\"anthropic.claude-v2\", client=bedrock_client)\n",
"\n",
"chain = NeptuneSparqlQAChain.from_llm(\n",
" llm=llm,\n",
" graph=graph,\n",
" examples=EXAMPLES,\n",
" verbose=True,\n",
" top_K=10,\n",
" return_intermediate_steps=True,\n",
" return_direct=False,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ask questions\n",
"Depends on the data we ingested above"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain.invoke(\"\"\"How many organizations are in the graph\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain.invoke(\"\"\"Are there any mergers or acquisitions\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain.invoke(\"\"\"Find organizations\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain.invoke(\"\"\"Find sites of MegaSystems or MegaFinancial\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain.invoke(\"\"\"Find a member who is manager of one or more members.\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain.invoke(\"\"\"Find five members and who their manager is.\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain.invoke(\n",
" \"\"\"Find org units or suborganizations of The Mega Group. What are the sites of those units?\"\"\"\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}