mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-03 19:57:51 +00:00
Switch graphqa example in docs to langgraph (#28574)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
parent
ce3b69aa05
commit
6815981578
@ -1,459 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "5e61b0f2-15b9-4241-9ab5-ff0f3f732232",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_position: 1\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "846ef4f4-ee38-4a42-a7d3-1a23826e4830",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How to map values to a graph database\n",
|
||||
"\n",
|
||||
"In this guide we'll go over strategies to improve graph database query generation by mapping values from user inputs to database.\n",
|
||||
"When using the built-in graph chains, the LLM is aware of the graph schema, but has no information about the values of properties stored in the database.\n",
|
||||
"Therefore, we can introduce a new step in graph database QA system to accurately map values.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"First, get required packages and set environment variables:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "18294435-182d-48da-bcab-5b8945b6d9cf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet langchain langchain-neo4j langchain-openai neo4j"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d86dd771-4001-4a34-8680-22e9b50e1e88",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We default to OpenAI models in this guide, but you can swap them out for the model provider of your choice."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "9346f8e9-78bf-4667-b3d3-72807a73b718",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
|
||||
"\n",
|
||||
"# Uncomment the below to use LangSmith. Not required.\n",
|
||||
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
|
||||
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "271c8a23-e51c-4ead-a76e-cf21107db47e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, we need to define Neo4j credentials.\n",
|
||||
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a2a3bb65-05c7-4daf-bac2-b25ae7fe2751",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
|
||||
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
|
||||
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "50fa4510-29b7-49b6-8496-5e86f694e81f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "4ee9ef7a-eef9-4289-b9fd-8fbc31041688",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_neo4j import Neo4jGraph\n",
|
||||
"\n",
|
||||
"graph = Neo4jGraph()\n",
|
||||
"\n",
|
||||
"# Import movie information\n",
|
||||
"\n",
|
||||
"movies_query = \"\"\"\n",
|
||||
"LOAD CSV WITH HEADERS FROM \n",
|
||||
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
|
||||
"AS row\n",
|
||||
"MERGE (m:Movie {id:row.movieId})\n",
|
||||
"SET m.released = date(row.released),\n",
|
||||
" m.title = row.title,\n",
|
||||
" m.imdbRating = toFloat(row.imdbRating)\n",
|
||||
"FOREACH (director in split(row.director, '|') | \n",
|
||||
" MERGE (p:Person {name:trim(director)})\n",
|
||||
" MERGE (p)-[:DIRECTED]->(m))\n",
|
||||
"FOREACH (actor in split(row.actors, '|') | \n",
|
||||
" MERGE (p:Person {name:trim(actor)})\n",
|
||||
" MERGE (p)-[:ACTED_IN]->(m))\n",
|
||||
"FOREACH (genre in split(row.genres, '|') | \n",
|
||||
" MERGE (g:Genre {name:trim(genre)})\n",
|
||||
" MERGE (m)-[:IN_GENRE]->(g))\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"graph.query(movies_query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0cb0ea30-ca55-4f35-aad6-beb57453de66",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Detecting entities in the user input\n",
|
||||
"We have to extract the types of entities/values we want to map to a graph database. In this example, we are dealing with a movie graph, so we can map movies and people to the database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "e1a19424-6046-40c2-81d1-f3b88193a293",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List, Optional\n",
|
||||
"\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Entities(BaseModel):\n",
|
||||
" \"\"\"Identifying information about entities.\"\"\"\n",
|
||||
"\n",
|
||||
" names: List[str] = Field(\n",
|
||||
" ...,\n",
|
||||
" description=\"All the person or movies appearing in the text\",\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"You are extracting person and movies from the text.\",\n",
|
||||
" ),\n",
|
||||
" (\n",
|
||||
" \"human\",\n",
|
||||
" \"Use the given format to extract information from the following \"\n",
|
||||
" \"input: {question}\",\n",
|
||||
" ),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"entity_chain = prompt | llm.with_structured_output(Entities)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9c14084c-37a7-4a9c-a026-74e12961c781",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can test the entity extraction chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "bbfe0d8f-982e-46e6-88fb-8a4f0d850b07",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Entities(names=['Casino'])"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"entities = entity_chain.invoke({\"question\": \"Who played in Casino movie?\"})\n",
|
||||
"entities"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a8afbf13-05d0-4383-8050-f88b8c2f6fab",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will utilize a simple `CONTAINS` clause to match entities to database. In practice, you might want to use a fuzzy search or a fulltext index to allow for minor misspellings."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "6f92929f-74fb-4db2-b7e1-eb1e9d386a67",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Casino maps to Casino Movie in database\\n'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"match_query = \"\"\"MATCH (p:Person|Movie)\n",
|
||||
"WHERE p.name CONTAINS $value OR p.title CONTAINS $value\n",
|
||||
"RETURN coalesce(p.name, p.title) AS result, labels(p)[0] AS type\n",
|
||||
"LIMIT 1\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def map_to_database(entities: Entities) -> Optional[str]:\n",
|
||||
" result = \"\"\n",
|
||||
" for entity in entities.names:\n",
|
||||
" response = graph.query(match_query, {\"value\": entity})\n",
|
||||
" try:\n",
|
||||
" result += f\"{entity} maps to {response[0]['result']} {response[0]['type']} in database\\n\"\n",
|
||||
" except IndexError:\n",
|
||||
" pass\n",
|
||||
" return result\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"map_to_database(entities)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f66c6756-6efb-4b1e-9b5d-87ed914a5212",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Custom Cypher generating chain\n",
|
||||
"\n",
|
||||
"We need to define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement.\n",
|
||||
"We will be using the LangChain expression language to accomplish that."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "8ef3e21d-f1c2-45e2-9511-4920d1cf6e7e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"\n",
|
||||
"# Generate Cypher statement based on natural language input\n",
|
||||
"cypher_template = \"\"\"Based on the Neo4j graph schema below, write a Cypher query that would answer the user's question:\n",
|
||||
"{schema}\n",
|
||||
"Entities in the question map to the following database values:\n",
|
||||
"{entities_list}\n",
|
||||
"Question: {question}\n",
|
||||
"Cypher query:\"\"\"\n",
|
||||
"\n",
|
||||
"cypher_prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"Given an input question, convert it to a Cypher query. No pre-amble.\",\n",
|
||||
" ),\n",
|
||||
" (\"human\", cypher_template),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"cypher_response = (\n",
|
||||
" RunnablePassthrough.assign(names=entity_chain)\n",
|
||||
" | RunnablePassthrough.assign(\n",
|
||||
" entities_list=lambda x: map_to_database(x[\"names\"]),\n",
|
||||
" schema=lambda _: graph.get_schema,\n",
|
||||
" )\n",
|
||||
" | cypher_prompt\n",
|
||||
" | llm.bind(stop=[\"\\nCypherResult:\"])\n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "1f0011e3-9660-4975-af2a-486b1bc3b954",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'MATCH (:Movie {title: \"Casino\"})<-[:ACTED_IN]-(actor)\\nRETURN actor.name'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cypher = cypher_response.invoke({\"question\": \"Who played in Casino movie?\"})\n",
|
||||
"cypher"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "38095678-611f-4847-a4de-e51ef7ef727c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Generating answers based on database results\n",
|
||||
"\n",
|
||||
"Now that we have a chain that generates the Cypher statement, we need to execute the Cypher statement against the database and send the database results back to an LLM to generate the final answer.\n",
|
||||
"Again, we will be using LCEL."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "d1fa97c0-1c9c-41d3-9ee1-5f1905d17434",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_neo4j.chains.graph_qa.cypher_utils import (\n",
|
||||
" CypherQueryCorrector,\n",
|
||||
" Schema,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"graph.refresh_schema()\n",
|
||||
"# Cypher validation tool for relationship directions\n",
|
||||
"corrector_schema = [\n",
|
||||
" Schema(el[\"start\"], el[\"type\"], el[\"end\"])\n",
|
||||
" for el in graph.structured_schema.get(\"relationships\")\n",
|
||||
"]\n",
|
||||
"cypher_validation = CypherQueryCorrector(corrector_schema)\n",
|
||||
"\n",
|
||||
"# Generate natural language response based on database results\n",
|
||||
"response_template = \"\"\"Based on the the question, Cypher query, and Cypher response, write a natural language response:\n",
|
||||
"Question: {question}\n",
|
||||
"Cypher query: {query}\n",
|
||||
"Cypher Response: {response}\"\"\"\n",
|
||||
"\n",
|
||||
"response_prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"Given an input question and Cypher response, convert it to a natural\"\n",
|
||||
" \" language answer. No pre-amble.\",\n",
|
||||
" ),\n",
|
||||
" (\"human\", response_template),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"chain = (\n",
|
||||
" RunnablePassthrough.assign(query=cypher_response)\n",
|
||||
" | RunnablePassthrough.assign(\n",
|
||||
" response=lambda x: graph.query(cypher_validation(x[\"query\"])),\n",
|
||||
" )\n",
|
||||
" | response_prompt\n",
|
||||
" | llm\n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "918146e5-7918-46d2-a774-53f9547d8fcb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Robert De Niro, James Woods, Joe Pesci, and Sharon Stone played in the movie \"Casino\".'"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke({\"question\": \"Who played in Casino movie?\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c7ba75cd-8399-4e54-a6f8-8a411f159f56",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.18"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -1,548 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_position: 2\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How to best prompt for Graph-RAG\n",
|
||||
"\n",
|
||||
"In this guide we'll go over prompting strategies to improve graph database query generation. We'll largely focus on methods for getting relevant database-specific information in your prompt.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"First, get required packages and set environment variables:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install --upgrade --quiet langchain langchain-neo4j langchain-openai neo4j"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We default to OpenAI models in this guide, but you can swap them out for the model provider of your choice."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
|
||||
"\n",
|
||||
"# Uncomment the below to use LangSmith. Not required.\n",
|
||||
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
|
||||
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, we need to define Neo4j credentials.\n",
|
||||
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
|
||||
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
|
||||
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_neo4j import Neo4jGraph\n",
|
||||
"\n",
|
||||
"graph = Neo4jGraph()\n",
|
||||
"\n",
|
||||
"# Import movie information\n",
|
||||
"\n",
|
||||
"movies_query = \"\"\"\n",
|
||||
"LOAD CSV WITH HEADERS FROM \n",
|
||||
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
|
||||
"AS row\n",
|
||||
"MERGE (m:Movie {id:row.movieId})\n",
|
||||
"SET m.released = date(row.released),\n",
|
||||
" m.title = row.title,\n",
|
||||
" m.imdbRating = toFloat(row.imdbRating)\n",
|
||||
"FOREACH (director in split(row.director, '|') | \n",
|
||||
" MERGE (p:Person {name:trim(director)})\n",
|
||||
" MERGE (p)-[:DIRECTED]->(m))\n",
|
||||
"FOREACH (actor in split(row.actors, '|') | \n",
|
||||
" MERGE (p:Person {name:trim(actor)})\n",
|
||||
" MERGE (p)-[:ACTED_IN]->(m))\n",
|
||||
"FOREACH (genre in split(row.genres, '|') | \n",
|
||||
" MERGE (g:Genre {name:trim(genre)})\n",
|
||||
" MERGE (m)-[:IN_GENRE]->(g))\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"graph.query(movies_query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Filtering graph schema\n",
|
||||
"\n",
|
||||
"At times, you may need to focus on a specific subset of the graph schema while generating Cypher statements.\n",
|
||||
"Let's say we are dealing with the following graph schema:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Node properties are the following:\n",
|
||||
"Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING},Person {name: STRING},Genre {name: STRING}\n",
|
||||
"Relationship properties are the following:\n",
|
||||
"\n",
|
||||
"The relationships are the following:\n",
|
||||
"(:Movie)-[:IN_GENRE]->(:Genre),(:Person)-[:DIRECTED]->(:Movie),(:Person)-[:ACTED_IN]->(:Movie)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"graph.refresh_schema()\n",
|
||||
"print(graph.schema)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's say we want to exclude the _Genre_ node from the schema representation we pass to an LLM.\n",
|
||||
"We can achieve that using the `exclude` parameter of the GraphCypherQAChain chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_neo4j import GraphCypherQAChain\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
|
||||
"chain = GraphCypherQAChain.from_llm(\n",
|
||||
" graph=graph,\n",
|
||||
" llm=llm,\n",
|
||||
" exclude_types=[\"Genre\"],\n",
|
||||
" verbose=True,\n",
|
||||
" allow_dangerous_requests=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Node properties are the following:\n",
|
||||
"Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING},Person {name: STRING}\n",
|
||||
"Relationship properties are the following:\n",
|
||||
"\n",
|
||||
"The relationships are the following:\n",
|
||||
"(:Person)-[:DIRECTED]->(:Movie),(:Person)-[:ACTED_IN]->(:Movie)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(chain.graph_schema)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Few-shot examples\n",
|
||||
"\n",
|
||||
"Including examples of natural language questions being converted to valid Cypher queries against our database in the prompt will often improve model performance, especially for complex queries.\n",
|
||||
"\n",
|
||||
"Let's say we have the following examples:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"examples = [\n",
|
||||
" {\n",
|
||||
" \"question\": \"How many artists are there?\",\n",
|
||||
" \"query\": \"MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Which actors played in the movie Casino?\",\n",
|
||||
" \"query\": \"MATCH (m:Movie {{title: 'Casino'}})<-[:ACTED_IN]-(a) RETURN a.name\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"How many movies has Tom Hanks acted in?\",\n",
|
||||
" \"query\": \"MATCH (a:Person {{name: 'Tom Hanks'}})-[:ACTED_IN]->(m:Movie) RETURN count(m)\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"List all the genres of the movie Schindler's List\",\n",
|
||||
" \"query\": \"MATCH (m:Movie {{title: 'Schindler\\\\'s List'}})-[:IN_GENRE]->(g:Genre) RETURN g.name\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Which actors have worked in movies from both the comedy and action genres?\",\n",
|
||||
" \"query\": \"MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Which directors have made movies with at least three different actors named 'John'?\",\n",
|
||||
" \"query\": \"MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Identify movies where directors also played a role in the film.\",\n",
|
||||
" \"query\": \"MATCH (p:Person)-[:DIRECTED]->(m:Movie), (p)-[:ACTED_IN]->(m) RETURN m.title, p.name\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Find the actor with the highest number of movies in the database.\",\n",
|
||||
" \"query\": \"MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1\",\n",
|
||||
" },\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can create a few-shot prompt with them like so:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate\n",
|
||||
"\n",
|
||||
"example_prompt = PromptTemplate.from_template(\n",
|
||||
" \"User input: {question}\\nCypher query: {query}\"\n",
|
||||
")\n",
|
||||
"prompt = FewShotPromptTemplate(\n",
|
||||
" examples=examples[:5],\n",
|
||||
" example_prompt=example_prompt,\n",
|
||||
" prefix=\"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\\n\\nHere is the schema information\\n{schema}.\\n\\nBelow are a number of examples of questions and their corresponding Cypher queries.\",\n",
|
||||
" suffix=\"User input: {question}\\nCypher query: \",\n",
|
||||
" input_variables=[\"question\", \"schema\"],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\n",
|
||||
"\n",
|
||||
"Here is the schema information\n",
|
||||
"foo.\n",
|
||||
"\n",
|
||||
"Below are a number of examples of questions and their corresponding Cypher queries.\n",
|
||||
"\n",
|
||||
"User input: How many artists are there?\n",
|
||||
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\n",
|
||||
"\n",
|
||||
"User input: Which actors played in the movie Casino?\n",
|
||||
"Cypher query: MATCH (m:Movie {title: 'Casino'})<-[:ACTED_IN]-(a) RETURN a.name\n",
|
||||
"\n",
|
||||
"User input: How many movies has Tom Hanks acted in?\n",
|
||||
"Cypher query: MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)\n",
|
||||
"\n",
|
||||
"User input: List all the genres of the movie Schindler's List\n",
|
||||
"Cypher query: MATCH (m:Movie {title: 'Schindler\\'s List'})-[:IN_GENRE]->(g:Genre) RETURN g.name\n",
|
||||
"\n",
|
||||
"User input: Which actors have worked in movies from both the comedy and action genres?\n",
|
||||
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\n",
|
||||
"\n",
|
||||
"User input: How many artists are there?\n",
|
||||
"Cypher query: \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(prompt.format(question=\"How many artists are there?\", schema=\"foo\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Dynamic few-shot examples\n",
|
||||
"\n",
|
||||
"If we have enough examples, we may want to only include the most relevant ones in the prompt, either because they don't fit in the model's context window or because the long tail of examples distracts the model. And specifically, given any input we want to include the examples most relevant to that input.\n",
|
||||
"\n",
|
||||
"We can do just this using an ExampleSelector. In this case we'll use a [SemanticSimilarityExampleSelector](https://python.langchain.com/api_reference/core/example_selectors/langchain_core.example_selectors.semantic_similarity.SemanticSimilarityExampleSelector.html), which will store the examples in the vector database of our choosing. At runtime it will perform a similarity search between the input and our examples, and return the most semantically similar ones: "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.example_selectors import SemanticSimilarityExampleSelector\n",
|
||||
"from langchain_neo4j import Neo4jVector\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"example_selector = SemanticSimilarityExampleSelector.from_examples(\n",
|
||||
" examples,\n",
|
||||
" OpenAIEmbeddings(),\n",
|
||||
" Neo4jVector,\n",
|
||||
" k=5,\n",
|
||||
" input_keys=[\"question\"],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'query': 'MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)',\n",
|
||||
" 'question': 'How many artists are there?'},\n",
|
||||
" {'query': \"MATCH (a:Person {{name: 'Tom Hanks'}})-[:ACTED_IN]->(m:Movie) RETURN count(m)\",\n",
|
||||
" 'question': 'How many movies has Tom Hanks acted in?'},\n",
|
||||
" {'query': \"MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\",\n",
|
||||
" 'question': 'Which actors have worked in movies from both the comedy and action genres?'},\n",
|
||||
" {'query': \"MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name\",\n",
|
||||
" 'question': \"Which directors have made movies with at least three different actors named 'John'?\"},\n",
|
||||
" {'query': 'MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1',\n",
|
||||
" 'question': 'Find the actor with the highest number of movies in the database.'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"example_selector.select_examples({\"question\": \"how many artists are there?\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To use it, we can pass the ExampleSelector directly in to our FewShotPromptTemplate:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"prompt = FewShotPromptTemplate(\n",
|
||||
" example_selector=example_selector,\n",
|
||||
" example_prompt=example_prompt,\n",
|
||||
" prefix=\"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\\n\\nHere is the schema information\\n{schema}.\\n\\nBelow are a number of examples of questions and their corresponding Cypher queries.\",\n",
|
||||
" suffix=\"User input: {question}\\nCypher query: \",\n",
|
||||
" input_variables=[\"question\", \"schema\"],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\n",
|
||||
"\n",
|
||||
"Here is the schema information\n",
|
||||
"foo.\n",
|
||||
"\n",
|
||||
"Below are a number of examples of questions and their corresponding Cypher queries.\n",
|
||||
"\n",
|
||||
"User input: How many artists are there?\n",
|
||||
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\n",
|
||||
"\n",
|
||||
"User input: How many movies has Tom Hanks acted in?\n",
|
||||
"Cypher query: MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)\n",
|
||||
"\n",
|
||||
"User input: Which actors have worked in movies from both the comedy and action genres?\n",
|
||||
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\n",
|
||||
"\n",
|
||||
"User input: Which directors have made movies with at least three different actors named 'John'?\n",
|
||||
"Cypher query: MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name\n",
|
||||
"\n",
|
||||
"User input: Find the actor with the highest number of movies in the database.\n",
|
||||
"Cypher query: MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1\n",
|
||||
"\n",
|
||||
"User input: how many artists are there?\n",
|
||||
"Cypher query: \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(prompt.format(question=\"how many artists are there?\", schema=\"foo\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
|
||||
"chain = GraphCypherQAChain.from_llm(\n",
|
||||
" graph=graph,\n",
|
||||
" llm=llm,\n",
|
||||
" cypher_prompt=prompt,\n",
|
||||
" verbose=True,\n",
|
||||
" allow_dangerous_requests=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
|
||||
"Generated Cypher:\n",
|
||||
"\u001b[32;1m\u001b[1;3mMATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001b[32;1m\u001b[1;3m[{'count(DISTINCT a)': 967}]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'query': 'How many actors are in the graph?',\n",
|
||||
" 'result': 'There are 967 actors in the graph.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke(\"How many actors are in the graph?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
@ -316,9 +316,7 @@ For a high-level tutorial, check out [this guide](/docs/tutorials/sql_qa/).
|
||||
You can use an LLM to do question answering over graph databases.
|
||||
For a high-level tutorial, check out [this guide](/docs/tutorials/graph/).
|
||||
|
||||
- [How to: map values to a database](/docs/how_to/graph_mapping)
|
||||
- [How to: add a semantic layer over the database](/docs/how_to/graph_semantic)
|
||||
- [How to: improve results with prompting](/docs/how_to/graph_prompting)
|
||||
- [How to: construct knowledge graphs](/docs/how_to/graph_constructing)
|
||||
|
||||
### Summarization
|
||||
|
File diff suppressed because one or more lines are too long
@ -25,8 +25,6 @@ NOTEBOOKS_NO_EXECUTION = [
|
||||
"docs/docs/how_to/example_selectors_langsmith.ipynb", # TODO: add langchain-benchmarks; fix cassette issue
|
||||
"docs/docs/how_to/extraction_long_text.ipynb", # Non-determinism due to batch
|
||||
"docs/docs/how_to/graph_constructing.ipynb", # Requires local neo4j
|
||||
"docs/docs/how_to/graph_mapping.ipynb", # Requires local neo4j
|
||||
"docs/docs/how_to/graph_prompting.ipynb", # Requires local neo4j
|
||||
"docs/docs/how_to/graph_semantic.ipynb", # Requires local neo4j
|
||||
"docs/docs/how_to/hybrid.ipynb", # Requires AstraDB instance
|
||||
"docs/docs/how_to/indexing.ipynb", # Requires local Elasticsearch
|
||||
|
BIN
docs/static/img/langgraph_text2cypher.webp
vendored
Normal file
BIN
docs/static/img/langgraph_text2cypher.webp
vendored
Normal file
Binary file not shown.
After Width: | Height: | Size: 10 KiB |
@ -62,6 +62,14 @@
|
||||
"source": "/docs/tutorials/local_rag",
|
||||
"destination": "/docs/tutorials/rag"
|
||||
},
|
||||
{
|
||||
"source": "/docs/how_to/graph_mapping(/?)",
|
||||
"destination": "/docs/tutorials/graph#query-validation"
|
||||
},
|
||||
{
|
||||
"source": "/docs/how_to/graph_prompting(/?)",
|
||||
"destination": "/docs/tutorials/graph#few-shot-prompting"
|
||||
},
|
||||
{
|
||||
"source": "/docs/tutorials/data_generation",
|
||||
"destination": "https://python.langchain.com/v0.2/docs/tutorials/data_generation/"
|
||||
|
Loading…
Reference in New Issue
Block a user