mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-08 16:48:49 +00:00
Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com>
459 lines
13 KiB
Plaintext
459 lines
13 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "raw",
|
|
"id": "5e61b0f2-15b9-4241-9ab5-ff0f3f732232",
|
|
"metadata": {},
|
|
"source": [
|
|
"---\n",
|
|
"sidebar_position: 1\n",
|
|
"---"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "846ef4f4-ee38-4a42-a7d3-1a23826e4830",
|
|
"metadata": {},
|
|
"source": [
|
|
"# How to map values to a graph database\n",
|
|
"\n",
|
|
"In this guide we'll go over strategies to improve graph database query generation by mapping values from user inputs to database.\n",
|
|
"When using the built-in graph chains, the LLM is aware of the graph schema, but has no information about the values of properties stored in the database.\n",
|
|
"Therefore, we can introduce a new step in graph database QA system to accurately map values.\n",
|
|
"\n",
|
|
"## Setup\n",
|
|
"\n",
|
|
"First, get required packages and set environment variables:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "18294435-182d-48da-bcab-5b8945b6d9cf",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install --upgrade --quiet langchain langchain-community langchain-openai neo4j"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d86dd771-4001-4a34-8680-22e9b50e1e88",
|
|
"metadata": {},
|
|
"source": [
|
|
"We default to OpenAI models in this guide, but you can swap them out for the model provider of your choice."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "9346f8e9-78bf-4667-b3d3-72807a73b718",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdin",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" ········\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import getpass\n",
|
|
"import os\n",
|
|
"\n",
|
|
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
|
|
"\n",
|
|
"# Uncomment the below to use LangSmith. Not required.\n",
|
|
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
|
|
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "271c8a23-e51c-4ead-a76e-cf21107db47e",
|
|
"metadata": {},
|
|
"source": [
|
|
"Next, we need to define Neo4j credentials.\n",
|
|
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "a2a3bb65-05c7-4daf-bac2-b25ae7fe2751",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
|
|
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
|
|
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "50fa4510-29b7-49b6-8496-5e86f694e81f",
|
|
"metadata": {},
|
|
"source": [
|
|
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "4ee9ef7a-eef9-4289-b9fd-8fbc31041688",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[]"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain_community.graphs import Neo4jGraph\n",
|
|
"\n",
|
|
"graph = Neo4jGraph()\n",
|
|
"\n",
|
|
"# Import movie information\n",
|
|
"\n",
|
|
"movies_query = \"\"\"\n",
|
|
"LOAD CSV WITH HEADERS FROM \n",
|
|
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
|
|
"AS row\n",
|
|
"MERGE (m:Movie {id:row.movieId})\n",
|
|
"SET m.released = date(row.released),\n",
|
|
" m.title = row.title,\n",
|
|
" m.imdbRating = toFloat(row.imdbRating)\n",
|
|
"FOREACH (director in split(row.director, '|') | \n",
|
|
" MERGE (p:Person {name:trim(director)})\n",
|
|
" MERGE (p)-[:DIRECTED]->(m))\n",
|
|
"FOREACH (actor in split(row.actors, '|') | \n",
|
|
" MERGE (p:Person {name:trim(actor)})\n",
|
|
" MERGE (p)-[:ACTED_IN]->(m))\n",
|
|
"FOREACH (genre in split(row.genres, '|') | \n",
|
|
" MERGE (g:Genre {name:trim(genre)})\n",
|
|
" MERGE (m)-[:IN_GENRE]->(g))\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"graph.query(movies_query)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0cb0ea30-ca55-4f35-aad6-beb57453de66",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Detecting entities in the user input\n",
|
|
"We have to extract the types of entities/values we want to map to a graph database. In this example, we are dealing with a movie graph, so we can map movies and people to the database."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "e1a19424-6046-40c2-81d1-f3b88193a293",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from typing import List, Optional\n",
|
|
"\n",
|
|
"from langchain_core.prompts import ChatPromptTemplate\n",
|
|
"from langchain_openai import ChatOpenAI\n",
|
|
"from pydantic import BaseModel, Field\n",
|
|
"\n",
|
|
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
|
|
"\n",
|
|
"\n",
|
|
"class Entities(BaseModel):\n",
|
|
" \"\"\"Identifying information about entities.\"\"\"\n",
|
|
"\n",
|
|
" names: List[str] = Field(\n",
|
|
" ...,\n",
|
|
" description=\"All the person or movies appearing in the text\",\n",
|
|
" )\n",
|
|
"\n",
|
|
"\n",
|
|
"prompt = ChatPromptTemplate.from_messages(\n",
|
|
" [\n",
|
|
" (\n",
|
|
" \"system\",\n",
|
|
" \"You are extracting person and movies from the text.\",\n",
|
|
" ),\n",
|
|
" (\n",
|
|
" \"human\",\n",
|
|
" \"Use the given format to extract information from the following \"\n",
|
|
" \"input: {question}\",\n",
|
|
" ),\n",
|
|
" ]\n",
|
|
")\n",
|
|
"\n",
|
|
"\n",
|
|
"entity_chain = prompt | llm.with_structured_output(Entities)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9c14084c-37a7-4a9c-a026-74e12961c781",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can test the entity extraction chain."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "bbfe0d8f-982e-46e6-88fb-8a4f0d850b07",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Entities(names=['Casino'])"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"entities = entity_chain.invoke({\"question\": \"Who played in Casino movie?\"})\n",
|
|
"entities"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a8afbf13-05d0-4383-8050-f88b8c2f6fab",
|
|
"metadata": {},
|
|
"source": [
|
|
"We will utilize a simple `CONTAINS` clause to match entities to database. In practice, you might want to use a fuzzy search or a fulltext index to allow for minor misspellings."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "6f92929f-74fb-4db2-b7e1-eb1e9d386a67",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'Casino maps to Casino Movie in database\\n'"
|
|
]
|
|
},
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"match_query = \"\"\"MATCH (p:Person|Movie)\n",
|
|
"WHERE p.name CONTAINS $value OR p.title CONTAINS $value\n",
|
|
"RETURN coalesce(p.name, p.title) AS result, labels(p)[0] AS type\n",
|
|
"LIMIT 1\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"\n",
|
|
"def map_to_database(entities: Entities) -> Optional[str]:\n",
|
|
" result = \"\"\n",
|
|
" for entity in entities.names:\n",
|
|
" response = graph.query(match_query, {\"value\": entity})\n",
|
|
" try:\n",
|
|
" result += f\"{entity} maps to {response[0]['result']} {response[0]['type']} in database\\n\"\n",
|
|
" except IndexError:\n",
|
|
" pass\n",
|
|
" return result\n",
|
|
"\n",
|
|
"\n",
|
|
"map_to_database(entities)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f66c6756-6efb-4b1e-9b5d-87ed914a5212",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Custom Cypher generating chain\n",
|
|
"\n",
|
|
"We need to define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement.\n",
|
|
"We will be using the LangChain expression language to accomplish that."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "8ef3e21d-f1c2-45e2-9511-4920d1cf6e7e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.output_parsers import StrOutputParser\n",
|
|
"from langchain_core.runnables import RunnablePassthrough\n",
|
|
"\n",
|
|
"# Generate Cypher statement based on natural language input\n",
|
|
"cypher_template = \"\"\"Based on the Neo4j graph schema below, write a Cypher query that would answer the user's question:\n",
|
|
"{schema}\n",
|
|
"Entities in the question map to the following database values:\n",
|
|
"{entities_list}\n",
|
|
"Question: {question}\n",
|
|
"Cypher query:\"\"\"\n",
|
|
"\n",
|
|
"cypher_prompt = ChatPromptTemplate.from_messages(\n",
|
|
" [\n",
|
|
" (\n",
|
|
" \"system\",\n",
|
|
" \"Given an input question, convert it to a Cypher query. No pre-amble.\",\n",
|
|
" ),\n",
|
|
" (\"human\", cypher_template),\n",
|
|
" ]\n",
|
|
")\n",
|
|
"\n",
|
|
"cypher_response = (\n",
|
|
" RunnablePassthrough.assign(names=entity_chain)\n",
|
|
" | RunnablePassthrough.assign(\n",
|
|
" entities_list=lambda x: map_to_database(x[\"names\"]),\n",
|
|
" schema=lambda _: graph.get_schema,\n",
|
|
" )\n",
|
|
" | cypher_prompt\n",
|
|
" | llm.bind(stop=[\"\\nCypherResult:\"])\n",
|
|
" | StrOutputParser()\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"id": "1f0011e3-9660-4975-af2a-486b1bc3b954",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'MATCH (:Movie {title: \"Casino\"})<-[:ACTED_IN]-(actor)\\nRETURN actor.name'"
|
|
]
|
|
},
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"cypher = cypher_response.invoke({\"question\": \"Who played in Casino movie?\"})\n",
|
|
"cypher"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "38095678-611f-4847-a4de-e51ef7ef727c",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Generating answers based on database results\n",
|
|
"\n",
|
|
"Now that we have a chain that generates the Cypher statement, we need to execute the Cypher statement against the database and send the database results back to an LLM to generate the final answer.\n",
|
|
"Again, we will be using LCEL."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "d1fa97c0-1c9c-41d3-9ee1-5f1905d17434",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.chains.graph_qa.cypher_utils import (\n",
|
|
" CypherQueryCorrector,\n",
|
|
" Schema,\n",
|
|
")\n",
|
|
"\n",
|
|
"# Cypher validation tool for relationship directions\n",
|
|
"corrector_schema = [\n",
|
|
" Schema(el[\"start\"], el[\"type\"], el[\"end\"])\n",
|
|
" for el in graph.structured_schema.get(\"relationships\")\n",
|
|
"]\n",
|
|
"cypher_validation = CypherQueryCorrector(corrector_schema)\n",
|
|
"\n",
|
|
"# Generate natural language response based on database results\n",
|
|
"response_template = \"\"\"Based on the the question, Cypher query, and Cypher response, write a natural language response:\n",
|
|
"Question: {question}\n",
|
|
"Cypher query: {query}\n",
|
|
"Cypher Response: {response}\"\"\"\n",
|
|
"\n",
|
|
"response_prompt = ChatPromptTemplate.from_messages(\n",
|
|
" [\n",
|
|
" (\n",
|
|
" \"system\",\n",
|
|
" \"Given an input question and Cypher response, convert it to a natural\"\n",
|
|
" \" language answer. No pre-amble.\",\n",
|
|
" ),\n",
|
|
" (\"human\", response_template),\n",
|
|
" ]\n",
|
|
")\n",
|
|
"\n",
|
|
"chain = (\n",
|
|
" RunnablePassthrough.assign(query=cypher_response)\n",
|
|
" | RunnablePassthrough.assign(\n",
|
|
" response=lambda x: graph.query(cypher_validation(x[\"query\"])),\n",
|
|
" )\n",
|
|
" | response_prompt\n",
|
|
" | llm\n",
|
|
" | StrOutputParser()\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"id": "918146e5-7918-46d2-a774-53f9547d8fcb",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'Robert De Niro, James Woods, Joe Pesci, and Sharon Stone played in the movie \"Casino\".'"
|
|
]
|
|
},
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"chain.invoke({\"question\": \"Who played in Casino movie?\"})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c7ba75cd-8399-4e54-a6f8-8a411f159f56",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.18"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|