langchain/docs/docs/how_to/graph_semantic.ipynb
Eugene Yurtsev b2f58d37db
docs: run migration script against how-to docs (#21927)
Upgrade imports in how-to docs
2024-05-20 17:32:59 +00:00

414 lines
13 KiB
Plaintext

{
"cells": [
{
"cell_type": "raw",
"id": "19cc5b11-3822-454b-afb3-7bebd7f17b5c",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 1\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "2e17a273-bcfc-433f-8d42-2ba9533feeb8",
"metadata": {},
"source": [
"# How to add a semantic layer over graph database\n",
"\n",
"You can use database queries to retrieve information from a graph database like Neo4j.\n",
"One option is to use LLMs to generate Cypher statements.\n",
"While that option provides excellent flexibility, the solution could be brittle and not consistently generating precise Cypher statements.\n",
"Instead of generating Cypher statements, we can implement Cypher templates as tools in a semantic layer that an LLM agent can interact with.\n",
"\n",
"![graph_semantic.png](../../static/img/graph_semantic.png)\n",
"\n",
"## Setup\n",
"\n",
"First, get required packages and set environment variables:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ffdd48f6-bd05-4e5c-b846-d41183398a55",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install --upgrade --quiet langchain langchain-community langchain-openai neo4j"
]
},
{
"cell_type": "markdown",
"id": "4575b174-01e6-4061-aebf-f81e718de777",
"metadata": {},
"source": [
"We default to OpenAI models in this guide, but you can swap them out for the model provider of your choice."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "eb11c4a8-c00c-4c2d-9309-74a6acfff91c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
"\n",
"# Uncomment the below to use LangSmith. Not required.\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
]
},
{
"cell_type": "markdown",
"id": "76bb62ba-0060-41a2-a7b9-1f9c1faf571a",
"metadata": {},
"source": [
"Next, we need to define Neo4j credentials.\n",
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ef59a3af-31a8-4ad8-8eb9-132aca66956e",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
]
},
{
"cell_type": "markdown",
"id": "1e8fbc2c-b8e8-4c53-8fce-243cf99d3c1c",
"metadata": {},
"source": [
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c84b1449-6fcd-4140-b591-cb45e8dce207",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_community.graphs import Neo4jGraph\n",
"\n",
"graph = Neo4jGraph()\n",
"\n",
"# Import movie information\n",
"\n",
"movies_query = \"\"\"\n",
"LOAD CSV WITH HEADERS FROM \n",
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
"AS row\n",
"MERGE (m:Movie {id:row.movieId})\n",
"SET m.released = date(row.released),\n",
" m.title = row.title,\n",
" m.imdbRating = toFloat(row.imdbRating)\n",
"FOREACH (director in split(row.director, '|') | \n",
" MERGE (p:Person {name:trim(director)})\n",
" MERGE (p)-[:DIRECTED]->(m))\n",
"FOREACH (actor in split(row.actors, '|') | \n",
" MERGE (p:Person {name:trim(actor)})\n",
" MERGE (p)-[:ACTED_IN]->(m))\n",
"FOREACH (genre in split(row.genres, '|') | \n",
" MERGE (g:Genre {name:trim(genre)})\n",
" MERGE (m)-[:IN_GENRE]->(g))\n",
"\"\"\"\n",
"\n",
"graph.query(movies_query)"
]
},
{
"cell_type": "markdown",
"id": "403b9acd-aa0d-4157-b9de-6ec426835c43",
"metadata": {},
"source": [
"## Custom tools with Cypher templates\n",
"\n",
"A semantic layer consists of various tools exposed to an LLM that it can use to interact with a knowledge graph.\n",
"They can be of various complexity. You can think of each tool in a semantic layer as a function.\n",
"\n",
"The function we will implement is to retrieve information about movies or their cast."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d1dc1c8c-f343-4024-924b-a8a86cf5f1af",
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional, Type\n",
"\n",
"# Import things that are needed generically\n",
"from langchain.pydantic_v1 import BaseModel, Field\n",
"from langchain_core.callbacks import (\n",
" AsyncCallbackManagerForToolRun,\n",
" CallbackManagerForToolRun,\n",
")\n",
"from langchain_core.tools import BaseTool\n",
"\n",
"description_query = \"\"\"\n",
"MATCH (m:Movie|Person)\n",
"WHERE m.title CONTAINS $candidate OR m.name CONTAINS $candidate\n",
"MATCH (m)-[r:ACTED_IN|HAS_GENRE]-(t)\n",
"WITH m, type(r) as type, collect(coalesce(t.name, t.title)) as names\n",
"WITH m, type+\": \"+reduce(s=\"\", n IN names | s + n + \", \") as types\n",
"WITH m, collect(types) as contexts\n",
"WITH m, \"type:\" + labels(m)[0] + \"\\ntitle: \"+ coalesce(m.title, m.name) \n",
" + \"\\nyear: \"+coalesce(m.released,\"\") +\"\\n\" +\n",
" reduce(s=\"\", c in contexts | s + substring(c, 0, size(c)-2) +\"\\n\") as context\n",
"RETURN context LIMIT 1\n",
"\"\"\"\n",
"\n",
"\n",
"def get_information(entity: str) -> str:\n",
" try:\n",
" data = graph.query(description_query, params={\"candidate\": entity})\n",
" return data[0][\"context\"]\n",
" except IndexError:\n",
" return \"No information was found\""
]
},
{
"cell_type": "markdown",
"id": "bdecc24b-8065-4755-98cc-9c6d093d4897",
"metadata": {},
"source": [
"You can observe that we have defined the Cypher statement used to retrieve information.\n",
"Therefore, we can avoid generating Cypher statements and use the LLM agent to only populate the input parameters.\n",
"To provide additional information to an LLM agent about when to use the tool and their input parameters, we wrap the function as a tool."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f4cde772-0d05-475d-a2f0-b53e1669bd13",
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional, Type\n",
"\n",
"# Import things that are needed generically\n",
"from langchain.pydantic_v1 import BaseModel, Field\n",
"from langchain_core.callbacks import (\n",
" AsyncCallbackManagerForToolRun,\n",
" CallbackManagerForToolRun,\n",
")\n",
"from langchain_core.tools import BaseTool\n",
"\n",
"\n",
"class InformationInput(BaseModel):\n",
" entity: str = Field(description=\"movie or a person mentioned in the question\")\n",
"\n",
"\n",
"class InformationTool(BaseTool):\n",
" name = \"Information\"\n",
" description = (\n",
" \"useful for when you need to answer questions about various actors or movies\"\n",
" )\n",
" args_schema: Type[BaseModel] = InformationInput\n",
"\n",
" def _run(\n",
" self,\n",
" entity: str,\n",
" run_manager: Optional[CallbackManagerForToolRun] = None,\n",
" ) -> str:\n",
" \"\"\"Use the tool.\"\"\"\n",
" return get_information(entity)\n",
"\n",
" async def _arun(\n",
" self,\n",
" entity: str,\n",
" run_manager: Optional[AsyncCallbackManagerForToolRun] = None,\n",
" ) -> str:\n",
" \"\"\"Use the tool asynchronously.\"\"\"\n",
" return get_information(entity)"
]
},
{
"cell_type": "markdown",
"id": "ff4820aa-2b57-4558-901f-6d984b326738",
"metadata": {},
"source": [
"## OpenAI Agent\n",
"\n",
"LangChain expression language makes it very convenient to define an agent to interact with a graph database over the semantic layer."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6e959ac2-537d-4358-a43b-e3a47f68e1d6",
"metadata": {},
"outputs": [],
"source": [
"from typing import List, Tuple\n",
"\n",
"from langchain.agents import AgentExecutor\n",
"from langchain.agents.format_scratchpad import format_to_openai_function_messages\n",
"from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser\n",
"from langchain_core.messages import AIMessage, HumanMessage\n",
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_core.utils.function_calling import convert_to_openai_function\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
"tools = [InformationTool()]\n",
"\n",
"llm_with_tools = llm.bind(functions=[convert_to_openai_function(t) for t in tools])\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that finds information about movies \"\n",
" \" and recommends them. If tools require follow up questions, \"\n",
" \"make sure to ask the user for clarification. Make sure to include any \"\n",
" \"available options that need to be clarified in the follow up questions \"\n",
" \"Do only the things the user specifically requested. \",\n",
" ),\n",
" MessagesPlaceholder(variable_name=\"chat_history\"),\n",
" (\"user\", \"{input}\"),\n",
" MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n",
" ]\n",
")\n",
"\n",
"\n",
"def _format_chat_history(chat_history: List[Tuple[str, str]]):\n",
" buffer = []\n",
" for human, ai in chat_history:\n",
" buffer.append(HumanMessage(content=human))\n",
" buffer.append(AIMessage(content=ai))\n",
" return buffer\n",
"\n",
"\n",
"agent = (\n",
" {\n",
" \"input\": lambda x: x[\"input\"],\n",
" \"chat_history\": lambda x: _format_chat_history(x[\"chat_history\"])\n",
" if x.get(\"chat_history\")\n",
" else [],\n",
" \"agent_scratchpad\": lambda x: format_to_openai_function_messages(\n",
" x[\"intermediate_steps\"]\n",
" ),\n",
" }\n",
" | prompt\n",
" | llm_with_tools\n",
" | OpenAIFunctionsAgentOutputParser()\n",
")\n",
"\n",
"agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b0459833-fe84-4ebc-9823-a3a3ffd929e9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Invoking: `Information` with `{'entity': 'Casino'}`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3mtype:Movie\n",
"title: Casino\n",
"year: 1995-11-22\n",
"ACTED_IN: Joe Pesci, Robert De Niro, Sharon Stone, James Woods\n",
"\u001b[0m\u001b[32;1m\u001b[1;3mThe movie \"Casino\" starred Joe Pesci, Robert De Niro, Sharon Stone, and James Woods.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'input': 'Who played in Casino?',\n",
" 'output': 'The movie \"Casino\" starred Joe Pesci, Robert De Niro, Sharon Stone, and James Woods.'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.invoke({\"input\": \"Who played in Casino?\"})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2759973-de8a-4624-8930-c90a21d6caa3",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}