mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-14 03:27:29 +00:00
414 lines
13 KiB
Plaintext
414 lines
13 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "raw",
|
|
"id": "19cc5b11-3822-454b-afb3-7bebd7f17b5c",
|
|
"metadata": {},
|
|
"source": [
|
|
"---\n",
|
|
"sidebar_position: 1\n",
|
|
"---"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2e17a273-bcfc-433f-8d42-2ba9533feeb8",
|
|
"metadata": {},
|
|
"source": [
|
|
"# How to add a semantic layer over graph database\n",
|
|
"\n",
|
|
"You can use database queries to retrieve information from a graph database like Neo4j.\n",
|
|
"One option is to use LLMs to generate Cypher statements.\n",
|
|
"While that option provides excellent flexibility, the solution could be brittle and not consistently generating precise Cypher statements.\n",
|
|
"Instead of generating Cypher statements, we can implement Cypher templates as tools in a semantic layer that an LLM agent can interact with.\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"## Setup\n",
|
|
"\n",
|
|
"First, get required packages and set environment variables:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "ffdd48f6-bd05-4e5c-b846-d41183398a55",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Note: you may need to restart the kernel to use updated packages.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%pip install --upgrade --quiet langchain langchain-community langchain-openai neo4j"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4575b174-01e6-4061-aebf-f81e718de777",
|
|
"metadata": {},
|
|
"source": [
|
|
"We default to OpenAI models in this guide, but you can swap them out for the model provider of your choice."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "eb11c4a8-c00c-4c2d-9309-74a6acfff91c",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" ········\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import getpass\n",
|
|
"import os\n",
|
|
"\n",
|
|
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
|
|
"\n",
|
|
"# Uncomment the below to use LangSmith. Not required.\n",
|
|
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
|
|
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "76bb62ba-0060-41a2-a7b9-1f9c1faf571a",
|
|
"metadata": {},
|
|
"source": [
|
|
"Next, we need to define Neo4j credentials.\n",
|
|
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "ef59a3af-31a8-4ad8-8eb9-132aca66956e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
|
|
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
|
|
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "1e8fbc2c-b8e8-4c53-8fce-243cf99d3c1c",
|
|
"metadata": {},
|
|
"source": [
|
|
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "c84b1449-6fcd-4140-b591-cb45e8dce207",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[]"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain_community.graphs import Neo4jGraph\n",
|
|
"\n",
|
|
"graph = Neo4jGraph()\n",
|
|
"\n",
|
|
"# Import movie information\n",
|
|
"\n",
|
|
"movies_query = \"\"\"\n",
|
|
"LOAD CSV WITH HEADERS FROM \n",
|
|
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
|
|
"AS row\n",
|
|
"MERGE (m:Movie {id:row.movieId})\n",
|
|
"SET m.released = date(row.released),\n",
|
|
" m.title = row.title,\n",
|
|
" m.imdbRating = toFloat(row.imdbRating)\n",
|
|
"FOREACH (director in split(row.director, '|') | \n",
|
|
" MERGE (p:Person {name:trim(director)})\n",
|
|
" MERGE (p)-[:DIRECTED]->(m))\n",
|
|
"FOREACH (actor in split(row.actors, '|') | \n",
|
|
" MERGE (p:Person {name:trim(actor)})\n",
|
|
" MERGE (p)-[:ACTED_IN]->(m))\n",
|
|
"FOREACH (genre in split(row.genres, '|') | \n",
|
|
" MERGE (g:Genre {name:trim(genre)})\n",
|
|
" MERGE (m)-[:IN_GENRE]->(g))\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"graph.query(movies_query)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "403b9acd-aa0d-4157-b9de-6ec426835c43",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Custom tools with Cypher templates\n",
|
|
"\n",
|
|
"A semantic layer consists of various tools exposed to an LLM that it can use to interact with a knowledge graph.\n",
|
|
"They can be of various complexity. You can think of each tool in a semantic layer as a function.\n",
|
|
"\n",
|
|
"The function we will implement is to retrieve information about movies or their cast."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "d1dc1c8c-f343-4024-924b-a8a86cf5f1af",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from typing import Optional, Type\n",
|
|
"\n",
|
|
"# Import things that are needed generically\n",
|
|
"from langchain.pydantic_v1 import BaseModel, Field\n",
|
|
"from langchain_core.callbacks import (\n",
|
|
" AsyncCallbackManagerForToolRun,\n",
|
|
" CallbackManagerForToolRun,\n",
|
|
")\n",
|
|
"from langchain_core.tools import BaseTool\n",
|
|
"\n",
|
|
"description_query = \"\"\"\n",
|
|
"MATCH (m:Movie|Person)\n",
|
|
"WHERE m.title CONTAINS $candidate OR m.name CONTAINS $candidate\n",
|
|
"MATCH (m)-[r:ACTED_IN|HAS_GENRE]-(t)\n",
|
|
"WITH m, type(r) as type, collect(coalesce(t.name, t.title)) as names\n",
|
|
"WITH m, type+\": \"+reduce(s=\"\", n IN names | s + n + \", \") as types\n",
|
|
"WITH m, collect(types) as contexts\n",
|
|
"WITH m, \"type:\" + labels(m)[0] + \"\\ntitle: \"+ coalesce(m.title, m.name) \n",
|
|
" + \"\\nyear: \"+coalesce(m.released,\"\") +\"\\n\" +\n",
|
|
" reduce(s=\"\", c in contexts | s + substring(c, 0, size(c)-2) +\"\\n\") as context\n",
|
|
"RETURN context LIMIT 1\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"\n",
|
|
"def get_information(entity: str) -> str:\n",
|
|
" try:\n",
|
|
" data = graph.query(description_query, params={\"candidate\": entity})\n",
|
|
" return data[0][\"context\"]\n",
|
|
" except IndexError:\n",
|
|
" return \"No information was found\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bdecc24b-8065-4755-98cc-9c6d093d4897",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can observe that we have defined the Cypher statement used to retrieve information.\n",
|
|
"Therefore, we can avoid generating Cypher statements and use the LLM agent to only populate the input parameters.\n",
|
|
"To provide additional information to an LLM agent about when to use the tool and their input parameters, we wrap the function as a tool."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "f4cde772-0d05-475d-a2f0-b53e1669bd13",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from typing import Optional, Type\n",
|
|
"\n",
|
|
"# Import things that are needed generically\n",
|
|
"from langchain.pydantic_v1 import BaseModel, Field\n",
|
|
"from langchain_core.callbacks import (\n",
|
|
" AsyncCallbackManagerForToolRun,\n",
|
|
" CallbackManagerForToolRun,\n",
|
|
")\n",
|
|
"from langchain_core.tools import BaseTool\n",
|
|
"\n",
|
|
"\n",
|
|
"class InformationInput(BaseModel):\n",
|
|
" entity: str = Field(description=\"movie or a person mentioned in the question\")\n",
|
|
"\n",
|
|
"\n",
|
|
"class InformationTool(BaseTool):\n",
|
|
" name = \"Information\"\n",
|
|
" description = (\n",
|
|
" \"useful for when you need to answer questions about various actors or movies\"\n",
|
|
" )\n",
|
|
" args_schema: Type[BaseModel] = InformationInput\n",
|
|
"\n",
|
|
" def _run(\n",
|
|
" self,\n",
|
|
" entity: str,\n",
|
|
" run_manager: Optional[CallbackManagerForToolRun] = None,\n",
|
|
" ) -> str:\n",
|
|
" \"\"\"Use the tool.\"\"\"\n",
|
|
" return get_information(entity)\n",
|
|
"\n",
|
|
" async def _arun(\n",
|
|
" self,\n",
|
|
" entity: str,\n",
|
|
" run_manager: Optional[AsyncCallbackManagerForToolRun] = None,\n",
|
|
" ) -> str:\n",
|
|
" \"\"\"Use the tool asynchronously.\"\"\"\n",
|
|
" return get_information(entity)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ff4820aa-2b57-4558-901f-6d984b326738",
|
|
"metadata": {},
|
|
"source": [
|
|
"## OpenAI Agent\n",
|
|
"\n",
|
|
"LangChain expression language makes it very convenient to define an agent to interact with a graph database over the semantic layer."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "6e959ac2-537d-4358-a43b-e3a47f68e1d6",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from typing import List, Tuple\n",
|
|
"\n",
|
|
"from langchain.agents import AgentExecutor\n",
|
|
"from langchain.agents.format_scratchpad import format_to_openai_function_messages\n",
|
|
"from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser\n",
|
|
"from langchain_core.messages import AIMessage, HumanMessage\n",
|
|
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
|
|
"from langchain_core.utils.function_calling import convert_to_openai_function\n",
|
|
"from langchain_openai import ChatOpenAI\n",
|
|
"\n",
|
|
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
|
|
"tools = [InformationTool()]\n",
|
|
"\n",
|
|
"llm_with_tools = llm.bind(functions=[convert_to_openai_function(t) for t in tools])\n",
|
|
"\n",
|
|
"prompt = ChatPromptTemplate.from_messages(\n",
|
|
" [\n",
|
|
" (\n",
|
|
" \"system\",\n",
|
|
" \"You are a helpful assistant that finds information about movies \"\n",
|
|
" \" and recommends them. If tools require follow up questions, \"\n",
|
|
" \"make sure to ask the user for clarification. Make sure to include any \"\n",
|
|
" \"available options that need to be clarified in the follow up questions \"\n",
|
|
" \"Do only the things the user specifically requested. \",\n",
|
|
" ),\n",
|
|
" MessagesPlaceholder(variable_name=\"chat_history\"),\n",
|
|
" (\"user\", \"{input}\"),\n",
|
|
" MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n",
|
|
" ]\n",
|
|
")\n",
|
|
"\n",
|
|
"\n",
|
|
"def _format_chat_history(chat_history: List[Tuple[str, str]]):\n",
|
|
" buffer = []\n",
|
|
" for human, ai in chat_history:\n",
|
|
" buffer.append(HumanMessage(content=human))\n",
|
|
" buffer.append(AIMessage(content=ai))\n",
|
|
" return buffer\n",
|
|
"\n",
|
|
"\n",
|
|
"agent = (\n",
|
|
" {\n",
|
|
" \"input\": lambda x: x[\"input\"],\n",
|
|
" \"chat_history\": lambda x: _format_chat_history(x[\"chat_history\"])\n",
|
|
" if x.get(\"chat_history\")\n",
|
|
" else [],\n",
|
|
" \"agent_scratchpad\": lambda x: format_to_openai_function_messages(\n",
|
|
" x[\"intermediate_steps\"]\n",
|
|
" ),\n",
|
|
" }\n",
|
|
" | prompt\n",
|
|
" | llm_with_tools\n",
|
|
" | OpenAIFunctionsAgentOutputParser()\n",
|
|
")\n",
|
|
"\n",
|
|
"agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "b0459833-fe84-4ebc-9823-a3a3ffd929e9",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"\n",
|
|
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
|
"\u001b[32;1m\u001b[1;3m\n",
|
|
"Invoking: `Information` with `{'entity': 'Casino'}`\n",
|
|
"\n",
|
|
"\n",
|
|
"\u001b[0m\u001b[36;1m\u001b[1;3mtype:Movie\n",
|
|
"title: Casino\n",
|
|
"year: 1995-11-22\n",
|
|
"ACTED_IN: Joe Pesci, Robert De Niro, Sharon Stone, James Woods\n",
|
|
"\u001b[0m\u001b[32;1m\u001b[1;3mThe movie \"Casino\" starred Joe Pesci, Robert De Niro, Sharon Stone, and James Woods.\u001b[0m\n",
|
|
"\n",
|
|
"\u001b[1m> Finished chain.\u001b[0m\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'input': 'Who played in Casino?',\n",
|
|
" 'output': 'The movie \"Casino\" starred Joe Pesci, Robert De Niro, Sharon Stone, and James Woods.'}"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"agent_executor.invoke({\"input\": \"Who played in Casino?\"})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c2759973-de8a-4624-8930-c90a21d6caa3",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.1"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|