diff --git a/docs/docs/use_cases/graph/constructing.ipynb b/docs/docs/use_cases/graph/constructing.ipynb new file mode 100644 index 00000000000..3a63893c0e5 --- /dev/null +++ b/docs/docs/use_cases/graph/constructing.ipynb @@ -0,0 +1,261 @@ +{ + "cells": [ + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "---\n", + "sidebar_position: 4\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Constructing knowledge graphs\n", + "\n", + "In this guide we'll go over the basic ways of constructing a knowledge graph based on unstructured text. The constructured graph can then be used as knowledge base in a RAG application.\n", + "\n", + "## ⚠️ Security note ⚠️\n", + "\n", + "Constructing knowledge graphs requires executing write access to the database. There are inherent risks in doing this. Make sure that you verify and validate data before importing it. For more on general security best practices, [see here](/docs/security).\n", + "\n", + "\n", + "## Architecture\n", + "\n", + "At a high-level, the steps of constructing a knowledge are from text are:\n", + "\n", + "1. **Extracting structured information from text**: Model is used to extract structured graph information from text.\n", + "2. **Storing into graph database**: Storing the extracted structured graph information into a graph database enables downstream RAG applications\n", + "\n", + "## Setup\n", + "\n", + "First, get required packages and set environment variables.\n", + "In this example, we will be using Neo4j graph database." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "%pip install --upgrade --quiet langchain langchain-community langchain-openai langchain-experimental neo4j" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We default to OpenAI models in this guide." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " ········\n" + ] + } + ], + "source": [ + "import getpass\n", + "import os\n", + "\n", + "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n", + "\n", + "# Uncomment the below to use LangSmith. Not required.\n", + "# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n", + "# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we need to define Neo4j credentials and connection.\n", + "Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "from langchain_community.graphs import Neo4jGraph\n", + "\n", + "os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n", + "os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n", + "os.environ[\"NEO4J_PASSWORD\"] = \"password\"\n", + "\n", + "graph = Neo4jGraph()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## LLM Graph Transformer\n", + "\n", + "Extracting graph data from text enables the transformation of unstructured information into structured formats, facilitating deeper insights and more efficient navigation through complex relationships and patterns. The `LLMGraphTransformer` converts text documents into structured graph documents by leveraging a LLM to parse and categorize entities and their relationships. The selection of the LLM model significantly influences the output by determining the accuracy and nuance of the extracted graph data.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "from langchain_experimental.graph_transformers import LLMGraphTransformer\n", + "from langchain_openai import ChatOpenAI\n", + "\n", + "llm = ChatOpenAI(temperature=0, model_name=\"gpt-4-0125-preview\")\n", + "\n", + "llm_transformer = LLMGraphTransformer(llm=llm)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can pass in example text and examine the results." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Nodes:[Node(id='Marie Curie', type='Person'), Node(id='Polish', type='Nationality'), Node(id='French', type='Nationality'), Node(id='Physicist', type='Occupation'), Node(id='Chemist', type='Occupation'), Node(id='Radioactivity', type='Field'), Node(id='Nobel Prize', type='Award'), Node(id='Pierre Curie', type='Person'), Node(id='University Of Paris', type='Organization')]\n", + "Relationships:[Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Polish', type='Nationality'), type='NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='French', type='Nationality'), type='NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Physicist', type='Occupation'), type='OCCUPATION'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Chemist', type='Occupation'), type='OCCUPATION'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Radioactivity', type='Field'), type='RESEARCH_FIELD'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='AWARD_WINNER'), Relationship(source=Node(id='Pierre Curie', type='Person'), target=Node(id='Nobel Prize', type='Award'), type='AWARD_WINNER'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='University Of Paris', type='Organization'), type='PROFESSOR')]\n" + ] + } + ], + "source": [ + "from langchain_core.documents import Document\n", + "\n", + "text = \"\"\"\n", + "Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.\n", + "She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.\n", + "Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.\n", + "She was, in 1906, the first woman to become a professor at the University of Paris.\n", + "\"\"\"\n", + "documents = [Document(page_content=text)]\n", + "graph_documents = llm_transformer.convert_to_graph_documents(documents)\n", + "print(f\"Nodes:{graph_documents[0].nodes}\")\n", + "print(f\"Relationships:{graph_documents[0].relationships}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Examine the following image to better grasp the structure of the generated knowledge graph. \n", + "\n", + "![graph_construction1.png](../../../static/img/graph_construction1.png)\n", + "\n", + "Note that the graph construction process is non-deterministic since we are using LLM. Therefore, you might get slightly different results on each execution.\n", + "\n", + "Additionally, you have the flexibility to define specific types of nodes and relationships for extraction according to your requirements." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Nodes:[Node(id='Marie Curie', type='Person'), Node(id='Polish', type='Country'), Node(id='French', type='Country'), Node(id='Pierre Curie', type='Person'), Node(id='University Of Paris', type='Organization')]\n", + "Relationships:[Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='Polish', type='Country'), type='NATIONALITY'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='French', type='Country'), type='NATIONALITY'), Relationship(source=Node(id='Pierre Curie', type='Person'), target=Node(id='Marie Curie', type='Person'), type='SPOUSE'), Relationship(source=Node(id='Marie Curie', type='Person'), target=Node(id='University Of Paris', type='Organization'), type='WORKED_AT')]\n" + ] + } + ], + "source": [ + "llm_transformer_filtered = LLMGraphTransformer(\n", + " llm=llm,\n", + " allowed_nodes=[\"Person\", \"Country\", \"Organization\"],\n", + " allowed_relationships=[\"NATIONALITY\", \"LOCATED_IN\", \"WORKED_AT\", \"SPOUSE\"],\n", + ")\n", + "graph_documents_filtered = llm_transformer_filtered.convert_to_graph_documents(\n", + " documents\n", + ")\n", + "print(f\"Nodes:{graph_documents_filtered[0].nodes}\")\n", + "print(f\"Relationships:{graph_documents_filtered[0].relationships}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For a better understanding of the generated graph, we can again visualize it.\n", + "\n", + "![graph_construction2.png](../../../static/img/graph_construction2.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Storing to graph database\n", + "\n", + "The generated graph documents can be stored to a graph database using the `add_graph_documents` method." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "graph.add_graph_documents(graph_documents_filtered)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.18" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/docs/use_cases/graph/index.ipynb b/docs/docs/use_cases/graph/index.ipynb index 06bffdbf7a5..2a8de5e0f46 100644 --- a/docs/docs/use_cases/graph/index.ipynb +++ b/docs/docs/use_cases/graph/index.ipynb @@ -41,7 +41,8 @@ "\n", "* [Prompting strategies](/docs/use_cases/graph/prompting): Advanced prompt engineering techniques.\n", "* [Mapping values](/docs/use_cases/graph/mapping): Techniques for mapping values from questions to database.\n", - "* [Semantic layer](/docs/use_cases/graph/semantic): Techniques for working implementing semantic layers." + "* [Semantic layer](/docs/use_cases/graph/semantic): Techniques for implementing semantic layers.\n", + "* [Constructing graphs](/docs/use_cases/graph/constructing): Techniques for constructing knowledge graphs." ] }, { diff --git a/docs/docs/use_cases/graph/quickstart.ipynb b/docs/docs/use_cases/graph/quickstart.ipynb index f479ed45374..eb4d8c5e9b1 100644 --- a/docs/docs/use_cases/graph/quickstart.ipynb +++ b/docs/docs/use_cases/graph/quickstart.ipynb @@ -301,7 +301,8 @@ "\n", "* [Prompting strategies](/docs/use_cases/graph/prompting): Advanced prompt engineering techniques.\n", "* [Mapping values](/docs/use_cases/graph/mapping): Techniques for mapping values from questions to database.\n", - "* [Semantic layer](/docs/use_cases/graph/semantic): Techniques for working implementing semantic layers." + "* [Semantic layer](/docs/use_cases/graph/semantic): Techniques for implementing semantic layers.\n", + "* [Constructing graphs](/docs/use_cases/graph/constructing): Techniques for constructing knowledge graphs." ] }, { diff --git a/docs/static/img/graph_construction1.png b/docs/static/img/graph_construction1.png new file mode 100644 index 00000000000..9639f751e6c Binary files /dev/null and b/docs/static/img/graph_construction1.png differ diff --git a/docs/static/img/graph_construction2.png b/docs/static/img/graph_construction2.png new file mode 100644 index 00000000000..4e34adc95bb Binary files /dev/null and b/docs/static/img/graph_construction2.png differ