diff --git a/docs/extras/use_cases/more/graph/diffbot_graphtransformer.ipynb b/docs/extras/use_cases/more/graph/diffbot_graphtransformer.ipynb new file mode 100644 index 00000000000..da1c2fc020f --- /dev/null +++ b/docs/extras/use_cases/more/graph/diffbot_graphtransformer.ipynb @@ -0,0 +1,307 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7f0b0c06-ee70-468c-8bf5-b023f9e5e0a2", + "metadata": {}, + "source": [ + "# Diffbot Graph Transformer\n", + "\n", + "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/more/graph/diffbot_transformer.ipynb)\n", + "\n", + "## Use case\n", + "\n", + "Text data often contain rich relationships and insights that can be useful for various analytics, recommendation engines, or knowledge management applications.\n", + "\n", + "Diffbot's NLP API allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.\n", + "\n", + "By coupling Diffbot's NLP API with Neo4j, a graph database, you can create powerful, dynamic graph structures based on the information extracted from text. These graph structures are fully queryable and can be integrated into various applications.\n", + "\n", + "This combination allows for use cases such as:\n", + "\n", + "* Building knowledge graphs from textual documents, websites, or social media feeds.\n", + "* Generating recommendations based on semantic relationships in the data.\n", + "* Creating advanced search features that understand the relationships between entities.\n", + "* Building analytics dashboards that allow users to explore the hidden relationships in data.\n", + "\n", + "## Overview\n", + "\n", + "LangChain provides tools to interact with Graph Databases:\n", + "\n", + "1. `Construct knowledge graphs from text` using graph transformer and store integrations \n", + "2. `Query a graph database` using chains for query creation and execution\n", + "3. `Interact with a graph database` using agents for robust and flexible querying \n", + "\n", + "## Quickstart\n", + "\n", + "First, get required packages and set environment variables:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "975648da-b24f-4164-a671-6772179e12df", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install langchain langchain-experimental openai neo4j wikipedia" + ] + }, + { + "cell_type": "markdown", + "id": "77718977-629e-46c2-b091-f9191b9ec569", + "metadata": {}, + "source": [ + "## Diffbot NLP Service\n", + "\n", + "Diffbot's NLP service is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n", + "This extracted information can be used to construct a knowledge graph.\n", + "To use their service, you'll need to obtain an API key from [Diffbot](https://www.diffbot.com/products/natural-language/)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "2cbf97d0-3682-439b-8750-b695ff726789", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer\n", + "\n", + "diffbot_api_key = \"DIFFBOT_API_KEY\"\n", + "diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)" + ] + }, + { + "cell_type": "markdown", + "id": "5e3b894a-e3ee-46c7-8116-f8377f8f0159", + "metadata": {}, + "source": [ + "This code fetches Wikipedia articles about \"Baldur's Gate 3\" and then uses `DiffbotGraphTransformer` to extract entities and relationships.\n", + "The `DiffbotGraphTransformer` outputs a structured data `GraphDocument`, which can be used to populate a graph database.\n", + "Note that text chunking is avoided due to Diffbot's [character limit per API request](https://docs.diffbot.com/reference/introduction-to-natural-language-api)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "53f8df86-47a1-44a1-9a0f-6725b90703bc", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.document_loaders import WikipediaLoader\n", + "\n", + "query = \"Warren Buffett\"\n", + "raw_documents = WikipediaLoader(query=query).load()\n", + "graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)" + ] + }, + { + "cell_type": "markdown", + "id": "31bb851a-aab4-4b97-a6b7-fce397d32b47", + "metadata": {}, + "source": [ + "## Loading the data into a knowledge graph\n", + "\n", + "You will need to have a running Neo4j instance. One option is to create a [free Neo4j database instance in their Aura cloud service](https://neo4j.com/cloud/platform/aura-graph-database/). You can also run the database locally using the [Neo4j Desktop application](https://neo4j.com/download/), or running a docker container. You can run a local docker container by running the executing the following script:\n", + "```\n", + "docker run \\\n", + " --name neo4j \\\n", + " -p 7474:7474 -p 7687:7687 \\\n", + " -d \\\n", + " -e NEO4J_AUTH=neo4j/pleaseletmein \\\n", + " -e NEO4J_PLUGINS=\\[\\\"apoc\\\"\\] \\\n", + " neo4j:latest\n", + "``` \n", + "If you are using the docker container, you need to wait a couple of second for the database to start." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "0b2b6641-5a5d-467c-b148-e6aad5e4baa7", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.graphs import Neo4jGraph\n", + "\n", + "url=\"bolt://localhost:7687\"\n", + "username=\"neo4j\"\n", + "password=\"pleaseletmein\"\n", + "\n", + "graph = Neo4jGraph(\n", + " url=url,\n", + " username=username, \n", + " password=password\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "0b15e840-fe6f-45db-9193-1b4e2df5c12c", + "metadata": {}, + "source": [ + "The `GraphDocuments` can be loaded into a knowledge graph using the `add_graph_documents` method." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "1a67c4a8-955c-42a2-9c5d-de3ac0e640ec", + "metadata": {}, + "outputs": [], + "source": [ + "graph.add_graph_documents(graph_documents)" + ] + }, + { + "cell_type": "markdown", + "id": "ed411e05-2b03-460d-997e-938482774f40", + "metadata": {}, + "source": [ + "## Refresh graph schema information\n", + "If the schema of database changes, you can refresh the schema information needed to generate Cypher statements" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "904c9ee3-787c-403f-857d-459ce5ad5a1b", + "metadata": {}, + "outputs": [], + "source": [ + "graph.refresh_schema()" + ] + }, + { + "cell_type": "markdown", + "id": "f19d1387-5899-4258-8c94-8ef5fa7db464", + "metadata": {}, + "source": [ + "## Querying the graph\n", + "We can now use the graph cypher QA chain to ask question of the graph. It is advisable to use **gpt-4** to construct Cypher queries to get the best experience." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "9393b732-67c8-45c1-9ec2-089f49c62448", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.chains import GraphCypherQAChain\n", + "from langchain.chat_models import ChatOpenAI\n", + "\n", + "chain = GraphCypherQAChain.from_llm(\n", + " cypher_llm=ChatOpenAI(temperature=0, model_name=\"gpt-4\"),\n", + " qa_llm=ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo\"),\n", + " graph=graph, verbose=True,\n", + " \n", + ")\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "1a9b3652-b436-404d-aa25-5fb576f23dc0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n", + "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n", + "Generated Cypher:\n", + "\u001b[32;1m\u001b[1;3mMATCH (p:Person {name: \"Warren Buffett\"})-[:EDUCATED_AT]->(o:Organization)\n", + "RETURN o.name\u001b[0m\n", + "Full Context:\n", + "\u001b[32;1m\u001b[1;3m[{'o.name': 'New York Institute of Finance'}, {'o.name': 'Alice Deal Junior High School'}, {'o.name': 'Woodrow Wilson High School'}, {'o.name': 'University of Nebraska'}]\u001b[0m\n", + "\n", + "\u001b[1m> Finished chain.\u001b[0m\n" + ] + }, + { + "data": { + "text/plain": [ + "'Warren Buffett attended the University of Nebraska.'" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chain.run(\"Which university did Warren Buffett attend?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "adc0ba0f-a62c-4875-89ce-da717f3ab148", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n", + "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n", + "Generated Cypher:\n", + "\u001b[32;1m\u001b[1;3mMATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name\u001b[0m\n", + "Full Context:\n", + "\u001b[32;1m\u001b[1;3m[{'p.name': 'Charlie Munger'}, {'p.name': 'Oliver Chace'}, {'p.name': 'Howard Buffett'}, {'p.name': 'Howard'}, {'p.name': 'Susan Buffett'}, {'p.name': 'Warren Buffett'}]\u001b[0m\n", + "\n", + "\u001b[1m> Finished chain.\u001b[0m\n" + ] + }, + { + "data": { + "text/plain": [ + "'Charlie Munger, Oliver Chace, Howard Buffett, Susan Buffett, and Warren Buffett are or were working at Berkshire Hathaway.'" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "chain.run(\"Who is or was working at Berkshire Hathaway?\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d636954b-d967-4e96-9489-92e11c74af35", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/libs/experimental/langchain_experimental/graph_transformers/__init__.py b/libs/experimental/langchain_experimental/graph_transformers/__init__.py new file mode 100644 index 00000000000..3f6c8a665ef --- /dev/null +++ b/libs/experimental/langchain_experimental/graph_transformers/__init__.py @@ -0,0 +1,5 @@ +from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer + +__all__ = [ + "DiffbotGraphTransformer", +] diff --git a/libs/experimental/langchain_experimental/graph_transformers/diffbot.py b/libs/experimental/langchain_experimental/graph_transformers/diffbot.py new file mode 100644 index 00000000000..000c70de4b3 --- /dev/null +++ b/libs/experimental/langchain_experimental/graph_transformers/diffbot.py @@ -0,0 +1,316 @@ +from typing import Any, Dict, List, Optional, Sequence, Tuple, Union + +import requests +from langchain.graphs.graph_document import GraphDocument, Node, Relationship +from langchain.schema import Document +from langchain.utils import get_from_env + + +def format_property_key(s: str) -> str: + words = s.split() + if not words: + return s + first_word = words[0].lower() + capitalized_words = [word.capitalize() for word in words[1:]] + return "".join([first_word] + capitalized_words) + + +class NodesList: + """ + Manages a list of nodes with associated properties. + + Attributes: + nodes (Dict[Tuple, Any]): Stores nodes as keys and their properties as values. + Each key is a tuple where the first element is the + node ID and the second is the node type. + """ + + def __init__(self) -> None: + self.nodes: Dict[Tuple[Union[str, int], str], Any] = dict() + + def add_node_property( + self, node: Tuple[Union[str, int], str], properties: Dict[str, Any] + ) -> None: + """ + Adds or updates node properties. + + If the node does not exist in the list, it's added along with its properties. + If the node already exists, its properties are updated with the new values. + + Args: + node (Tuple): A tuple containing the node ID and node type. + properties (Dict): A dictionary of properties to add or update for the node. + """ + if node not in self.nodes: + self.nodes[node] = properties + else: + self.nodes[node].update(properties) + + def return_node_list(self) -> List[Node]: + """ + Returns the nodes as a list of Node objects. + + Each Node object will have its ID, type, and properties populated. + + Returns: + List[Node]: A list of Node objects. + """ + nodes = [ + Node(id=key[0], type=key[1], properties=self.nodes[key]) + for key in self.nodes + ] + return nodes + + +# Properties that should be treated as node properties instead of relationships +FACT_TO_PROPERTY_TYPE = [ + "Date", + "Number", + "Job title", + "Cause of death", + "Organization type", + "Academic title", +] + + +schema_mapping = [ + ("HEADQUARTERS", "ORGANIZATION_LOCATIONS"), + ("RESIDENCE", "PERSON_LOCATION"), + ("ALL_PERSON_LOCATIONS", "PERSON_LOCATION"), + ("CHILD", "HAS_CHILD"), + ("PARENT", "HAS_PARENT"), + ("CUSTOMERS", "HAS_CUSTOMER"), + ("SKILLED_AT", "INTERESTED_IN"), +] + + +class SimplifiedSchema: + """ + Provides functionality for working with a simplified schema mapping. + + Attributes: + schema (Dict): A dictionary containing the mapping to simplified schema types. + """ + + def __init__(self) -> None: + """Initializes the schema dictionary based on the predefined list.""" + self.schema = dict() + for row in schema_mapping: + self.schema[row[0]] = row[1] + + def get_type(self, type: str) -> str: + """ + Retrieves the simplified schema type for a given original type. + + Args: + type (str): The original schema type to find the simplified type for. + + Returns: + str: The simplified schema type if it exists; + otherwise, returns the original type. + """ + try: + return self.schema[type] + except KeyError: + return type + + +class DiffbotGraphTransformer: + """Transforms documents into graph documents using Diffbot's NLP API. + + A graph document transformation system takes a sequence of Documents and returns a + sequence of Graph Documents. + + Example: + .. code-block:: python + + class DiffbotGraphTransformer(BaseGraphDocumentTransformer): + + def transform_documents( + self, documents: Sequence[Document], **kwargs: Any + ) -> Sequence[GraphDocument]: + results = [] + + for document in documents: + raw_results = self.nlp_request(document.page_content) + graph_document = self.process_response(raw_results, document) + results.append(graph_document) + return results + + async def atransform_documents( + self, documents: Sequence[Document], **kwargs: Any + ) -> Sequence[Document]: + raise NotImplementedError + """ + + def __init__( + self, + diffbot_api_key: Optional[str] = None, + fact_confidence_threshold: float = 0.7, + include_qualifiers: bool = True, + include_evidence: bool = True, + simplified_schema: bool = True, + ) -> None: + """ + Initialize the graph transformer with various options. + + Args: + diffbot_api_key (str): + The API key for Diffbot's NLP services. + + fact_confidence_threshold (float): + Minimum confidence level for facts to be included. + include_qualifiers (bool): + Whether to include qualifiers in the relationships. + include_evidence (bool): + Whether to include evidence for the relationships. + simplified_schema (bool): + Whether to use a simplified schema for relationships. + """ + self.diffbot_api_key = diffbot_api_key or get_from_env( + "diffbot_api_key", "DIFFBOT_API_KEY" + ) + self.fact_threshold_confidence = fact_confidence_threshold + self.include_qualifiers = include_qualifiers + self.include_evidence = include_evidence + self.simplified_schema = None + if simplified_schema: + self.simplified_schema = SimplifiedSchema() + + def nlp_request(self, text: str) -> Dict[str, Any]: + """ + Make an API request to the Diffbot NLP endpoint. + + Args: + text (str): The text to be processed. + + Returns: + Dict[str, Any]: The JSON response from the API. + """ + + # Relationship extraction only works for English + payload = { + "content": text, + "lang": "en", + } + + FIELDS = "facts" + HOST = "nl.diffbot.com" + url = ( + f"https://{HOST}/v1/?fields={FIELDS}&" + f"token={self.diffbot_api_key}&language=en" + ) + result = requests.post(url, data=payload) + return result.json() + + def process_response( + self, payload: Dict[str, Any], document: Document + ) -> GraphDocument: + """ + Transform the Diffbot NLP response into a GraphDocument. + + Args: + payload (Dict[str, Any]): The JSON response from Diffbot's NLP API. + document (Document): The original document. + + Returns: + GraphDocument: The transformed document as a graph. + """ + + # Return empty result if there are no facts + if "facts" not in payload or not payload["facts"]: + return GraphDocument(nodes=[], relationships=[], source=document) + + # Nodes are a custom class because we need to deduplicate + nodes_list = NodesList() + # Relationships are a list because we don't deduplicate nor anything else + relationships = list() + for record in payload["facts"]: + # Skip if the fact is below the threshold confidence + if record["confidence"] < self.fact_threshold_confidence: + continue + + # TODO: It should probably be treated as a node property + if not record["value"]["allTypes"]: + continue + + # Define source node + source_id = ( + record["entity"]["allUris"][0] + if record["entity"]["allUris"] + else record["entity"]["name"] + ) + source_label = record["entity"]["allTypes"][0]["name"].capitalize() + source_name = record["entity"]["name"] + source_node = Node(id=source_id, type=source_label) + nodes_list.add_node_property( + (source_id, source_label), {"name": source_name} + ) + + # Define target node + target_id = ( + record["value"]["allUris"][0] + if record["value"]["allUris"] + else record["value"]["name"] + ) + target_label = record["value"]["allTypes"][0]["name"].capitalize() + target_name = record["value"]["name"] + # Some facts are better suited as node properties + if target_label in FACT_TO_PROPERTY_TYPE: + nodes_list.add_node_property( + (source_id, source_label), + {format_property_key(record["property"]["name"]): target_name}, + ) + else: # Define relationship + # Define target node object + target_node = Node(id=target_id, type=target_label) + nodes_list.add_node_property( + (target_id, target_label), {"name": target_name} + ) + # Define relationship type + rel_type = record["property"]["name"].replace(" ", "_").upper() + if self.simplified_schema: + rel_type = self.simplified_schema.get_type(rel_type) + + # Relationship qualifiers/properties + rel_properties = dict() + relationship_evidence = [el["passage"] for el in record["evidence"]][0] + if self.include_evidence: + rel_properties.update({"evidence": relationship_evidence}) + if self.include_qualifiers and record.get("qualifiers"): + for property in record["qualifiers"]: + prop_key = format_property_key(property["property"]["name"]) + rel_properties[prop_key] = property["value"]["name"] + + relationship = Relationship( + source=source_node, + target=target_node, + type=rel_type, + properties=rel_properties, + ) + relationships.append(relationship) + + return GraphDocument( + nodes=nodes_list.return_node_list(), + relationships=relationships, + source=document, + ) + + def convert_to_graph_documents( + self, documents: Sequence[Document] + ) -> List[GraphDocument]: + """Convert a sequence of documents into graph documents. + + Args: + documents (Sequence[Document]): The original documents. + **kwargs: Additional keyword arguments. + + Returns: + Sequence[GraphDocument]: The transformed documents as graphs. + """ + results = [] + for document in documents: + raw_results = self.nlp_request(document.page_content) + graph_document = self.process_response(raw_results, document) + results.append(graph_document) + return results diff --git a/libs/experimental/poetry.lock b/libs/experimental/poetry.lock index b0d5b9139af..620da0f99ae 100644 --- a/libs/experimental/poetry.lock +++ b/libs/experimental/poetry.lock @@ -3752,6 +3752,31 @@ files = [ {file = "types_PyYAML-6.0.12.11-py3-none-any.whl", hash = "sha256:a461508f3096d1d5810ec5ab95d7eeecb651f3a15b71959999988942063bf01d"}, ] +[[package]] +name = "types-requests" +version = "2.31.0.2" +description = "Typing stubs for requests" +optional = false +python-versions = "*" +files = [ + {file = "types-requests-2.31.0.2.tar.gz", hash = "sha256:6aa3f7faf0ea52d728bb18c0a0d1522d9bfd8c72d26ff6f61bfc3d06a411cf40"}, + {file = "types_requests-2.31.0.2-py3-none-any.whl", hash = "sha256:56d181c85b5925cbc59f4489a57e72a8b2166f18273fd8ba7b6fe0c0b986f12a"}, +] + +[package.dependencies] +types-urllib3 = "*" + +[[package]] +name = "types-urllib3" +version = "1.26.25.14" +description = "Typing stubs for urllib3" +optional = false +python-versions = "*" +files = [ + {file = "types-urllib3-1.26.25.14.tar.gz", hash = "sha256:229b7f577c951b8c1b92c1bc2b2fdb0b49847bd2af6d1cc2a2e3dd340f3bda8f"}, + {file = "types_urllib3-1.26.25.14-py3-none-any.whl", hash = "sha256:9683bbb7fb72e32bfe9d2be6e04875fbe1b3eeec3cbb4ea231435aa7fd6b4f0e"}, +] + [[package]] name = "typing-extensions" version = "4.7.1" @@ -3995,4 +4020,4 @@ extended-testing = ["faker", "presidio-analyzer", "presidio-anonymizer"] [metadata] lock-version = "2.0" python-versions = ">=3.8.1,<4.0" -content-hash = "66ac482bd05eb74414210ac28fc1e8dae1a9928a4a1314e1326fada3551aa8ad" +content-hash = "443e88f690572715cf58671e4480a006574c7141a1258dff0a0818b954184901" diff --git a/libs/experimental/pyproject.toml b/libs/experimental/pyproject.toml index 8e876c392c1..5ec66559a27 100644 --- a/libs/experimental/pyproject.toml +++ b/libs/experimental/pyproject.toml @@ -23,6 +23,7 @@ black = "^23.1.0" [tool.poetry.group.typing.dependencies] mypy = "^0.991" types-pyyaml = "^6.0.12.2" +types-requests = "^2.28.11.5" [tool.poetry.group.dev.dependencies] jupyter = "^1.0.0" diff --git a/libs/langchain/langchain/graphs/graph_document.py b/libs/langchain/langchain/graphs/graph_document.py new file mode 100644 index 00000000000..9f72a3ad8e0 --- /dev/null +++ b/libs/langchain/langchain/graphs/graph_document.py @@ -0,0 +1,51 @@ +from __future__ import annotations + +from typing import List, Union + +from langchain.load.serializable import Serializable +from langchain.pydantic_v1 import Field +from langchain.schema import Document + + +class Node(Serializable): + """Represents a node in a graph with associated properties. + + Attributes: + id (Union[str, int]): A unique identifier for the node. + type (str): The type or label of the node, default is "Node". + properties (dict): Additional properties and metadata associated with the node. + """ + + id: Union[str, int] + type: str = "Node" + properties: dict = Field(default_factory=dict) + + +class Relationship(Serializable): + """Represents a directed relationship between two nodes in a graph. + + Attributes: + source (Node): The source node of the relationship. + target (Node): The target node of the relationship. + type (str): The type of the relationship. + properties (dict): Additional properties associated with the relationship. + """ + + source: Node + target: Node + type: str + properties: dict = Field(default_factory=dict) + + +class GraphDocument(Serializable): + """Represents a graph document consisting of nodes and relationships. + + Attributes: + nodes (List[Node]): A list of nodes in the graph. + relationships (List[Relationship]): A list of relationships in the graph. + source (Document): The document from which the graph information is derived. + """ + + nodes: List[Node] + relationships: List[Relationship] + source: Document diff --git a/libs/langchain/langchain/graphs/neo4j_graph.py b/libs/langchain/langchain/graphs/neo4j_graph.py index 02572b2d1a1..256df9d26bd 100644 --- a/libs/langchain/langchain/graphs/neo4j_graph.py +++ b/libs/langchain/langchain/graphs/neo4j_graph.py @@ -1,5 +1,7 @@ from typing import Any, Dict, List +from langchain.graphs.graph_document import GraphDocument + node_properties_query = """ CALL apoc.meta.data() YIELD label, other, elementType, type, property @@ -99,3 +101,56 @@ class Neo4jGraph: The relationships are the following: {[el['output'] for el in relationships]} """ + + def add_graph_documents( + self, graph_documents: List[GraphDocument], include_source: bool = False + ) -> None: + """ + Take GraphDocument as input as uses it to construct a graph. + """ + for document in graph_documents: + include_docs_query = ( + "CREATE (d:Document) " + "SET d.text = $document.page_content " + "SET d += $document.metadata " + "WITH d " + ) + # Import nodes + self.query( + ( + f"{include_docs_query if include_source else ''}" + "UNWIND $data AS row " + "CALL apoc.merge.node([row.type], {id: row.id}, " + "row.properties, {}) YIELD node " + f"{'MERGE (d)-[:MENTIONS]->(node) ' if include_source else ''}" + "RETURN distinct 'done' AS result" + ), + { + "data": [el.__dict__ for el in document.nodes], + "document": document.source.__dict__, + }, + ) + # Import relationships + self.query( + "UNWIND $data AS row " + "CALL apoc.merge.node([row.source_label], {id: row.source}," + "{}, {}) YIELD node as source " + "CALL apoc.merge.node([row.target_label], {id: row.target}," + "{}, {}) YIELD node as target " + "CALL apoc.merge.relationship(source, row.type, " + "{}, row.properties, target) YIELD rel " + "RETURN distinct 'done'", + { + "data": [ + { + "source": el.source.id, + "source_label": el.source.type, + "target": el.target.id, + "target_label": el.target.type, + "type": el.type.replace(" ", "_").upper(), + "properties": el.properties, + } + for el in document.relationships + ] + }, + )