add neo4j query constructor for self query (#25288)

- [x] **PR title - community: add neo4j query constructor for self query** - [x] **PR message** - **Description:** adding a Neo4jTranslator so that the Neo4j vector database can use SelfQueryRetriever - **Issue:** this issue had been raised before in #19748 - **Dependencies:** none. - **Twitter handle:** @moyi_dang - p.s. I have not added the query constructor in BUILTIN_TRANSLATORS in this PR, I want to make changes to only one package at a time. - [x] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-08-31 18:38:48 +00:00 · 2024-08-30 07:54:33 -07:00
parent b5d670498f
commit 6377185291
4 changed files with 556 additions and 0 deletions
--- a/docs/docs/integrations/retrievers/self_query/neo4j_self_query.ipynb
+++ b/docs/docs/integrations/retrievers/self_query/neo4j_self_query.ipynb
@@ -0,0 +1,403 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Neo4j\n",
+    "\n",
+    ">[Neo4j](https://neo4j.com/docs/) is a graph database that stores nodes and relationships, that also supports native vector search.\n",
+    "\n",
+    "In the notebook, we'll demo the `SelfQueryRetriever` wrapped around a `Neo4j` vector store. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Creating a Neo4j vector store\n",
+    "First we'll want to create a Neo4j vector store and seed it with some data. We've created a small demo set of documents that contain summaries of movies."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: neo4j in /Users/moyi/git/langchain/env/lib/python3.11/site-packages (5.24.0)\n",
+      "Requirement already satisfied: pytz in /Users/moyi/git/langchain/env/lib/python3.11/site-packages (from neo4j) (2024.1)\n",
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip install --upgrade neo4j"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      "OpenAI API Key: ········\n"
+     ]
+    }
+   ],
+   "source": [
+    "import getpass\n",
+    "import os\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      "Neo4j URL: ········\n",
+      "Neo4j User Name: ········\n",
+      "Neo4j Password: ········\n"
+     ]
+    }
+   ],
+   "source": [
+    "# To run this notebook, you can set up a free neo4j account on neo4j.com and input the following information.\n",
+    "# (If you are having trouble connecting to the database, try using neo4j+ssc: instead of neo4j+s)\n",
+    "\n",
+    "os.environ[\"NEO4J_URI\"] = getpass.getpass(\"Neo4j URL:\")\n",
+    "os.environ[\"NEO4J_USERNAME\"] = getpass.getpass(\"Neo4j User Name:\")\n",
+    "os.environ[\"NEO4J_PASSWORD\"] = getpass.getpass(\"Neo4j Password:\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.vectorstores import Neo4jVector\n",
+    "from langchain_core.documents import Document\n",
+    "from langchain_openai import OpenAIEmbeddings\n",
+    "\n",
+    "embeddings = OpenAIEmbeddings()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: CALL subquery without a variable scope clause is now deprecated. Use CALL (row) { ... }} {position: line: 1, column: 21, offset: 20} for query: \"UNWIND $data AS row CALL { WITH row MERGE (c:`Chunk` {id: row.id}) WITH c, row CALL db.create.setNodeVectorProperty(c, 'embedding', row.embedding) SET c.`text` = row.text SET c += row.metadata } IN TRANSACTIONS OF 1000 ROWS \"\n"
+     ]
+    }
+   ],
+   "source": [
+    "docs = [\n",
+    "    Document(\n",
+    "        page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n",
+    "        metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\"},\n",
+    "    ),\n",
+    "    Document(\n",
+    "        page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n",
+    "        metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2},\n",
+    "    ),\n",
+    "    Document(\n",
+    "        page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n",
+    "        metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6},\n",
+    "    ),\n",
+    "    Document(\n",
+    "        page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n",
+    "        metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3},\n",
+    "    ),\n",
+    "    Document(\n",
+    "        page_content=\"Toys come alive and have a blast doing so\",\n",
+    "        metadata={\"year\": 1995, \"genre\": \"animated\"},\n",
+    "    ),\n",
+    "    Document(\n",
+    "        page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n",
+    "        metadata={\n",
+    "            \"year\": 1979,\n",
+    "            \"director\": \"Andrei Tarkovsky\",\n",
+    "            \"genre\": \"science fiction\",\n",
+    "            \"rating\": 9.9,\n",
+    "        },\n",
+    "    ),\n",
+    "]\n",
+    "vectorstore = Neo4jVector.from_documents(docs, embeddings)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Creating our self-querying retriever\n",
+    "Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.chains.query_constructor.base import AttributeInfo\n",
+    "from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
+    "from langchain_openai import OpenAI\n",
+    "\n",
+    "metadata_field_info = [\n",
+    "    AttributeInfo(\n",
+    "        name=\"genre\",\n",
+    "        description=\"The genre of the movie\",\n",
+    "        type=\"string or list[string]\",\n",
+    "    ),\n",
+    "    AttributeInfo(\n",
+    "        name=\"year\",\n",
+    "        description=\"The year the movie was released\",\n",
+    "        type=\"integer\",\n",
+    "    ),\n",
+    "    AttributeInfo(\n",
+    "        name=\"director\",\n",
+    "        description=\"The name of the movie director\",\n",
+    "        type=\"string\",\n",
+    "    ),\n",
+    "    AttributeInfo(\n",
+    "        name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n",
+    "    ),\n",
+    "]\n",
+    "document_content_description = \"Brief summary of a movie\"\n",
+    "llm = OpenAI(temperature=0)\n",
+    "retriever = SelfQueryRetriever.from_llm(\n",
+    "    llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Testing it out\n",
+    "And now we can try actually using our retriever!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Document(metadata={'genre': 'science fiction', 'year': 1993, 'rating': 7.7}, page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose'),\n",
+       " Document(metadata={'genre': 'animated', 'year': 1995}, page_content='Toys come alive and have a blast doing so'),\n",
+       " Document(metadata={'genre': 'science fiction', 'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky'}, page_content='Three men walk into the Zone, three men walk out of the Zone'),\n",
+       " Document(metadata={'year': 2006, 'rating': 8.6, 'director': 'Satoshi Kon'}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea')]"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# This example only specifies a relevant query\n",
+    "retriever.invoke(\"What are some movies about dinosaurs\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Document(metadata={'genre': 'science fiction', 'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky'}, page_content='Three men walk into the Zone, three men walk out of the Zone'),\n",
+       " Document(metadata={'year': 2006, 'rating': 8.6, 'director': 'Satoshi Kon'}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea')]"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# This example only specifies a filter\n",
+    "retriever.invoke(\"I want to watch a movie rated higher than 8.5\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Document(metadata={'year': 2019, 'rating': 8.3, 'director': 'Greta Gerwig'}, page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them')]"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# This example specifies a query and a filter\n",
+    "retriever.invoke(\"Has Greta Gerwig directed any movies about women\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Document(metadata={'year': 2006, 'rating': 8.6, 'director': 'Satoshi Kon'}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea'),\n",
+       " Document(metadata={'genre': 'science fiction', 'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky'}, page_content='Three men walk into the Zone, three men walk out of the Zone')]"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# This example specifies a composite filter\n",
+    "retriever.invoke(\"What's a highly rated (above 8.5) science fiction film?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Document(metadata={'genre': 'animated', 'year': 1995}, page_content='Toys come alive and have a blast doing so')]"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# This example specifies a query and composite filter\n",
+    "retriever.invoke(\n",
+    "    \"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Filter k\n",
+    "\n",
+    "We can also use the self query retriever to specify `k`: the number of documents to fetch.\n",
+    "\n",
+    "We can do this by passing `enable_limit=True` to the constructor."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "retriever = SelfQueryRetriever.from_llm(\n",
+    "    llm,\n",
+    "    vectorstore,\n",
+    "    document_content_description,\n",
+    "    metadata_field_info,\n",
+    "    enable_limit=True,\n",
+    "    verbose=True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Document(metadata={'genre': 'science fiction', 'year': 1993, 'rating': 7.7}, page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose'),\n",
+       " Document(metadata={'genre': 'animated', 'year': 1995}, page_content='Toys come alive and have a blast doing so')]"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# This example only specifies a relevant query\n",
+    "retriever.invoke(\"what are two movies about dinosaurs\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
--- a/libs/community/langchain_community/query_constructors/neo4j.py
+++ b/libs/community/langchain_community/query_constructors/neo4j.py
@@ -0,0 +1,60 @@
+from typing import Dict, Tuple, Union
+
+from langchain_core.structured_query import (
+    Comparator,
+    Comparison,
+    Operation,
+    Operator,
+    StructuredQuery,
+    Visitor,
+)
+
+
+class Neo4jTranslator(Visitor):
+    """Translate `Neo4j` internal query language elements to valid filters."""
+
+    allowed_operators = [Operator.AND, Operator.OR]
+    """Subset of allowed logical operators."""
+
+    allowed_comparators = [
+        Comparator.EQ,
+        Comparator.NE,
+        Comparator.GTE,
+        Comparator.LTE,
+        Comparator.LT,
+        Comparator.GT,
+    ]
+
+    def _format_func(self, func: Union[Operator, Comparator]) -> str:
+        self._validate_func(func)
+        map_dict = {
+            Operator.AND: "$and",
+            Operator.OR: "$or",
+            Comparator.EQ: "$eq",
+            Comparator.NE: "$ne",
+            Comparator.GTE: "$gte",
+            Comparator.LTE: "$lte",
+            Comparator.LT: "$lt",
+            Comparator.GT: "$gt",
+        }
+        return map_dict[func]
+
+    def visit_operation(self, operation: Operation) -> Dict:
+        args = [arg.accept(self) for arg in operation.arguments]
+        return {self._format_func(operation.operator): args}
+
+    def visit_comparison(self, comparison: Comparison) -> Dict:
+        return {
+            comparison.attribute: {
+                self._format_func(comparison.comparator): comparison.value
+            }
+        }
+
+    def visit_structured_query(
+        self, structured_query: StructuredQuery
+    ) -> Tuple[str, dict]:
+        if structured_query.filter is None:
+            kwargs = {}
+        else:
+            kwargs = {"filter": structured_query.filter.accept(self)}
+        return structured_query.query, kwargs
--- a/libs/community/tests/unit_tests/query_constructors/test_neo4j.py
+++ b/libs/community/tests/unit_tests/query_constructors/test_neo4j.py
@@ -0,0 +1,90 @@
+from typing import Dict, Tuple
+
+from langchain_core.structured_query import (
+    Comparator,
+    Comparison,
+    Operation,
+    Operator,
+    StructuredQuery,
+)
+
+from langchain_community.query_constructors.neo4j import Neo4jTranslator
+
+DEFAULT_TRANSLATOR = Neo4jTranslator()
+
+
+def test_visit_comparison() -> None:
+    comp = Comparison(comparator=Comparator.LT, attribute="foo", value=["1", "2"])
+    expected = {"foo": {"$lt": ["1", "2"]}}
+    actual = DEFAULT_TRANSLATOR.visit_comparison(comp)
+    assert expected == actual
+
+
+def test_visit_operation() -> None:
+    op = Operation(
+        operator=Operator.AND,
+        arguments=[
+            Comparison(comparator=Comparator.LT, attribute="foo", value=2),
+            Comparison(comparator=Comparator.EQ, attribute="bar", value="baz"),
+            Comparison(comparator=Comparator.LT, attribute="abc", value=["1", "2"]),
+        ],
+    )
+    expected = {
+        "$and": [
+            {"foo": {"$lt": 2}},
+            {"bar": {"$eq": "baz"}},
+            {"abc": {"$lt": ["1", "2"]}},
+        ]
+    }
+    actual = DEFAULT_TRANSLATOR.visit_operation(op)
+    assert expected == actual
+
+
+def test_visit_structured_query() -> None:
+    query = "What is the capital of France?"
+    structured_query = StructuredQuery(
+        query=query,
+        filter=None,
+    )
+    expected: Tuple[str, Dict] = (query, {})
+    actual = DEFAULT_TRANSLATOR.visit_structured_query(structured_query)
+    assert expected == actual
+
+    comp = Comparison(comparator=Comparator.LT, attribute="foo", value=["1", "2"])
+    expected = (
+        query,
+        {"filter": {"foo": {"$lt": ["1", "2"]}}},
+    )
+    structured_query = StructuredQuery(
+        query=query,
+        filter=comp,
+    )
+    actual = DEFAULT_TRANSLATOR.visit_structured_query(structured_query)
+    assert expected == actual
+
+    op = Operation(
+        operator=Operator.AND,
+        arguments=[
+            Comparison(comparator=Comparator.LT, attribute="foo", value=2),
+            Comparison(comparator=Comparator.EQ, attribute="bar", value="baz"),
+            Comparison(comparator=Comparator.LT, attribute="abc", value=["1", "2"]),
+        ],
+    )
+    structured_query = StructuredQuery(
+        query=query,
+        filter=op,
+    )
+    expected = (
+        query,
+        {
+            "filter": {
+                "$and": [
+                    {"foo": {"$lt": 2}},
+                    {"bar": {"$eq": "baz"}},
+                    {"abc": {"$lt": ["1", "2"]}},
+                ]
+            }
+        },
+    )
+    actual = DEFAULT_TRANSLATOR.visit_structured_query(structured_query)
+    assert expected == actual
--- a/libs/langchain/langchain/retrievers/self_query/base.py
+++ b/libs/langchain/langchain/retrievers/self_query/base.py
@@ -48,6 +48,7 @@ def _get_builtin_translator(vectorstore: VectorStore) -> Visitor:
        MongoDBAtlasTranslator,
    )
    from langchain_community.query_constructors.myscale import MyScaleTranslator
+    from langchain_community.query_constructors.neo4j import Neo4jTranslator
    from langchain_community.query_constructors.opensearch import OpenSearchTranslator
    from langchain_community.query_constructors.pgvector import PGVectorTranslator
    from langchain_community.query_constructors.pinecone import PineconeTranslator
@@ -70,6 +71,7 @@ def _get_builtin_translator(vectorstore: VectorStore) -> Visitor:
        Dingo,
        Milvus,
        MyScale,
+        Neo4jVector,
        OpenSearchVectorSearch,
        PGVector,
        Qdrant,
@@ -111,6 +113,7 @@ def _get_builtin_translator(vectorstore: VectorStore) -> Visitor:
        TimescaleVector: TimescaleVectorTranslator,
        OpenSearchVectorSearch: OpenSearchTranslator,
        CommunityMongoDBAtlasVectorSearch: MongoDBAtlasTranslator,
+        Neo4jVector: Neo4jTranslator,
    }
    if isinstance(vectorstore, DatabricksVectorSearch):
        return DatabricksVectorSearchTranslator()