docs: contributed Graph RAG Retriever integration (#29744)

**Description:** This adds the `Graph RAG` Retriever integration documentation, per https://python.langchain.com/docs/contributing/how_to/integrations/. * The integration exists in this public repository: https://github.com/datastax/graph-rag * We've implemented the standard langchain tests for retrievers: https://github.com/datastax/graph-rag/blob/main/packages/langchain-graph-retriever/tests/test_langchain.py * Our integration is published to PyPi: https://pypi.org/project/langchain-graph-retriever/
2025-08-09 21:08:59 +00:00 · 2025-02-13 03:25:48 +01:00 · 2025-02-13 03:25:48 +01:00 · 716fd89d8e
commit 716fd89d8e
parent f42dafa809
3 changed files with 407 additions and 0 deletions
--- a/docs/docs/integrations/providers/graph_rag.mdx
+++ b/docs/docs/integrations/providers/graph_rag.mdx
@ -0,0 +1,22 @@
 # Graph RAG
 ## Overview
 [Graph RAG](https://datastax.github.io/graph-rag/) provides a retriever interface
 that combines **unstructured** similarity search on vectors with **structured**
 traversal of metadata properties. This enables graph-based retrieval over **existing**
 vector stores.
 ## Installation and setup
 ```bash
 pip install langchain-graph-retriever
 ```
 ## Retrievers
 ```python
 from langchain_graph_retriever import GraphRetriever
 ```
 For more information, see the [Graph RAG Integration Guide](/docs/integrations/retrievers/graph_rag).
--- a/docs/docs/integrations/retrievers/graph_rag.mdx
+++ b/docs/docs/integrations/retrievers/graph_rag.mdx
@ -0,0 +1,379 @@
 ---
 sidebar_label: Graph RAG
 description: Graph traversal over any Vector Store using document metadata.
 ---
 import ChatModelTabs from "@theme/ChatModelTabs";
 import EmbeddingTabs from "@theme/EmbeddingTabs";
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 # Graph RAG
 This guide provides an introduction to Graph RAG. For detailed documentation of all
 supported features and configurations, refer to the
 [Graph RAG Project Page](https://datastax.github.io/graph-rag/).
 ## Overview
 The `GraphRetriever` from the `langchain-graph-retriever` package provides a LangChain
 [retriever](/docs/concepts/retrievers/) that combines **unstructured** similarity search
 on vectors with **structured** traversal of metadata properties. This enables graph-based
 retrieval over an **existing** vector store.
 ### Integration details
 | Retriever | Source | PyPI Package | Latest | Project Page |
 | :--- | :--- | :---: | :---: | :---: |
 | GraphRetriever | [github.com/datastax/graph-rag](https://github.com/datastax/graph-rag/tree/main/packages/langchain-graph-retriever) | [langchain-graph-retriever](https://pypi.org/project/langchain-graph-retriever/) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-graph-retriever?style=flat-square&label=%20&color=orange) | [Graph RAG](https://datastax.github.io/graph-rag/) |
 ## Benefits
 * [**Link based on existing metadata:**](https://datastax.github.io/graph-rag/get-started/)
  Use existing metadata fields without additional processing. Retrieve more from an
  existing vector store!
 * [**Change links on demand:**](https://datastax.github.io/graph-rag/get-started/edges/)
  Edges can be specified on-the-fly, allowing different relationships to be traversed
  based on the question.
 * [**Pluggable Traversal Strategies:**](https://datastax.github.io/graph-rag/get-started/strategies/)
  Use built-in traversal strategies like Eager or MMR, or define custom logic to select
  which nodes to explore.
 * [**Broad compatibility:**](https://datastax.github.io/graph-rag/get-started/adapters/)
  Adapters are available for a variety of vector stores with support for additional
  stores easily added.
 ## Setup
 ### Installation
 This retriever lives in the `langchain-graph-retriever` package.
 ```bash
 pip install -qU langchain-graph-retriever
 ```
 ## Instantiation
 The following examples will show how to perform graph traversal over some sample
 Documents about animals.
 ### Prerequisites
 <details>
  <summary>Toggle for Details</summary>
  <div>
    1. Ensure you have Python 3.10+ installed
    1. Install the following package that provides sample data.
        ```bash
        pip install -qU graph_rag_example_helpers
        ```
    1. Download the test documents:
        ```python
        from graph_rag_example_helpers.datasets.animals import fetch_documents
        animals = fetch_documents()
        ```
    1. <EmbeddingTabs/>
  </div>
 </details>
 ### Populating the Vector store
 This section shows how to populate a variety of vector stores with the sample data.
 For help on choosing one of the vector stores below, or to add support for your
 vector store, consult the documentation about
 [Adapters and Supported Stores](https://datastax.github.io/graph-rag/guide/adapters/).
 <Tabs groupId="vector-store" queryString>
  <TabItem value="astra-db" label="AstraDB" default>
    <div style={{ paddingLeft: '30px' }}>
      Install the `langchain-graph-retriever` package with the `astra` extra:
      ```bash
      pip install "langchain-graph-retriever[astra]"
      ```
      Then create a vector store and load the test documents:
      ```python
      from langchain_astradb import AstraDBVectorStore
      vector_store = AstraDBVectorStore.from_documents(
          documents=animals,
          embedding=embeddings,
          collection_name="animals",
          api_endpoint=ASTRA_DB_API_ENDPOINT,
          token=ASTRA_DB_APPLICATION_TOKEN,
      )
      ```
      For the `ASTRA_DB_API_ENDPOINT` and `ASTRA_DB_APPLICATION_TOKEN` credentials,
      consult the [AstraDB Vector Store Guide](/docs/integrations/vectorstores/astradb).
      :::note
      For faster initial testing, consider using the **InMemory** Vector Store.
      :::
    </div>
  </TabItem>
  <TabItem value="cassandra" label="Apache Cassandra">
    <div style={{ paddingLeft: '30px' }}>
      Install the `langchain-graph-retriever` package with the `cassandra` extra:
      ```bash
      pip install "langchain-graph-retriever[cassandra]"
      ```
      Then create a vector store and load the test documents:
      ```python
      from langchain_community.vectorstores.cassandra import Cassandra
      from langchain_graph_retriever.transformers import ShreddingTransformer
      vector_store = Cassandra.from_documents(
          documents=list(ShreddingTransformer().transform_documents(animals)),
          embedding=embeddings,
          table_name="animals",
      )
      ```
      For help creating a Cassandra connection, consult the
      [Apache Cassandra Vector Store Guide](/docs/integrations/vectorstores/cassandra#connection-parameters)
      :::note
      Apache Cassandra doesn't support searching in nested metadata. Because of this
      it is necessary to use the [`ShreddingTransformer`](https://datastax.github.io/graph-rag/reference/langchain_graph_retriever/transformers/#langchain_graph_retriever.transformers.shredding.ShreddingTransformer)
      when inserting documents.
      :::
    </div>
  </TabItem>
  <TabItem value="opensearch" label="OpenSearch">
    <div style={{ paddingLeft: '30px' }}>
      Install the `langchain-graph-retriever` package with the `opensearch` extra:
      ```bash
      pip install "langchain-graph-retriever[opensearch]"
      ```
      Then create a vector store and load the test documents:
      ```python
      from langchain_community.vectorstores import OpenSearchVectorSearch
      vector_store = OpenSearchVectorSearch.from_documents(
          documents=animals,
          embedding=embeddings,
          engine="faiss",
          index_name="animals",
          opensearch_url=OPEN_SEARCH_URL,
          bulk_size=500,
      )
      ```
      For help creating an OpenSearch connection, consult the
      [OpenSearch Vector Store Guide](/docs/integrations/vectorstores/opensearch).
    </div>
  </TabItem>
  <TabItem value="chroma" label="Chroma">
    <div style={{ paddingLeft: '30px' }}>
      Install the `langchain-graph-retriever` package with the `chroma` extra:
      ```bash
      pip install "langchain-graph-retriever[chroma]"
      ```
      Then create a vector store and load the test documents:
      ```python
      from langchain_chroma.vectorstores import Chroma
      from langchain_graph_retriever.transformers import ShreddingTransformer
      vector_store = Chroma.from_documents(
          documents=list(ShreddingTransformer().transform_documents(animals)),
          embedding=embeddings,
          collection_name="animals",
      )
      ```
      For help creating an Chroma connection, consult the
      [Chroma Vector Store Guide](/docs/integrations/vectorstores/chroma).
      :::note
      Chroma doesn't support searching in nested metadata. Because of this
      it is necessary to use the [`ShreddingTransformer`](https://datastax.github.io/graph-rag/reference/langchain_graph_retriever/transformers/#langchain_graph_retriever.transformers.shredding.ShreddingTransformer)
      when inserting documents.
      :::
    </div>
  </TabItem>
  <TabItem value="in-memory" label="InMemory" default>
    <div style={{ paddingLeft: '30px' }}>
      Install the `langchain-graph-retriever` package:
      ```bash
      pip install "langchain-graph-retriever"
      ```
      Then create a vector store and load the test documents:
      ```python
      from langchain_core.vectorstores import InMemoryVectorStore
      vector_store = InMemoryVectorStore.from_documents(
          documents=animals,
          embedding=embeddings,
      )
      ```
      :::tip
      Using the `InMemoryVectorStore` is the fastest way to get started with Graph RAG
      but it isn't recommended for production use. Instead it is recommended to use
      **AstraDB** or **OpenSearch**.
      :::
    </div>
  </TabItem>
 </Tabs>
 ### Graph Traversal
 This graph retriever starts with a single animal that best matches the query, then
 traverses to other animals sharing the same `habitat` and/or `origin`.
  ```python
  from graph_retriever.strategies import Eager
  from langchain_graph_retriever import GraphRetriever
  traversal_retriever = GraphRetriever(
      store = vector_store,
      edges = [("habitat", "habitat"), ("origin", "origin")],
      strategy = Eager(k=5, start_k=1, max_depth=2),
  )
  ```
 The above creates a graph traversing retriever that starts with the nearest
 animal (`start_k=1`), retrieves 5 documents (`k=5`) and limits the search to documents
 that are at most 2 steps away from the first animal (`max_depth=2`).
 The `edges` define how metadata values can be used for traversal. In this case, every
 animal is connected to other animals with the same `habitat` and/or `origin`.
 ```python
 results = traversal_retriever.invoke("what animals could be found near a capybara?")
 for doc in results:
    print(f"{doc.id}: {doc.page_content}")
 ```
 ```output
 capybara: capybaras are the largest rodents in the world and are highly social animals.
 heron: herons are wading birds known for their long legs and necks, often seen near water.
 crocodile: crocodiles are large reptiles with powerful jaws and a long lifespan, often living over 70 years.
 frog: frogs are amphibians known for their jumping ability and croaking sounds.
 duck: ducks are waterfowl birds known for their webbed feet and quacking sounds.
 ```
 Graph traversal improves retrieval quality by leveraging structured relationships in
 the data. Unlike standard similarity search (see below), it provides a clear,
 explainable rationale for why documents are selected.
 In this case, the documents `capybara`, `heron`, `frog`, `crocodile`, and `newt` all
 share the same `habitat=wetlands`, as defined by their metadata. This should increase
 Document Relevance and the quality of the answer from the LLM.
 ### Comparison to Standard Retrieval
 When `max_depth=0`, the graph traversing retriever behaves like a standard retriever:
 ```python
 standard_retriever = GraphRetriever(
    store = vector_store,
    edges = [("habitat", "habitat"), ("origin", "origin")],
    strategy = Eager(k=5, start_k=5, max_depth=0),
 )
 ```
 This creates a retriever that starts with the nearest 5 animals (`start_k=5`),
 and returns them without any traversal (`max_depth=0`). The edge definitions
 are ignored in this case.
 This is essentially the same as:
 ```python
 standard_retriever = vector_store.as_retriever(search_kwargs={"k":5})
 ```
 For either case, invoking the retriever returns:
 ```python
 results = standard_retriever.invoke("what animals could be found near a capybara?")
 for doc in results:
    print(f"{doc.id}: {doc.page_content}")
 ```
 ```output
 capybara: capybaras are the largest rodents in the world and are highly social animals.
 iguana: iguanas are large herbivorous lizards often found basking in trees and near water.
 guinea pig: guinea pigs are small rodents often kept as pets due to their gentle and social nature.
 hippopotamus: hippopotamuses are large semi-aquatic mammals known for their massive size and territorial behavior.
 boar: boars are wild relatives of pigs, known for their tough hides and tusks.
 ```
 These documents are joined based on similarity alone. Any structural data that existed
 in the store is ignored. As compared to graph retrieval, this can decrease Document
 Relevance because the returned results have a lower chance of being helpful to answer
 the query.
 ## Usage
 Following the examples above, `.invoke` is used to initiate retrieval on a query.
 ## Use within a chain
 Like other retrievers, `GraphRetriever` can be incorporated into LLM applications
 via [chains](/docs/how_to/sequence/).
 <ChatModelTabs customVarName="llm" />
 ```python
 from langchain_core.output_parsers import StrOutputParser
 from langchain_core.prompts import ChatPromptTemplate
 from langchain_core.runnables import RunnablePassthrough
 prompt = ChatPromptTemplate.from_template(
 """Answer the question based only on the context provided.
 Context: {context}
 Question: {question}"""
 )
 def format_docs(docs):
    return "\n\n".join(f"text: {doc.page_content} metadata: {doc.metadata}" for doc in docs)
 chain = (
    {"context": traversal_retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
 )
 ```
 ```python
 chain.invoke("what animals could be found near a capybara?")
 ```
 ```output
 Animals that could be found near a capybara include herons, crocodiles, frogs,
 and ducks, as they all inhabit wetlands.
 ```
 ## API reference
 To explore all available parameters and advanced configurations, refer to the
 [Graph RAG API reference](https://datastax.github.io/graph-rag/reference/).
--- a/libs/packages.yml
+++ b/libs/packages.yml
@ -394,3 +394,9 @@ packages:
  repo: lunary-ai/langchain-abso
  path: .
  downloads: 0
 - name: langchain-graph-retriever
  name_title: 'Graph RAG'
  repo: datastax/graph-rag
  path: packages/langchain-graph-retriever
  downloads: 0
  provider_page: graph_rag