mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-19 21:33:51 +00:00
docs: contributed Graph RAG
Retriever integration (#29744)
**Description:** This adds the `Graph RAG` Retriever integration documentation, per https://python.langchain.com/docs/contributing/how_to/integrations/. * The integration exists in this public repository: https://github.com/datastax/graph-rag * We've implemented the standard langchain tests for retrievers: https://github.com/datastax/graph-rag/blob/main/packages/langchain-graph-retriever/tests/test_langchain.py * Our integration is published to PyPi: https://pypi.org/project/langchain-graph-retriever/
This commit is contained in:
parent
f42dafa809
commit
716fd89d8e
22
docs/docs/integrations/providers/graph_rag.mdx
Normal file
22
docs/docs/integrations/providers/graph_rag.mdx
Normal file
@ -0,0 +1,22 @@
|
||||
# Graph RAG
|
||||
|
||||
## Overview
|
||||
|
||||
[Graph RAG](https://datastax.github.io/graph-rag/) provides a retriever interface
|
||||
that combines **unstructured** similarity search on vectors with **structured**
|
||||
traversal of metadata properties. This enables graph-based retrieval over **existing**
|
||||
vector stores.
|
||||
|
||||
## Installation and setup
|
||||
|
||||
```bash
|
||||
pip install langchain-graph-retriever
|
||||
```
|
||||
|
||||
## Retrievers
|
||||
|
||||
```python
|
||||
from langchain_graph_retriever import GraphRetriever
|
||||
```
|
||||
|
||||
For more information, see the [Graph RAG Integration Guide](/docs/integrations/retrievers/graph_rag).
|
379
docs/docs/integrations/retrievers/graph_rag.mdx
Normal file
379
docs/docs/integrations/retrievers/graph_rag.mdx
Normal file
@ -0,0 +1,379 @@
|
||||
---
|
||||
sidebar_label: Graph RAG
|
||||
description: Graph traversal over any Vector Store using document metadata.
|
||||
---
|
||||
|
||||
import ChatModelTabs from "@theme/ChatModelTabs";
|
||||
import EmbeddingTabs from "@theme/EmbeddingTabs";
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
|
||||
# Graph RAG
|
||||
|
||||
This guide provides an introduction to Graph RAG. For detailed documentation of all
|
||||
supported features and configurations, refer to the
|
||||
[Graph RAG Project Page](https://datastax.github.io/graph-rag/).
|
||||
|
||||
## Overview
|
||||
|
||||
The `GraphRetriever` from the `langchain-graph-retriever` package provides a LangChain
|
||||
[retriever](/docs/concepts/retrievers/) that combines **unstructured** similarity search
|
||||
on vectors with **structured** traversal of metadata properties. This enables graph-based
|
||||
retrieval over an **existing** vector store.
|
||||
|
||||
### Integration details
|
||||
|
||||
| Retriever | Source | PyPI Package | Latest | Project Page |
|
||||
| :--- | :--- | :---: | :---: | :---: |
|
||||
| GraphRetriever | [github.com/datastax/graph-rag](https://github.com/datastax/graph-rag/tree/main/packages/langchain-graph-retriever) | [langchain-graph-retriever](https://pypi.org/project/langchain-graph-retriever/) |  | [Graph RAG](https://datastax.github.io/graph-rag/) |
|
||||
|
||||
|
||||
## Benefits
|
||||
|
||||
* [**Link based on existing metadata:**](https://datastax.github.io/graph-rag/get-started/)
|
||||
Use existing metadata fields without additional processing. Retrieve more from an
|
||||
existing vector store!
|
||||
|
||||
* [**Change links on demand:**](https://datastax.github.io/graph-rag/get-started/edges/)
|
||||
Edges can be specified on-the-fly, allowing different relationships to be traversed
|
||||
based on the question.
|
||||
|
||||
|
||||
* [**Pluggable Traversal Strategies:**](https://datastax.github.io/graph-rag/get-started/strategies/)
|
||||
Use built-in traversal strategies like Eager or MMR, or define custom logic to select
|
||||
which nodes to explore.
|
||||
|
||||
* [**Broad compatibility:**](https://datastax.github.io/graph-rag/get-started/adapters/)
|
||||
Adapters are available for a variety of vector stores with support for additional
|
||||
stores easily added.
|
||||
|
||||
## Setup
|
||||
|
||||
### Installation
|
||||
|
||||
This retriever lives in the `langchain-graph-retriever` package.
|
||||
|
||||
```bash
|
||||
pip install -qU langchain-graph-retriever
|
||||
```
|
||||
## Instantiation
|
||||
|
||||
The following examples will show how to perform graph traversal over some sample
|
||||
Documents about animals.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
<details>
|
||||
<summary>Toggle for Details</summary>
|
||||
<div>
|
||||
1. Ensure you have Python 3.10+ installed
|
||||
|
||||
1. Install the following package that provides sample data.
|
||||
```bash
|
||||
pip install -qU graph_rag_example_helpers
|
||||
```
|
||||
|
||||
1. Download the test documents:
|
||||
```python
|
||||
from graph_rag_example_helpers.datasets.animals import fetch_documents
|
||||
animals = fetch_documents()
|
||||
```
|
||||
|
||||
1. <EmbeddingTabs/>
|
||||
</div>
|
||||
</details>
|
||||
|
||||
### Populating the Vector store
|
||||
|
||||
This section shows how to populate a variety of vector stores with the sample data.
|
||||
|
||||
For help on choosing one of the vector stores below, or to add support for your
|
||||
vector store, consult the documentation about
|
||||
[Adapters and Supported Stores](https://datastax.github.io/graph-rag/guide/adapters/).
|
||||
|
||||
<Tabs groupId="vector-store" queryString>
|
||||
<TabItem value="astra-db" label="AstraDB" default>
|
||||
<div style={{ paddingLeft: '30px' }}>
|
||||
Install the `langchain-graph-retriever` package with the `astra` extra:
|
||||
|
||||
```bash
|
||||
pip install "langchain-graph-retriever[astra]"
|
||||
```
|
||||
|
||||
Then create a vector store and load the test documents:
|
||||
|
||||
```python
|
||||
from langchain_astradb import AstraDBVectorStore
|
||||
|
||||
vector_store = AstraDBVectorStore.from_documents(
|
||||
documents=animals,
|
||||
embedding=embeddings,
|
||||
collection_name="animals",
|
||||
api_endpoint=ASTRA_DB_API_ENDPOINT,
|
||||
token=ASTRA_DB_APPLICATION_TOKEN,
|
||||
)
|
||||
```
|
||||
For the `ASTRA_DB_API_ENDPOINT` and `ASTRA_DB_APPLICATION_TOKEN` credentials,
|
||||
consult the [AstraDB Vector Store Guide](/docs/integrations/vectorstores/astradb).
|
||||
|
||||
:::note
|
||||
For faster initial testing, consider using the **InMemory** Vector Store.
|
||||
:::
|
||||
</div>
|
||||
</TabItem>
|
||||
<TabItem value="cassandra" label="Apache Cassandra">
|
||||
<div style={{ paddingLeft: '30px' }}>
|
||||
Install the `langchain-graph-retriever` package with the `cassandra` extra:
|
||||
|
||||
```bash
|
||||
pip install "langchain-graph-retriever[cassandra]"
|
||||
```
|
||||
|
||||
Then create a vector store and load the test documents:
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores.cassandra import Cassandra
|
||||
from langchain_graph_retriever.transformers import ShreddingTransformer
|
||||
|
||||
vector_store = Cassandra.from_documents(
|
||||
documents=list(ShreddingTransformer().transform_documents(animals)),
|
||||
embedding=embeddings,
|
||||
table_name="animals",
|
||||
)
|
||||
```
|
||||
|
||||
For help creating a Cassandra connection, consult the
|
||||
[Apache Cassandra Vector Store Guide](/docs/integrations/vectorstores/cassandra#connection-parameters)
|
||||
|
||||
:::note
|
||||
Apache Cassandra doesn't support searching in nested metadata. Because of this
|
||||
it is necessary to use the [`ShreddingTransformer`](https://datastax.github.io/graph-rag/reference/langchain_graph_retriever/transformers/#langchain_graph_retriever.transformers.shredding.ShreddingTransformer)
|
||||
when inserting documents.
|
||||
:::
|
||||
</div>
|
||||
</TabItem>
|
||||
<TabItem value="opensearch" label="OpenSearch">
|
||||
<div style={{ paddingLeft: '30px' }}>
|
||||
Install the `langchain-graph-retriever` package with the `opensearch` extra:
|
||||
|
||||
```bash
|
||||
pip install "langchain-graph-retriever[opensearch]"
|
||||
```
|
||||
|
||||
Then create a vector store and load the test documents:
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores import OpenSearchVectorSearch
|
||||
|
||||
vector_store = OpenSearchVectorSearch.from_documents(
|
||||
documents=animals,
|
||||
embedding=embeddings,
|
||||
engine="faiss",
|
||||
index_name="animals",
|
||||
opensearch_url=OPEN_SEARCH_URL,
|
||||
bulk_size=500,
|
||||
)
|
||||
```
|
||||
|
||||
For help creating an OpenSearch connection, consult the
|
||||
[OpenSearch Vector Store Guide](/docs/integrations/vectorstores/opensearch).
|
||||
</div>
|
||||
</TabItem>
|
||||
<TabItem value="chroma" label="Chroma">
|
||||
<div style={{ paddingLeft: '30px' }}>
|
||||
Install the `langchain-graph-retriever` package with the `chroma` extra:
|
||||
|
||||
```bash
|
||||
pip install "langchain-graph-retriever[chroma]"
|
||||
```
|
||||
|
||||
Then create a vector store and load the test documents:
|
||||
|
||||
```python
|
||||
from langchain_chroma.vectorstores import Chroma
|
||||
from langchain_graph_retriever.transformers import ShreddingTransformer
|
||||
|
||||
vector_store = Chroma.from_documents(
|
||||
documents=list(ShreddingTransformer().transform_documents(animals)),
|
||||
embedding=embeddings,
|
||||
collection_name="animals",
|
||||
)
|
||||
```
|
||||
|
||||
For help creating an Chroma connection, consult the
|
||||
[Chroma Vector Store Guide](/docs/integrations/vectorstores/chroma).
|
||||
|
||||
:::note
|
||||
Chroma doesn't support searching in nested metadata. Because of this
|
||||
it is necessary to use the [`ShreddingTransformer`](https://datastax.github.io/graph-rag/reference/langchain_graph_retriever/transformers/#langchain_graph_retriever.transformers.shredding.ShreddingTransformer)
|
||||
when inserting documents.
|
||||
:::
|
||||
</div>
|
||||
</TabItem>
|
||||
<TabItem value="in-memory" label="InMemory" default>
|
||||
<div style={{ paddingLeft: '30px' }}>
|
||||
Install the `langchain-graph-retriever` package:
|
||||
|
||||
```bash
|
||||
pip install "langchain-graph-retriever"
|
||||
```
|
||||
|
||||
Then create a vector store and load the test documents:
|
||||
|
||||
```python
|
||||
from langchain_core.vectorstores import InMemoryVectorStore
|
||||
|
||||
vector_store = InMemoryVectorStore.from_documents(
|
||||
documents=animals,
|
||||
embedding=embeddings,
|
||||
)
|
||||
```
|
||||
|
||||
:::tip
|
||||
Using the `InMemoryVectorStore` is the fastest way to get started with Graph RAG
|
||||
but it isn't recommended for production use. Instead it is recommended to use
|
||||
**AstraDB** or **OpenSearch**.
|
||||
:::
|
||||
</div>
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
### Graph Traversal
|
||||
|
||||
This graph retriever starts with a single animal that best matches the query, then
|
||||
traverses to other animals sharing the same `habitat` and/or `origin`.
|
||||
|
||||
```python
|
||||
from graph_retriever.strategies import Eager
|
||||
from langchain_graph_retriever import GraphRetriever
|
||||
|
||||
traversal_retriever = GraphRetriever(
|
||||
store = vector_store,
|
||||
edges = [("habitat", "habitat"), ("origin", "origin")],
|
||||
strategy = Eager(k=5, start_k=1, max_depth=2),
|
||||
)
|
||||
```
|
||||
|
||||
The above creates a graph traversing retriever that starts with the nearest
|
||||
animal (`start_k=1`), retrieves 5 documents (`k=5`) and limits the search to documents
|
||||
that are at most 2 steps away from the first animal (`max_depth=2`).
|
||||
|
||||
The `edges` define how metadata values can be used for traversal. In this case, every
|
||||
animal is connected to other animals with the same `habitat` and/or `origin`.
|
||||
|
||||
```python
|
||||
results = traversal_retriever.invoke("what animals could be found near a capybara?")
|
||||
|
||||
for doc in results:
|
||||
print(f"{doc.id}: {doc.page_content}")
|
||||
```
|
||||
|
||||
```output
|
||||
capybara: capybaras are the largest rodents in the world and are highly social animals.
|
||||
heron: herons are wading birds known for their long legs and necks, often seen near water.
|
||||
crocodile: crocodiles are large reptiles with powerful jaws and a long lifespan, often living over 70 years.
|
||||
frog: frogs are amphibians known for their jumping ability and croaking sounds.
|
||||
duck: ducks are waterfowl birds known for their webbed feet and quacking sounds.
|
||||
```
|
||||
|
||||
Graph traversal improves retrieval quality by leveraging structured relationships in
|
||||
the data. Unlike standard similarity search (see below), it provides a clear,
|
||||
explainable rationale for why documents are selected.
|
||||
|
||||
In this case, the documents `capybara`, `heron`, `frog`, `crocodile`, and `newt` all
|
||||
share the same `habitat=wetlands`, as defined by their metadata. This should increase
|
||||
Document Relevance and the quality of the answer from the LLM.
|
||||
|
||||
### Comparison to Standard Retrieval
|
||||
|
||||
When `max_depth=0`, the graph traversing retriever behaves like a standard retriever:
|
||||
|
||||
```python
|
||||
standard_retriever = GraphRetriever(
|
||||
store = vector_store,
|
||||
edges = [("habitat", "habitat"), ("origin", "origin")],
|
||||
strategy = Eager(k=5, start_k=5, max_depth=0),
|
||||
)
|
||||
```
|
||||
|
||||
This creates a retriever that starts with the nearest 5 animals (`start_k=5`),
|
||||
and returns them without any traversal (`max_depth=0`). The edge definitions
|
||||
are ignored in this case.
|
||||
|
||||
This is essentially the same as:
|
||||
|
||||
```python
|
||||
standard_retriever = vector_store.as_retriever(search_kwargs={"k":5})
|
||||
```
|
||||
|
||||
For either case, invoking the retriever returns:
|
||||
|
||||
```python
|
||||
results = standard_retriever.invoke("what animals could be found near a capybara?")
|
||||
|
||||
for doc in results:
|
||||
print(f"{doc.id}: {doc.page_content}")
|
||||
```
|
||||
|
||||
```output
|
||||
capybara: capybaras are the largest rodents in the world and are highly social animals.
|
||||
iguana: iguanas are large herbivorous lizards often found basking in trees and near water.
|
||||
guinea pig: guinea pigs are small rodents often kept as pets due to their gentle and social nature.
|
||||
hippopotamus: hippopotamuses are large semi-aquatic mammals known for their massive size and territorial behavior.
|
||||
boar: boars are wild relatives of pigs, known for their tough hides and tusks.
|
||||
```
|
||||
|
||||
These documents are joined based on similarity alone. Any structural data that existed
|
||||
in the store is ignored. As compared to graph retrieval, this can decrease Document
|
||||
Relevance because the returned results have a lower chance of being helpful to answer
|
||||
the query.
|
||||
|
||||
## Usage
|
||||
|
||||
Following the examples above, `.invoke` is used to initiate retrieval on a query.
|
||||
|
||||
## Use within a chain
|
||||
|
||||
Like other retrievers, `GraphRetriever` can be incorporated into LLM applications
|
||||
via [chains](/docs/how_to/sequence/).
|
||||
|
||||
<ChatModelTabs customVarName="llm" />
|
||||
|
||||
```python
|
||||
from langchain_core.output_parsers import StrOutputParser
|
||||
from langchain_core.prompts import ChatPromptTemplate
|
||||
from langchain_core.runnables import RunnablePassthrough
|
||||
|
||||
prompt = ChatPromptTemplate.from_template(
|
||||
"""Answer the question based only on the context provided.
|
||||
|
||||
Context: {context}
|
||||
|
||||
Question: {question}"""
|
||||
)
|
||||
|
||||
def format_docs(docs):
|
||||
return "\n\n".join(f"text: {doc.page_content} metadata: {doc.metadata}" for doc in docs)
|
||||
|
||||
chain = (
|
||||
{"context": traversal_retriever | format_docs, "question": RunnablePassthrough()}
|
||||
| prompt
|
||||
| llm
|
||||
| StrOutputParser()
|
||||
)
|
||||
```
|
||||
|
||||
```python
|
||||
chain.invoke("what animals could be found near a capybara?")
|
||||
```
|
||||
|
||||
```output
|
||||
Animals that could be found near a capybara include herons, crocodiles, frogs,
|
||||
and ducks, as they all inhabit wetlands.
|
||||
```
|
||||
|
||||
## API reference
|
||||
|
||||
To explore all available parameters and advanced configurations, refer to the
|
||||
[Graph RAG API reference](https://datastax.github.io/graph-rag/reference/).
|
@ -394,3 +394,9 @@ packages:
|
||||
repo: lunary-ai/langchain-abso
|
||||
path: .
|
||||
downloads: 0
|
||||
- name: langchain-graph-retriever
|
||||
name_title: 'Graph RAG'
|
||||
repo: datastax/graph-rag
|
||||
path: packages/langchain-graph-retriever
|
||||
downloads: 0
|
||||
provider_page: graph_rag
|
||||
|
Loading…
Reference in New Issue
Block a user