templates: Add NVIDIA Canonical RAG example chain (#15758)

- **Description:** Adds a RAG template that uses NVIDIA AI playground and embedding models, along with Milvus vector store - **Dependencies:** This template depends on the AI playground service in NVIDIA NGC. API keys with a significant trial compute are available (10k queries at the time of writing). This template also depends on the Milvus Vector store which is publicly available. Note: [A quick link to get a key](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/codellama-13b/api) when you have an NGC account. Generate Key button at the top right of the code window. --------- Co-authored-by: Sagar B Manjunath <sbogadimanju@nvidia.com> Co-authored-by: Erick Friis <erick@langchain.dev>
2025-07-06 13:18:12 +00:00 · 2024-01-11 08:09:16 +05:30 · 2024-01-11 08:09:16 +05:30 · e6240fecab
commit e6240fecab
parent 38523d7c57
8 changed files with 2562 additions and 0 deletions
--- a/templates/nvidia-rag-canonical/LICENSE
+++ b/templates/nvidia-rag-canonical/LICENSE
@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2023 LangChain, Inc.
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/templates/nvidia-rag-canonical/README.md
+++ b/templates/nvidia-rag-canonical/README.md
@ -0,0 +1,121 @@
 # nvidia-rag-canonical
 This template performs RAG using Milvus Vector Store and NVIDIA Models (Embedding and Chat).
 ## Environment Setup
 You should export your NVIDIA API Key as an environment variable.
 If you do not have an NVIDIA API Key, you can create one by following these steps:
 1. Create a free account with the [NVIDIA GPU Cloud](https://catalog.ngc.nvidia.com/) service, which hosts AI solution catalogs, containers, models, etc.
 2. Navigate to `Catalog > AI Foundation Models > (Model with API endpoint)`.
 3. Select the `API` option and click `Generate Key`.
 4. Save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.
 ```shell
 export NVIDIA_API_KEY=...
 ```
 For instructions on hosting the Milvus Vector Store, refer to the section at the bottom.
 ## Usage
 To use this package, you should first have the LangChain CLI installed:
 ```shell
 pip install -U langchain-cli
 ```
 To use the NVIDIA models, install the Langchain NVIDIA AI Endpoints package:
 ```shell
 pip install -U langchain_nvidia_aiplay
 ```
 To create a new LangChain project and install this as the only package, you can do:
 ```shell
 langchain app new my-app --package nvidia-rag-canonical
 ```
 If you want to add this to an existing project, you can just run:
 ```shell
 langchain app add nvidia-rag-canonical
 ```
 And add the following code to your `server.py` file:
 ```python
 from nvidia_rag_canonical import chain as rag_nvidia_chain
 add_routes(app, rag_nvidia_chain, path="/nvidia-rag")
 ```
 If you want to set up an ingestion pipeline, you can add the following code to your `server.py` file:
 ```python
 from rag_nvidia_canonical import ingest as rag_nvidia_ingest
 add_routes(app, rag_nvidia_ingest, path="/nvidia-rag-ingest")
 ```
 Note that for files ingested by the ingestion API, the server will need to be restarted for the newly ingested files to be accessible by the retriever.
 (Optional) Let's now configure LangSmith.
 LangSmith will help us trace, monitor and debug LangChain applications.
 LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/).
 If you don't have access, you can skip this section
 ```shell
 export LANGCHAIN_TRACING_V2=true
 export LANGCHAIN_API_KEY=<your-api-key>
 export LANGCHAIN_PROJECT=<your-project>  # if not specified, defaults to "default"
 ```
 If you DO NOT already have a Milvus Vector Store you want to connect to, see `Milvus Setup` section below before proceeding.
 If you DO have a Milvus Vector Store you want to connect to, edit the connection details in `nvidia_rag_canonical/chain.py`
 If you are inside this directory, then you can spin up a LangServe instance directly by:
 ```shell
 langchain serve
 ```
 This will start the FastAPI app with a server is running locally at
 [http://localhost:8000](http://localhost:8000)
 We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
 We can access the playground at [http://127.0.0.1:8000/nvidia-rag/playground](http://127.0.0.1:8000/nvidia-rag/playground)
 We can access the template from code with:
 ```python
 from langserve.client import RemoteRunnable
 runnable = RemoteRunnable("http://localhost:8000/nvidia-rag")
 ```
 ## Milvus Setup
 Use this step if you need to create a Milvus Vector Store and ingest data.
 We will first follow the standard Milvus setup instructions [here](https://milvus.io/docs/install_standalone-docker.md).
 1. Download the Docker Compose YAML file.
    ```shell
    wget https://github.com/milvus-io/milvus/releases/download/v2.3.3/milvus-standalone-docker-compose.yml -O docker-compose.yml
    ```
 2. Start the Milvus Vector Store container
    ```shell
    sudo docker compose up -d
    ```
 3. Install the PyMilvus package to interact with the Milvus container.
    ```shell
    pip install pymilvus
    ```
 4. Let's now ingest some data! We can do that by moving into this directory and running the code in `ingest.py`, eg:
    ```shell
    python ingest.py
    ```
    Note that you can (and should!) change this to ingest data of your choice.
--- a/templates/nvidia-rag-canonical/ingest.py
+++ b/templates/nvidia-rag-canonical/ingest.py
@ -0,0 +1,39 @@
 import getpass
 import os
 from langchain.document_loaders import PyPDFLoader
 from langchain.text_splitter import CharacterTextSplitter
 from langchain.vectorstores.milvus import Milvus
 from langchain_nvidia_aiplay import NVIDIAEmbeddings
 if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
 else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key
 # Note: if you change this, you should also change it in `nvidia_rag_canonical/chain.py`
 EMBEDDING_MODEL = "nvolveqa_40k"
 HOST = "127.0.0.1"
 PORT = "19530"
 COLLECTION_NAME = "test"
 embeddings = NVIDIAEmbeddings(model=EMBEDDING_MODEL)
 if __name__ == "__main__":
    # Load docs
    loader = PyPDFLoader("https://www.ssa.gov/news/press/factsheets/basicfact-alt.pdf")
    data = loader.load()
    # Split docs
    text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=100)
    docs = text_splitter.split_documents(data)
    # Insert the documents in Milvus Vector Store
    vector_db = Milvus.from_documents(
        docs,
        embeddings,
        collection_name=COLLECTION_NAME,
        connection_args={"host": HOST, "port": PORT},
    )
--- a/templates/nvidia-rag-canonical/nvidia_rag_canonical/init.py
+++ b/templates/nvidia-rag-canonical/nvidia_rag_canonical/init.py
@ -0,0 +1,3 @@
 from nvidia_rag_canonical.chain import chain, ingest
 __all__ = ["chain", "ingest"]
--- a/templates/nvidia-rag-canonical/nvidia_rag_canonical/chain.py
+++ b/templates/nvidia-rag-canonical/nvidia_rag_canonical/chain.py
@ -0,0 +1,91 @@
 import getpass
 import os
 from langchain.text_splitter import CharacterTextSplitter
 from langchain_community.document_loaders import PyPDFLoader
 from langchain_community.vectorstores import Milvus
 from langchain_core.output_parsers import StrOutputParser
 from langchain_core.prompts import ChatPromptTemplate
 from langchain_core.pydantic_v1 import BaseModel
 from langchain_core.runnables import (
    RunnableLambda,
    RunnableParallel,
    RunnablePassthrough,
 )
 from langchain_nvidia_aiplay import ChatNVIDIA, NVIDIAEmbeddings
 EMBEDDING_MODEL = "nvolveqa_40k"
 CHAT_MODEL = "llama2_13b"
 HOST = "127.0.0.1"
 PORT = "19530"
 COLLECTION_NAME = "test"
 INGESTION_CHUNK_SIZE = 500
 INGESTION_CHUNK_OVERLAP = 0
 if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
 else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key
 # Read from Milvus Vector Store
 embeddings = NVIDIAEmbeddings(model=EMBEDDING_MODEL)
 vectorstore = Milvus(
    connection_args={"host": HOST, "port": PORT},
    collection_name=COLLECTION_NAME,
    embedding_function=embeddings,
 )
 retriever = vectorstore.as_retriever()
 # RAG prompt
 template = """<s>[INST] <<SYS>>
 Use the following context to answer the user's question. If you don't know the answer,
 just say that you don't know, don't try to make up an answer.
 <</SYS>>
 <s>[INST] Context: {context} Question: {question} Only return the helpful
 answer below and nothing else. Helpful answer:[/INST]"
 """
 prompt = ChatPromptTemplate.from_template(template)
 # RAG
 model = ChatNVIDIA(model=CHAT_MODEL)
 chain = (
    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
    | prompt
    | model
    | StrOutputParser()
 )
 # Add typing for input
 class Question(BaseModel):
    __root__: str
 chain = chain.with_types(input_type=Question)
 def _ingest(url: str) -> dict:
    """Load and ingest the PDF file from the URL"""
    loader = PyPDFLoader(url)
    data = loader.load()
    # Split docs
    text_splitter = CharacterTextSplitter(
        chunk_size=INGESTION_CHUNK_SIZE, chunk_overlap=INGESTION_CHUNK_OVERLAP
    )
    docs = text_splitter.split_documents(data)
    # Insert the documents in Milvus Vector Store
    _ = Milvus.from_documents(
        documents=docs,
        embedding=embeddings,
        collection_name=COLLECTION_NAME,
        connection_args={"host": HOST, "port": PORT},
    )
    return {}
 ingest = RunnableLambda(_ingest)
--- a/templates/nvidia-rag-canonical/poetry.lock
+++ b/templates/nvidia-rag-canonical/poetry.lock
--- a/templates/nvidia-rag-canonical/pyproject.toml
+++ b/templates/nvidia-rag-canonical/pyproject.toml
@ -0,0 +1,29 @@
 [tool.poetry]
 name = "nvidia-rag-canonical"
 version = "0.1.0"
 description = "RAG with NVIDIA"
 authors = ["Sagar Bogadi Manjunath <sbogadimanju@nvidia.com>"]
 readme = "README.md"
 [tool.poetry.dependencies]
 python = ">=3.8.1,<4.0"
 langchain = "^0.1"
 pymilvus = ">=2.3.0"
 langchain-nvidia-aiplay = "^0.0.2"
 [tool.poetry.group.dev.dependencies]
 langchain-cli = ">=0.0.20"
 [tool.langserve]
 export_module = "nvidia_rag_canonical"
 export_attr = "chain"
 [tool.templates-hub]
 use-case = "rag"
 author = "LangChain"
 integrations = ["Milvus", "NVIDIA"]
 tags = ["vectordbs"]
 [build-system]
 requires = ["poetry-core"]
 build-backend = "poetry.core.masonry.api"
--- a/templates/nvidia-rag-canonical/tests/init.py
+++ b/templates/nvidia-rag-canonical/tests/init.py
		`@ -0,0 +1,3 @@`
							`from nvidia_rag_canonical.chain import chain, ingest`

							`__all__ = ["chain", "ingest"]`