templates[minor]: Add rag google sensitive data protection template (#13921)

This is a template demonstrating how to utilize Google Sensitive Data Protection in conjunction with ChatVertexAI(). Tagging you @efriis as you reviewed my last template. :) Thanks! Proof of successful execution: ![image](https://github.com/langchain-ai/langchain/assets/82172964/e4d678aa-85c8-482b-b09d-81fe7e912dd4) --------- Co-authored-by: Erick Friis <erick@langchain.dev>
2025-09-26 13:59:49 +00:00 · 2023-11-29 00:15:58 +01:00
parent 8b9dc5e6d3
commit 6137894008
9 changed files with 2487 additions and 2 deletions
--- a/templates/rag-google-cloud-sensitive-data-protection/LICENSE
+++ b/templates/rag-google-cloud-sensitive-data-protection/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 LangChain, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/templates/rag-google-cloud-sensitive-data-protection/README.md
+++ b/templates/rag-google-cloud-sensitive-data-protection/README.md
@@ -0,0 +1,81 @@
+# rag-google-cloud-sensitive-data-protection
+
+This template is an application that utilizes Google Vertex AI Search, a machine learning powered search service, and
+PaLM 2 for Chat (chat-bison). The application uses a Retrieval chain to answer questions based on your documents.
+
+This template is an application that utilizes Google Sensitive Data Protection, a service for detecting and redacting
+sensitive data in text, and PaLM 2 for Chat (chat-bison), although you can use any model.
+
+For more context on using Sensitive Data Protection,
+check [here](https://cloud.google.com/dlp/docs/sensitive-data-protection-overview).
+
+## Environment Setup
+
+Before using this template, please ensure that you enable the DLP API and Vertex AI Search API in your Google Cloud
+project.
+
+Set the following environment variables:
+
+* `GOOGLE_CLOUD_PROJECT_ID` - Your Google Cloud project ID.
+* `MODEL_TYPE` - The model type for Vertex AI Search (e.g. `chat-bison`)
+
+## Usage
+
+To use this package, you should first have the LangChain CLI installed:
+
+```shell
+pip install -U langchain-cli
+```
+
+To create a new LangChain project and install this as the only package, you can do:
+
+```shell
+langchain app new my-app --package rag-google-cloud-sensitive-data-protection
+```
+
+If you want to add this to an existing project, you can just run:
+
+```shell
+langchain app add rag-google-cloud-sensitive-data-protection
+```
+
+And add the following code to your `server.py` file:
+
+```python
+from rag_google_cloud_sensitive_data_protection.chain import chain as rag_google_cloud_sensitive_data_protection_chain
+
+add_routes(app, rag_google_cloud_sensitive_data_protection_chain, path="/rag-google-cloud-sensitive-data-protection")
+```
+
+(Optional) Let's now configure LangSmith.
+LangSmith will help us trace, monitor and debug LangChain applications.
+LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/).
+If you don't have access, you can skip this section
+
+```shell
+export LANGCHAIN_TRACING_V2=true
+export LANGCHAIN_API_KEY=<your-api-key>
+export LANGCHAIN_PROJECT=<your-project>  # if not specified, defaults to "default"
+```
+
+If you are inside this directory, then you can spin up a LangServe instance directly by:
+
+```shell
+langchain serve
+```
+
+This will start the FastAPI app with a server running locally at
+[http://localhost:8000](http://localhost:8000)
+
+We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
+We can access the playground
+at [http://127.0.0.1:8000/rag-google-cloud-vertexai-search/playground](http://127.0.0.1:8000/rag-google-cloud-vertexai-search/playground)
+
+We can access the template from code with:
+
+```python
+from langserve.client import RemoteRunnable
+
+runnable = RemoteRunnable("http://localhost:8000/rag-google-cloud-sensitive-data-protection")
+```
+```
--- a/templates/rag-google-cloud-sensitive-data-protection/main.py
+++ b/templates/rag-google-cloud-sensitive-data-protection/main.py
@@ -0,0 +1,9 @@
+from rag_google_cloud_sensitive_data_protection.chain import chain
+
+if __name__ == "__main__":
+    query = {
+        "question": "Good morning. My name is Captain Blackbeard. My phone number "
+        "is 555-555-5555. And my email is lovely.pirate@gmail.com. Have a nice day.",
+        "chat_history": [],
+    }
+    print(chain.invoke(query))
--- a/templates/rag-google-cloud-sensitive-data-protection/poetry.lock
+++ b/templates/rag-google-cloud-sensitive-data-protection/poetry.lock
--- a/templates/rag-google-cloud-sensitive-data-protection/pyproject.toml
+++ b/templates/rag-google-cloud-sensitive-data-protection/pyproject.toml
@@ -0,0 +1,34 @@
+[tool.poetry]
+name = "rag-google-cloud-sensitive-data-protection"
+version = "0.0.1"
+description = "RAG using sensitive data protection"
+authors = ["Juan Calvo <juan.calvo@datatonic.com>"]
+readme = "README.md"
+
+[tool.poetry.dependencies]
+python = ">=3.8.1,<4.0"
+langchain = ">=0.0.333"
+google-cloud-aiplatform = ">=1.35.0"
+google-cloud-dlp = "^3.13.0"
+
+
+[tool.poetry.group.dev.dependencies]
+langchain-cli = ">=0.0.15"
+fastapi = "^0.104.0"
+sse-starlette = "^1.6.5"
+
+[tool.langserve]
+export_module = "rag_google_cloud_sensitive_data_protection"
+export_attr = "chain"
+
+[tool.templates-hub]
+use-case = "rag"
+author = "Datatonic"
+integrations = ["OpenAI", "Google Cloud"]
+tags = ["data"]
+
+[build-system]
+requires = [
+    "poetry-core",
+]
+build-backend = "poetry.core.masonry.api"
--- a/templates/rag-google-cloud-sensitive-data-protection/rag_google_cloud_sensitive_data_protection/init.py
+++ b/templates/rag-google-cloud-sensitive-data-protection/rag_google_cloud_sensitive_data_protection/init.py
@@ -0,0 +1,3 @@
+from rag_google_cloud_sensitive_data_protection.chain import chain
+
+__all__ = ["chain"]
--- a/templates/rag-google-cloud-sensitive-data-protection/rag_google_cloud_sensitive_data_protection/chain.py
+++ b/templates/rag-google-cloud-sensitive-data-protection/rag_google_cloud_sensitive_data_protection/chain.py
@@ -0,0 +1,117 @@
+import os
+from typing import List, Tuple
+
+from google.cloud import dlp_v2
+from langchain.chat_models import ChatVertexAI
+from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
+from langchain.pydantic_v1 import BaseModel, Field
+from langchain.schema.messages import AIMessage, HumanMessage
+from langchain.schema.output_parser import StrOutputParser
+from langchain.schema.runnable import RunnableLambda, RunnableMap
+
+
+# Formatting for chat history
+def _format_chat_history(chat_history: List[Tuple[str, str]]):
+    buffer = []
+    for human, ai in chat_history:
+        buffer.append(HumanMessage(content=human))
+        buffer.append(AIMessage(content=ai))
+    return buffer
+
+
+def _deidentify_with_replace(
+    input_str: str,
+    info_types: List[str],
+    project: str,
+) -> str:
+    """Uses the Data Loss Prevention API to deidentify sensitive data in a
+    string by replacing matched input values with the info type.
+    Args:
+        project: The Google Cloud project id to use as a parent resource.
+        input_str: The string to deidentify (will be treated as text).
+        info_types: A list of strings representing info types to look for.
+    Returns:
+        str: The input string after it has been deidentified.
+    """
+
+    # Instantiate a client
+    dlp = dlp_v2.DlpServiceClient()
+
+    # Convert the project id into a full resource id.
+    parent = f"projects/{project}/locations/global"
+
+    if info_types is None:
+        info_types = ["PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER"]
+    # Construct inspect configuration dictionary
+    inspect_config = {"info_types": [{"name": info_type} for info_type in info_types]}
+
+    # Construct deidentify configuration dictionary
+    deidentify_config = {
+        "info_type_transformations": {
+            "transformations": [
+                {"primitive_transformation": {"replace_with_info_type_config": {}}}
+            ]
+        }
+    }
+
+    # Construct item
+    item = {"value": input_str}
+
+    # Call the API
+    response = dlp.deidentify_content(
+        request={
+            "parent": parent,
+            "deidentify_config": deidentify_config,
+            "inspect_config": inspect_config,
+            "item": item,
+        }
+    )
+
+    # Print out the results.
+    return response.item.value
+
+
+# Prompt we will use
+prompt = ChatPromptTemplate.from_messages(
+    [
+        (
+            "system",
+            "You are a helpful assistant who translates to pirate",
+        ),
+        MessagesPlaceholder(variable_name="chat_history"),
+        ("user", "{question}"),
+    ]
+)
+
+# Create Vertex AI retriever
+project_id = os.environ.get("GOOGLE_CLOUD_PROJECT_ID")
+model_type = os.environ.get("MODEL_TYPE")
+
+# Set LLM and embeddings
+model = ChatVertexAI(model_name=model_type, temperature=0.0)
+
+
+class ChatHistory(BaseModel):
+    question: str
+    chat_history: List[Tuple[str, str]] = Field(..., extra={"widget": {"type": "chat"}})
+
+
+_inputs = RunnableMap(
+    {
+        "question": RunnableLambda(
+            lambda x: _deidentify_with_replace(
+                input_str=x["question"],
+                info_types=["PERSON_NAME", "PHONE_NUMBER", "EMAIL_ADDRESS"],
+                project=project_id,
+            )
+        ).with_config(run_name="<lambda> _deidentify_with_replace"),
+        "chat_history": RunnableLambda(
+            lambda x: _format_chat_history(x["chat_history"])
+        ).with_config(run_name="<lambda> _format_chat_history"),
+    }
+)
+
+# RAG
+chain = _inputs | prompt | model | StrOutputParser()
+
+chain = chain.with_types(input_type=ChatHistory).with_config(run_name="Inputs")
--- a/templates/rag-google-cloud-sensitive-data-protection/tests/init.py
+++ b/templates/rag-google-cloud-sensitive-data-protection/tests/init.py
--- a/templates/rag-google-cloud-vertexai-search/rag_google_cloud_vertexai_search/chain.py
+++ b/templates/rag-google-cloud-vertexai-search/rag_google_cloud_vertexai_search/chain.py
@@ -7,7 +7,7 @@ from langchain.retrievers import GoogleVertexAISearchRetriever
 from langchain.schema.output_parser import StrOutputParser
 from langchain.schema.runnable import RunnableParallel, RunnablePassthrough

-# Get region and profile from env
+# Get project, data store, and model type from env variables
 project_id = os.environ.get("GOOGLE_CLOUD_PROJECT_ID")
 data_store_id = os.environ.get("DATA_STORE_ID")
 model_type = os.environ.get("MODEL_TYPE")
@@ -21,7 +21,7 @@ if not data_store_id:
 # Set LLM and embeddings
 model = ChatVertexAI(model_name=model_type, temperature=0.0)

-# Create Kendra retriever
+# Create Vertex AI retriever
 retriever = GoogleVertexAISearchRetriever(
    project_id=project_id, search_engine_id=data_store_id
 )