mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-08 14:31:55 +00:00
community[minor]: [Pebblo] Enhance PebbloSafeLoader to take anonymize flag (#26812)
- **Description:** The flag is named `anonymize_snippets`. When set to true, the Pebblo server will anonymize snippets by redacting all personally identifiable information (PII) from the snippets going into VectorDB and the generated reports - **Issue:** NA - **Dependencies:** NA - **docs**: Updated
This commit is contained in:
@@ -124,6 +124,39 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Anonymize the snippets to redact all PII details\n",
|
||||
"\n",
|
||||
"Set `anonymize_snippets` to `True` to anonymize all personally identifiable information (PII) from the snippets going into VectorDB and the generated reports.\n",
|
||||
"\n",
|
||||
"> Note: The _Pebblo Entity Classifier_ effectively identifies personally identifiable information (PII) and is continuously evolving. While its recall is not yet 100%, it is steadily improving.\n",
|
||||
"> For more details, please refer to the [_Pebblo Entity Classifier docs_](https://daxa-ai.github.io/pebblo/entityclassifier/)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import CSVLoader, PebbloSafeLoader\n",
|
||||
"\n",
|
||||
"loader = PebbloSafeLoader(\n",
|
||||
" CSVLoader(\"data/corp_sens_data.csv\"),\n",
|
||||
" name=\"acme-corp-rag-1\", # App name (Mandatory)\n",
|
||||
" owner=\"Joe Smith\", # Owner (Optional)\n",
|
||||
" description=\"Support productivity RAG application\", # Description (Optional)\n",
|
||||
" anonymize_snippets=True, # Whether to anonymize entities in the PDF Report (Optional, default=False)\n",
|
||||
")\n",
|
||||
"documents = loader.load()\n",
|
||||
"print(documents[0].metadata)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
|
Reference in New Issue
Block a user