langchain/docs/extras/guides
maks-operlejn-ds 274c3dc3a8
Multilingual anonymization (#10327)
### Description

Add multiple language support to Anonymizer

PII detection in Microsoft Presidio relies on several components - in
addition to the usual pattern matching (e.g. using regex), the analyser
uses a model for Named Entity Recognition (NER) to extract entities such
as:
- `PERSON`
- `LOCATION`
- `DATE_TIME`
- `NRP`
- `ORGANIZATION`


[[Source]](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/predefined_recognizers/spacy_recognizer.py)

To handle NER in specific languages, we utilize unique models from the
`spaCy` library, recognized for its extensive selection covering
multiple languages and sizes. However, it's not restrictive, allowing
for integration of alternative frameworks such as
[Stanza](https://microsoft.github.io/presidio/analyzer/nlp_engines/spacy_stanza/)
or
[transformers](https://microsoft.github.io/presidio/analyzer/nlp_engines/transformers/)
when necessary.

### Future works

- **automatic language detection** - instead of passing the language as
a parameter in `anonymizer.anonymize`, we could detect the language/s
beforehand and then use the corresponding NER model. We have discussed
this internally and @mateusz-wosinski-ds will look into a standalone
language detection tool/chain for LangChain 😄

### Twitter handle
@deepsense_ai / @MaksOpp

### Tag maintainer
@baskaryan @hwchase17 @hinthornw
2023-09-07 14:42:24 -07:00
..
adapters adapter doc nit (#9234) 2023-08-14 18:26:37 -07:00
deployments Made some Grammatical error fixes (#10156) 2023-09-03 20:21:46 -07:00
evaluation Delete Old Evals Examples (#8252) 2023-07-26 18:46:54 -07:00
langsmith update notebook (#7852) 2023-07-17 14:46:42 -07:00
privacy Multilingual anonymization (#10327) 2023-09-07 14:42:24 -07:00
safety Update amazon_comprehend_chain.ipynb (#10246) 2023-09-06 15:38:37 -07:00
debugging.md Fixed some grammatical typos in doc files (#10191) 2023-09-04 10:48:08 -07:00
fallbacks.ipynb Fixing spelling mistakes in fallbacks.ipynb (#9376) 2023-08-18 10:33:47 -04:00
local_llms.ipynb typo in locall llms fixed (#9755) 2023-09-03 20:29:41 -07:00
model_laboratory.ipynb mv popular and additional chains to use cases (#8242) 2023-07-27 12:55:13 -07:00
pydantic_compatibility.md guides docs nits (#10005) 2023-08-30 11:07:42 -07:00