docs: add LangFair as a provider (#29390)

**Description:** - Add `docs/docs/providers/langfair.mdx` - Register langfair in `libs/packages.yml` **Twitter handle:** @LangFair **Tests and docs** 1. Integration tests not needed as this PR only adds a .mdx file to docs. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Dylan Bouchard <dylan.bouchard@cvshealth.com> Co-authored-by: Dylan Bouchard <109233938+dylanbouchard@users.noreply.github.com> Co-authored-by: Erick Friis <erickfriis@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>
2025-09-10 07:21:03 +00:00 · 2025-02-07 16:27:37 -05:00
parent eb9eddae0c
commit 252cf0af10
2 changed files with 133 additions and 0 deletions
--- a/docs/docs/integrations/providers/langfair.mdx
+++ b/docs/docs/integrations/providers/langfair.mdx
@@ -0,0 +1,129 @@
+# LangFair: Use-Case Level LLM Bias and Fairness Assessments
+
+LangFair is a comprehensive Python library designed for conducting bias and fairness assessments of large language model (LLM) use cases. The LangFair [repository](https://github.com/cvs-health/langfair) includes a comprehensive framework for [choosing bias and fairness metrics](https://github.com/cvs-health/langfair/tree/main#-choosing-bias-and-fairness-metrics-for-an-llm-use-case), along with [demo notebooks](https://github.com/cvs-health/langfair/tree/main/examples) and a [technical playbook](https://arxiv.org/abs/2407.10853) that discusses LLM bias and fairness risks, evaluation metrics, and best practices. 
+
+Explore our [documentation site](https://cvs-health.github.io/langfair/) for detailed instructions on using LangFair.
+
+## ⚡ Quickstart Guide
+### (Optional) Create a virtual environment for using LangFair
+We recommend creating a new virtual environment using venv before installing LangFair. To do so, please follow instructions [here](https://docs.python.org/3/library/venv.html).
+
+### Installing LangFair
+The latest version can be installed from PyPI:
+
+```bash
+pip install langfair
+```
+
+### Usage Examples
+Below are code samples illustrating how to use LangFair to assess bias and fairness risks in text generation and summarization use cases. The below examples assume the user has already defined a list of prompts from their use case, `prompts`. 
+
+##### Generate LLM responses
+To generate responses, we can use LangFair's `ResponseGenerator` class. First, we must create a `langchain` LLM object. Below we use `ChatVertexAI`, but **any of [LangChain’s LLM classes](https://js.langchain.com/docs/integrations/chat/) may be used instead**. Note that `InMemoryRateLimiter` is to used to avoid rate limit errors.
+```python
+from langchain_google_vertexai import ChatVertexAI
+from langchain_core.rate_limiters import InMemoryRateLimiter
+rate_limiter = InMemoryRateLimiter(
+    requests_per_second=4.5, check_every_n_seconds=0.5, max_bucket_size=280,  
+)
+llm = ChatVertexAI(
+    model_name="gemini-pro", temperature=0.3, rate_limiter=rate_limiter
+)
+```
+We can use `ResponseGenerator.generate_responses` to generate 25 responses for each prompt, as is convention for toxicity evaluation.
+```python
+from langfair.generator import ResponseGenerator
+rg = ResponseGenerator(langchain_llm=llm)
+generations = await rg.generate_responses(prompts=prompts, count=25)
+responses = generations["data"]["response"]
+duplicated_prompts = generations["data"]["prompt"] # so prompts correspond to responses
+```
+
+##### Compute toxicity metrics
+Toxicity metrics can be computed with `ToxicityMetrics`. Note that use of `torch.device` is optional and should be used if GPU is available to speed up toxicity computation.
+```python
+# import torch # uncomment if GPU is available
+# device = torch.device("cuda") # uncomment if GPU is available
+from langfair.metrics.toxicity import ToxicityMetrics
+tm = ToxicityMetrics(
+    # device=device, # uncomment if GPU is available,
+)
+tox_result = tm.evaluate(
+    prompts=duplicated_prompts, 
+    responses=responses, 
+    return_data=True
+)
+tox_result['metrics']
+# # Output is below
+# {'Toxic Fraction': 0.0004,
+# 'Expected Maximum Toxicity': 0.013845130120171235,
+# 'Toxicity Probability': 0.01}
+```
+
+##### Compute stereotype metrics
+Stereotype metrics can be computed with `StereotypeMetrics`.
+```python
+from langfair.metrics.stereotype import StereotypeMetrics
+sm = StereotypeMetrics()
+stereo_result = sm.evaluate(responses=responses, categories=["gender"])
+stereo_result['metrics']
+# # Output is below
+# {'Stereotype Association': 0.3172750176745329,
+# 'Cooccurrence Bias': 0.44766333654278373,
+# 'Stereotype Fraction - gender': 0.08}
+```
+
+##### Generate counterfactual responses and compute metrics
+We can generate counterfactual responses with `CounterfactualGenerator`.
+```python
+from langfair.generator.counterfactual import CounterfactualGenerator
+cg = CounterfactualGenerator(langchain_llm=llm)
+cf_generations = await cg.generate_responses(
+    prompts=prompts, attribute='gender', count=25
+)
+male_responses = cf_generations['data']['male_response']
+female_responses = cf_generations['data']['female_response']
+```
+
+Counterfactual metrics can be easily computed with `CounterfactualMetrics`.
+```python
+from langfair.metrics.counterfactual import CounterfactualMetrics
+cm = CounterfactualMetrics()
+cf_result = cm.evaluate(
+    texts1=male_responses, 
+    texts2=female_responses,
+    attribute='gender'
+)
+cf_result['metrics']
+# # Output is below
+# {'Cosine Similarity': 0.8318708,
+# 'RougeL Similarity': 0.5195852482361165,
+# 'Bleu Similarity': 0.3278433712872481,
+# 'Sentiment Bias': 0.0009947145187601957}
+```
+
+##### Alternative approach: Semi-automated evaluation with `AutoEval`
+To streamline assessments for text generation and summarization use cases, the `AutoEval` class conducts a multi-step process that completes all of the aforementioned steps with two lines of code.
+```python
+from langfair.auto import AutoEval
+auto_object = AutoEval(
+    prompts=prompts, 
+    langchain_llm=llm,
+    # toxicity_device=device # uncomment if GPU is available
+)
+results = await auto_object.evaluate()
+results['metrics']
+# # Output is below
+# {'Toxicity': {'Toxic Fraction': 0.0004,
+#   'Expected Maximum Toxicity': 0.013845130120171235,
+#   'Toxicity Probability': 0.01},
+#  'Stereotype': {'Stereotype Association': 0.3172750176745329,
+#   'Cooccurrence Bias': 0.44766333654278373,
+#   'Stereotype Fraction - gender': 0.08,
+#   'Expected Maximum Stereotype - gender': 0.60355167388916,
+#   'Stereotype Probability - gender': 0.27036},
+#  'Counterfactual': {'male-female': {'Cosine Similarity': 0.8318708,
+#    'RougeL Similarity': 0.5195852482361165,
+#    'Bleu Similarity': 0.3278433712872481,
+#    'Sentiment Bias': 0.0009947145187601957}}}
+```
--- a/libs/packages.yml
+++ b/libs/packages.yml
@@ -386,3 +386,7 @@ packages:
  repo: Nimbleway/langchain-nimble
  path: .
  downloads: 0
+- name: langfair
+  repo: cvs-health/langfair
+  path: .
+  downloads: 0