docs: update Confident callback integration and examples (#32458)

**Description:**
Updates the Confident AI integration documentation to use modern
patterns and improve code quality. This change:
- Replaces deprecated `DeepEvalCallbackHandler` with the new
`CallbackHandler` from `deepeval.integrations.langchain`
- Updates installation and authentication instructions to match current
best practices
- Adds modern integration examples using LangChain's latest patterns
- Removes deprecated metrics and outdated code examples
- Updates code samples to follow current best practices

The changes make the documentation more maintainable and ensure users
follow the recommended integration patterns.

**Issue:** Fixes #32444

**Dependencies:**
- deepeval
- langchain
- langchain-openai

**Twitter handle:** @Muwinuddin

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
Amit Biswas
2025-09-11 08:43:31 +06:00
committed by GitHub
parent eb77da7de5
commit 653b0908af
2 changed files with 69 additions and 200 deletions

View File

@@ -7,10 +7,7 @@
"source": [
"# Confident\n",
"\n",
">[DeepEval](https://confident-ai.com) package for unit testing LLMs.\n",
"> Using Confident, everyone can build robust language models through faster iterations\n",
"> using both unit testing and integration testing. We provide support for each step in the iteration\n",
"> from synthetic data creation to testing.\n"
">[DeepEval](https://confident-ai.com) package for unit testing LLMs."
]
},
{
@@ -42,7 +39,7 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain langchain-openai langchain-community deepeval langchain-chroma"
"!pip install deepeval langchain langchain-openai"
]
},
{
@@ -64,11 +61,29 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">🎉🥳 Congratulations! You've successfully logged in! 🙌 \n",
"</pre>\n"
],
"text/plain": [
"🎉🥳 Congratulations! You've successfully logged in! 🙌 \n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"!deepeval login"
"import os\n",
"import deepeval\n",
"\n",
"api_key = os.getenv(\"DEEPEVAL_API_KEY\")\n",
"deepeval.login(api_key)"
]
},
{
@@ -76,12 +91,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup DeepEval\n",
"### Setup Confident AI Callback (Modern)\n",
"\n",
"You can, by default, use the `DeepEvalCallbackHandler` to set up the metrics you want to track. However, this has limited support for metrics at the moment (more to be added soon). It currently supports:\n",
"- [Answer Relevancy](https://docs.confident-ai.com/docs/measuring_llm_performance/answer_relevancy)\n",
"- [Bias](https://docs.confident-ai.com/docs/measuring_llm_performance/debias)\n",
"- [Toxicness](https://docs.confident-ai.com/docs/measuring_llm_performance/non_toxic)"
"The previous DeepEvalCallbackHandler and metric tracking are deprecated. Please use the new integration below."
]
},
{
@@ -90,10 +102,15 @@
"metadata": {},
"outputs": [],
"source": [
"from deepeval.metrics.answer_relevancy import AnswerRelevancy\n",
"from deepeval.integrations.langchain import CallbackHandler\n",
"\n",
"# Here we want to make sure the answer is minimally relevant\n",
"answer_relevancy_metric = AnswerRelevancy(minimum_score=0.5)"
"handler = CallbackHandler(\n",
" name=\"My Trace\",\n",
" tags=[\"production\", \"v1\"],\n",
" metadata={\"experiment\": \"A/B\"},\n",
" thread_id=\"thread-123\",\n",
" user_id=\"user-456\",\n",
")"
]
},
{
@@ -103,186 +120,11 @@
"source": [
"## Get Started"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"To use the `DeepEvalCallbackHandler`, we need the `implementation_name`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.callbacks.confident_callback import DeepEvalCallbackHandler\n",
"\n",
"deepeval_callback = DeepEvalCallbackHandler(\n",
" implementation_name=\"langchainQuickstart\", metrics=[answer_relevancy_metric]\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scenario 1: Feeding into LLM\n",
"\n",
"You can then feed it into your LLM with OpenAI."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LLMResult(generations=[[Generation(text='\\n\\nQ: What did the fish say when he hit the wall? \\nA: Dam.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\nThe Moon \\n\\nThe moon is high in the midnight sky,\\nSparkling like a star above.\\nThe night so peaceful, so serene,\\nFilling up the air with love.\\n\\nEver changing and renewing,\\nA never-ending light of grace.\\nThe moon remains a constant view,\\nA reminder of lifes gentle pace.\\n\\nThrough time and space it guides us on,\\nA never-fading beacon of hope.\\nThe moon shines down on us all,\\nAs it continues to rise and elope.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\nQ. What did one magnet say to the other magnet?\\nA. \"I find you very attractive!\"', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text=\"\\n\\nThe world is charged with the grandeur of God.\\nIt will flame out, like shining from shook foil;\\nIt gathers to a greatness, like the ooze of oil\\nCrushed. Why do men then now not reck his rod?\\n\\nGenerations have trod, have trod, have trod;\\nAnd all is seared with trade; bleared, smeared with toil;\\nAnd wears man's smudge and shares man's smell: the soil\\nIs bare now, nor can foot feel, being shod.\\n\\nAnd for all this, nature is never spent;\\nThere lives the dearest freshness deep down things;\\nAnd though the last lights off the black West went\\nOh, morning, at the brown brink eastward, springs —\\n\\nBecause the Holy Ghost over the bent\\nWorld broods with warm breast and with ah! bright wings.\\n\\n~Gerard Manley Hopkins\", generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\nQ: What did one ocean say to the other ocean?\\nA: Nothing, they just waved.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text=\"\\n\\nA poem for you\\n\\nOn a field of green\\n\\nThe sky so blue\\n\\nA gentle breeze, the sun above\\n\\nA beautiful world, for us to love\\n\\nLife is a journey, full of surprise\\n\\nFull of joy and full of surprise\\n\\nBe brave and take small steps\\n\\nThe future will be revealed with depth\\n\\nIn the morning, when dawn arrives\\n\\nA fresh start, no reason to hide\\n\\nSomewhere down the road, there's a heart that beats\\n\\nBelieve in yourself, you'll always succeed.\", generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {'completion_tokens': 504, 'total_tokens': 528, 'prompt_tokens': 24}, 'model_name': 'text-davinci-003'})"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_openai import OpenAI\n",
"\n",
"llm = OpenAI(\n",
" temperature=0,\n",
" callbacks=[deepeval_callback],\n",
" verbose=True,\n",
" openai_api_key=\"<YOUR_API_KEY>\",\n",
")\n",
"output = llm.generate(\n",
" [\n",
" \"What is the best evaluation tool out there? (no bias at all)\",\n",
" ]\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You can then check the metric if it was successful by calling the `is_successful()` method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"answer_relevancy_metric.is_successful()\n",
"# returns True/False"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you have ran that, you should be able to see our dashboard below. \n",
"\n",
"![Dashboard](https://docs.confident-ai.com/assets/images/dashboard-screenshot-b02db73008213a211b1158ff052d969e.png)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scenario 2: Tracking an LLM in a chain without callbacks\n",
"\n",
"To track an LLM in a chain without callbacks, you can plug into it at the end.\n",
"\n",
"We can start by defining a simple chain as shown below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"from langchain.chains import RetrievalQA\n",
"from langchain_chroma import Chroma\n",
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_openai import OpenAI, OpenAIEmbeddings\n",
"from langchain_text_splitters import CharacterTextSplitter\n",
"\n",
"text_file_url = \"https://raw.githubusercontent.com/hwchase17/chat-your-data/master/state_of_the_union.txt\"\n",
"\n",
"openai_api_key = \"sk-XXX\"\n",
"\n",
"with open(\"state_of_the_union.txt\", \"w\") as f:\n",
" response = requests.get(text_file_url)\n",
" f.write(response.text)\n",
"\n",
"loader = TextLoader(\"state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)\n",
"docsearch = Chroma.from_documents(texts, embeddings)\n",
"\n",
"qa = RetrievalQA.from_chain_type(\n",
" llm=OpenAI(openai_api_key=openai_api_key),\n",
" chain_type=\"stuff\",\n",
" retriever=docsearch.as_retriever(),\n",
")\n",
"\n",
"# Providing a new question-answering pipeline\n",
"query = \"Who is the president?\"\n",
"result = qa.run(query)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"After defining a chain, you can then manually check for answer similarity."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"answer_relevancy_metric.measure(result, query)\n",
"answer_relevancy_metric.is_successful()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### What's next?\n",
"\n",
"You can create your own custom metrics [here](https://docs.confident-ai.com/docs/quickstart/custom-metrics). \n",
"\n",
"DeepEval also offers other features such as being able to [automatically create unit tests](https://docs.confident-ai.com/docs/quickstart/synthetic-data-creation), [tests for hallucination](https://docs.confident-ai.com/docs/measuring_llm_performance/factual_consistency).\n",
"\n",
"If you are interested, check out our Github repository here [https://github.com/confident-ai/deepeval](https://github.com/confident-ai/deepeval). We welcome any PRs and discussions on how to improve LLM performance."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "langchain",
"language": "python",
"name": "python3"
},
@@ -296,12 +138,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"vscode": {
"interpreter": {
"hash": "a53ebf4a859167383b364e7e7521d0add3c2dbbdecce4edf676e8c4634ff3fbb"
}
"version": "3.12.11"
}
},
"nbformat": 4,

View File

@@ -21,6 +21,38 @@ pip install deepeval
See an [example](/docs/integrations/callbacks/confident).
```python
from langchain.callbacks.confident_callback import DeepEvalCallbackHandler
## Modern Integration Example
Install the required packages:
```bash
pip install deepeval langchain langchain-openai
```
Authenticate with your API key:
```python
import os
import deepeval
# Load API key from environment variable for security
api_key = os.environ.get("DEEPEVAL_API_KEY")
deepeval.login(api_key)
```
Use the new callback handler:
```python
from deepeval.integrations.langchain import CallbackHandler
handler = CallbackHandler(
name="My Trace",
tags=["production", "v1"],
metadata={"experiment": "A/B"},
thread_id="thread-123",
user_id="user-456"
)
```
See the [full example](/docs/integrations/callbacks/confident).