mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-05 21:12:48 +00:00
Adding an in-context QA evaluation chain + chain of thought reasoning chain for improved accuracy (#2444)
Right now, eval chains require an answer for every question. It's cumbersome to collect this ground truth so getting around this issue with 2 things: * Adding a context param in `ContextQAEvalChain` and simply evaluating if the question is answered accurately from context * Adding chain of though explanation prompting to improve the accuracy of this w/o GT. This also gets to feature parity with openai/evals which has the same contextual eval w/o GT. TODO in follow-up: * Better prompt inheritance. No need for seperate prompt for CoT reasoning. How can we merge them together --------- Co-authored-by: Vashisht Madhavan <vashishtmadhavan@Vashs-MacBook-Pro.local>
This commit is contained in:
committed by
GitHub
parent
e131156805
commit
aa439ac2ff
@@ -234,6 +234,93 @@
|
||||
"evalchain.evaluate(examples, predictions, question_key=\"question\", answer_key=\"answer\", prediction_key=\"text\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cb1cf335",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Evaluation without Ground Truth\n",
|
||||
"Its possible to evaluate question answering systems without ground truth. You would need a `\"context\"` input that reflects what the information the LLM uses to answer the question. This context can be obtained by any retreival system. Here's an example of how it works:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6c59293f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"context_examples = [\n",
|
||||
" {\n",
|
||||
" \"question\": \"How old am I?\",\n",
|
||||
" \"context\": \"I am 30 years old. I live in New York and take the train to work everyday.\",\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": 'Who won the NFC championship game in 2023?\"',\n",
|
||||
" \"context\": \"NFC Championship Game 2023: Philadelphia Eagles 31, San Francisco 49ers 7\"\n",
|
||||
" }\n",
|
||||
"]\n",
|
||||
"QA_PROMPT = \"Answer the question based on the context\\nContext:{context}\\nQuestion:{question}\\nAnswer:\"\n",
|
||||
"template = PromptTemplate(input_variables=[\"context\", \"question\"], template=QA_PROMPT)\n",
|
||||
"qa_chain = LLMChain(llm=llm, prompt=template)\n",
|
||||
"predictions = qa_chain.apply(context_examples)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "e500d0cc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'text': 'You are 30 years old.'},\n",
|
||||
" {'text': ' The Philadelphia Eagles won the NFC championship game in 2023.'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"predictions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "6d8cbc1d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.evaluation.qa import ContextQAEvalChain\n",
|
||||
"eval_chain = ContextQAEvalChain.from_llm(llm)\n",
|
||||
"graded_outputs = eval_chain.evaluate(context_examples, predictions, question_key=\"question\", prediction_key=\"text\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "6c5262d0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'text': ' CORRECT'}, {'text': ' CORRECT'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"graded_outputs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "aaa61f0c",
|
||||
@@ -329,7 +416,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.9.16"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
Reference in New Issue
Block a user