# Data Augmented Question Answering

This notebook uses some generic prompts/language models to evaluate an question answering system that uses other sources of data besides what is in the model. For example, this can be used to evaluate a question answering system over your propritary data.

## Setup
Let's set up an example with our favorite example - the state of the union address.

In [1]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.faiss import FAISS
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA

In [2]:
with open('../state_of_the_union.txt') as f:
 state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

embeddings = OpenAIEmbeddings()
docsearch = FAISS.from_texts(texts, embeddings)
qa = VectorDBQA.from_llm(llm=OpenAI(), vectorstore=docsearch)

## Examples
Now we need some examples to evaluate. We can do this in two ways:

1. Hard code some examples ourselves
2. Generate examples automatically, using a language model

In [3]:
# Hard-coded examples
examples = [
 {
 "query": "What did the president say about Ketanji Brown Jackson",
 "answer": "He praised her legal ability and said he nominated her for the supreme court."
 },
 {
 "query": "What did the president say about Michael Jackson",
 "answer": "Nothing"
 }
]

In [4]:
# Generated examples
from langchain.evaluation.qa import QAGenerateChain
example_gen_chain = QAGenerateChain.from_llm(OpenAI())

In [5]:
new_examples = example_gen_chain.apply_and_parse([{"doc": t} for t in texts[:5]])

In [6]:
new_examples

[{'query': 'What did Vladimir Putin seek to do according to the document?',
 'answer': 'Vladimir Putin sought to shake the foundations of the free world and make it bend to his menacing ways.'},
 {'query': 'What did President Zelenskyy say in his speech to the European Parliament?',
 'answer': 'President Zelenskyy said "Light will win over darkness."'},
 {'query': "How many countries joined the European Union in opposing Putin's attack on Ukraine?",
 'answer': '27'},
 {'query': 'What is the U.S. Department of Justice assembling in response to the Russian oligarchs?',
 'answer': 'A dedicated task force.'},
 {'query': 'How much direct assistance is the US providing to Ukraine?',
 'answer': 'The US is providing more than $1 Billion in direct assistance to Ukraine.'}]

In [7]:
# Combine examples
examples += new_examples

## Evaluate
Now that we have examples, we can use the question answering evaluator to evaluate our question answering chain.

In [8]:
from langchain.evaluation.qa import QAEvalChain

In [9]:
predictions = qa.apply(examples)

In [10]:
llm = OpenAI(temperature=0)
eval_chain = QAEvalChain.from_llm(llm)

In [11]:
graded_outputs = eval_chain.evaluate(examples, predictions)

In [12]:
graded_outputs

[{'text': ' CORRECT'},
 {'text': ' CORRECT'},
 {'text': ' INCORRECT'},
 {'text': ' CORRECT'},
 {'text': ' CORRECT'},
 {'text': ' CORRECT'},
 {'text': ' CORRECT'}]