# Retrieval QA using OpenAI functions

OpenAI functions allows for structuring of response output. This is often useful in question answering when you want to not only get the final answer but also supporting evidence, citations, etc.

In this notebook we show how to use an LLM chain which uses OpenAI functions as part of an overall retrieval pipeline.

In [1]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

In [2]:
loader = TextLoader("../../state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
for i, text in enumerate(texts):
    text.metadata['source'] = f"{i}-pl"
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)

Using embedded DuckDB without persistence: data will be transient


In [3]:
from langchain.chat_models import ChatOpenAI
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.prompts import PromptTemplate
from langchain.chains import create_qa_with_sources_chain

In [4]:
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")

In [5]:
qa_chain = create_qa_with_sources_chain(llm)

In [6]:
doc_prompt = PromptTemplate(
    template="Content: {page_content}\nSource: {source}",
    input_variables=["page_content", "source"],
)

In [7]:
final_qa_chain = StuffDocumentsChain(
    llm_chain=qa_chain, 
    document_variable_name='context',
    document_prompt=doc_prompt,
)

In [8]:
retrieval_qa = RetrievalQA(
    retriever=docsearch.as_retriever(),
    combine_documents_chain=final_qa_chain
)

In [9]:
query = "What did the president say about russia"

In [10]:
retrieval_qa.run(query)

'{\n  "answer": "The President expressed strong condemnation of Russia\'s actions in Ukraine and announced measures to isolate Russia and provide support to Ukraine. He stated that Russia\'s invasion of Ukraine will have long-term consequences for Russia and emphasized the commitment of the United States and its allies to defend NATO countries. The President also mentioned the imposition of sanctions on Russia and the release of oil reserves to help mitigate gas prices. Overall, the President\'s message conveyed a firm stance against Russia\'s aggression and a commitment to supporting Ukraine and protecting American interests.",\n  "sources": ["0-pl", "4-pl", "5-pl", "6-pl"]\n}'

## Using Pydantic

If we want to, we can set the chain to return in Pydantic. Note that if downstream chains consume the output of this chain - including memory - they will generally expect it to be in string format, so you should only use this chain when it is the final chain.

In [11]:
qa_chain_pydantic = create_qa_with_sources_chain(llm, output_parser="pydantic")

In [12]:
final_qa_chain_pydantic = StuffDocumentsChain(
    llm_chain=qa_chain_pydantic, 
    document_variable_name='context',
    document_prompt=doc_prompt,
)

In [13]:
retrieval_qa_pydantic = RetrievalQA(
    retriever=docsearch.as_retriever(),
    combine_documents_chain=final_qa_chain_pydantic
)

In [14]:
retrieval_qa_pydantic.run(query)

AnswerWithSources(answer="The President expressed strong condemnation of Russia's actions in Ukraine and announced measures to isolate Russia and provide support to Ukraine. He stated that Russia's invasion of Ukraine will have long-term consequences for Russia and emphasized the commitment of the United States and its allies to defend NATO countries. The President also mentioned the imposition of sanctions on Russia and the release of oil reserves to help mitigate gas prices. Overall, the President's message conveyed a firm stance against Russia's aggression and support for Ukraine.", sources=['0-pl', '4-pl', '5-pl', '6-pl'])

## Using in ConversationalRetrievalChain

We can also show what it's like to use this in the ConversationalRetrievalChain. Note that because this chain involves memory, we will NOT use the Pydantic return type.

In [15]:
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.chains import LLMChain
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\
Make sure to avoid using any unclear pronouns.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
condense_question_chain = LLMChain(
    llm=llm,
    prompt=CONDENSE_QUESTION_PROMPT,
)

In [16]:
qa = ConversationalRetrievalChain(
    question_generator=condense_question_chain, 
    retriever=docsearch.as_retriever(),
    memory=memory, 
    combine_docs_chain=final_qa_chain
)

In [17]:
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query})

In [18]:
result

{'question': 'What did the president say about Ketanji Brown Jackson',
 'chat_history': [HumanMessage(content='What did the president say about Ketanji Brown Jackson', additional_kwargs={}, example=False),
  AIMessage(content='{\n  "answer": "The President nominated Ketanji Brown Jackson as a Circuit Court of Appeals Judge and praised her as one of the nation\'s top legal minds who will continue Justice Breyer\'s legacy of excellence.",\n  "sources": ["31-pl"]\n}', additional_kwargs={}, example=False)],
 'answer': '{\n  "answer": "The President nominated Ketanji Brown Jackson as a Circuit Court of Appeals Judge and praised her as one of the nation\'s top legal minds who will continue Justice Breyer\'s legacy of excellence.",\n  "sources": ["31-pl"]\n}'}

In [19]:
query = "what did he say about her predecessor?"
result = qa({"question": query})

In [20]:
result

{'question': 'what did he say about her predecessor?',
 'chat_history': [HumanMessage(content='What did the president say about Ketanji Brown Jackson', additional_kwargs={}, example=False),
  AIMessage(content='{\n  "answer": "The President nominated Ketanji Brown Jackson as a Circuit Court of Appeals Judge and praised her as one of the nation\'s top legal minds who will continue Justice Breyer\'s legacy of excellence.",\n  "sources": ["31-pl"]\n}', additional_kwargs={}, example=False),
  HumanMessage(content='what did he say about her predecessor?', additional_kwargs={}, example=False),
  AIMessage(content='{\n  "answer": "The President honored Justice Stephen Breyer for his service as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court.",\n  "sources": ["31-pl"]\n}', additional_kwargs={}, example=False)],
 'answer': '{\n  "answer": "The President honored Justice Stephen Breyer for his service as an Army veteran, Constitutional scholar, and

## Using your own output schema

We can change the outputs of our chain by passing in our own schema. The values and descriptions of this schema will inform the function we pass to the OpenAI API, meaning it won't just affect how we parse outputs but will also change the OpenAI output itself. For example we can add a `countries_referenced` parameter to our schema and describe what we want this parameter to mean, and that'll cause the OpenAI output to include a description of a speaker in the response.

In [21]:
from typing import List

from pydantic import BaseModel, Field

from langchain.chains.openai_functions import create_qa_with_structure_chain

In [24]:
class CustomResponseSchema(BaseModel):
    """An answer to the question being asked, with sources."""

    answer: str = Field(..., description="Answer to the question that was asked")
    countries_referenced: List[str] = Field(..., description="All of the countries mentioned in the sources")
    sources: List[str] = Field(
        ..., description="List of sources used to answer the question"
    )

qa_chain_pydantic = create_qa_with_structure_chain(llm, CustomResponseSchema, output_parser="pydantic")
final_qa_chain_pydantic = StuffDocumentsChain(
    llm_chain=qa_chain_pydantic, 
    document_variable_name='context',
    document_prompt=doc_prompt,
)
retrieval_qa_pydantic = RetrievalQA(
    retriever=docsearch.as_retriever(),
    combine_documents_chain=final_qa_chain_pydantic
)
query = "What did he say about russia"
retrieval_qa_pydantic.run(query)

CustomResponseSchema(answer="He announced that American airspace will be closed off to all Russian flights, further isolating Russia and adding economic pressure. The Ruble has lost 30% of its value and the Russian stock market has lost 40% of its value. He also mentioned providing support to Ukraine in terms of military, economic, and humanitarian assistance. The US is giving more than $1 billion in direct assistance to Ukraine. He clarified that US forces are not engaged in conflict with Russian forces in Ukraine but are deployed to defend NATO allies. He emphasized that Putin's actions have consequences and that the free world is holding him accountable through economic sanctions and targeting Russian oligarchs.", countries_referenced=['Russia', 'Ukraine'], sources=['4-pl', '5-pl', '2-pl', '3-pl'])