langchain/docs/extras/use_cases/question_answering/question_answering.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5151afed",
   "metadata": {},
   "source": [
    "# Question Answering\n",
    "\n",
    "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/question_answering/qa.ipynb)\n",
    "\n",
    "## Use case\n",
    "Suppose you have some text documents (PDF, blog, Notion pages, etc.) and want to ask questions related to the contents of those documents. \n",
    "\n",
    "LLMs, given their proficiency in understanding text, are a great tool for this.\n",
    "\n",
    "In this walkthrough we'll go over how to build a question-answering over documents application using LLMs. \n",
    "\n",
    "Two very related use cases which we cover elsewhere are:\n",
    "- [QA over structured data](/docs/use_cases/qa_structured/sql) (e.g., SQL)\n",
    "- [QA over code](/docs/use_cases/code_understanding) (e.g., Python)\n",
    "\n",
    "![intro.png](/img/qa_intro.png)\n",
    "\n",
    "## Overview\n",
    "The pipeline for converting raw unstructured data into a QA chain looks like this:\n",
    "1. `Loading`: First we need to load our data. Use the [LangChain integration hub](https://integrations.langchain.com/) to browse the full set of loaders. \n",
    "2. `Splitting`: [Text splitters](/docs/modules/data_connection/document_transformers/) break `Documents` into splits of specified size\n",
    "3. `Storage`: Storage (e.g., often a [vectorstore](/docs/modules/data_connection/vectorstores/)) will house [and often embed](https://www.pinecone.io/learn/vector-embeddings/) the splits\n",
    "4. `Retrieval`: The app retrieves splits from storage (e.g., often [with similar embeddings](https://www.pinecone.io/learn/k-nearest-neighbor/) to the input question)\n",
    "5. `Generation`: An [LLM](/docs/modules/model_io/models/llms/) produces an answer using a prompt that includes the question and the retrieved data\n",
    "\n",
    "![flow.jpeg](/img/qa_flow.jpeg)\n",
    "\n",
    "## Quickstart\n",
    "\n",
    "Suppose we want a QA app over this [blog post](https://lilianweng.github.io/posts/2023-06-23-agent/). \n",
    "\n",
    "We can create this in a few lines of code. \n",
    "\n",
    "First set environment variables and install packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e14b744b",
   "metadata": {},
   "outputs": [],
   "source": [
    "pip install langchain openai chromadb langchainhub\n",
    "\n",
    "# Set env var OPENAI_API_KEY or load from a .env file\n",
    "# import dotenv\n",
    "\n",
    "# dotenv.load_dotenv()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "820244ae-74b4-4593-b392-822979dd91b8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load documents\n",
    "\n",
    "from langchain.document_loaders import WebBaseLoader\n",
    "loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c89a0aa7-1e7e-4557-90e5-a7ea87db00e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Split documents\n",
    "\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)\n",
    "splits = text_splitter.split_documents(loader.load())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "000e46f6-dafc-4a43-8417-463d0614fd30",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Embed and store splits\n",
    "\n",
    "from langchain.vectorstores import Chroma\n",
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "vectorstore = Chroma.from_documents(documents=splits,embedding=OpenAIEmbeddings())\n",
    "retriever = vectorstore.as_retriever()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "dacbde0b-7d45-4a2c-931d-81bb094aec94",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prompt \n",
    "# https://smith.langchain.com/hub/rlm/rag-prompt\n",
    "\n",
    "from langchain import hub\n",
    "rag_prompt = hub.pull(\"rlm/rag-prompt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "79b9fdae-c2bf-4cf6-884f-c19aa07dd975",
   "metadata": {},
   "outputs": [],
   "source": [
    "# LLM\n",
    "\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "92c0f3ae-6ab2-4d04-9b22-1963b96b9db5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# RAG chain \n",
    "\n",
    "from langchain.schema.runnable import RunnablePassthrough\n",
    "rag_chain = (\n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
    "    | rag_prompt \n",
    "    | llm \n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "0d3b0f36-7b56-49c0-8e40-a1aa9ebcbf24",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='Task decomposition is the process of breaking down a task into smaller subgoals or steps. It can be done using simple prompting, task-specific instructions, or human inputs.')"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rag_chain.invoke(\"What is Task Decomposition?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "639dc31a-7f16-40f6-ba2a-20e7c2ecfe60",
   "metadata": {},
   "source": [
    "[Here](https://smith.langchain.com/public/2270a675-74de-47ac-b111-b232d8340a64/r) is the LangSmith trace for this chain.\n",
    "\n",
    "Below we will explain each step in more detail."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba5daed6",
   "metadata": {},
   "source": [
    "## Step 1. Load\n",
    "\n",
    "Specify a `DocumentLoader` to load in your unstructured data as `Documents`. \n",
    "\n",
    "A `Document` is a dict with text (`page_content`) and `metadata`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "cf4d5c72",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import WebBaseLoader\n",
    "\n",
    "loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
    "data = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd2cc9a7",
   "metadata": {},
   "source": [
    "### Go deeper\n",
    "- Browse the > 160 data loader integrations [here](https://integrations.langchain.com/).\n",
    "- See further documentation on loaders [here](/docs/modules/data_connection/document_loaders/).\n",
    "\n",
    "## Step 2. Split\n",
    "\n",
    "Split the `Document` into chunks for embedding and vector storage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4b11c01d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)\n",
    "all_splits = text_splitter.split_documents(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a33bd4d",
   "metadata": {},
   "source": [
    "### Go deeper\n",
    "\n",
    "- `DocumentSplitters` are just one type of the more generic `DocumentTransformers`.\n",
    "- See further documentation on transformers [here](/docs/modules/data_connection/document_transformers/).\n",
    "- `Context-aware splitters` keep the location (\"context\") of each split in the original `Document`:\n",
    "    - [Markdown files](/docs/use_cases/question_answering/how_to/document-context-aware-QA)\n",
    "    - [Code (py or js)](docs/integrations/document_loaders/source_code)\n",
    "    - [Documents](/docs/integrations/document_loaders/grobid)\n",
    "\n",
    "## Step 3. Store\n",
    "\n",
    "To be able to look up our document splits, we first need to store them where we can later look them up.\n",
    "\n",
    "The most common way to do this is to embed the contents of each document split.\n",
    "\n",
    "We store the embedding and splits in a vectorstore."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e9c302c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "from langchain.vectorstores import Chroma\n",
    "\n",
    "vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc6f22b0",
   "metadata": {},
   "source": [
    "### Go deeper\n",
    "- Browse the > 40 vectorstores integrations [here](https://integrations.langchain.com/).\n",
    "- See further documentation on vectorstores [here](/docs/modules/data_connection/vectorstores/).\n",
    "- Browse the > 30 text embedding integrations [here](https://integrations.langchain.com/).\n",
    "- See further documentation on embedding models [here](/docs/modules/data_connection/text_embedding/).\n",
    "\n",
    " Here are Steps 1-3:\n",
    "\n",
    "![lc.png](/img/qa_data_load.png)\n",
    "\n",
    "## Step 4. Retrieve\n",
    "\n",
    "Retrieve relevant splits for any question using [similarity search](https://www.pinecone.io/learn/what-is-similarity-search/).\n",
    "\n",
    "This is simply \"top K\" retrieval where we select documents based on embedding similarity to the query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "e2c26b7d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "question = \"What are the approaches to Task Decomposition?\"\n",
    "docs = vectorstore.similarity_search(question)\n",
    "len(docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d5a113b",
   "metadata": {},
   "source": [
    "### Go deeper\n",
    "\n",
    "Vectorstores are commonly used for retrieval, but they are not the only option. For example, SVMs (see thread [here](https://twitter.com/karpathy/status/1647025230546886658?s=20)) can also be used.\n",
    "\n",
    "LangChain [has many retrievers](/docs/modules/data_connection/retrievers/) including, but not limited to, vectorstores. \n",
    "\n",
    "All retrievers implement a common method `get_relevant_documents()` (and its asynchronous variant `aget_relevant_documents()`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c901eaee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain.retrievers import SVMRetriever\n",
    "\n",
    "svm_retriever = SVMRetriever.from_documents(all_splits,OpenAIEmbeddings())\n",
    "docs_svm=svm_retriever.get_relevant_documents(question)\n",
    "len(docs_svm)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69de3d54",
   "metadata": {},
   "source": [
    "Some common ways to improve on vector similarity search include:\n",
    "- `MultiQueryRetriever` [generates variants of the input question](/docs/modules/data_connection/retrievers/MultiQueryRetriever) to improve retrieval.\n",
    "- `Max marginal relevance` selects for [relevance and diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf) among the retrieved documents.\n",
    "- Documents can be filtered during retrieval using [`metadata` filters](/docs/use_cases/question_answering/how_to/document-context-aware-QA)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9cfe3270-4e89-4c60-a2e5-9026b021bf76",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.retrievers.multi_query import MultiQueryRetriever\n",
    "\n",
    "logging.basicConfig()\n",
    "logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)\n",
    "\n",
    "retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectorstore.as_retriever(),\n",
    "                                                  llm=ChatOpenAI(temperature=0))\n",
    "unique_docs = retriever_from_llm.get_relevant_documents(query=question)\n",
    "len(unique_docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee8420e6-73a6-411b-a84d-74b096bddad7",
   "metadata": {},
   "source": [
    "In addition, a useful concept for improving retrieval is decoupling the documents from the embedded search key.\n",
    "\n",
    "For example, we can embed a document summary or question that are likely to lead to the document being retrieved.\n",
    "\n",
    "See details in [here](docs/modules/data_connection/retrievers/multi_vector) on the multi-vector retriever for this purpose.\n",
    "\n",
    "![mv.png](/img/multi_vector.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "415d6824",
   "metadata": {},
   "source": [
    "## Step 5. Generate\n",
    "\n",
    "Distill the retrieved documents into an answer using an LLM/Chat model (e.g., `gpt-3.5-turbo`).\n",
    "\n",
    "We use the [Runnable](https://python.langchain.com/docs/expression_language/interface) protocol to define the chain.\n",
    "\n",
    "Runnable protocol pipes together components in a transparent way.\n",
    "\n",
    "We used a prompt for RAG that is checked into the LangChain prompt hub ([here](https://smith.langchain.com/hub/rlm/rag-prompt))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "99fa1aec",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='Task decomposition is the process of breaking down a task into smaller subgoals or steps. It can be done using simple prompting, task-specific instructions, or human inputs.')"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0)\n",
    "\n",
    "from langchain.schema.runnable import RunnablePassthrough\n",
    "rag_chain = (\n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
    "    | rag_prompt \n",
    "    | llm \n",
    ")\n",
    "\n",
    "rag_chain.invoke(\"What is Task Decomposition?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7d52c84",
   "metadata": {},
   "source": [
    "### Go deeper\n",
    "\n",
    "#### Choosing LLMs\n",
    "- Browse the > 90 LLM and chat model integrations [here](https://integrations.langchain.com/).\n",
    "- See further documentation on LLMs and chat models [here](/docs/modules/model_io/models/).\n",
    "- See a guide on local LLMS [here](/docs/modules/use_cases/question_answering/how_to/local_retrieval_qa)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa82f437",
   "metadata": {},
   "source": [
    "#### Customizing the prompt\n",
    "\n",
    "As shown above, we can load prompts (e.g., [this RAG prompt](https://smith.langchain.com/hub/rlm/rag-prompt)) from the prompt hub.\n",
    "\n",
    "The prompt can also be easily customized, as shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "e4fee704",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='Task decomposition is the process of breaking down a complicated task into smaller, more manageable subtasks or steps. It can be done using prompts, task-specific instructions, or human inputs. Thanks for asking!')"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain.prompts import PromptTemplate\n",
    "\n",
    "template = \"\"\"Use the following pieces of context to answer the question at the end. \n",
    "If you don't know the answer, just say that you don't know, don't try to make up an answer. \n",
    "Use three sentences maximum and keep the answer as concise as possible. \n",
    "Always say \"thanks for asking!\" at the end of the answer. \n",
    "{context}\n",
    "Question: {question}\n",
    "Helpful Answer:\"\"\"\n",
    "rag_prompt_custom = PromptTemplate.from_template(template)\n",
    "\n",
    "rag_chain = (\n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
    "    | rag_prompt_custom \n",
    "    | llm \n",
    ")\n",
    "\n",
    "rag_chain.invoke(\"What is Task Decomposition?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f5b6297-715a-444e-b3ef-a6d27382b435",
   "metadata": {},
   "source": [
    "We can use [LangSmith](https://smith.langchain.com/public/129cac54-44d5-453a-9807-3bd4835e5f96/r) to see the trace."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}