mirror of
https://github.com/hwchase17/langchain.git
synced 2026-01-24 05:50:18 +00:00
RAG guide
This commit is contained in:
388
docs/docs/guides/productionization/evaluation/examples/rag.ipynb
Normal file
388
docs/docs/guides/productionization/evaluation/examples/rag.ipynb
Normal file
@@ -0,0 +1,388 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2e7db2b1-8f9c-46bd-9c50-b6cfb0a38a22",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# RAG Evaluation\n",
|
||||
"[](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/guides/evaluation/examples/rag.ipynb)\n",
|
||||
"\n",
|
||||
"RAG (Retrieval Augmented Generation) is one of the most popular LLM applications.\n",
|
||||
"\n",
|
||||
"For an in-depth review, see our RAG series of notebooks and videos [here](https://github.com/langchain-ai/rag-from-scratch)).\n",
|
||||
"\n",
|
||||
"## Types of RAG eval\n",
|
||||
"\n",
|
||||
"There are at least 4 types of RAG eval that users of typically interested in:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"We will discuss each below.\n",
|
||||
"\n",
|
||||
"### Reference Answer\n",
|
||||
"\n",
|
||||
"First, lets consider the case in which we want to compare our RAG chain answer to a reference answer.\n",
|
||||
"\n",
|
||||
"This is shown on the far right (blue) above.\n",
|
||||
"\n",
|
||||
"#### RAG Chain \n",
|
||||
"\n",
|
||||
"To start, we build a RAG chain. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d809e9a0-44bc-4e9f-8eee-732ef077538c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain-community langchain chromdb tiktoken"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "760cab79-2d5e-4324-ba4a-54b6f4094cb0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We build an `index` using a set of LangChain docs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "6f7c0017-f4dd-4071-aa48-40957ffb4e9d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### INDEX\n",
|
||||
"\n",
|
||||
"from bs4 import BeautifulSoup as Soup\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader\n",
|
||||
"\n",
|
||||
"# Load\n",
|
||||
"url = \"https://python.langchain.com/docs/expression_language/\"\n",
|
||||
"loader = RecursiveUrlLoader(url=url, max_depth=20, extractor=lambda x: Soup(x, \"html.parser\").text)\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"# Split\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
|
||||
"splits = text_splitter.split_documents(docs)\n",
|
||||
"\n",
|
||||
"# Embed\n",
|
||||
"vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())\n",
|
||||
"\n",
|
||||
"# Index\n",
|
||||
"retriever = vectorstore.as_retriever()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c365fb82-78a6-40b6-bd59-daaa1e79d6c8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, we build a `RAG chain` that returns an `answer` and the retrieved documents as `contexts`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "68e249d7-bc6c-4631-b099-6daaeeddf38a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### RAG \n",
|
||||
"\n",
|
||||
"import openai\n",
|
||||
"from langsmith import traceable\n",
|
||||
"from langsmith.wrappers import wrap_openai\n",
|
||||
"\n",
|
||||
"class RagBot:\n",
|
||||
" def __init__(self, retriever, model: str = \"gpt-4-turbo-preview\"):\n",
|
||||
" self._retriever = retriever\n",
|
||||
" # Wrapping the client instruments the LLM\n",
|
||||
" self._client = wrap_openai(openai.Client())\n",
|
||||
" self._model = model\n",
|
||||
"\n",
|
||||
" @traceable\n",
|
||||
" def get_answer(self, question: str):\n",
|
||||
" similar = self._retriever.invoke(question)\n",
|
||||
" response = self._client.chat.completions.create(\n",
|
||||
" model=self._model,\n",
|
||||
" messages=[\n",
|
||||
" {\n",
|
||||
" \"role\": \"system\",\n",
|
||||
" \"content\": \"You are a helpful AI assistant.\"\n",
|
||||
" \" Use the following docs to help answer the user's question.\\n\\n\"\n",
|
||||
" f\"## Docs\\n\\n{similar}\",\n",
|
||||
" },\n",
|
||||
" {\"role\": \"user\", \"content\": question},\n",
|
||||
" ],\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" # Evaluators will expect \"answer\" and \"contexts\"\n",
|
||||
" return {\n",
|
||||
" \"answer\": response.choices[0].message.content,\n",
|
||||
" \"contexts\": [str(doc) for doc in similar],\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
"rag_bot = RagBot(retriever)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "6101d155-a1ab-460c-8c3e-f1f44e09a8b7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'LangChain Expression Language (LCEL) is a declarative language that simplifies the composition of chains for working with language models and related '"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"response = rag_bot.get_answer(\"What is LCEL?\")\n",
|
||||
"response[\"answer\"][:150]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "432e8ec7-a085-4224-ad38-0087e1d553f1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### RAG Dataset \n",
|
||||
"\n",
|
||||
"Next, we build a dataset of QA pairs based upon the [documentation](https://python.langchain.com/docs/expression_language/) that we indexed."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "0f29304f-d79b-40e9-988a-343732102af9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# QA\n",
|
||||
"inputs = [\n",
|
||||
" \"How can I directly pass a string to a runnable and use it to construct the input needed for my prompt?\",\n",
|
||||
" \"How can I make the output of my LCEL chain a string?\",\n",
|
||||
" \"How can I apply a custom function to one of the inputs of an LCEL chain?\"\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"outputs = [\n",
|
||||
" \"Use RunnablePassthrough. from langchain_core.runnables import RunnableParallel, RunnablePassthrough; from langchain_core.prompts import ChatPromptTemplate; from langchain_openai import ChatOpenAI; prompt = ChatPromptTemplate.from_template('Tell a joke about: {input}'); model = ChatOpenAI(); runnable = ({'input' : RunnablePassthrough()} | prompt | model); runnable.invoke('flowers')\",\n",
|
||||
" \"Use StrOutputParser. from langchain_openai import ChatOpenAI; from langchain_core.prompts import ChatPromptTemplate; from langchain_core.output_parsers import StrOutputParser; prompt = ChatPromptTemplate.from_template('Tell me a short joke about {topic}'); model = ChatOpenAI(model='gpt-3.5-turbo') #gpt-4 or other LLMs can be used here; output_parser = StrOutputParser(); chain = prompt | model | output_parser\",\n",
|
||||
" \"Use RunnableLambda with itemgetter to extract the relevant key. from operator import itemgetter; from langchain_core.prompts import ChatPromptTemplate; from langchain_core.runnables import RunnableLambda; from langchain_openai import ChatOpenAI; def length_function(text): return len(text); chain = ({'prompt_input': itemgetter('foo') | RunnableLambda(length_function),} | prompt | model); chain.invoke({'foo':'hello world'})\"\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"qa_pairs = [{\"question\": q, \"answer\": a} for q, a in zip(inputs, outputs)]\n",
|
||||
"\n",
|
||||
"# Create dataset\n",
|
||||
"client = Client()\n",
|
||||
"dataset_name = \"RAG_test_LCEL\"\n",
|
||||
"dataset = client.create_dataset(\n",
|
||||
" dataset_name=dataset_name,\n",
|
||||
" description=\"QA pairs about LCEL.\",\n",
|
||||
")\n",
|
||||
"client.create_examples(\n",
|
||||
" inputs=[{\"question\": q} for q in inputs],\n",
|
||||
" outputs=[{\"answer\": a} for a in outputs],\n",
|
||||
" dataset_id=dataset.id,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "92cf3a0f-621f-468d-818d-a6f2d4b53823",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Eval flow\n",
|
||||
"\n",
|
||||
"There are [several different evaluators](https://docs.smith.langchain.com/evaluation/faq/evaluator-implementations) that can be used to compare our RAG chain answer to a reference answer.\n",
|
||||
"\n",
|
||||
"< `TODO:` Update table to link to the eval prompts. > \n",
|
||||
"\n",
|
||||
"Here, we will use `CoT_QA` as an LLM-as-judge evaluator.\n",
|
||||
"\n",
|
||||
"[Here](https://github.com/langchain-ai/langchain/blob/22da9f5f3f9fef24c5c75072b678b8a2f654b173/libs/langchain/langchain/evaluation/qa/eval_prompt.py#L43) is the prompt used by `CoT_QA`.\n",
|
||||
"\n",
|
||||
"Our evaluator will connect our dataset and RAG chain outputs to the evaluator prompt inputs:\n",
|
||||
"\n",
|
||||
"1. `question` from the dataset -> `question` in the prompt, the RAG chain input\n",
|
||||
"2. `answer` from the dataset -> `context` in the prompt, the ground truth answer\n",
|
||||
"3. `answer` from the LLM using `predict_rag_answer` function below -> `result` in the prompt, the RAG chain result\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "1cbe0b4a-2a30-4f40-b3aa-5cc67c6a7802",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# RAG chain\n",
|
||||
"def predict_rag_answer(example: dict):\n",
|
||||
" \"\"\"Use this for answer evaluation\"\"\"\n",
|
||||
" response = rag_bot.get_answer(example[\"question\"])\n",
|
||||
" return {\"answer\": response[\"answer\"]}\n",
|
||||
"\n",
|
||||
"def predict_rag_answer_with_context(example: dict):\n",
|
||||
" \"\"\"Use this for evaluation of retrieved documents and hallucinations\"\"\"\n",
|
||||
" response = rag_bot.get_answer(example[\"question\"])\n",
|
||||
" return {\"answer\": response[\"answer\"], \"contexts\": response[\"contexts\"]}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "a7a3827d-a92f-4a7a-a572-5123fbd9c334",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"View the evaluation results for experiment: 'rag-qa-oai-e8604ab3' at:\n",
|
||||
"https://smith.langchain.com/o/1fa8b1f4-fcb9-4072-9aa9-983e35ad61b8/datasets/368734fb-7c14-4e1f-b91a-50d52cb58a07/compare?selectedSessions=a176a91c-a5f0-42ab-b2f4-fedaa1cbf17d\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "e459fbab745f4ce4bb399609910a807f",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"0it [00:00, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langsmith.evaluation import LangChainStringEvaluator, evaluate\n",
|
||||
"\n",
|
||||
"# Evaluator \n",
|
||||
"qa_evalulator = [LangChainStringEvaluator(\"cot_qa\")]\n",
|
||||
"dataset_name = \"RAG_test_LCEL\"\n",
|
||||
"experiment_results = evaluate(\n",
|
||||
" predict_rag_answer,\n",
|
||||
" data=dataset_name,\n",
|
||||
" evaluators=qa_evalulator,\n",
|
||||
" experiment_prefix=\"rag-qa-oai\",\n",
|
||||
" metadata={\"variant\": \"LCEL context, gpt-3.5-turbo\"},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "60ba4123-c691-4aa0-ba76-e567e8aaf09f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Answer Hallucination\n",
|
||||
"\n",
|
||||
"Next, lets consider the case in which we want to compare our RAG chain answer to the retrieved documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7f0872a5-e989-415d-9fed-5846efaa9488",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"xxx"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "480a27cb-1a31-4194-b160-8cdcfbf24eea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Retrieval\n",
|
||||
"\n",
|
||||
"Finally, lets consider the case in which we want to compare our retrieved documents to the question."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "df247034-14ed-40b1-b313-b0fef7286546",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"xxx"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "cfe988dc-2aaa-42f4-93ff-c3c9fe6b3124",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c2f09b6e-667a-47fe-b3f9-8634783f7666",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "62b936c4-d24f-4596-a907-3dac7952c6e6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "528ba4b9-8026-4024-93bc-d8c413bb5f71",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
BIN
docs/static/img/langsmith_rag_eval.png
vendored
Normal file
BIN
docs/static/img/langsmith_rag_eval.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 183 KiB |
BIN
docs/static/img/langsmith_rag_flow.png
vendored
Normal file
BIN
docs/static/img/langsmith_rag_flow.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 148 KiB |
Reference in New Issue
Block a user