mirror of
				https://github.com/hwchase17/langchain.git
				synced 2025-10-22 09:41:52 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			762 lines
		
	
	
		
			27 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			762 lines
		
	
	
		
			27 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| {
 | ||
|  "cells": [
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "10f50955-be55-422f-8c62-3a32f8cf02ed",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "# RAG application running locally on Intel Xeon CPU using langchain and open-source models"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "48113be6-44bb-4aac-aed3-76a1365b9561",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "Author - Pratool Bharti (pratool.bharti@intel.com)"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "8b10b54b-1572-4ea1-9c1e-1d29fcc3dcd9",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "In this cookbook, we use langchain tools and open source models to execute locally on CPU. This notebook has been validated to run on Intel Xeon 8480+ CPU. Here we implement a RAG pipeline for Llama2 model to answer questions about Intel Q1 2024 earnings release."
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "acadbcec-3468-4926-8ce5-03b678041c0a",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Create a conda or virtualenv environment with python >=3.10 and install following libraries**\n",
 | ||
|     "<br>\n",
 | ||
|     "\n",
 | ||
|     "`pip install --upgrade langchain langchain-community langchainhub langchain-chroma bs4 gpt4all pypdf pysqlite3-binary` <br>\n",
 | ||
|     "`pip install llama-cpp-python   --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu`"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "84c392c8-700a-42ec-8e94-806597f22e43",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Load pysqlite3 in sys modules since ChromaDB requires sqlite3.**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 1,
 | ||
|    "id": "145cd491-b388-4ea7-bdc8-2f4995cac6fd",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "__import__(\"pysqlite3\")\n",
 | ||
|     "import sys\n",
 | ||
|     "\n",
 | ||
|     "sys.modules[\"sqlite3\"] = sys.modules.pop(\"pysqlite3\")"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "14dde7e2-b236-49b9-b3a0-08c06410418c",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Import essential components from langchain to load and split data**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 3,
 | ||
|    "id": "887643ba-249e-48d6-9aa7-d25087e8dfbf",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
 | ||
|     "from langchain_community.document_loaders import PyPDFLoader"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "922c0eba-8736-4de5-bd2f-3d0f00b16e43",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Download Intel Q1 2024 earnings release**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 4,
 | ||
|    "id": "2d6a2419-5338-4188-8615-a40a65ff8019",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [
 | ||
|     {
 | ||
|      "name": "stdout",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       "--2024-07-15 15:04:43--  https://d1io3yog0oux5.cloudfront.net/_11d435a500963f99155ee058df09f574/intel/db/887/9014/earnings_release/Q1+24_EarningsRelease_FINAL.pdf\n",
 | ||
|       "Resolving proxy-dmz.intel.com (proxy-dmz.intel.com)... 10.7.211.16\n",
 | ||
|       "Connecting to proxy-dmz.intel.com (proxy-dmz.intel.com)|10.7.211.16|:912... connected.\n",
 | ||
|       "Proxy request sent, awaiting response... 200 OK\n",
 | ||
|       "Length: 133510 (130K) [application/pdf]\n",
 | ||
|       "Saving to: ‘intel_q1_2024_earnings.pdf’\n",
 | ||
|       "\n",
 | ||
|       "intel_q1_2024_earni 100%[===================>] 130.38K  --.-KB/s    in 0.005s  \n",
 | ||
|       "\n",
 | ||
|       "2024-07-15 15:04:44 (24.6 MB/s) - ‘intel_q1_2024_earnings.pdf’ saved [133510/133510]\n",
 | ||
|       "\n"
 | ||
|      ]
 | ||
|     }
 | ||
|    ],
 | ||
|    "source": [
 | ||
|     "!wget  'https://d1io3yog0oux5.cloudfront.net/_11d435a500963f99155ee058df09f574/intel/db/887/9014/earnings_release/Q1+24_EarningsRelease_FINAL.pdf' -O intel_q1_2024_earnings.pdf"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "e3612627-e105-453d-8a50-bbd6e39dedb5",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Loading earning release pdf document through PyPDFLoader**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 5,
 | ||
|    "id": "cac6278e-ebad-4224-a062-bf6daca24cb0",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "loader = PyPDFLoader(\"intel_q1_2024_earnings.pdf\")\n",
 | ||
|     "data = loader.load()"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "a7dca43b-1c62-41df-90c7-6ed2904f823d",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Splitting entire document in several chunks with each chunk size is 500 tokens**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 6,
 | ||
|    "id": "4486adbe-0d0e-4685-8c08-c1774ed6e993",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
 | ||
|     "all_splits = text_splitter.split_documents(data)"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "af142346-e793-4a52-9a56-63e3be416b3d",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Looking at the first split of the document**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 7,
 | ||
|    "id": "e4240fd1-898e-4bfc-a377-02c9bc25b56e",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [
 | ||
|     {
 | ||
|      "data": {
 | ||
|       "text/plain": [
 | ||
|        "Document(metadata={'source': 'intel_q1_2024_earnings.pdf', 'page': 0}, page_content='Intel Corporation\\n2200 Mission College Blvd.\\nSanta Clara, CA 95054-1549\\n                                                         \\nNews Release\\n Intel Reports First -Quarter 2024  Financial Results\\nNEWS SUMMARY\\n▪First-quarter revenue of $12.7 billion , up 9%  year over year (YoY).\\n▪First-quarter GAAP earnings (loss) per share (EPS) attributable to Intel was $(0.09) ; non-GAAP EPS \\nattributable to Intel was $0.18 .')"
 | ||
|       ]
 | ||
|      },
 | ||
|      "execution_count": 7,
 | ||
|      "metadata": {},
 | ||
|      "output_type": "execute_result"
 | ||
|     }
 | ||
|    ],
 | ||
|    "source": [
 | ||
|     "all_splits[0]"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "b88d2632-7c1b-49ef-a691-c0eb67d23e6a",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**One of the major step in RAG is to convert each split of document into embeddings and store in a vector database such that searching relevant documents are efficient.** <br>\n",
 | ||
|     "**For that, importing Chroma vector database from langchain. Also, importing open source GPT4All for embedding models**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 8,
 | ||
|    "id": "9ff99dd7-9d47-4239-ba0a-d775792334ba",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "from langchain_chroma import Chroma\n",
 | ||
|     "from langchain_community.embeddings import GPT4AllEmbeddings"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "b5d1f4dd-dd8d-4a20-95d1-2dbdd204375a",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**In next step, we will download one of the most popular embedding model \"all-MiniLM-L6-v2\". Find more details of the model at this link https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 10,
 | ||
|    "id": "05db3494-5d8e-4a13-9941-26330a86f5e5",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "model_name = \"all-MiniLM-L6-v2.gguf2.f16.gguf\"\n",
 | ||
|     "gpt4all_kwargs = {\"allow_download\": \"True\"}\n",
 | ||
|     "embeddings = GPT4AllEmbeddings(model_name=model_name, gpt4all_kwargs=gpt4all_kwargs)"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "4e53999e-1983-46ac-8039-2783e194c3ae",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Store all the embeddings in the Chroma database**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 11,
 | ||
|    "id": "0922951a-9ddf-4761-973d-8e9a86f61284",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings)"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "29f94fa0-6c75-4a65-a1a3-debc75422479",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Now, let's find relevant splits from the documents related to the question**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 12,
 | ||
|    "id": "88c8152d-ec7a-4f0b-9d86-877789407537",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [
 | ||
|     {
 | ||
|      "name": "stdout",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       "4\n"
 | ||
|      ]
 | ||
|     }
 | ||
|    ],
 | ||
|    "source": [
 | ||
|     "question = \"What is Intel CCG revenue in Q1 2024\"\n",
 | ||
|     "docs = vectorstore.similarity_search(question)\n",
 | ||
|     "print(len(docs))"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "53330c6b-cb0f-43f9-b379-2e57ac1e5335",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Look at the first retrieved document from the vector database**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 13,
 | ||
|    "id": "43a6d94f-b5c4-47b0-a353-2db4c3d24d9c",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [
 | ||
|     {
 | ||
|      "data": {
 | ||
|       "text/plain": [
 | ||
|        "Document(metadata={'page': 1, 'source': 'intel_q1_2024_earnings.pdf'}, page_content='Client Computing Group (CCG) $7.5 billion up31%\\nData Center and AI (DCAI) $3.0 billion up5%\\nNetwork and Edge (NEX) $1.4 billion down 8%\\nTotal Intel Products revenue $11.9 billion up17%\\nIntel Foundry $4.4 billion down 10%\\nAll other:\\nAltera $342 million down 58%\\nMobileye $239 million down 48%\\nOther $194 million up17%\\nTotal all other revenue $775 million down 46%\\nIntersegment eliminations $(4.4) billion\\nTotal net revenue $12.7 billion up9%\\nIntel Products Highlights')"
 | ||
|       ]
 | ||
|      },
 | ||
|      "execution_count": 13,
 | ||
|      "metadata": {},
 | ||
|      "output_type": "execute_result"
 | ||
|     }
 | ||
|    ],
 | ||
|    "source": [
 | ||
|     "docs[0]"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "64ba074f-4b36-442e-b7e2-b26d6e2815c3",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Download Lllama-2 model from Huggingface and store locally** <br>\n",
 | ||
|     "**You can download different quantization variant of Lllama-2 model from the link below. We are using Q8 version here (7.16GB).** <br>\n",
 | ||
|     "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": null,
 | ||
|    "id": "c8dd0811-6f43-4bc6-b854-2ab377639c9a",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "!huggingface-cli download TheBloke/Llama-2-7b-Chat-GGUF llama-2-7b-chat.Q8_0.gguf --local-dir . --local-dir-use-symlinks False"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "3895b1f5-f51d-4539-abf0-af33d7ca48ea",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Import langchain components required to load downloaded LLMs model**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 14,
 | ||
|    "id": "fb087088-aa62-44c0-8356-061e9b9f1186",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "from langchain.callbacks.manager import CallbackManager\n",
 | ||
|     "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
 | ||
|     "from langchain_community.llms import LlamaCpp"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "5a8a111e-2614-4b70-b034-85cd3e7304cb",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Loading the local Lllama-2 model using Llama-cpp library**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 16,
 | ||
|    "id": "fb917da2-c0d7-4995-b56d-26254276e0da",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [
 | ||
|     {
 | ||
|      "name": "stderr",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       "llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q8_0.gguf (version GGUF V2)\n",
 | ||
|       "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n",
 | ||
|       "llama_model_loader: - kv   0:                       general.architecture str              = llama\n",
 | ||
|       "llama_model_loader: - kv   1:                               general.name str              = LLaMA v2\n",
 | ||
|       "llama_model_loader: - kv   2:                       llama.context_length u32              = 4096\n",
 | ||
|       "llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096\n",
 | ||
|       "llama_model_loader: - kv   4:                          llama.block_count u32              = 32\n",
 | ||
|       "llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008\n",
 | ||
|       "llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128\n",
 | ||
|       "llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32\n",
 | ||
|       "llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32\n",
 | ||
|       "llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001\n",
 | ||
|       "llama_model_loader: - kv  10:                          general.file_type u32              = 7\n",
 | ||
|       "llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama\n",
 | ||
|       "llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = [\"<unk>\", \"<s>\", \"</s>\", \"<0x00>\", \"<...\n",
 | ||
|       "llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...\n",
 | ||
|       "llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...\n",
 | ||
|       "llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1\n",
 | ||
|       "llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2\n",
 | ||
|       "llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0\n",
 | ||
|       "llama_model_loader: - kv  18:               general.quantization_version u32              = 2\n",
 | ||
|       "llama_model_loader: - type  f32:   65 tensors\n",
 | ||
|       "llama_model_loader: - type q8_0:  226 tensors\n",
 | ||
|       "llm_load_vocab: special tokens cache size = 259\n",
 | ||
|       "llm_load_vocab: token to piece cache size = 0.1684 MB\n",
 | ||
|       "llm_load_print_meta: format           = GGUF V2\n",
 | ||
|       "llm_load_print_meta: arch             = llama\n",
 | ||
|       "llm_load_print_meta: vocab type       = SPM\n",
 | ||
|       "llm_load_print_meta: n_vocab          = 32000\n",
 | ||
|       "llm_load_print_meta: n_merges         = 0\n",
 | ||
|       "llm_load_print_meta: vocab_only       = 0\n",
 | ||
|       "llm_load_print_meta: n_ctx_train      = 4096\n",
 | ||
|       "llm_load_print_meta: n_embd           = 4096\n",
 | ||
|       "llm_load_print_meta: n_layer          = 32\n",
 | ||
|       "llm_load_print_meta: n_head           = 32\n",
 | ||
|       "llm_load_print_meta: n_head_kv        = 32\n",
 | ||
|       "llm_load_print_meta: n_rot            = 128\n",
 | ||
|       "llm_load_print_meta: n_swa            = 0\n",
 | ||
|       "llm_load_print_meta: n_embd_head_k    = 128\n",
 | ||
|       "llm_load_print_meta: n_embd_head_v    = 128\n",
 | ||
|       "llm_load_print_meta: n_gqa            = 1\n",
 | ||
|       "llm_load_print_meta: n_embd_k_gqa     = 4096\n",
 | ||
|       "llm_load_print_meta: n_embd_v_gqa     = 4096\n",
 | ||
|       "llm_load_print_meta: f_norm_eps       = 0.0e+00\n",
 | ||
|       "llm_load_print_meta: f_norm_rms_eps   = 1.0e-06\n",
 | ||
|       "llm_load_print_meta: f_clamp_kqv      = 0.0e+00\n",
 | ||
|       "llm_load_print_meta: f_max_alibi_bias = 0.0e+00\n",
 | ||
|       "llm_load_print_meta: f_logit_scale    = 0.0e+00\n",
 | ||
|       "llm_load_print_meta: n_ff             = 11008\n",
 | ||
|       "llm_load_print_meta: n_expert         = 0\n",
 | ||
|       "llm_load_print_meta: n_expert_used    = 0\n",
 | ||
|       "llm_load_print_meta: causal attn      = 1\n",
 | ||
|       "llm_load_print_meta: pooling type     = 0\n",
 | ||
|       "llm_load_print_meta: rope type        = 0\n",
 | ||
|       "llm_load_print_meta: rope scaling     = linear\n",
 | ||
|       "llm_load_print_meta: freq_base_train  = 10000.0\n",
 | ||
|       "llm_load_print_meta: freq_scale_train = 1\n",
 | ||
|       "llm_load_print_meta: n_ctx_orig_yarn  = 4096\n",
 | ||
|       "llm_load_print_meta: rope_finetuned   = unknown\n",
 | ||
|       "llm_load_print_meta: ssm_d_conv       = 0\n",
 | ||
|       "llm_load_print_meta: ssm_d_inner      = 0\n",
 | ||
|       "llm_load_print_meta: ssm_d_state      = 0\n",
 | ||
|       "llm_load_print_meta: ssm_dt_rank      = 0\n",
 | ||
|       "llm_load_print_meta: model type       = 7B\n",
 | ||
|       "llm_load_print_meta: model ftype      = Q8_0\n",
 | ||
|       "llm_load_print_meta: model params     = 6.74 B\n",
 | ||
|       "llm_load_print_meta: model size       = 6.67 GiB (8.50 BPW) \n",
 | ||
|       "llm_load_print_meta: general.name     = LLaMA v2\n",
 | ||
|       "llm_load_print_meta: BOS token        = 1 '<s>'\n",
 | ||
|       "llm_load_print_meta: EOS token        = 2 '</s>'\n",
 | ||
|       "llm_load_print_meta: UNK token        = 0 '<unk>'\n",
 | ||
|       "llm_load_print_meta: LF token         = 13 '<0x0A>'\n",
 | ||
|       "llm_load_print_meta: max token length = 48\n",
 | ||
|       "llm_load_tensors: ggml ctx size =    0.14 MiB\n",
 | ||
|       "llm_load_tensors:        CPU buffer size =  6828.64 MiB\n",
 | ||
|       "...................................................................................................\n",
 | ||
|       "llama_new_context_with_model: n_ctx      = 2048\n",
 | ||
|       "llama_new_context_with_model: n_batch    = 512\n",
 | ||
|       "llama_new_context_with_model: n_ubatch   = 512\n",
 | ||
|       "llama_new_context_with_model: flash_attn = 0\n",
 | ||
|       "llama_new_context_with_model: freq_base  = 10000.0\n",
 | ||
|       "llama_new_context_with_model: freq_scale = 1\n",
 | ||
|       "llama_kv_cache_init:        CPU KV buffer size =  1024.00 MiB\n",
 | ||
|       "llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB\n",
 | ||
|       "llama_new_context_with_model:        CPU  output buffer size =     0.12 MiB\n",
 | ||
|       "llama_new_context_with_model:        CPU compute buffer size =   164.01 MiB\n",
 | ||
|       "llama_new_context_with_model: graph nodes  = 1030\n",
 | ||
|       "llama_new_context_with_model: graph splits = 1\n",
 | ||
|       "AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 | \n",
 | ||
|       "Model metadata: {'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.context_length': '4096', 'general.name': 'LLaMA v2', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '11008', 'llama.attention.layer_norm_rms_epsilon': '0.000001', 'llama.rope.dimension_count': '128', 'llama.attention.head_count': '32', 'tokenizer.ggml.bos_token_id': '1', 'llama.block_count': '32', 'llama.attention.head_count_kv': '32', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '7'}\n",
 | ||
|       "Using fallback chat format: llama-2\n"
 | ||
|      ]
 | ||
|     }
 | ||
|    ],
 | ||
|    "source": [
 | ||
|     "llm = LlamaCpp(\n",
 | ||
|     "    model_path=\"llama-2-7b-chat.Q8_0.gguf\",\n",
 | ||
|     "    n_gpu_layers=-1,\n",
 | ||
|     "    n_batch=512,\n",
 | ||
|     "    n_ctx=2048,\n",
 | ||
|     "    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls\n",
 | ||
|     "    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),\n",
 | ||
|     "    verbose=True,\n",
 | ||
|     ")"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "43e06f56-ef97-451b-87d9-8465ea442aed",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Now let's ask the same question to Llama model without showing them the earnings release.**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 17,
 | ||
|    "id": "1033dd82-5532-437d-a548-27695e109589",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [
 | ||
|     {
 | ||
|      "name": "stdout",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       "?\n",
 | ||
|       "(NASDAQ:INTC)\n",
 | ||
|       "Intel's CCG (Client Computing Group) revenue for Q1 2024 was $9.6 billion, a decrease of 35% from the previous quarter and a decrease of 42% from the same period last year."
 | ||
|      ]
 | ||
|     },
 | ||
|     {
 | ||
|      "name": "stderr",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       "\n",
 | ||
|       "llama_print_timings:        load time =     131.20 ms\n",
 | ||
|       "llama_print_timings:      sample time =      16.05 ms /    68 runs   (    0.24 ms per token,  4236.76 tokens per second)\n",
 | ||
|       "llama_print_timings: prompt eval time =     131.14 ms /    16 tokens (    8.20 ms per token,   122.01 tokens per second)\n",
 | ||
|       "llama_print_timings:        eval time =    3225.00 ms /    67 runs   (   48.13 ms per token,    20.78 tokens per second)\n",
 | ||
|       "llama_print_timings:       total time =    3466.40 ms /    83 tokens\n"
 | ||
|      ]
 | ||
|     },
 | ||
|     {
 | ||
|      "data": {
 | ||
|       "text/plain": [
 | ||
|        "\"?\\n(NASDAQ:INTC)\\nIntel's CCG (Client Computing Group) revenue for Q1 2024 was $9.6 billion, a decrease of 35% from the previous quarter and a decrease of 42% from the same period last year.\""
 | ||
|       ]
 | ||
|      },
 | ||
|      "execution_count": 17,
 | ||
|      "metadata": {},
 | ||
|      "output_type": "execute_result"
 | ||
|     }
 | ||
|    ],
 | ||
|    "source": [
 | ||
|     "llm.invoke(question)"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "75f5cb10-746f-4e37-9386-b85a4d2b84ef",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**As you can see, model is giving wrong information. Correct asnwer is CCG revenue in Q1 2024 is $7.5B. Now let's apply RAG using the earning release document**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "0f4150ec-5692-4756-b11a-22feb7ab88ff",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**in RAG, we modify the input prompt by adding relevent documents with the question. Here, we use one of the popular RAG prompt**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 18,
 | ||
|    "id": "226c14b0-f43e-4a1f-a1e4-04731d467ec4",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [
 | ||
|     {
 | ||
|      "data": {
 | ||
|       "text/plain": [
 | ||
|        "[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template=\"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\\nQuestion: {question} \\nContext: {context} \\nAnswer:\"))]"
 | ||
|       ]
 | ||
|      },
 | ||
|      "execution_count": 18,
 | ||
|      "metadata": {},
 | ||
|      "output_type": "execute_result"
 | ||
|     }
 | ||
|    ],
 | ||
|    "source": [
 | ||
|     "from langchain import hub\n",
 | ||
|     "\n",
 | ||
|     "rag_prompt = hub.pull(\"rlm/rag-prompt\")\n",
 | ||
|     "rag_prompt.messages"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "77deb6a0-0950-450a-916a-f2a029676c20",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Appending all retreived documents in a single document**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 19,
 | ||
|    "id": "2dbc3327-6ef3-4c1f-8797-0c71964b0921",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "def format_docs(docs):\n",
 | ||
|     "    return \"\\n\\n\".join(doc.page_content for doc in docs)"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "2e2d9f18-49d0-43a3-bea8-78746ffa86b7",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**The last step is to create a chain using langchain tool that will create an e2e pipeline. It will take question and context as an input.**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 20,
 | ||
|    "id": "427379c2-51ff-4e0f-8278-a45221363299",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "from langchain_core.output_parsers import StrOutputParser\n",
 | ||
|     "from langchain_core.runnables import RunnablePassthrough, RunnablePick\n",
 | ||
|     "\n",
 | ||
|     "# Chain\n",
 | ||
|     "chain = (\n",
 | ||
|     "    RunnablePassthrough.assign(context=RunnablePick(\"context\") | format_docs)\n",
 | ||
|     "    | rag_prompt\n",
 | ||
|     "    | llm\n",
 | ||
|     "    | StrOutputParser()\n",
 | ||
|     ")"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 21,
 | ||
|    "id": "095d6280-c949-4d00-8e32-8895a82d245f",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [
 | ||
|     {
 | ||
|      "name": "stderr",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       "Llama.generate: prefix-match hit\n"
 | ||
|      ]
 | ||
|     },
 | ||
|     {
 | ||
|      "name": "stdout",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       " Based on the provided context, Intel CCG revenue in Q1 2024 was $7.5 billion up 31%."
 | ||
|      ]
 | ||
|     },
 | ||
|     {
 | ||
|      "name": "stderr",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       "\n",
 | ||
|       "llama_print_timings:        load time =     131.20 ms\n",
 | ||
|       "llama_print_timings:      sample time =       7.74 ms /    31 runs   (    0.25 ms per token,  4004.13 tokens per second)\n",
 | ||
|       "llama_print_timings: prompt eval time =    2529.41 ms /   674 tokens (    3.75 ms per token,   266.46 tokens per second)\n",
 | ||
|       "llama_print_timings:        eval time =    1542.94 ms /    30 runs   (   51.43 ms per token,    19.44 tokens per second)\n",
 | ||
|       "llama_print_timings:       total time =    4123.68 ms /   704 tokens\n"
 | ||
|      ]
 | ||
|     },
 | ||
|     {
 | ||
|      "data": {
 | ||
|       "text/plain": [
 | ||
|        "' Based on the provided context, Intel CCG revenue in Q1 2024 was $7.5 billion up 31%.'"
 | ||
|       ]
 | ||
|      },
 | ||
|      "execution_count": 21,
 | ||
|      "metadata": {},
 | ||
|      "output_type": "execute_result"
 | ||
|     }
 | ||
|    ],
 | ||
|    "source": [
 | ||
|     "chain.invoke({\"context\": docs, \"question\": question})"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "638364b2-6bd2-4471-9961-d3a1d1b9d4ee",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Now we see the results are correct as it is mentioned in earnings release.** <br>\n",
 | ||
|     "**To further automate, we will create a chain that will take input as question and retriever so that we don't need to retrieve documents separately**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 22,
 | ||
|    "id": "4654e5b7-635f-4767-8b31-4c430164cdd5",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "retriever = vectorstore.as_retriever()\n",
 | ||
|     "qa_chain = (\n",
 | ||
|     "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
 | ||
|     "    | rag_prompt\n",
 | ||
|     "    | llm\n",
 | ||
|     "    | StrOutputParser()\n",
 | ||
|     ")"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "0979f393-fd0a-4e82-b844-68371c6ad68f",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "**Now we only need to pass the question to the chain and it will fetch the contexts directly from the vector database to generate the answer**\n",
 | ||
|     "<br>\n",
 | ||
|     "**Let's try with another question**"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 26,
 | ||
|    "id": "3ea07b82-e6ec-4084-85f4-191373530172",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [
 | ||
|     {
 | ||
|      "name": "stderr",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       "Llama.generate: prefix-match hit\n"
 | ||
|      ]
 | ||
|     },
 | ||
|     {
 | ||
|      "name": "stdout",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       " According to the provided context, Intel DCAI revenue in Q1 2024 was $3.0 billion up 5%."
 | ||
|      ]
 | ||
|     },
 | ||
|     {
 | ||
|      "name": "stderr",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       "\n",
 | ||
|       "llama_print_timings:        load time =     131.20 ms\n",
 | ||
|       "llama_print_timings:      sample time =       6.28 ms /    31 runs   (    0.20 ms per token,  4937.88 tokens per second)\n",
 | ||
|       "llama_print_timings: prompt eval time =    2681.93 ms /   730 tokens (    3.67 ms per token,   272.19 tokens per second)\n",
 | ||
|       "llama_print_timings:        eval time =    1471.07 ms /    30 runs   (   49.04 ms per token,    20.39 tokens per second)\n",
 | ||
|       "llama_print_timings:       total time =    4206.77 ms /   760 tokens\n"
 | ||
|      ]
 | ||
|     },
 | ||
|     {
 | ||
|      "data": {
 | ||
|       "text/plain": [
 | ||
|        "' According to the provided context, Intel DCAI revenue in Q1 2024 was $3.0 billion up 5%.'"
 | ||
|       ]
 | ||
|      },
 | ||
|      "execution_count": 26,
 | ||
|      "metadata": {},
 | ||
|      "output_type": "execute_result"
 | ||
|     }
 | ||
|    ],
 | ||
|    "source": [
 | ||
|     "qa_chain.invoke(\"what is Intel DCAI revenue in Q1 2024?\")"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": null,
 | ||
|    "id": "9407f2a0-4a35-4315-8e96-02fcb80f210c",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": []
 | ||
|   }
 | ||
|  ],
 | ||
|  "metadata": {
 | ||
|   "kernelspec": {
 | ||
|    "display_name": "Python 3.11.1 64-bit",
 | ||
|    "language": "python",
 | ||
|    "name": "python3"
 | ||
|   },
 | ||
|   "language_info": {
 | ||
|    "codemirror_mode": {
 | ||
|     "name": "ipython",
 | ||
|     "version": 3
 | ||
|    },
 | ||
|    "file_extension": ".py",
 | ||
|    "mimetype": "text/x-python",
 | ||
|    "name": "python",
 | ||
|    "nbconvert_exporter": "python",
 | ||
|    "pygments_lexer": "ipython3",
 | ||
|    "version": "3.11.1"
 | ||
|   },
 | ||
|   "vscode": {
 | ||
|    "interpreter": {
 | ||
|     "hash": "1a1af0ee75eeea9e2e1ee996c87e7a2b11a0bebd85af04bb136d915cefc0abce"
 | ||
|    }
 | ||
|   }
 | ||
|  },
 | ||
|  "nbformat": 4,
 | ||
|  "nbformat_minor": 5
 | ||
| }
 |