Building RAG agents locally using open source LLMs on Intel CPU (#28302)

**Description:** Added a cookbook that showcase how to build a RAG agent pipeline locally using open-source LLM and embedding models on Intel Xeon CPU. It uses Llama 3.1:8B model from Ollama for LLM and nomic-embed-text-v1.5 from NomicEmbeddings for embeddings. The whole experiment is developed and tested on Intel 4th Gen Xeon Scalable CPU. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-08-25 04:23:05 +00:00 · 2024-11-27 07:40:09 -08:00 · 2024-11-27 07:40:09 -08:00 · c09000f20e
commit c09000f20e
parent 607c60a594
2 changed files with 657 additions and 1 deletions
--- a/cookbook/README.md
+++ b/cookbook/README.md
@ -63,4 +63,5 @@ Notebook | Description
 [oracleai_demo.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) | This guide outlines how to utilize Oracle AI Vector Search alongside Langchain for an end-to-end RAG pipeline, providing step-by-step examples. The process includes loading documents from various sources using OracleDocLoader, summarizing them either within or outside the database with OracleSummary, and generating embeddings similarly through OracleEmbeddings. It also covers chunking documents according to specific requirements using Advanced Oracle Capabilities from OracleTextSplitter, and finally, storing and indexing these documents in a Vector Store for querying with OracleVS.
 [rag-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag-locally-on-intel-cpu.ipynb) | Perform Retrieval-Augmented-Generation (RAG) on locally downloaded open-source models using langchain and open source tools and execute it on Intel Xeon CPU. We showed an example of how to apply RAG on Llama 2 model and enable it to answer the queries related to Intel Q1 2024 earnings release.
 [visual_RAG_vdms.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/visual_RAG_vdms.ipynb) | Performs Visual Retrieval-Augmented-Generation (RAG) using videos and scene descriptions generated by open source models.
-[contextual_rag.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/contextual_rag.ipynb) | Performs contextual retrieval-augmented generation (RAG) prepending chunk-specific explanatory context to each chunk before embedding.
+[contextual_rag.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/contextual_rag.ipynb) | Performs contextual retrieval-augmented generation (RAG) prepending chunk-specific explanatory context to each chunk before embedding.
+[rag-agents-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/local_rag_agents_intel_cpu.ipynb) | Build a RAG agent locally with open source models that routes questions through one of two paths to find answers. The agent generates answers based on documents retrieved from either the vector database or retrieved from web search. If the vector database lacks relevant information, the agent opts for web search. Open-source models for LLM and embeddings are used locally on an Intel Xeon CPU to execute this pipeline.
--- a/cookbook/local_rag_agents_intel_cpu.ipynb
+++ b/cookbook/local_rag_agents_intel_cpu.ipynb