mirror of
				https://github.com/hwchase17/langchain.git
				synced 2025-10-26 05:10:22 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			674 lines
		
	
	
		
			27 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			674 lines
		
	
	
		
			27 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| {
 | |
|  "cells": [
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "# Use LangChain, GPT and Deep Lake to work with code base\n",
 | |
|     "In this tutorial, we are going to use Langchain + Deep Lake with GPT to analyze the code base of the LangChain itself. "
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Design"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "1. Prepare data:\n",
 | |
|     "   1. Upload all python project files using the `langchain.document_loaders.TextLoader`. We will call these files the **documents**.\n",
 | |
|     "   2. Split all documents to chunks using the `langchain.text_splitter.CharacterTextSplitter`.\n",
 | |
|     "   3. Embed chunks and upload them into the DeepLake using `langchain.embeddings.openai.OpenAIEmbeddings` and `langchain.vectorstores.DeepLake`\n",
 | |
|     "2. Question-Answering:\n",
 | |
|     "   1. Build a chain from `langchain.chat_models.ChatOpenAI` and `langchain.chains.ConversationalRetrievalChain`\n",
 | |
|     "   2. Prepare questions.\n",
 | |
|     "   3. Get answers running the chain.\n"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Implementation"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "source": [
 | |
|     "### Integration preparations"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "We need to set up keys for external services and install necessary python libraries."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 3,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "#!python3 -m pip install --upgrade langchain deeplake openai"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Set up OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. \n",
 | |
|     "\n",
 | |
|     "For full documentation of Deep Lake please follow https://docs.activeloop.ai/ and API reference https://docs.deeplake.ai/en/latest/"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 5,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "name": "stdin",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       " ········\n"
 | |
|      ]
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "import os\n",
 | |
|     "from getpass import getpass\n",
 | |
|     "\n",
 | |
|     "os.environ['OPENAI_API_KEY'] = getpass()\n",
 | |
|     "# Please manually enter OpenAI Key"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at https://app.activeloop.ai"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 6,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "name": "stdin",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       " ········\n"
 | |
|      ]
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "DEEPLAKE_ACCOUNT_NAME = getpass()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 7,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "name": "stdin",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       " ········\n"
 | |
|      ]
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "os.environ['DEEPLAKE_KEY'] = getpass()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 22,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "!activeloop login -t $DEEPLAKE_KEY"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "### Prepare data "
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Load all repository files. Here we assume this notebook is downloaded as the part of the langchain fork and we work with the python files of the `langchain` repo.\n",
 | |
|     "\n",
 | |
|     "If you want to use files from different repo, change `root_dir` to the root dir of your repo."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 8,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "name": "stdout",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "1147\n"
 | |
|      ]
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "from langchain.document_loaders import TextLoader\n",
 | |
|     "\n",
 | |
|     "root_dir = '../../../..'\n",
 | |
|     "\n",
 | |
|     "docs = []\n",
 | |
|     "for dirpath, dirnames, filenames in os.walk(root_dir):\n",
 | |
|     "    for file in filenames:\n",
 | |
|     "        if file.endswith('.py') and '/.venv/' not in dirpath:\n",
 | |
|     "            try: \n",
 | |
|     "                loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')\n",
 | |
|     "                docs.extend(loader.load_and_split())\n",
 | |
|     "            except Exception as e: \n",
 | |
|     "                pass\n",
 | |
|     "print(f'{len(docs)}')"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Then, chunk the files"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 13,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "name": "stderr",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "Created a chunk of size 1620, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1213, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1263, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1448, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1120, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1148, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1826, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1260, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1195, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2147, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1410, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1269, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1030, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1046, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1024, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1026, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1285, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1370, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1031, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1999, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1029, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1120, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1033, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1143, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1416, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2482, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1890, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1418, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1848, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1069, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2369, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1045, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1501, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1208, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1950, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1283, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1414, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1304, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1224, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1060, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2461, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1099, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1178, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1449, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1345, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 3359, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2248, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1589, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2104, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1505, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1387, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1215, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1240, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1635, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1075, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2180, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1791, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1555, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1082, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1225, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1287, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1085, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1117, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1966, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1150, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1285, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1150, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1585, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1208, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1267, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1542, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1183, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2424, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1017, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1304, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1379, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1324, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1205, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1056, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1195, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 3608, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1058, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1075, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1217, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1109, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1440, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1046, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1220, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1403, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1241, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1427, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1049, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1580, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1565, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1131, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1425, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1054, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1027, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2559, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1028, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1382, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1888, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1475, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1652, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1891, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1899, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1021, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1085, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1854, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1672, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2537, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1251, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1734, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1642, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1376, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1253, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1642, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1419, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1438, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1427, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1684, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1760, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1157, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2504, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1082, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2268, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1784, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1311, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2972, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1144, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1825, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1508, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2901, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1715, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1062, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1206, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1102, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1184, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1002, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1065, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1871, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1754, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2413, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1771, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2054, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2000, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2061, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1066, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1419, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1368, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1008, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1227, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1745, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 2296, which is longer than the specified 1000\n",
 | |
|       "Created a chunk of size 1083, which is longer than the specified 1000\n"
 | |
|      ]
 | |
|     },
 | |
|     {
 | |
|      "name": "stdout",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "3477\n"
 | |
|      ]
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "from langchain.text_splitter import CharacterTextSplitter\n",
 | |
|     "\n",
 | |
|     "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
 | |
|     "texts = text_splitter.split_documents(docs)\n",
 | |
|     "print(f\"{len(texts)}\")"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Then embed chunks and upload them to the DeepLake.\n",
 | |
|     "\n",
 | |
|     "This can take several minutes. "
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 14,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', document_model_name='text-embedding-ada-002', query_model_name='text-embedding-ada-002', embedding_ctx_length=8191, openai_api_key=None, openai_organization=None, allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6)"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 14,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "from langchain.embeddings.openai import OpenAIEmbeddings\n",
 | |
|     "\n",
 | |
|     "embeddings = OpenAIEmbeddings()\n",
 | |
|     "embeddings"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "from langchain.vectorstores import DeepLake\n",
 | |
|     "\n",
 | |
|     "db = DeepLake.from_documents(texts, embeddings, dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\")\n",
 | |
|     "db"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "### Question Answering\n",
 | |
|     "First load the dataset, construct the retriever, then construct the Conversational Chain"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 16,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "name": "stderr",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "-"
 | |
|      ]
 | |
|     },
 | |
|     {
 | |
|      "name": "stdout",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/user_name/langchain-code\n",
 | |
|       "\n"
 | |
|      ]
 | |
|     },
 | |
|     {
 | |
|      "name": "stderr",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "/"
 | |
|      ]
 | |
|     },
 | |
|     {
 | |
|      "name": "stdout",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "hub://user_name/langchain-code loaded successfully.\n",
 | |
|       "\n"
 | |
|      ]
 | |
|     },
 | |
|     {
 | |
|      "name": "stderr",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "Deep Lake Dataset in hub://user_name/langchain-code already exists, loading from the storage\n"
 | |
|      ]
 | |
|     },
 | |
|     {
 | |
|      "name": "stdout",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "Dataset(path='hub://user_name/langchain-code', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])\n",
 | |
|       "\n",
 | |
|       "  tensor     htype      shape       dtype  compression\n",
 | |
|       "  -------   -------    -------     -------  ------- \n",
 | |
|       " embedding  generic  (3477, 1536)  float32   None   \n",
 | |
|       "    ids      text     (3477, 1)      str     None   \n",
 | |
|       " metadata    json     (3477, 1)      str     None   \n",
 | |
|       "   text      text     (3477, 1)      str     None   \n"
 | |
|      ]
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "db = DeepLake(dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\", read_only=True, embedding_function=embeddings)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 17,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "retriever = db.as_retriever()\n",
 | |
|     "retriever.search_kwargs['distance_metric'] = 'cos'\n",
 | |
|     "retriever.search_kwargs['fetch_k'] = 20\n",
 | |
|     "retriever.search_kwargs['maximal_marginal_relevance'] = True\n",
 | |
|     "retriever.search_kwargs['k'] = 20"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 18,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "def filter(x):\n",
 | |
|     "    # filter based on source code\n",
 | |
|     "    if 'something' in x['text'].data()['value']:\n",
 | |
|     "        return False\n",
 | |
|     "    \n",
 | |
|     "    # filter based on path e.g. extension\n",
 | |
|     "    metadata =  x['metadata'].data()['value']\n",
 | |
|     "    return 'only_this' in metadata['source'] or 'also_that' in metadata['source']\n",
 | |
|     "\n",
 | |
|     "### turn on below for custom filtering\n",
 | |
|     "# retriever.search_kwargs['filter'] = filter"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 19,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "from langchain.chat_models import ChatOpenAI\n",
 | |
|     "from langchain.chains import ConversationalRetrievalChain\n",
 | |
|     "\n",
 | |
|     "model = ChatOpenAI(model='gpt-3.5-turbo') # 'ada' 'gpt-3.5-turbo' 'gpt-4',\n",
 | |
|     "qa = ConversationalRetrievalChain.from_llm(model,retriever=retriever)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "questions = [\n",
 | |
|     "    \"What is the class hierarchy?\",\n",
 | |
|     "    # \"What classes are derived from the Chain class?\",\n",
 | |
|     "    # \"What classes and functions in the ./langchain/utilities/ forlder are not covered by unit tests?\",\n",
 | |
|     "    # \"What one improvement do you propose in code in relation to the class herarchy for the Chain class?\",\n",
 | |
|     "] \n",
 | |
|     "chat_history = []\n",
 | |
|     "\n",
 | |
|     "for question in questions:  \n",
 | |
|     "    result = qa({\"question\": question, \"chat_history\": chat_history})\n",
 | |
|     "    chat_history.append((question, result['answer']))\n",
 | |
|     "    print(f\"-> **Question**: {question} \\n\")\n",
 | |
|     "    print(f\"**Answer**: {result['answer']} \\n\")\n"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "source": [
 | |
|     "-> **Question**: What is the class hierarchy? \n",
 | |
|     "\n",
 | |
|     "**Answer**: There are several class hierarchies in the provided code, so I'll list a few:\n",
 | |
|     "\n",
 | |
|     "1. `BaseModel` -> `ConstitutionalPrinciple`: `ConstitutionalPrinciple` is a subclass of `BaseModel`.\n",
 | |
|     "2. `BasePromptTemplate` -> `StringPromptTemplate`, `AIMessagePromptTemplate`, `BaseChatPromptTemplate`, `ChatMessagePromptTemplate`, `ChatPromptTemplate`, `HumanMessagePromptTemplate`, `MessagesPlaceholder`, `SystemMessagePromptTemplate`, `FewShotPromptTemplate`, `FewShotPromptWithTemplates`, `Prompt`, `PromptTemplate`: All of these classes are subclasses of `BasePromptTemplate`.\n",
 | |
|     "3. `APIChain`, `Chain`, `MapReduceDocumentsChain`, `MapRerankDocumentsChain`, `RefineDocumentsChain`, `StuffDocumentsChain`, `HypotheticalDocumentEmbedder`, `LLMChain`, `LLMBashChain`, `LLMCheckerChain`, `LLMMathChain`, `LLMRequestsChain`, `PALChain`, `QAWithSourcesChain`, `VectorDBQAWithSourcesChain`, `VectorDBQA`, `SQLDatabaseChain`: All of these classes are subclasses of `Chain`.\n",
 | |
|     "4. `BaseLoader`: `BaseLoader` is a subclass of `ABC`.\n",
 | |
|     "5. `BaseTracer` -> `ChainRun`, `LLMRun`, `SharedTracer`, `ToolRun`, `Tracer`, `TracerException`, `TracerSession`: All of these classes are subclasses of `BaseTracer`.\n",
 | |
|     "6. `OpenAIEmbeddings`, `HuggingFaceEmbeddings`, `CohereEmbeddings`, `JinaEmbeddings`, `LlamaCppEmbeddings`, `HuggingFaceHubEmbeddings`, `TensorflowHubEmbeddings`, `SagemakerEndpointEmbeddings`, `HuggingFaceInstructEmbeddings`, `SelfHostedEmbeddings`, `SelfHostedHuggingFaceEmbeddings`, `SelfHostedHuggingFaceInstructEmbeddings`, `FakeEmbeddings`, `AlephAlphaAsymmetricSemanticEmbedding`, `AlephAlphaSymmetricSemanticEmbedding`: All of these classes are subclasses of `BaseLLM`. \n",
 | |
|     "\n",
 | |
|     "\n",
 | |
|     "-> **Question**: What classes are derived from the Chain class? \n",
 | |
|     "\n",
 | |
|     "**Answer**: There are multiple classes that are derived from the Chain class. Some of them are:\n",
 | |
|     "- APIChain\n",
 | |
|     "- AnalyzeDocumentChain\n",
 | |
|     "- ChatVectorDBChain\n",
 | |
|     "- CombineDocumentsChain\n",
 | |
|     "- ConstitutionalChain\n",
 | |
|     "- ConversationChain\n",
 | |
|     "- GraphQAChain\n",
 | |
|     "- HypotheticalDocumentEmbedder\n",
 | |
|     "- LLMChain\n",
 | |
|     "- LLMCheckerChain\n",
 | |
|     "- LLMRequestsChain\n",
 | |
|     "- LLMSummarizationCheckerChain\n",
 | |
|     "- MapReduceChain\n",
 | |
|     "- OpenAPIEndpointChain\n",
 | |
|     "- PALChain\n",
 | |
|     "- QAWithSourcesChain\n",
 | |
|     "- RetrievalQA\n",
 | |
|     "- RetrievalQAWithSourcesChain\n",
 | |
|     "- SequentialChain\n",
 | |
|     "- SQLDatabaseChain\n",
 | |
|     "- TransformChain\n",
 | |
|     "- VectorDBQA\n",
 | |
|     "- VectorDBQAWithSourcesChain\n",
 | |
|     "\n",
 | |
|     "There might be more classes that are derived from the Chain class as it is possible to create custom classes that extend the Chain class.\n",
 | |
|     "\n",
 | |
|     "\n",
 | |
|     "-> **Question**: What classes and functions in the ./langchain/utilities/ forlder are not covered by unit tests? \n",
 | |
|     "\n",
 | |
|     "**Answer**: All classes and functions in the `./langchain/utilities/` folder seem to have unit tests written for them. \n"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": []
 | |
|   }
 | |
|  ],
 | |
|  "metadata": {
 | |
|   "kernelspec": {
 | |
|    "display_name": "Python 3 (ipykernel)",
 | |
|    "language": "python",
 | |
|    "name": "python3"
 | |
|   },
 | |
|   "language_info": {
 | |
|    "codemirror_mode": {
 | |
|     "name": "ipython",
 | |
|     "version": 3
 | |
|    },
 | |
|    "file_extension": ".py",
 | |
|    "mimetype": "text/x-python",
 | |
|    "name": "python",
 | |
|    "nbconvert_exporter": "python",
 | |
|    "pygments_lexer": "ipython3",
 | |
|    "version": "3.10.6"
 | |
|   }
 | |
|  },
 | |
|  "nbformat": 4,
 | |
|  "nbformat_minor": 4
 | |
| }
 |