{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Use LangChain, GPT and Activeloop's Deep Lake to work with code base\n", "In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT to analyze the code base of the LangChain itself. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Design" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "1. Prepare data:\n", " 1. Upload all python project files using the `langchain_community.document_loaders.TextLoader`. We will call these files the **documents**.\n", " 2. Split all documents to chunks using the `langchain_text_splitters.CharacterTextSplitter`.\n", " 3. Embed chunks and upload them into the DeepLake using `langchain.embeddings.openai.OpenAIEmbeddings` and `langchain_community.vectorstores.DeepLake`\n", "2. Question-Answering:\n", " 1. Build a chain from `langchain.chat_models.ChatOpenAI` and `langchain.chains.ConversationalRetrievalChain`\n", " 2. Prepare questions.\n", " 3. Get answers running the chain.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Implementation" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Integration preparations" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We need to set up keys for external services and install necessary python libraries." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "#!python3 -m pip install --upgrade langchain langchain-deeplake openai" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Set up OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. \n", "\n", "For full documentation of Deep Lake please follow https://docs.activeloop.ai/ and API reference https://docs.deeplake.ai/en/latest/" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "import os\n", "from getpass import getpass\n", "\n", "if \"OPENAI_API_KEY\" not in os.environ:\n", " os.environ[\"OPENAI_API_KEY\"] = getpass()\n", "# Please manually enter OpenAI Key" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at [app.activeloop.ai](https://app.activeloop.ai)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [], "source": [ "activeloop_token = getpass(\"Activeloop Token:\")\n", "os.environ[\"ACTIVELOOP_TOKEN\"] = activeloop_token" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare data " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Load all repository files. Here we assume this notebook is downloaded as the part of the langchain fork and we work with the python files of the `langchain` repo.\n", "\n", "If you want to use files from different repo, change `root_dir` to the root dir of your repo." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CITATION.cff MIGRATE.md README.md libs\t poetry.toml\n", "LICENSE Makefile\t docs\t poetry.lock pyproject.toml\n" ] } ], "source": [ "!ls \"../../../../../../libs\"" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2554\n" ] } ], "source": [ "from langchain_community.document_loaders import TextLoader\n", "\n", "root_dir = \"../../../../../../libs\"\n", "\n", "docs = []\n", "for dirpath, dirnames, filenames in os.walk(root_dir):\n", " for file in filenames:\n", " if file.endswith(\".py\") and \"*venv/\" not in dirpath:\n", " try:\n", " loader = TextLoader(os.path.join(dirpath, file), encoding=\"utf-8\")\n", " docs.extend(loader.load_and_split())\n", " except Exception:\n", " pass\n", "print(f\"{len(docs)}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Then, chunk the files" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Created a chunk of size 1010, which is longer than the specified 1000\n", "Created a chunk of size 3466, which is longer than the specified 1000\n", "Created a chunk of size 1375, which is longer than the specified 1000\n", "Created a chunk of size 1928, which is longer than the specified 1000\n", "Created a chunk of size 1075, which is longer than the specified 1000\n", "Created a chunk of size 1063, which is longer than the specified 1000\n", "Created a chunk of size 1083, which is longer than the specified 1000\n", "Created a chunk of size 1074, which is longer than the specified 1000\n", "Created a chunk of size 1591, which is longer than the specified 1000\n", "Created a chunk of size 2300, which is longer than the specified 1000\n", "Created a chunk of size 1040, which is longer than the specified 1000\n", "Created a chunk of size 1018, which is longer than the specified 1000\n", "Created a chunk of size 2787, which is longer than the specified 1000\n", "Created a chunk of size 1018, which is longer than the specified 1000\n", "Created a chunk of size 2311, which is longer than the specified 1000\n", "Created a chunk of size 2811, which is longer than the specified 1000\n", "Created a chunk of size 1186, which is longer than the specified 1000\n", "Created a chunk of size 1497, which is longer than the specified 1000\n", "Created a chunk of size 1043, which is longer than the specified 1000\n", "Created a chunk of size 1020, which is longer than the specified 1000\n", "Created a chunk of size 1232, which is longer than the specified 1000\n", "Created a chunk of size 1334, which is longer than the specified 1000\n", "Created a chunk of size 1221, which is longer than the specified 1000\n", "Created a chunk of size 2229, which is longer than the specified 1000\n", "Created a chunk of size 1027, which is longer than the specified 1000\n", "Created a chunk of size 1361, which is longer than the specified 1000\n", "Created a chunk of size 1057, which is longer than the specified 1000\n", "Created a chunk of size 1204, which is longer than the specified 1000\n", "Created a chunk of size 1420, which is longer than the specified 1000\n", "Created a chunk of size 1298, which is longer than the specified 1000\n", "Created a chunk of size 1062, which is longer than the specified 1000\n", "Created a chunk of size 1008, which is longer than the specified 1000\n", "Created a chunk of size 1025, which is longer than the specified 1000\n", "Created a chunk of size 1206, which is longer than the specified 1000\n", "Created a chunk of size 1202, which is longer than the specified 1000\n", "Created a chunk of size 1206, which is longer than the specified 1000\n", "Created a chunk of size 1272, which is longer than the specified 1000\n", "Created a chunk of size 1092, which is longer than the specified 1000\n", "Created a chunk of size 1303, which is longer than the specified 1000\n", "Created a chunk of size 1029, which is longer than the specified 1000\n", "Created a chunk of size 1117, which is longer than the specified 1000\n", "Created a chunk of size 1438, which is longer than the specified 1000\n", "Created a chunk of size 3055, which is longer than the specified 1000\n", "Created a chunk of size 1628, which is longer than the specified 1000\n", "Created a chunk of size 1566, which is longer than the specified 1000\n", "Created a chunk of size 1179, which is longer than the specified 1000\n", "Created a chunk of size 1006, which is longer than the specified 1000\n", "Created a chunk of size 1213, which is longer than the specified 1000\n", "Created a chunk of size 2461, which is longer than the specified 1000\n", "Created a chunk of size 1849, which is longer than the specified 1000\n", "Created a chunk of size 1398, which is longer than the specified 1000\n", "Created a chunk of size 1469, which is longer than the specified 1000\n", "Created a chunk of size 1220, which is longer than the specified 1000\n", "Created a chunk of size 1048, which is longer than the specified 1000\n", "Created a chunk of size 1040, which is longer than the specified 1000\n", "Created a chunk of size 1052, which is longer than the specified 1000\n", "Created a chunk of size 1052, which is longer than the specified 1000\n", "Created a chunk of size 1304, which is longer than the specified 1000\n", "Created a chunk of size 1147, which is longer than the specified 1000\n", "Created a chunk of size 1236, which is longer than the specified 1000\n", "Created a chunk of size 1411, which is longer than the specified 1000\n", "Created a chunk of size 1181, which is longer than the specified 1000\n", "Created a chunk of size 1357, which is longer than the specified 1000\n", "Created a chunk of size 1706, which is longer than the specified 1000\n", "Created a chunk of size 1099, which is longer than the specified 1000\n", "Created a chunk of size 1221, which is longer than the specified 1000\n", "Created a chunk of size 1066, which is longer than the specified 1000\n", "Created a chunk of size 1223, which is longer than the specified 1000\n", "Created a chunk of size 1202, which is longer than the specified 1000\n", "Created a chunk of size 2806, which is longer than the specified 1000\n", "Created a chunk of size 1180, which is longer than the specified 1000\n", "Created a chunk of size 1338, which is longer than the specified 1000\n", "Created a chunk of size 1074, which is longer than the specified 1000\n", "Created a chunk of size 1025, which is longer than the specified 1000\n", "Created a chunk of size 1017, which is longer than the specified 1000\n", "Created a chunk of size 1497, which is longer than the specified 1000\n", "Created a chunk of size 1151, which is longer than the specified 1000\n", "Created a chunk of size 1287, which is longer than the specified 1000\n", "Created a chunk of size 1359, which is longer than the specified 1000\n", "Created a chunk of size 1075, which is longer than the specified 1000\n", "Created a chunk of size 1037, which is longer than the specified 1000\n", "Created a chunk of size 1080, which is longer than the specified 1000\n", "Created a chunk of size 1354, which is longer than the specified 1000\n", "Created a chunk of size 1033, which is longer than the specified 1000\n", "Created a chunk of size 1473, which is longer than the specified 1000\n", "Created a chunk of size 1074, which is longer than the specified 1000\n", "Created a chunk of size 2091, which is longer than the specified 1000\n", "Created a chunk of size 1388, which is longer than the specified 1000\n", "Created a chunk of size 1040, which is longer than the specified 1000\n", "Created a chunk of size 1040, which is longer than the specified 1000\n", "Created a chunk of size 1158, which is longer than the specified 1000\n", "Created a chunk of size 1683, which is longer than the specified 1000\n", "Created a chunk of size 2424, which is longer than the specified 1000\n", "Created a chunk of size 1877, which is longer than the specified 1000\n", "Created a chunk of size 1002, which is longer than the specified 1000\n", "Created a chunk of size 2175, which is longer than the specified 1000\n", "Created a chunk of size 1011, which is longer than the specified 1000\n", "Created a chunk of size 1915, which is longer than the specified 1000\n", "Created a chunk of size 1587, which is longer than the specified 1000\n", "Created a chunk of size 1969, which is longer than the specified 1000\n", "Created a chunk of size 1687, which is longer than the specified 1000\n", "Created a chunk of size 1732, which is longer than the specified 1000\n", "Created a chunk of size 1322, which is longer than the specified 1000\n", "Created a chunk of size 1339, which is longer than the specified 1000\n", "Created a chunk of size 3083, which is longer than the specified 1000\n", "Created a chunk of size 2148, which is longer than the specified 1000\n", "Created a chunk of size 1647, which is longer than the specified 1000\n", "Created a chunk of size 1698, which is longer than the specified 1000\n", "Created a chunk of size 1012, which is longer than the specified 1000\n", "Created a chunk of size 1919, which is longer than the specified 1000\n", "Created a chunk of size 1676, which is longer than the specified 1000\n", "Created a chunk of size 1581, which is longer than the specified 1000\n", "Created a chunk of size 2559, which is longer than the specified 1000\n", "Created a chunk of size 1247, which is longer than the specified 1000\n", "Created a chunk of size 1220, which is longer than the specified 1000\n", "Created a chunk of size 1768, which is longer than the specified 1000\n", "Created a chunk of size 1287, which is longer than the specified 1000\n", "Created a chunk of size 1300, which is longer than the specified 1000\n", "Created a chunk of size 1390, which is longer than the specified 1000\n", "Created a chunk of size 1423, which is longer than the specified 1000\n", "Created a chunk of size 1018, which is longer than the specified 1000\n", "Created a chunk of size 1185, which is longer than the specified 1000\n", "Created a chunk of size 2858, which is longer than the specified 1000\n", "Created a chunk of size 1149, which is longer than the specified 1000\n", "Created a chunk of size 1730, which is longer than the specified 1000\n", "Created a chunk of size 1026, which is longer than the specified 1000\n", "Created a chunk of size 1913, which is longer than the specified 1000\n", "Created a chunk of size 1362, which is longer than the specified 1000\n", "Created a chunk of size 1324, which is longer than the specified 1000\n", "Created a chunk of size 1073, which is longer than the specified 1000\n", "Created a chunk of size 1455, which is longer than the specified 1000\n", "Created a chunk of size 1621, which is longer than the specified 1000\n", "Created a chunk of size 1516, which is longer than the specified 1000\n", "Created a chunk of size 1633, which is longer than the specified 1000\n", "Created a chunk of size 1620, which is longer than the specified 1000\n", "Created a chunk of size 1856, which is longer than the specified 1000\n", "Created a chunk of size 1562, which is longer than the specified 1000\n", "Created a chunk of size 1729, which is longer than the specified 1000\n", "Created a chunk of size 1203, which is longer than the specified 1000\n", "Created a chunk of size 1307, which is longer than the specified 1000\n", "Created a chunk of size 1331, which is longer than the specified 1000\n", "Created a chunk of size 1295, which is longer than the specified 1000\n", "Created a chunk of size 1101, which is longer than the specified 1000\n", "Created a chunk of size 1090, which is longer than the specified 1000\n", "Created a chunk of size 1241, which is longer than the specified 1000\n", "Created a chunk of size 1138, which is longer than the specified 1000\n", "Created a chunk of size 1076, which is longer than the specified 1000\n", "Created a chunk of size 1210, which is longer than the specified 1000\n", "Created a chunk of size 1183, which is longer than the specified 1000\n", "Created a chunk of size 1353, which is longer than the specified 1000\n", "Created a chunk of size 1271, which is longer than the specified 1000\n", "Created a chunk of size 1778, which is longer than the specified 1000\n", "Created a chunk of size 1141, which is longer than the specified 1000\n", "Created a chunk of size 1099, which is longer than the specified 1000\n", "Created a chunk of size 2090, which is longer than the specified 1000\n", "Created a chunk of size 1056, which is longer than the specified 1000\n", "Created a chunk of size 1120, which is longer than the specified 1000\n", "Created a chunk of size 1048, which is longer than the specified 1000\n", "Created a chunk of size 1072, which is longer than the specified 1000\n", "Created a chunk of size 1367, which is longer than the specified 1000\n", "Created a chunk of size 1246, which is longer than the specified 1000\n", "Created a chunk of size 1766, which is longer than the specified 1000\n", "Created a chunk of size 1105, which is longer than the specified 1000\n", "Created a chunk of size 1400, which is longer than the specified 1000\n", "Created a chunk of size 1488, which is longer than the specified 1000\n", "Created a chunk of size 1672, which is longer than the specified 1000\n", "Created a chunk of size 1137, which is longer than the specified 1000\n", "Created a chunk of size 1500, which is longer than the specified 1000\n", "Created a chunk of size 1224, which is longer than the specified 1000\n", "Created a chunk of size 1414, which is longer than the specified 1000\n", "Created a chunk of size 1242, which is longer than the specified 1000\n", "Created a chunk of size 1551, which is longer than the specified 1000\n", "Created a chunk of size 1268, which is longer than the specified 1000\n", "Created a chunk of size 1130, which is longer than the specified 1000\n", "Created a chunk of size 2023, which is longer than the specified 1000\n", "Created a chunk of size 1878, which is longer than the specified 1000\n", "Created a chunk of size 1364, which is longer than the specified 1000\n", "Created a chunk of size 1212, which is longer than the specified 1000\n", "Created a chunk of size 1792, which is longer than the specified 1000\n", "Created a chunk of size 1055, which is longer than the specified 1000\n", "Created a chunk of size 1496, which is longer than the specified 1000\n", "Created a chunk of size 1045, which is longer than the specified 1000\n", "Created a chunk of size 1501, which is longer than the specified 1000\n", "Created a chunk of size 1208, which is longer than the specified 1000\n", "Created a chunk of size 1356, which is longer than the specified 1000\n", "Created a chunk of size 1351, which is longer than the specified 1000\n", "Created a chunk of size 1130, which is longer than the specified 1000\n", "Created a chunk of size 1133, which is longer than the specified 1000\n", "Created a chunk of size 1381, which is longer than the specified 1000\n", "Created a chunk of size 1120, which is longer than the specified 1000\n", "Created a chunk of size 1200, which is longer than the specified 1000\n", "Created a chunk of size 1202, which is longer than the specified 1000\n", "Created a chunk of size 1149, which is longer than the specified 1000\n", "Created a chunk of size 1196, which is longer than the specified 1000\n", "Created a chunk of size 3173, which is longer than the specified 1000\n", "Created a chunk of size 1106, which is longer than the specified 1000\n", "Created a chunk of size 1211, which is longer than the specified 1000\n", "Created a chunk of size 1530, which is longer than the specified 1000\n", "Created a chunk of size 1471, which is longer than the specified 1000\n", "Created a chunk of size 1353, which is longer than the specified 1000\n", "Created a chunk of size 1279, which is longer than the specified 1000\n", "Created a chunk of size 1101, which is longer than the specified 1000\n", "Created a chunk of size 1123, which is longer than the specified 1000\n", "Created a chunk of size 1848, which is longer than the specified 1000\n", "Created a chunk of size 1197, which is longer than the specified 1000\n", "Created a chunk of size 1235, which is longer than the specified 1000\n", "Created a chunk of size 1314, which is longer than the specified 1000\n", "Created a chunk of size 1043, which is longer than the specified 1000\n", "Created a chunk of size 1183, which is longer than the specified 1000\n", "Created a chunk of size 1182, which is longer than the specified 1000\n", "Created a chunk of size 1269, which is longer than the specified 1000\n", "Created a chunk of size 1416, which is longer than the specified 1000\n", "Created a chunk of size 1462, which is longer than the specified 1000\n", "Created a chunk of size 1120, which is longer than the specified 1000\n", "Created a chunk of size 1033, which is longer than the specified 1000\n", "Created a chunk of size 1143, which is longer than the specified 1000\n", "Created a chunk of size 1537, which is longer than the specified 1000\n", "Created a chunk of size 1381, which is longer than the specified 1000\n", "Created a chunk of size 2286, which is longer than the specified 1000\n", "Created a chunk of size 1175, which is longer than the specified 1000\n", "Created a chunk of size 1187, which is longer than the specified 1000\n", "Created a chunk of size 1494, which is longer than the specified 1000\n", "Created a chunk of size 1597, which is longer than the specified 1000\n", "Created a chunk of size 1203, which is longer than the specified 1000\n", "Created a chunk of size 1058, which is longer than the specified 1000\n", "Created a chunk of size 1261, which is longer than the specified 1000\n", "Created a chunk of size 1189, which is longer than the specified 1000\n", "Created a chunk of size 1388, which is longer than the specified 1000\n", "Created a chunk of size 1224, which is longer than the specified 1000\n", "Created a chunk of size 1226, which is longer than the specified 1000\n", "Created a chunk of size 1289, which is longer than the specified 1000\n", "Created a chunk of size 1157, which is longer than the specified 1000\n", "Created a chunk of size 1095, which is longer than the specified 1000\n", "Created a chunk of size 2196, which is longer than the specified 1000\n", "Created a chunk of size 1029, which is longer than the specified 1000\n", "Created a chunk of size 1077, which is longer than the specified 1000\n", "Created a chunk of size 1848, which is longer than the specified 1000\n", "Created a chunk of size 1095, which is longer than the specified 1000\n", "Created a chunk of size 1418, which is longer than the specified 1000\n", "Created a chunk of size 1069, which is longer than the specified 1000\n", "Created a chunk of size 2573, which is longer than the specified 1000\n", "Created a chunk of size 1512, which is longer than the specified 1000\n", "Created a chunk of size 1046, which is longer than the specified 1000\n", "Created a chunk of size 1792, which is longer than the specified 1000\n", "Created a chunk of size 1042, which is longer than the specified 1000\n", "Created a chunk of size 1125, which is longer than the specified 1000\n", "Created a chunk of size 1165, which is longer than the specified 1000\n", "Created a chunk of size 1030, which is longer than the specified 1000\n", "Created a chunk of size 1484, which is longer than the specified 1000\n", "Created a chunk of size 2796, which is longer than the specified 1000\n", "Created a chunk of size 1026, which is longer than the specified 1000\n", "Created a chunk of size 1726, which is longer than the specified 1000\n", "Created a chunk of size 1628, which is longer than the specified 1000\n", "Created a chunk of size 1881, which is longer than the specified 1000\n", "Created a chunk of size 1441, which is longer than the specified 1000\n", "Created a chunk of size 1175, which is longer than the specified 1000\n", "Created a chunk of size 1360, which is longer than the specified 1000\n", "Created a chunk of size 1210, which is longer than the specified 1000\n", "Created a chunk of size 1425, which is longer than the specified 1000\n", "Created a chunk of size 1560, which is longer than the specified 1000\n", "Created a chunk of size 1131, which is longer than the specified 1000\n", "Created a chunk of size 1276, which is longer than the specified 1000\n", "Created a chunk of size 1068, which is longer than the specified 1000\n", "Created a chunk of size 1494, which is longer than the specified 1000\n", "Created a chunk of size 1246, which is longer than the specified 1000\n", "Created a chunk of size 2621, which is longer than the specified 1000\n", "Created a chunk of size 1264, which is longer than the specified 1000\n", "Created a chunk of size 1166, which is longer than the specified 1000\n", "Created a chunk of size 1332, which is longer than the specified 1000\n", "Created a chunk of size 3499, which is longer than the specified 1000\n", "Created a chunk of size 1651, which is longer than the specified 1000\n", "Created a chunk of size 1794, which is longer than the specified 1000\n", "Created a chunk of size 2162, which is longer than the specified 1000\n", "Created a chunk of size 1061, which is longer than the specified 1000\n", "Created a chunk of size 1083, which is longer than the specified 1000\n", "Created a chunk of size 1018, which is longer than the specified 1000\n", "Created a chunk of size 1751, which is longer than the specified 1000\n", "Created a chunk of size 1301, which is longer than the specified 1000\n", "Created a chunk of size 1025, which is longer than the specified 1000\n", "Created a chunk of size 1489, which is longer than the specified 1000\n", "Created a chunk of size 1481, which is longer than the specified 1000\n", "Created a chunk of size 1505, which is longer than the specified 1000\n", "Created a chunk of size 1497, which is longer than the specified 1000\n", "Created a chunk of size 1505, which is longer than the specified 1000\n", "Created a chunk of size 1282, which is longer than the specified 1000\n", "Created a chunk of size 1224, which is longer than the specified 1000\n", "Created a chunk of size 1261, which is longer than the specified 1000\n", "Created a chunk of size 1123, which is longer than the specified 1000\n", "Created a chunk of size 1137, which is longer than the specified 1000\n", "Created a chunk of size 2183, which is longer than the specified 1000\n", "Created a chunk of size 1039, which is longer than the specified 1000\n", "Created a chunk of size 1135, which is longer than the specified 1000\n", "Created a chunk of size 1254, which is longer than the specified 1000\n", "Created a chunk of size 1234, which is longer than the specified 1000\n", "Created a chunk of size 1111, which is longer than the specified 1000\n", "Created a chunk of size 1135, which is longer than the specified 1000\n", "Created a chunk of size 2023, which is longer than the specified 1000\n", "Created a chunk of size 1216, which is longer than the specified 1000\n", "Created a chunk of size 1013, which is longer than the specified 1000\n", "Created a chunk of size 1152, which is longer than the specified 1000\n", "Created a chunk of size 1087, which is longer than the specified 1000\n", "Created a chunk of size 1040, which is longer than the specified 1000\n", "Created a chunk of size 1330, which is longer than the specified 1000\n", "Created a chunk of size 2342, which is longer than the specified 1000\n", "Created a chunk of size 1940, which is longer than the specified 1000\n", "Created a chunk of size 1621, which is longer than the specified 1000\n", "Created a chunk of size 2169, which is longer than the specified 1000\n", "Created a chunk of size 1824, which is longer than the specified 1000\n", "Created a chunk of size 1554, which is longer than the specified 1000\n", "Created a chunk of size 1457, which is longer than the specified 1000\n", "Created a chunk of size 1486, which is longer than the specified 1000\n", "Created a chunk of size 1556, which is longer than the specified 1000\n", "Created a chunk of size 1012, which is longer than the specified 1000\n", "Created a chunk of size 1484, which is longer than the specified 1000\n", "Created a chunk of size 1039, which is longer than the specified 1000\n", "Created a chunk of size 1335, which is longer than the specified 1000\n", "Created a chunk of size 1684, which is longer than the specified 1000\n", "Created a chunk of size 1537, which is longer than the specified 1000\n", "Created a chunk of size 1136, which is longer than the specified 1000\n", "Created a chunk of size 1219, which is longer than the specified 1000\n", "Created a chunk of size 1011, which is longer than the specified 1000\n", "Created a chunk of size 1055, which is longer than the specified 1000\n", "Created a chunk of size 1433, which is longer than the specified 1000\n", "Created a chunk of size 1263, which is longer than the specified 1000\n", "Created a chunk of size 1014, which is longer than the specified 1000\n", "Created a chunk of size 1107, which is longer than the specified 1000\n", "Created a chunk of size 2702, which is longer than the specified 1000\n", "Created a chunk of size 1237, which is longer than the specified 1000\n", "Created a chunk of size 1172, which is longer than the specified 1000\n", "Created a chunk of size 1517, which is longer than the specified 1000\n", "Created a chunk of size 1589, which is longer than the specified 1000\n", "Created a chunk of size 1681, which is longer than the specified 1000\n", "Created a chunk of size 2244, which is longer than the specified 1000\n", "Created a chunk of size 1505, which is longer than the specified 1000\n", "Created a chunk of size 1228, which is longer than the specified 1000\n", "Created a chunk of size 1801, which is longer than the specified 1000\n", "Created a chunk of size 1856, which is longer than the specified 1000\n", "Created a chunk of size 2171, which is longer than the specified 1000\n", "Created a chunk of size 2450, which is longer than the specified 1000\n", "Created a chunk of size 1110, which is longer than the specified 1000\n", "Created a chunk of size 1148, which is longer than the specified 1000\n", "Created a chunk of size 1050, which is longer than the specified 1000\n", "Created a chunk of size 1014, which is longer than the specified 1000\n", "Created a chunk of size 1458, which is longer than the specified 1000\n", "Created a chunk of size 1270, which is longer than the specified 1000\n", "Created a chunk of size 1287, which is longer than the specified 1000\n", "Created a chunk of size 1127, which is longer than the specified 1000\n", "Created a chunk of size 1576, which is longer than the specified 1000\n", "Created a chunk of size 1350, which is longer than the specified 1000\n", "Created a chunk of size 2283, which is longer than the specified 1000\n", "Created a chunk of size 2211, which is longer than the specified 1000\n", "Created a chunk of size 1167, which is longer than the specified 1000\n", "Created a chunk of size 1038, which is longer than the specified 1000\n", "Created a chunk of size 1117, which is longer than the specified 1000\n", "Created a chunk of size 1160, which is longer than the specified 1000\n", "Created a chunk of size 1163, which is longer than the specified 1000\n", "Created a chunk of size 1013, which is longer than the specified 1000\n", "Created a chunk of size 1226, which is longer than the specified 1000\n", "Created a chunk of size 1336, which is longer than the specified 1000\n", "Created a chunk of size 1012, which is longer than the specified 1000\n", "Created a chunk of size 2833, which is longer than the specified 1000\n", "Created a chunk of size 1201, which is longer than the specified 1000\n", "Created a chunk of size 1172, which is longer than the specified 1000\n", "Created a chunk of size 1438, which is longer than the specified 1000\n", "Created a chunk of size 1259, which is longer than the specified 1000\n", "Created a chunk of size 1452, which is longer than the specified 1000\n", "Created a chunk of size 1377, which is longer than the specified 1000\n", "Created a chunk of size 1001, which is longer than the specified 1000\n", "Created a chunk of size 1240, which is longer than the specified 1000\n", "Created a chunk of size 1142, which is longer than the specified 1000\n", "Created a chunk of size 1338, which is longer than the specified 1000\n", "Created a chunk of size 1057, which is longer than the specified 1000\n", "Created a chunk of size 1040, which is longer than the specified 1000\n", "Created a chunk of size 1579, which is longer than the specified 1000\n", "Created a chunk of size 1176, which is longer than the specified 1000\n", "Created a chunk of size 1081, which is longer than the specified 1000\n", "Created a chunk of size 1751, which is longer than the specified 1000\n", "Created a chunk of size 1064, which is longer than the specified 1000\n", "Created a chunk of size 1029, which is longer than the specified 1000\n", "Created a chunk of size 1937, which is longer than the specified 1000\n", "Created a chunk of size 1972, which is longer than the specified 1000\n", "Created a chunk of size 1417, which is longer than the specified 1000\n", "Created a chunk of size 1203, which is longer than the specified 1000\n", "Created a chunk of size 1314, which is longer than the specified 1000\n", "Created a chunk of size 1088, which is longer than the specified 1000\n", "Created a chunk of size 1455, which is longer than the specified 1000\n", "Created a chunk of size 1467, which is longer than the specified 1000\n", "Created a chunk of size 1476, which is longer than the specified 1000\n", "Created a chunk of size 1354, which is longer than the specified 1000\n", "Created a chunk of size 1403, which is longer than the specified 1000\n", "Created a chunk of size 1366, which is longer than the specified 1000\n", "Created a chunk of size 1112, which is longer than the specified 1000\n", "Created a chunk of size 1512, which is longer than the specified 1000\n", "Created a chunk of size 1262, which is longer than the specified 1000\n", "Created a chunk of size 1405, which is longer than the specified 1000\n", "Created a chunk of size 2221, which is longer than the specified 1000\n", "Created a chunk of size 1128, which is longer than the specified 1000\n", "Created a chunk of size 1021, which is longer than the specified 1000\n", "Created a chunk of size 1532, which is longer than the specified 1000\n", "Created a chunk of size 1535, which is longer than the specified 1000\n", "Created a chunk of size 1230, which is longer than the specified 1000\n", "Created a chunk of size 2456, which is longer than the specified 1000\n", "Created a chunk of size 1047, which is longer than the specified 1000\n", "Created a chunk of size 1320, which is longer than the specified 1000\n", "Created a chunk of size 1144, which is longer than the specified 1000\n", "Created a chunk of size 1509, which is longer than the specified 1000\n", "Created a chunk of size 1003, which is longer than the specified 1000\n", "Created a chunk of size 1025, which is longer than the specified 1000\n", "Created a chunk of size 1197, which is longer than the specified 1000\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "8244\n" ] } ], "source": [ "from langchain_text_splitters import CharacterTextSplitter\n", "\n", "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", "texts = text_splitter.split_documents(docs)\n", "print(f\"{len(texts)}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Then embed chunks and upload them to the DeepLake.\n", "\n", "This can take several minutes. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "OpenAIEmbeddings(client=, model='text-embedding-ada-002', deployment='text-embedding-ada-002', openai_api_version='', openai_api_base='', openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key='', openai_organization='', allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6, request_timeout=None, headers=None, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={})" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langchain_openai import OpenAIEmbeddings\n", "\n", "embeddings = OpenAIEmbeddings()\n", "embeddings" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain_deeplake.vectorstores import DeeplakeVectorStore\n", "\n", "username = \"\"\n", "\n", "\n", "db = DeeplakeVectorStore.from_documents(\n", " documents=texts,\n", " embedding=embeddings,\n", " dataset_path=f\"hub://{username}/langchain-code\",\n", " overwrite=True,\n", ")\n", "db" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Question Answering\n", "First load the dataset, construct the retriever, then construct the Conversational Chain" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "db = DeeplakeVectorStore(\n", " dataset_path=f\"hub://{username}/langchain-code\",\n", " read_only=True,\n", " embedding_function=embeddings,\n", ")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "tags": [] }, "outputs": [], "source": [ "retriever = db.as_retriever()\n", "retriever.search_kwargs[\"distance_metric\"] = \"cos\"\n", "retriever.search_kwargs[\"fetch_k\"] = 20\n", "retriever.search_kwargs[\"maximal_marginal_relevance\"] = True\n", "retriever.search_kwargs[\"k\"] = 20" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.chains import ConversationalRetrievalChain\n", "from langchain_openai import ChatOpenAI\n", "\n", "model = ChatOpenAI(model=\"gpt-3.5-turbo-0613\") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n", "qa = RetrievalQA.from_llm(model, retriever=retriever)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-> **Question**: What is the class hierarchy? \n", "\n", "**Answer**: The class hierarchy for Memory is as follows:\n", "\n", " BaseMemory --> BaseChatMemory --> Memory # Examples: ZepMemory, MotorheadMemory\n", "\n", "The class hierarchy for ChatMessageHistory is as follows:\n", "\n", " BaseChatMessageHistory --> ChatMessageHistory # Example: ZepChatMessageHistory\n", "\n", "The class hierarchy for Prompt is as follows:\n", "\n", " BasePromptTemplate --> PipelinePromptTemplate\n", " StringPromptTemplate --> PromptTemplate\n", " FewShotPromptTemplate\n", " FewShotPromptWithTemplates\n", " BaseChatPromptTemplate --> AutoGPTPrompt\n", " ChatPromptTemplate --> AgentScratchPadChatPromptTemplate\n", " \n", "\n", "-> **Question**: What classes are derived from the Chain class? \n", "\n", "**Answer**: The classes derived from the Chain class are:\n", "\n", "- APIChain\n", "- OpenAPIEndpointChain\n", "- AnalyzeDocumentChain\n", "- MapReduceDocumentsChain\n", "- MapRerankDocumentsChain\n", "- ReduceDocumentsChain\n", "- RefineDocumentsChain\n", "- StuffDocumentsChain\n", "- ConstitutionalChain\n", "- ConversationChain\n", "- ChatVectorDBChain\n", "- ConversationalRetrievalChain\n", "- FalkorDBQAChain\n", "- FlareChain\n", "- ArangoGraphQAChain\n", "- GraphQAChain\n", "- GraphCypherQAChain\n", "- HugeGraphQAChain\n", "- KuzuQAChain\n", "- NebulaGraphQAChain\n", "- NeptuneOpenCypherQAChain\n", "- GraphSparqlQAChain\n", "- HypotheticalDocumentEmbedder\n", "- LLMChain\n", "- LLMBashChain\n", "- LLMCheckerChain\n", "- LLMMathChain\n", "- LLMRequestsChain\n", "- LLMSummarizationCheckerChain\n", "- MapReduceChain\n", "- OpenAIModerationChain\n", "- NatBotChain\n", "- QAGenerationChain\n", "- QAWithSourcesChain\n", "- RetrievalQAWithSourcesChain\n", "- VectorDBQAWithSourcesChain\n", "- RetrievalQA\n", "- VectorDBQA\n", "- LLMRouterChain\n", "- MultiPromptChain\n", "- MultiRetrievalQAChain\n", "- MultiRouteChain\n", "- RouterChain\n", "- SequentialChain\n", "- SimpleSequentialChain\n", "- TransformChain\n", "- TaskPlaningChain\n", "- QueryChain\n", "- CPALChain\n", " \n", "\n", "-> **Question**: What kind of retrievers does LangChain have? \n", "\n", "**Answer**: The LangChain class includes various types of retrievers such as:\n", "\n", "- ArxivRetriever\n", "- AzureAISearchRetriever\n", "- BM25Retriever\n", "- ChaindeskRetriever\n", "- ChatGPTPluginRetriever\n", "- ContextualCompressionRetriever\n", "- DocArrayRetriever\n", "- ElasticSearchBM25Retriever\n", "- EnsembleRetriever\n", "- GoogleVertexAISearchRetriever\n", "- AmazonKendraRetriever\n", "- KNNRetriever\n", "- LlamaIndexGraphRetriever and LlamaIndexRetriever\n", "- MergerRetriever\n", "- MetalRetriever\n", "- MilvusRetriever\n", "- MultiQueryRetriever\n", "- ParentDocumentRetriever\n", "- PineconeHybridSearchRetriever\n", "- PubMedRetriever\n", "- RePhraseQueryRetriever\n", "- RemoteLangChainRetriever\n", "- SelfQueryRetriever\n", "- SVMRetriever\n", "- TFIDFRetriever\n", "- TimeWeightedVectorStoreRetriever\n", "- VespaRetriever\n", "- WeaviateHybridSearchRetriever\n", "- WebResearchRetriever\n", "- WikipediaRetriever\n", "- ZepRetriever\n", "- ZillizRetriever \n", "\n" ] } ], "source": [ "questions = [\n", " \"What is the class hierarchy?\",\n", " \"What classes are derived from the Chain class?\",\n", " \"What kind of retrievers does LangChain have?\",\n", "]\n", "chat_history = []\n", "qa_dict = {}\n", "\n", "for question in questions:\n", " result = qa({\"question\": question, \"chat_history\": chat_history})\n", " chat_history.append((question, result[\"answer\"]))\n", " qa_dict[question] = result[\"answer\"]\n", " print(f\"-> **Question**: {question} \\n\")\n", " print(f\"**Answer**: {result['answer']} \\n\")" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'question': 'LangChain possesses a variety of retrievers including:\\n\\n1. ArxivRetriever\\n2. AzureAISearchRetriever\\n3. BM25Retriever\\n4. ChaindeskRetriever\\n5. ChatGPTPluginRetriever\\n6. ContextualCompressionRetriever\\n7. DocArrayRetriever\\n8. ElasticSearchBM25Retriever\\n9. EnsembleRetriever\\n10. GoogleVertexAISearchRetriever\\n11. AmazonKendraRetriever\\n12. KNNRetriever\\n13. LlamaIndexGraphRetriever\\n14. LlamaIndexRetriever\\n15. MergerRetriever\\n16. MetalRetriever\\n17. MilvusRetriever\\n18. MultiQueryRetriever\\n19. ParentDocumentRetriever\\n20. PineconeHybridSearchRetriever\\n21. PubMedRetriever\\n22. RePhraseQueryRetriever\\n23. RemoteLangChainRetriever\\n24. SelfQueryRetriever\\n25. SVMRetriever\\n26. TFIDFRetriever\\n27. TimeWeightedVectorStoreRetriever\\n28. VespaRetriever\\n29. WeaviateHybridSearchRetriever\\n30. WebResearchRetriever\\n31. WikipediaRetriever\\n32. ZepRetriever\\n33. ZillizRetriever\\n\\nIt also includes self query translators like:\\n\\n1. ChromaTranslator\\n2. DeepLakeTranslator\\n3. MyScaleTranslator\\n4. PineconeTranslator\\n5. QdrantTranslator\\n6. WeaviateTranslator\\n\\nAnd remote retrievers like:\\n\\n1. RemoteLangChainRetriever'}" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qa_dict" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The class hierarchy for Memory is as follows:\n", "\n", " BaseMemory --> BaseChatMemory --> Memory # Examples: ZepMemory, MotorheadMemory\n", "\n", "The class hierarchy for ChatMessageHistory is as follows:\n", "\n", " BaseChatMessageHistory --> ChatMessageHistory # Example: ZepChatMessageHistory\n", "\n", "The class hierarchy for Prompt is as follows:\n", "\n", " BasePromptTemplate --> PipelinePromptTemplate\n", " StringPromptTemplate --> PromptTemplate\n", " FewShotPromptTemplate\n", " FewShotPromptWithTemplates\n", " BaseChatPromptTemplate --> AutoGPTPrompt\n", " ChatPromptTemplate --> AgentScratchPadChatPromptTemplate\n", "\n" ] } ], "source": [ "print(qa_dict[\"What is the class hierarchy?\"])" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The classes derived from the Chain class are:\n", "\n", "- APIChain\n", "- OpenAPIEndpointChain\n", "- AnalyzeDocumentChain\n", "- MapReduceDocumentsChain\n", "- MapRerankDocumentsChain\n", "- ReduceDocumentsChain\n", "- RefineDocumentsChain\n", "- StuffDocumentsChain\n", "- ConstitutionalChain\n", "- ConversationChain\n", "- ChatVectorDBChain\n", "- ConversationalRetrievalChain\n", "- FlareChain\n", "- ArangoGraphQAChain\n", "- GraphQAChain\n", "- GraphCypherQAChain\n", "- HugeGraphQAChain\n", "- KuzuQAChain\n", "- NebulaGraphQAChain\n", "- NeptuneOpenCypherQAChain\n", "- GraphSparqlQAChain\n", "- HypotheticalDocumentEmbedder\n", "- LLMChain\n", "- LLMBashChain\n", "- LLMCheckerChain\n", "- LLMMathChain\n", "- LLMRequestsChain\n", "- LLMSummarizationCheckerChain\n", "- MapReduceChain\n", "- OpenAIModerationChain\n", "- NatBotChain\n", "- QAGenerationChain\n", "- QAWithSourcesChain\n", "- RetrievalQAWithSourcesChain\n", "- VectorDBQAWithSourcesChain\n", "- RetrievalQA\n", "- VectorDBQA\n", "- LLMRouterChain\n", "- MultiPromptChain\n", "- MultiRetrievalQAChain\n", "- MultiRouteChain\n", "- RouterChain\n", "- SequentialChain\n", "- SimpleSequentialChain\n", "- TransformChain\n", "- TaskPlaningChain\n", "- QueryChain\n", "- CPALChain\n", "\n" ] } ], "source": [ "print(qa_dict[\"What classes are derived from the Chain class?\"])" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The LangChain class includes various types of retrievers such as:\n", "\n", "- ArxivRetriever\n", "- AzureAISearchRetriever\n", "- BM25Retriever\n", "- ChaindeskRetriever\n", "- ChatGPTPluginRetriever\n", "- ContextualCompressionRetriever\n", "- DocArrayRetriever\n", "- ElasticSearchBM25Retriever\n", "- EnsembleRetriever\n", "- GoogleVertexAISearchRetriever\n", "- AmazonKendraRetriever\n", "- KNNRetriever\n", "- LlamaIndexGraphRetriever and LlamaIndexRetriever\n", "- MergerRetriever\n", "- MetalRetriever\n", "- MilvusRetriever\n", "- MultiQueryRetriever\n", "- ParentDocumentRetriever\n", "- PineconeHybridSearchRetriever\n", "- PubMedRetriever\n", "- RePhraseQueryRetriever\n", "- RemoteLangChainRetriever\n", "- SelfQueryRetriever\n", "- SVMRetriever\n", "- TFIDFRetriever\n", "- TimeWeightedVectorStoreRetriever\n", "- VespaRetriever\n", "- WeaviateHybridSearchRetriever\n", "- WebResearchRetriever\n", "- WikipediaRetriever\n", "- ZepRetriever\n", "- ZillizRetriever\n" ] } ], "source": [ "print(qa_dict[\"What kind of retrievers does LangChain have?\"])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 4 }