This commit is contained in:
Harrison Chase
2023-04-15 21:06:39 -07:00
86 changed files with 5145 additions and 767 deletions

View File

@@ -11,3 +11,7 @@ pre {
max-width: 2560px !important;
}
}
#my-component-root *, #headlessui-portal-root * {
z-index: 1000000000000;
}

58
docs/_static/js/mendablesearch.js vendored Normal file
View File

@@ -0,0 +1,58 @@
document.addEventListener('DOMContentLoaded', () => {
// Load the external dependencies
function loadScript(src, onLoadCallback) {
const script = document.createElement('script');
script.src = src;
script.onload = onLoadCallback;
document.head.appendChild(script);
}
function createRootElement() {
const rootElement = document.createElement('div');
rootElement.id = 'my-component-root';
document.body.appendChild(rootElement);
return rootElement;
}
function initializeMendable() {
const rootElement = createRootElement();
const { MendableFloatingButton } = Mendable;
const iconSpan1 = React.createElement('span', {
}, '🦜');
const iconSpan2 = React.createElement('span', {
}, '🔗');
const icon = React.createElement('p', {
style: { color: '#ffffff', fontSize: '22px',width: '48px', height: '48px', margin: '0px', padding: '0px', display: 'flex', alignItems: 'center', justifyContent: 'center', textAlign: 'center' },
}, [iconSpan1, iconSpan2]);
const mendableFloatingButton = React.createElement(
MendableFloatingButton,
{
style: { darkMode: false, accentColor: '#010810' },
floatingButtonStyle: { color: '#ffffff', backgroundColor: '#010810' },
anon_key: '82842b36-3ea6-49b2-9fb8-52cfc4bde6bf', // Mendable Search Public ANON key, ok to be public
messageSettings: {
openSourcesInNewTab: false,
},
icon: icon,
}
);
ReactDOM.render(mendableFloatingButton, rootElement);
}
loadScript('https://unpkg.com/react@17/umd/react.production.min.js', () => {
loadScript('https://unpkg.com/react-dom@17/umd/react-dom.production.min.js', () => {
loadScript('https://unpkg.com/@mendable/search@0.0.83/dist/umd/mendable.min.js', initializeMendable);
});
});
});

View File

@@ -103,5 +103,10 @@ html_static_path = ["_static"]
html_css_files = [
"css/custom.css",
]
html_js_files = [
"js/mendablesearch.js",
]
nb_execution_mode = "off"
myst_enable_extensions = ["colon_fence"]

View File

@@ -33,12 +33,17 @@ It implements a Question Answering app and contains instructions for deploying t
A minimal example on how to run LangChain on Vercel using Flask.
## [Digitalocean App Platform](https://github.com/homanp/digitalocean-langchain)
A minimal example on how to deploy LangChain to DigitalOcean App Platform.
## [SteamShip](https://github.com/steamship-core/steamship-langchain/)
This repository contains LangChain adapters for Steamship, enabling LangChain developers to rapidly deploy their apps on Steamship.
This includes: production ready endpoints, horizontal scaling across dependencies, persistant storage of app state, multi-tenancy support, etc.
## [Langchain-serve](https://github.com/jina-ai/langchain-serve)
This repository allows users to serve local chains and agents as RESTful, gRPC, or Websocket APIs thanks to [Jina](https://docs.jina.ai/). Deploy your chains & agents with ease and enjoy independent scaling, serverless and autoscaling APIs, as well as a Streamlit playground on Jina AI Cloud.
## [BentoML](https://github.com/ssheng/BentoChain)

View File

@@ -1,7 +1,6 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -9,7 +8,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -17,7 +15,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -31,7 +28,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -39,7 +35,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -52,14 +47,13 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install comet_ml\n",
"!pip install langchain\n",
"!pip install openai\n",
"!pip install google-search-results"
"%pip install comet_ml langchain openai google-search-results spacy textstat pandas\n",
"\n",
"import sys\n",
"!{sys.executable} -m spacy download en_core_web_sm"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -67,7 +61,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -86,7 +79,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -94,7 +86,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -109,12 +100,12 @@
"source": [
"import os\n",
"\n",
"%env OPENAI_API_KEY=\"...\"\n",
"%env SERPAPI_API_KEY=\"...\""
"os.environ[\"OPENAI_API_KEY\"] = \"...\"\n",
"#os.environ[\"OPENAI_ORGANIZATION\"] = \"...\"\n",
"os.environ[\"SERPAPI_API_KEY\"] = \"...\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -149,7 +140,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -185,12 +175,11 @@
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callback_manager=manager)\n",
"\n",
"test_prompts = [{\"title\": \"Documentary about Bigfoot in Paris\"}]\n",
"synopsis_chain.apply(test_prompts)\n",
"print(synopsis_chain.apply(test_prompts))\n",
"comet_callback.flush_tracker(synopsis_chain, finish=True)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -232,7 +221,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -240,7 +228,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -256,7 +243,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install rouge-score"
"%pip install rouge-score"
]
},
{
@@ -336,16 +323,29 @@
" \"\"\"\n",
" }\n",
"]\n",
"synopsis_chain.apply(test_prompts)\n",
"print(synopsis_chain.apply(test_prompts))\n",
"comet_callback.flush_tracker(synopsis_chain, finish=True)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"orig_nbformat": 4
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.15"
}
},
"nbformat": 4,
"nbformat_minor": 2

View File

@@ -36,7 +36,7 @@ from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
model = GPT4All(model="./models/gpt4all-model.bin", n_ctx=512, n_threads=8, callback_handler=callback_handler, verbose=True)
# Generate text. Tokens are streamed throught the callback manager.
# Generate text. Tokens are streamed through the callback manager.
model("Once upon a time, ")
```
@@ -44,4 +44,4 @@ model("Once upon a time, ")
You can find links to model file downloads in the [pyllamacpp](https://github.com/nomic-ai/pyllamacpp) repository.
For a more detailed walkthrough of this, see [this notebook](../modules/models/llms/integrations/gpt4all.ipynb)
For a more detailed walkthrough of this, see [this notebook](../modules/models/llms/integrations/gpt4all.ipynb)

View File

@@ -1,5 +1,5 @@
LangChain Gallery
=============
=================
Lots of people have built some pretty awesome stuff with LangChain.
This is a collection of our favorites.
@@ -223,7 +223,7 @@ Open Source
Answer questions about the documentation of any project
Misc. Colab Notebooks
~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. panels::
:body: text-center

View File

@@ -9,9 +9,9 @@
}
},
"source": [
"# SQLite example\n",
"# SQL Chain example\n",
"\n",
"This example showcases hooking up an LLM to answer questions over a database."
"This example demonstrates the use of the `SQLDatabaseChain` for answering questions over a database."
]
},
{
@@ -23,8 +23,10 @@
}
},
"source": [
"This uses the example Chinook database.\n",
"To set it up follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the `.db` file in a notebooks folder at the root of this repository."
"Under the hood, LangChain uses SQLAlchemy to connect to SQL databases. The `SQLDatabaseChain` can therefore be used with any SQL dialect supported by SQLAlchemy, such as MS SQL, MySQL, MariaDB, PostgreSQL, Oracle SQL, and SQLite. Please refer to the SQLAlchemy documentation for more information about requirements for connecting to your database. For example, a connection to MySQL requires an appropriate connector such as PyMySQL. A URI for a MySQL connection might look like: `mysql+pymysql://user:pass@some_mysql_db_address/db_name`\n",
"\n",
"This demonstration uses SQLite and the example Chinook database.\n",
"To set it up, follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the `.db` file in a notebooks folder at the root of this repository."
]
},
{
@@ -679,7 +681,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.10"
}
},
"nbformat": 4,

View File

@@ -106,7 +106,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Specify a column to be used identify the document source\n",
"## Specify a column to be used identify the document source\n",
"\n",
"Use the `source_column` argument to specify a column to be set as the source for the document created from each row. Otherwise `file_path` will be used as the source for all documents created from the csv file.\n",
"\n",

Submodule docs/modules/indexes/document_loaders/examples/example_data/test_repo1 added at 7e525a3b91

View File

@@ -0,0 +1,192 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Git\n",
"\n",
"This notebook shows how to load text files from Git repository."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load existing repository from disk"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from git import Repo\n",
"\n",
"repo = Repo.clone_from(\n",
" \"https://github.com/hwchase17/langchain\", to_path=\"./example_data/test_repo1\"\n",
")\n",
"branch = repo.head.reference"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GitLoader"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"loader = GitLoader(repo_path=\"./example_data/test_repo1/\", branch=branch)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"len(data)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='.venv\\n.github\\n.git\\n.mypy_cache\\n.pytest_cache\\nDockerfile' metadata={'file_path': '.dockerignore', 'file_name': '.dockerignore', 'file_type': ''}\n"
]
}
],
"source": [
"print(data[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clone repository from url"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GitLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"loader = GitLoader(\n",
" clone_url=\"https://github.com/hwchase17/langchain\",\n",
" repo_path=\"./example_data/test_repo2/\",\n",
" branch=\"master\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1074"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Filtering files to load"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GitLoader\n",
"\n",
"# eg. loading only python files\n",
"loader = GitLoader(repo_path=\"./example_data/test_repo1/\", file_filter=lambda file_path: file_path.endswith(\".py\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,81 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "1dc7df1d",
"metadata": {},
"source": [
"# Slack (Local Exported Zipfile)\n",
"\n",
"This notebook covers how to load documents from a Zipfile generated from a Slack export.\n",
"\n",
"In order to get this Slack export, follow these instructions:\n",
"\n",
"## 🧑 Instructions for ingesting your own dataset\n",
"\n",
"Export your Slack data. You can do this by going to your Workspace Management page and clicking the Import/Export option ({your_slack_domain}.slack.com/services/export). Then, choose the right date range and click `Start export`. Slack will send you an email and a DM when the export is ready.\n",
"\n",
"The download will produce a `.zip` file in your Downloads folder (or wherever your downloads can be found, depending on your OS configuration).\n",
"\n",
"Copy the path to the `.zip` file, and assign it as `LOCAL_ZIPFILE` below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "007c5cbf",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import SlackDirectoryLoader "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1caec59",
"metadata": {},
"outputs": [],
"source": [
"# Optionally set your Slack URL. This will give you proper URLs in the docs sources.\n",
"SLACK_WORKSPACE_URL = \"https://xxx.slack.com\"\n",
"LOCAL_ZIPFILE = \"\" # Paste the local paty to your Slack zip file here.\n",
"\n",
"loader = SlackDirectoryLoader(LOCAL_ZIPFILE, SLACK_WORKSPACE_URL)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1c30ff7",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()\n",
"docs"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -112,6 +112,79 @@
"source": [
"data = loader.load()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a2c1c79f",
"metadata": {},
"source": [
"# Playwright URL Loader\n",
"\n",
"This covers how to load HTML documents from a list of URLs using the `PlaywrightURLLoader`.\n",
"\n",
"As in the Selenium case, Playwright allows us to load pages that need JavaScript to render.\n",
"\n",
"## Setup\n",
"\n",
"To use the `PlaywrightURLLoader`, you will need to install `playwright` and `unstructured`. Additionally, you will need to install the Playwright Chromium browser:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "53158417",
"metadata": {},
"outputs": [],
"source": [
"# Install playwright\n",
"!pip install \"playwright\"\n",
"!pip install \"unstructured\"\n",
"!playwright install"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0ab4e115",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import PlaywrightURLLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce5a9a0a",
"metadata": {},
"outputs": [],
"source": [
"urls = [\n",
" \"https://www.youtube.com/watch?v=dQw4w9WgXcQ\",\n",
" \"https://goo.gl/maps/NDSHwePEyaHMFGwh8\"\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2dc3e0bc",
"metadata": {},
"outputs": [],
"source": [
"loader = PlaywrightURLLoader(urls=urls, remove_selectors=[\"header\", \"footer\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10b79f80",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
}
],
"metadata": {
@@ -130,7 +203,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
"version": "3.10.6"
}
},
"nbformat": 4,

View File

@@ -89,7 +89,7 @@
"id": "150988e6",
"metadata": {},
"source": [
"# Loading multiple webpages\n",
"## Loading multiple webpages\n",
"\n",
"You can also load multiple webpages at once by passing in a list of urls to the loader. This will return a list of documents in the same order as the urls passed in."
]
@@ -123,7 +123,7 @@
"id": "641be294",
"metadata": {},
"source": [
"## Load multiple urls concurrently\n",
"### Load multiple urls concurrently\n",
"\n",
"You can speed up the scraping process by scraping and parsing multiple urls concurrently.\n",
"\n",

View File

@@ -99,7 +99,7 @@
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../state_of_the_union.txt')"
"loader = TextLoader('../state_of_the_union.txt', encoding='utf8')"
]
},
{

View File

@@ -0,0 +1,128 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ab66dd43",
"metadata": {},
"source": [
"# SVM Retriever\n",
"\n",
"This notebook goes over how to use a retriever that under the hood uses an SVM using scikit-learn.\n",
"\n",
"Largely based on https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.ipynb"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "393ac030",
"metadata": {},
"outputs": [],
"source": [
"from langchain.retrievers import SVMRetriever\n",
"from langchain.embeddings import OpenAIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a801b57c",
"metadata": {},
"outputs": [],
"source": [
"# !pip install scikit-learn"
]
},
{
"cell_type": "markdown",
"id": "aaf80e7f",
"metadata": {},
"source": [
"## Create New Retriever with Texts"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "98b1c017",
"metadata": {},
"outputs": [],
"source": [
"retriever = SVMRetriever.from_texts([\"foo\", \"bar\", \"world\", \"hello\", \"foo bar\"], OpenAIEmbeddings())"
]
},
{
"cell_type": "markdown",
"id": "08437fa2",
"metadata": {},
"source": [
"## Use Retriever\n",
"\n",
"We can now use the retriever!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c0455218",
"metadata": {},
"outputs": [],
"source": [
"result = retriever.get_relevant_documents(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7dfa5c29",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='foo', metadata={}),\n",
" Document(page_content='foo bar', metadata={}),\n",
" Document(page_content='hello', metadata={}),\n",
" Document(page_content='world', metadata={})]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74bd9256",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -69,7 +69,7 @@
{
"data": {
"text/plain": [
"['73f8f585-9536-4240-aef7-cea205b336bd']"
"['f6303531-d3a5-44af-b7c8-e3cf76916ce5']"
]
},
"execution_count": 3,
@@ -79,7 +79,7 @@
],
"source": [
"retriever.add_documents([Document(page_content=\"hello world\")])\n",
"time.sleep(1)\n",
"time.sleep(20)\n",
"retriever.add_documents([Document(page_content=\"hello foo\")])"
]
},
@@ -90,19 +90,24 @@
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "normalize_score_fn must be provided to FAISS constructor to normalize scores",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[4], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mretriever\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_relevant_documents\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mhello world\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/workplace/langchain/langchain/retrievers/time_weighted_retriever.py:94\u001b[0m, in \u001b[0;36mTimeWeightedVectorStoreRetriever.get_relevant_documents\u001b[0;34m(self, query)\u001b[0m\n\u001b[1;32m 89\u001b[0m docs_and_scores \u001b[38;5;241m=\u001b[39m {\n\u001b[1;32m 90\u001b[0m doc\u001b[38;5;241m.\u001b[39mmetadata[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mbuffer_idx\u001b[39m\u001b[38;5;124m\"\u001b[39m]: (doc, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdefault_salience)\n\u001b[1;32m 91\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m doc \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmemory_stream[\u001b[38;5;241m-\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mk :]\n\u001b[1;32m 92\u001b[0m }\n\u001b[1;32m 93\u001b[0m \u001b[38;5;66;03m# If a doc is considered salient, update the salience score\u001b[39;00m\n\u001b[0;32m---> 94\u001b[0m docs_and_scores\u001b[38;5;241m.\u001b[39mupdate(\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_salient_docs\u001b[49m\u001b[43m(\u001b[49m\u001b[43mquery\u001b[49m\u001b[43m)\u001b[49m)\n\u001b[1;32m 95\u001b[0m rescored_docs \u001b[38;5;241m=\u001b[39m [\n\u001b[1;32m 96\u001b[0m (doc, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_get_combined_score(doc, salience, current_time))\n\u001b[1;32m 97\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m doc, salience \u001b[38;5;129;01min\u001b[39;00m docs_and_scores\u001b[38;5;241m.\u001b[39mvalues()\n\u001b[1;32m 98\u001b[0m ]\n\u001b[1;32m 99\u001b[0m rescored_docs\u001b[38;5;241m.\u001b[39msort(key\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mlambda\u001b[39;00m x: x[\u001b[38;5;241m1\u001b[39m], reverse\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n",
"File \u001b[0;32m~/workplace/langchain/langchain/retrievers/time_weighted_retriever.py:75\u001b[0m, in \u001b[0;36mTimeWeightedVectorStoreRetriever.get_salient_docs\u001b[0;34m(self, query)\u001b[0m\n\u001b[1;32m 72\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Return documents that are salient to the query.\"\"\"\u001b[39;00m\n\u001b[1;32m 73\u001b[0m docs_and_scores: List[Tuple[Document, \u001b[38;5;28mfloat\u001b[39m]]\n\u001b[1;32m 74\u001b[0m docs_and_scores \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m---> 75\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mvectorstore\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msimilarity_search_with_normalized_similarities\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 76\u001b[0m \u001b[43m \u001b[49m\u001b[43mquery\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msearch_kwargs\u001b[49m\n\u001b[1;32m 77\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 78\u001b[0m )\n\u001b[1;32m 79\u001b[0m results \u001b[38;5;241m=\u001b[39m {}\n\u001b[1;32m 80\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m fetched_doc, cosine_distance \u001b[38;5;129;01min\u001b[39;00m docs_and_scores:\n",
"File \u001b[0;32m~/workplace/langchain/langchain/vectorstores/base.py:94\u001b[0m, in \u001b[0;36mVectorStore.similarity_search_with_normalized_similarities\u001b[0;34m(self, query, k, **kwargs)\u001b[0m\n\u001b[1;32m 84\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21msimilarity_search_with_normalized_similarities\u001b[39m(\n\u001b[1;32m 85\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 86\u001b[0m query: \u001b[38;5;28mstr\u001b[39m,\n\u001b[1;32m 87\u001b[0m k: \u001b[38;5;28mint\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m4\u001b[39m,\n\u001b[1;32m 88\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m 89\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m List[Tuple[Document, \u001b[38;5;28mfloat\u001b[39m]]:\n\u001b[1;32m 90\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"Return docs and similarity scores, normalized on a scale from 0 to 1.\u001b[39;00m\n\u001b[1;32m 91\u001b[0m \n\u001b[1;32m 92\u001b[0m \u001b[38;5;124;03m 0 is dissimilar, 1 is most similar.\u001b[39;00m\n\u001b[1;32m 93\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m---> 94\u001b[0m docs_and_similarities \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_similarity_search_with_normalized_similarities\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 95\u001b[0m \u001b[43m \u001b[49m\u001b[43mquery\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mk\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mk\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\n\u001b[1;32m 96\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 97\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28many\u001b[39m(\n\u001b[1;32m 98\u001b[0m similarity \u001b[38;5;241m<\u001b[39m \u001b[38;5;241m0.0\u001b[39m \u001b[38;5;129;01mor\u001b[39;00m similarity \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m1.0\u001b[39m\n\u001b[1;32m 99\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m _, similarity \u001b[38;5;129;01min\u001b[39;00m docs_and_similarities\n\u001b[1;32m 100\u001b[0m ):\n\u001b[1;32m 101\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 102\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mNormalized similarity scores must be between\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 103\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m 0 and 1, got \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mdocs_and_similarities\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 104\u001b[0m )\n",
"File \u001b[0;32m~/workplace/langchain/langchain/vectorstores/faiss.py:435\u001b[0m, in \u001b[0;36mFAISS._similarity_search_with_normalized_similarities\u001b[0;34m(self, query, k, **kwargs)\u001b[0m\n\u001b[1;32m 433\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Return docs and their similarity scores on a scale from 0 to 1.\"\"\"\u001b[39;00m\n\u001b[1;32m 434\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mnormalize_score_fn \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 435\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 436\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnormalize_score_fn must be provided to\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 437\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m FAISS constructor to normalize scores\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 438\u001b[0m )\n\u001b[1;32m 439\u001b[0m docs_and_scores \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msimilarity_search_with_score(query, k\u001b[38;5;241m=\u001b[39mk)\n\u001b[1;32m 440\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m [(doc, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mnormalize_score_fn(score)) \u001b[38;5;28;01mfor\u001b[39;00m doc, score \u001b[38;5;129;01min\u001b[39;00m docs_and_scores]\n",
"\u001b[0;31mValueError\u001b[0m: normalize_score_fn must be provided to FAISS constructor to normalize scores"
"name": "stdout",
"output_type": "stream",
"text": [
"0.9999994359334345\n",
"1.8408203353689756\n",
"0.9999428041917008\n",
"1.9999408025741263\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='hello world', metadata={'last_accessed_at': datetime.datetime(2023, 4, 15, 21, 4, 41, 457055), 'created_at': datetime.datetime(2023, 4, 15, 21, 4, 20, 437090), 'buffer_idx': 0})]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
@@ -120,7 +125,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 5,
"id": "dc37669b",
"metadata": {},
"outputs": [],
@@ -131,35 +136,35 @@
"embedding_size = 1536\n",
"index = faiss.IndexFlatL2(embedding_size)\n",
"vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})\n",
"retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_factor=.00000001, k=1) "
"retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_factor=.0000000000000000000000001, k=1) "
]
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 6,
"id": "fa284384",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['3c156c9a-ad6b-47dc-befc-87437a603973']"
"['f063e5b2-c2eb-42fc-8894-79c7a7a9038e']"
]
},
"execution_count": 31,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.add_documents([Document(page_content=\"hello world\")])\n",
"time.sleep(1)\n",
"time.sleep(20)\n",
"retriever.add_documents([Document(page_content=\"hello foo\")])"
]
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 7,
"id": "7558f94d",
"metadata": {},
"outputs": [
@@ -167,19 +172,19 @@
"name": "stdout",
"output_type": "stream",
"text": [
"0.8416762384156714\n",
"1.6166791948060522\n",
"0.5781175231321124\n",
"1.577325318264064\n"
"0.9978079943966991\n",
"1.8412067636745604\n",
"0.7235696005754303\n",
"1.7230094271411442\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 4, 13, 23, 35, 54, 905605), 'created_at': datetime.datetime(2023, 4, 13, 23, 35, 54, 242988), 'buffer_idx': 1})]"
"[Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 4, 15, 21, 5, 2, 243331), 'created_at': datetime.datetime(2023, 4, 15, 21, 5, 1, 579028), 'buffer_idx': 1})]"
]
},
"execution_count": 32,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}

View File

@@ -17,34 +17,36 @@
"metadata": {},
"outputs": [],
"source": [
"!python3 -m pip install openai deeplake"
"!python3 -m pip install openai deeplake tiktoken"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import DeepLake\n",
"from langchain.document_loaders import TextLoader"
"from langchain.vectorstores import DeepLake"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.environ['OPENAI_API_KEY'] = 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'"
"import getpass\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
@@ -60,9 +62,38 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mem://langchain loaded successfully.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 1/1 [00:04<00:00\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='mem://langchain', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (4, 1536) float32 None \n",
" ids text (4, 1) str None \n",
" metadata json (4, 1) str None \n",
" text text (4, 1) str None \n"
]
}
],
"source": [
"db = DeepLake.from_documents(docs, embeddings)\n",
"\n",
@@ -72,9 +103,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 7,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
@@ -89,9 +134,18 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/media/sdb/davit/.local/lib/python3.10/site-packages/langchain/llms/openai.py:624: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`\n",
" warnings.warn(\n"
]
}
],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain.llms import OpenAIChat\n",
@@ -101,9 +155,20 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"'The president nominated Circuit Court of Appeals Judge Ketanji Brown Jackson for the United States Supreme Court and praised her qualifications and broad support from both Democrats and Republicans.'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = 'What did the president say about Ketanji Brown Jackson'\n",
"qa.run(query)"
@@ -119,9 +184,43 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 11,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mem://langchain loaded successfully.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 1/1 [00:04<00:00\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='mem://langchain', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (42, 1536) float32 None \n",
" ids text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": []
}
],
"source": [
"import random\n",
"\n",
@@ -133,9 +232,30 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 12,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 42/42 [00:00<00:00, 3456.17it/s]\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='Vice President Harris and I ran for office with a new economic vision for America. \\n\\nInvest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up \\nand the middle out, not from the top down. \\n\\nBecause we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. \\n\\nAmerica used to have the best roads, bridges, and airports on Earth. \\n\\nNow our infrastructure is ranked 13th in the world. \\n\\nWe wont be able to compete for the jobs of the 21st Century if we dont fix that. \\n\\nThats why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history. \\n\\nThis was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. \\n\\nWere done talking about infrastructure weeks. \\n\\nWere going to have an infrastructure decade.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China. \\n\\nAs Ive told Xi Jinping, it is never a good bet to bet against the American people. \\n\\nWell create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. \\n\\nAnd well do it all to withstand the devastating effects of the climate crisis and promote environmental justice. \\n\\nWell build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities. \\n\\n4,000 projects have already been announced. \\n\\nAnd tonight, Im announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013})]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db.similarity_search('What did the president say about Ketanji Brown Jackson', filter={'year': 2013})"
]
@@ -151,9 +271,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 13,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2014})]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db.similarity_search('What did the president say about Ketanji Brown Jackson?', distance_metric='cos')"
]
@@ -169,9 +303,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 14,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \\n\\nWhen they came home, many of the worlds fittest and best trained warriors were never the same. \\n\\nHeadaches. Numbness. Dizziness. \\n\\nA cancer that would put them in a flag-draped coffin. \\n\\nI know. \\n\\nOne of those soldiers was my son Major Beau Biden. \\n\\nWe dont know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \\n\\nBut Im committed to finding out everything we can. \\n\\nCommitted to military families like Danielle Robinson from Ohio. \\n\\nThe widow of Sergeant First Class Heath Robinson. \\n\\nHe was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. \\n\\nStationed near Baghdad, just yards from burn pits the size of football fields. \\n\\nHeaths widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2014}),\n",
" Document(page_content='As Ohio Senator Sherrod Brown says, “Its time to bury the label “Rust Belt.” \\n\\nIts time. \\n\\nBut with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills. \\n\\nInflation is robbing them of the gains they might otherwise feel. \\n\\nI get it. Thats why my top priority is getting prices under control. \\n\\nLook, our economy roared back faster than most predicted, but the pandemic meant that businesses had a hard time hiring enough workers to keep up production in their factories. \\n\\nThe pandemic also disrupted global supply chains. \\n\\nWhen factories close, it takes longer to make goods and get them from the warehouse to the store, and prices go up. \\n\\nLook at cars. \\n\\nLast year, there werent enough semiconductors to make all the cars that people wanted to buy. \\n\\nAnd guess what, prices of automobiles went up. \\n\\nSo—we have a choice. \\n\\nOne way to fight inflation is to drive down wages and make Americans poorer.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='We cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \\n\\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \\n\\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \\n\\nOfficer Mora was 27 years old. \\n\\nOfficer Rivera was 22. \\n\\nBoth Dominican Americans whod grown up on the same streets they later chose to patrol as police officers. \\n\\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \\n\\nIve worked on these issues a long time. \\n\\nI know what works: Investing in crime preventionand community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012})]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db.max_marginal_relevance_search('What did the president say about Ketanji Brown Jackson?')"
]
@@ -187,21 +335,87 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"!activeloop login -t <token>"
"os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 17,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Your Deep Lake dataset has been successfully created!\n",
"The dataset is private so make sure you are logged in!\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\\"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/davitbun/linkedin\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"hub://davitbun/linkedin loaded successfully.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 1/1 [00:23<00:00\n",
"/"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://davitbun/linkedin', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (42, 1536) float32 None \n",
" ids text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
}
],
"source": [
"# Embed and store the texts\n",
"dataset_path = \"hub://{username}/{dataset_name}\" # could be also ./local/path (much faster locally), s3://bucket/path/to/dataset, gcs://path/to/dataset, etc.\n",
"dataset_path = f\"hub://{USERNAME}/{DATASET_NAME}\" # could be also ./local/path (much faster locally), s3://bucket/path/to/dataset, gcs://path/to/dataset, etc.\n",
"\n",
"embedding = OpenAIEmbeddings()\n",
"vectordb = DeepLake.from_documents(documents=docs, embedding=embedding, dataset_path=dataset_path)"
@@ -209,9 +423,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 18,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)\n",
@@ -220,11 +448,35 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://davitbun/linkedin', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (42, 1536) float32 None \n",
" ids text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
}
],
"source": [
"vectordb.ds.summary()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"vectordb.ds.summary()"
"embeddings = vectordb.ds.embedding.numpy()"
]
},
{
@@ -232,9 +484,7 @@
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"embeddings = vectordb.ds.embedding.numpy()"
]
"source": []
}
],
"metadata": {

View File

@@ -16,7 +16,7 @@
"In order to add a memory with an external message store to an agent we are going to do the following steps:\n",
"\n",
"1. We are going to create a `RedisChatMessageHistory` to connect to an external database to store the messages in.\n",
"2. We are going to create an `LLMChain` useing that chat history as memory.\n",
"2. We are going to create an `LLMChain` using that chat history as memory.\n",
"3. We are going to use that `LLMChain` to create a custom Agent.\n",
"\n",
"For the purposes of this exercise, we are going to create a simple custom Agent that has access to a search tool and utilizes the `ConversationBufferMemory` class."

View File

@@ -7,9 +7,11 @@
"source": [
"# VectorStore-Backed Memory\n",
"\n",
"`VectorStoreRetrieverMemory` stores interactions in a VectorDB and queries the top-K most \"salient\" interactions every type it is called.\n",
"`VectorStoreRetrieverMemory` stores memories in a VectorDB and queries the top-K most \"salient\" docs every time it is called.\n",
"\n",
"This differs from most of the other Memory classes in that "
"This differs from most of the other Memory classes in that it doesn't explicitly track the order of interactions.\n",
"\n",
"In this case, the \"docs\" are previous conversation snippets. This can be useful to refer to relevant pieces of information that the AI was told earlier in the conversation."
]
},
{
@@ -25,7 +27,8 @@
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.llms import OpenAI\n",
"from langchain.memory import VectorStoreRetrieverMemory\n",
"from langchain.chains import ConversationChain"
"from langchain.chains import ConversationChain\n",
"from langchain.prompts import PromptTemplate"
]
},
{
@@ -40,7 +43,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 29,
"id": "eef56f65",
"metadata": {
"tags": []
@@ -66,12 +69,12 @@
"source": [
"### Create your the VectorStoreRetrieverMemory\n",
"\n",
"The memory object is instantiated from "
"The memory object is instantiated from any VectorStoreRetriever."
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 30,
"id": "e00d4938",
"metadata": {
"tags": []
@@ -84,14 +87,14 @@
"memory = VectorStoreRetrieverMemory(retriever=retriever)\n",
"\n",
"# When added to an agent, the memory object can save pertinent information from conversations or used tools\n",
"memory.save_context({\"input\": \"check the latest scores of the Warriors game\"}, {\"output\": \"the Warriors are up against the Astros 88 to 84\"})\n",
"memory.save_context({\"input\": \"I need help doing my taxes - what's the standard deduction this year?\"}, {\"output\": \"...\"})\n",
"memory.save_context({\"input\": \"What's the the time?\"}, {\"output\": f\"It's {datetime.now()}\"}) # "
"memory.save_context({\"input\": \"My favorite food is pizza\"}, {\"output\": \"thats good to know\"})\n",
"memory.save_context({\"input\": \"My favorite sport is soccer\"}, {\"output\": \"...\"})\n",
"memory.save_context({\"input\": \"I don't the Celtics\"}, {\"output\": \"ok\"}) # "
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 31,
"id": "2fe28a28",
"metadata": {
"tags": []
@@ -101,7 +104,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"input: I need help doing my taxes - what's the standard deduction this year?\n",
"input: My favorite sport is soccer\n",
"output: ...\n"
]
}
@@ -109,7 +112,7 @@
"source": [
"# Notice the first result returned is the memory pertaining to tax help, which the language model deems more semantically relevant\n",
"# to a 1099 than the other documents, despite them both containing numbers.\n",
"print(memory.load_memory_variables({\"prompt\": \"What's a 1099?\"})[\"history\"])"
"print(memory.load_memory_variables({\"prompt\": \"what sport should i watch?\"})[\"history\"])"
]
},
{
@@ -123,7 +126,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 32,
"id": "ebd68c10",
"metadata": {
"tags": []
@@ -139,9 +142,13 @@
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Relevant pieces of previous conversation:\n",
"input: My favorite food is pizza\n",
"output: thats good to know\n",
"\n",
"(You do not need to use these pieces of information if not relevant)\n",
"\n",
"Current conversation:\n",
"input: I need help doing my taxes - what's the standard deduction this year?\n",
"output: ...\n",
"Human: Hi, my name is Perry, what's up?\n",
"AI:\u001b[0m\n",
"\n",
@@ -151,18 +158,32 @@
{
"data": {
"text/plain": [
"\" Hi Perry, my name is AI. I'm doing great, how about you? I understand you need help with your taxes. What specifically do you need help with?\""
"\" Hi Perry, I'm doing well. How about you?\""
]
},
"execution_count": 5,
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm = OpenAI(temperature=0) # Can be any valid LLM\n",
"_DEFAULT_TEMPLATE = \"\"\"The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Relevant pieces of previous conversation:\n",
"{history}\n",
"\n",
"(You do not need to use these pieces of information if not relevant)\n",
"\n",
"Current conversation:\n",
"Human: {input}\n",
"AI:\"\"\"\n",
"PROMPT = PromptTemplate(\n",
" input_variables=[\"history\", \"input\"], template=_DEFAULT_TEMPLATE\n",
")\n",
"conversation_with_summary = ConversationChain(\n",
" llm=llm, \n",
" prompt=PROMPT,\n",
" # We set a very low max_token_limit for the purposes of testing.\n",
" memory=memory,\n",
" verbose=True\n",
@@ -172,7 +193,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 33,
"id": "86207a61",
"metadata": {
"tags": []
@@ -188,10 +209,14 @@
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Relevant pieces of previous conversation:\n",
"input: My favorite sport is soccer\n",
"output: ...\n",
"\n",
"(You do not need to use these pieces of information if not relevant)\n",
"\n",
"Current conversation:\n",
"input: check the latest scores of the Warriors game\n",
"output: the Warriors are up against the Astros 88 to 84\n",
"Human: If the Cavaliers were to face off against the Warriers or the Astros, who would they most stand a chance to beat?\n",
"Human: what's my favorite sport?\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
@@ -200,22 +225,22 @@
{
"data": {
"text/plain": [
"\" It's hard to say without knowing the current form of the teams. However, based on the current scores, it looks like the Cavaliers would have a better chance of beating the Astros than the Warriors.\""
"' You told me earlier that your favorite sport is soccer.'"
]
},
"execution_count": 6,
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Here, the basketball related content is surfaced\n",
"conversation_with_summary.predict(input=\"If the Cavaliers were to face off against the Warriers or the Astros, who would they most stand a chance to beat?\")"
"conversation_with_summary.predict(input=\"what's my favorite sport?\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 34,
"id": "8c669db1",
"metadata": {
"tags": []
@@ -231,10 +256,14 @@
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Relevant pieces of previous conversation:\n",
"input: My favorite food is pizza\n",
"output: thats good to know\n",
"\n",
"(You do not need to use these pieces of information if not relevant)\n",
"\n",
"Current conversation:\n",
"input: What's the the time?\n",
"output: It's 2023-04-13 09:18:55.623736\n",
"Human: What day is it tomorrow?\n",
"Human: Whats my favorite food\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
@@ -243,10 +272,10 @@
{
"data": {
"text/plain": [
"' Tomorrow is 2023-04-14.'"
"' You said your favorite food is pizza.'"
]
},
"execution_count": 7,
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
@@ -254,12 +283,12 @@
"source": [
"# Even though the language model is stateless, since relavent memory is fetched, it can \"reason\" about the time.\n",
"# Timestamping memories and data is useful in general to let the agent determine temporal relevance\n",
"conversation_with_summary.predict(input=\"What day is it tomorrow?\")"
"conversation_with_summary.predict(input=\"Whats my favorite food\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 35,
"id": "8c09a239",
"metadata": {
"tags": []
@@ -275,10 +304,14 @@
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"Relevant pieces of previous conversation:\n",
"input: Hi, my name is Perry, what's up?\n",
"response: Hi Perry, my name is AI. I'm doing great, how about you? I understand you need help with your taxes. What specifically do you need help with?\n",
"Human: What's your name?\n",
"response: Hi Perry, I'm doing well. How about you?\n",
"\n",
"(You do not need to use these pieces of information if not relevant)\n",
"\n",
"Current conversation:\n",
"Human: What's my name?\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
@@ -287,10 +320,10 @@
{
"data": {
"text/plain": [
"\" My name is AI. It's nice to meet you, Perry.\""
"' Your name is Perry.'"
]
},
"execution_count": 8,
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
@@ -299,8 +332,16 @@
"# The memories from the conversation are automatically stored,\n",
"# since this query best matches the introduction chat above,\n",
"# the agent is able to 'remember' the user's name.\n",
"conversation_with_summary.predict(input=\"What's your name?\")"
"conversation_with_summary.predict(input=\"What's my name?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "df27c7dc",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -319,7 +360,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
"version": "3.9.1"
}
},
"nbformat": 4,

View File

@@ -115,7 +115,7 @@
"id": "a2d76826",
"metadata": {},
"source": [
"**The above request should now appear on your [PromptLayer dashboard](https://ww.promptlayer.com).**"
"**The above request should now appear on your [PromptLayer dashboard](https://www.promptlayer.com).**"
]
},
{

View File

@@ -43,22 +43,18 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Total Tokens: 39\n",
"Prompt Tokens: 4\n",
"Completion Tokens: 35\n",
"Tokens Used: 42\n",
"\tPrompt Tokens: 4\n",
"\tCompletion Tokens: 38\n",
"Successful Requests: 1\n",
"Total Cost (USD): $0.0007800000000000001\n"
"Total Cost (USD): $0.00084\n"
]
}
],
"source": [
"with get_openai_callback() as cb:\n",
" result = llm(\"Tell me a joke\")\n",
" print(f\"Total Tokens: {cb.total_tokens}\")\n",
" print(f\"Prompt Tokens: {cb.prompt_tokens}\")\n",
" print(f\"Completion Tokens: {cb.completion_tokens}\")\n",
" print(f\"Successful Requests: {cb.successful_requests}\")\n",
" print(f\"Total Cost (USD): ${cb.total_cost}\")"
" print(cb)"
]
},
{

View File

@@ -186,7 +186,7 @@
"source": [
"**Number of Tokens:** You can also estimate how many tokens a piece of text will be in that model. This is useful because models have a context length (and cost more for more tokens), which means you need to be aware of how long the text you are passing in is.\n",
"\n",
"Notice that by default the tokens are estimated using [tiktoken](https://github.com/openai/tiktoken) (except for legacy version <3.8, where a HuggingFace tokenizer is used)"
"Notice that by default the tokens are estimated using [tiktoken](https://github.com/openai/tiktoken) (except for legacy version <3.8, where a Hugging Face tokenizer is used)"
]
},
{

View File

@@ -38,7 +38,7 @@
"from langchain.llms import BaseLLM\n",
"from langchain.vectorstores.base import VectorStore\n",
"from pydantic import BaseModel, Field\n",
"from langchain.chains.base import Chain\n"
"from langchain.chains.base import Chain"
]
},
{
@@ -73,6 +73,7 @@
"embeddings_model = OpenAIEmbeddings()\n",
"# Initialize the vectorstore as empty\n",
"import faiss\n",
"\n",
"embedding_size = 1536\n",
"index = faiss.IndexFlatL2(embedding_size)\n",
"vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})"
@@ -116,7 +117,12 @@
" )\n",
" prompt = PromptTemplate(\n",
" template=task_creation_template,\n",
" input_variables=[\"result\", \"task_description\", \"incomplete_tasks\", \"objective\"],\n",
" input_variables=[\n",
" \"result\",\n",
" \"task_description\",\n",
" \"incomplete_tasks\",\n",
" \"objective\",\n",
" ],\n",
" )\n",
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
]
@@ -147,7 +153,7 @@
" template=task_prioritization_template,\n",
" input_variables=[\"task_names\", \"next_task_id\", \"objective\"],\n",
" )\n",
" return cls(prompt=prompt, llm=llm, verbose=verbose)\n"
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
]
},
{
@@ -173,7 +179,7 @@
" template=execution_template,\n",
" input_variables=[\"objective\", \"context\", \"task\"],\n",
" )\n",
" return cls(prompt=prompt, llm=llm, verbose=verbose)\n"
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
]
},
{
@@ -193,11 +199,22 @@
"metadata": {},
"outputs": [],
"source": [
"def get_next_task(task_creation_chain: LLMChain, result: Dict, task_description: str, task_list: List[str], objective: str) -> List[Dict]:\n",
"def get_next_task(\n",
" task_creation_chain: LLMChain,\n",
" result: Dict,\n",
" task_description: str,\n",
" task_list: List[str],\n",
" objective: str,\n",
") -> List[Dict]:\n",
" \"\"\"Get the next task.\"\"\"\n",
" incomplete_tasks = \", \".join(task_list)\n",
" response = task_creation_chain.run(result=result, task_description=task_description, incomplete_tasks=incomplete_tasks, objective=objective)\n",
" new_tasks = response.split('\\n')\n",
" response = task_creation_chain.run(\n",
" result=result,\n",
" task_description=task_description,\n",
" incomplete_tasks=incomplete_tasks,\n",
" objective=objective,\n",
" )\n",
" new_tasks = response.split(\"\\n\")\n",
" return [{\"task_name\": task_name} for task_name in new_tasks if task_name.strip()]"
]
},
@@ -208,12 +225,19 @@
"metadata": {},
"outputs": [],
"source": [
"def prioritize_tasks(task_prioritization_chain: LLMChain, this_task_id: int, task_list: List[Dict], objective: str) -> List[Dict]:\n",
"def prioritize_tasks(\n",
" task_prioritization_chain: LLMChain,\n",
" this_task_id: int,\n",
" task_list: List[Dict],\n",
" objective: str,\n",
") -> List[Dict]:\n",
" \"\"\"Prioritize tasks.\"\"\"\n",
" task_names = [t[\"task_name\"] for t in task_list]\n",
" next_task_id = int(this_task_id) + 1\n",
" response = task_prioritization_chain.run(task_names=task_names, next_task_id=next_task_id, objective=objective)\n",
" new_tasks = response.split('\\n')\n",
" response = task_prioritization_chain.run(\n",
" task_names=task_names, next_task_id=next_task_id, objective=objective\n",
" )\n",
" new_tasks = response.split(\"\\n\")\n",
" prioritized_task_list = []\n",
" for task_string in new_tasks:\n",
" if not task_string.strip():\n",
@@ -239,9 +263,12 @@
" if not results:\n",
" return []\n",
" sorted_results, _ = zip(*sorted(results, key=lambda x: x[1], reverse=True))\n",
" return [str(item.metadata['task']) for item in sorted_results]\n",
" return [str(item.metadata[\"task\"]) for item in sorted_results]\n",
"\n",
"def execute_task(vectorstore, execution_chain: LLMChain, objective: str, task: str, k: int = 5) -> str:\n",
"\n",
"def execute_task(\n",
" vectorstore, execution_chain: LLMChain, objective: str, task: str, k: int = 5\n",
") -> str:\n",
" \"\"\"Execute a task.\"\"\"\n",
" context = _get_top_tasks(vectorstore, query=objective, k=k)\n",
" return execution_chain.run(objective=objective, context=context, task=task)"
@@ -254,7 +281,6 @@
"metadata": {},
"outputs": [],
"source": [
"\n",
"class BabyAGI(Chain, BaseModel):\n",
" \"\"\"Controller model for the BabyAGI agent.\"\"\"\n",
"\n",
@@ -265,9 +291,10 @@
" task_id_counter: int = Field(1)\n",
" vectorstore: VectorStore = Field(init=False)\n",
" max_iterations: Optional[int] = None\n",
" \n",
"\n",
" class Config:\n",
" \"\"\"Configuration for this pydantic object.\"\"\"\n",
"\n",
" arbitrary_types_allowed = True\n",
"\n",
" def add_task(self, task: Dict):\n",
@@ -285,18 +312,18 @@
" def print_task_result(self, result: str):\n",
" print(\"\\033[93m\\033[1m\" + \"\\n*****TASK RESULT*****\\n\" + \"\\033[0m\\033[0m\")\n",
" print(result)\n",
" \n",
"\n",
" @property\n",
" def input_keys(self) -> List[str]:\n",
" return [\"objective\"]\n",
" \n",
"\n",
" @property\n",
" def output_keys(self) -> List[str]:\n",
" return []\n",
"\n",
" def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:\n",
" \"\"\"Run the agent.\"\"\"\n",
" objective = inputs['objective']\n",
" objective = inputs[\"objective\"]\n",
" first_task = inputs.get(\"first_task\", \"Make a todo list\")\n",
" self.add_task({\"task_id\": 1, \"task_name\": first_task})\n",
" num_iters = 0\n",
@@ -325,7 +352,11 @@
"\n",
" # Step 4: Create new tasks and reprioritize task list\n",
" new_tasks = get_next_task(\n",
" self.task_creation_chain, result, task[\"task_name\"], [t[\"task_name\"] for t in self.task_list], objective\n",
" self.task_creation_chain,\n",
" result,\n",
" task[\"task_name\"],\n",
" [t[\"task_name\"] for t in self.task_list],\n",
" objective,\n",
" )\n",
" for new_task in new_tasks:\n",
" self.task_id_counter += 1\n",
@@ -333,27 +364,26 @@
" self.add_task(new_task)\n",
" self.task_list = deque(\n",
" prioritize_tasks(\n",
" self.task_prioritization_chain, this_task_id, list(self.task_list), objective\n",
" self.task_prioritization_chain,\n",
" this_task_id,\n",
" list(self.task_list),\n",
" objective,\n",
" )\n",
" )\n",
" num_iters += 1\n",
" if self.max_iterations is not None and num_iters == self.max_iterations:\n",
" print(\"\\033[91m\\033[1m\" + \"\\n*****TASK ENDING*****\\n\" + \"\\033[0m\\033[0m\")\n",
" print(\n",
" \"\\033[91m\\033[1m\" + \"\\n*****TASK ENDING*****\\n\" + \"\\033[0m\\033[0m\"\n",
" )\n",
" break\n",
" return {}\n",
"\n",
" @classmethod\n",
" def from_llm(\n",
" cls,\n",
" llm: BaseLLM,\n",
" vectorstore: VectorStore,\n",
" verbose: bool = False,\n",
" **kwargs\n",
" cls, llm: BaseLLM, vectorstore: VectorStore, verbose: bool = False, **kwargs\n",
" ) -> \"BabyAGI\":\n",
" \"\"\"Initialize the BabyAGI Controller.\"\"\"\n",
" task_creation_chain = TaskCreationChain.from_llm(\n",
" llm, verbose=verbose\n",
" )\n",
" task_creation_chain = TaskCreationChain.from_llm(llm, verbose=verbose)\n",
" task_prioritization_chain = TaskPrioritizationChain.from_llm(\n",
" llm, verbose=verbose\n",
" )\n",
@@ -363,7 +393,7 @@
" task_prioritization_chain=task_prioritization_chain,\n",
" execution_chain=execution_chain,\n",
" vectorstore=vectorstore,\n",
" **kwargs\n",
" **kwargs,\n",
" )"
]
},
@@ -405,14 +435,11 @@
"outputs": [],
"source": [
"# Logging of LLMChains\n",
"verbose=False\n",
"verbose = False\n",
"# If None, will keep on going forever\n",
"max_iterations: Optional[int] = 3\n",
"baby_agi = BabyAGI.from_llm(\n",
" llm=llm,\n",
" vectorstore=vectorstore,\n",
" verbose=verbose,\n",
" max_iterations=max_iterations\n",
" llm=llm, vectorstore=vectorstore, verbose=verbose, max_iterations=max_iterations\n",
")"
]
},

View File

@@ -34,7 +34,7 @@
"from langchain.llms import BaseLLM\n",
"from langchain.vectorstores.base import VectorStore\n",
"from pydantic import BaseModel, Field\n",
"from langchain.chains.base import Chain\n"
"from langchain.chains.base import Chain"
]
},
{
@@ -54,7 +54,9 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install faiss-cpu > /dev/null%pip install google-search-results > /dev/nullfrom langchain.vectorstores import FAISS\n",
"%pip install faiss-cpu > /dev/null\n",
"%pip install google-search-results > /dev/null\n",
"from langchain.vectorstores import FAISS\n",
"from langchain.docstore import InMemoryDocstore"
]
},
@@ -69,6 +71,7 @@
"embeddings_model = OpenAIEmbeddings()\n",
"# Initialize the vectorstore as empty\n",
"import faiss\n",
"\n",
"embedding_size = 1536\n",
"index = faiss.IndexFlatL2(embedding_size)\n",
"vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})"
@@ -115,7 +118,12 @@
" )\n",
" prompt = PromptTemplate(\n",
" template=task_creation_template,\n",
" input_variables=[\"result\", \"task_description\", \"incomplete_tasks\", \"objective\"],\n",
" input_variables=[\n",
" \"result\",\n",
" \"task_description\",\n",
" \"incomplete_tasks\",\n",
" \"objective\",\n",
" ],\n",
" )\n",
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
]
@@ -146,7 +154,7 @@
" template=task_prioritization_template,\n",
" input_variables=[\"task_names\", \"next_task_id\", \"objective\"],\n",
" )\n",
" return cls(prompt=prompt, llm=llm, verbose=verbose)\n"
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
]
},
{
@@ -158,20 +166,23 @@
"source": [
"from langchain.agents import ZeroShotAgent, Tool, AgentExecutor\n",
"from langchain import OpenAI, SerpAPIWrapper, LLMChain\n",
"todo_prompt = PromptTemplate.from_template(\"You are a planner who is an expert at coming up with a todo list for a given objective. Come up with a todo list for this objective: {objective}\")\n",
"\n",
"todo_prompt = PromptTemplate.from_template(\n",
" \"You are a planner who is an expert at coming up with a todo list for a given objective. Come up with a todo list for this objective: {objective}\"\n",
")\n",
"todo_chain = LLMChain(llm=OpenAI(temperature=0), prompt=todo_prompt)\n",
"search = SerpAPIWrapper()\n",
"tools = [\n",
" Tool(\n",
" name = \"Search\",\n",
" name=\"Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events\"\n",
" description=\"useful for when you need to answer questions about current events\",\n",
" ),\n",
" Tool(\n",
" name = \"TODO\",\n",
" name=\"TODO\",\n",
" func=todo_chain.run,\n",
" description=\"useful for when you need to come up with todo lists. Input: an objective to create a todo list for. Output: a todo list for that objective. Please be very clear what the objective is!\"\n",
" )\n",
" description=\"useful for when you need to come up with todo lists. Input: an objective to create a todo list for. Output: a todo list for that objective. Please be very clear what the objective is!\",\n",
" ),\n",
"]\n",
"\n",
"\n",
@@ -179,10 +190,10 @@
"suffix = \"\"\"Question: {task}\n",
"{agent_scratchpad}\"\"\"\n",
"prompt = ZeroShotAgent.create_prompt(\n",
" tools, \n",
" prefix=prefix, \n",
" suffix=suffix, \n",
" input_variables=[\"objective\", \"task\", \"context\",\"agent_scratchpad\"]\n",
" tools,\n",
" prefix=prefix,\n",
" suffix=suffix,\n",
" input_variables=[\"objective\", \"task\", \"context\", \"agent_scratchpad\"],\n",
")"
]
},
@@ -203,11 +214,22 @@
"metadata": {},
"outputs": [],
"source": [
"def get_next_task(task_creation_chain: LLMChain, result: Dict, task_description: str, task_list: List[str], objective: str) -> List[Dict]:\n",
"def get_next_task(\n",
" task_creation_chain: LLMChain,\n",
" result: Dict,\n",
" task_description: str,\n",
" task_list: List[str],\n",
" objective: str,\n",
") -> List[Dict]:\n",
" \"\"\"Get the next task.\"\"\"\n",
" incomplete_tasks = \", \".join(task_list)\n",
" response = task_creation_chain.run(result=result, task_description=task_description, incomplete_tasks=incomplete_tasks, objective=objective)\n",
" new_tasks = response.split('\\n')\n",
" response = task_creation_chain.run(\n",
" result=result,\n",
" task_description=task_description,\n",
" incomplete_tasks=incomplete_tasks,\n",
" objective=objective,\n",
" )\n",
" new_tasks = response.split(\"\\n\")\n",
" return [{\"task_name\": task_name} for task_name in new_tasks if task_name.strip()]"
]
},
@@ -218,12 +240,19 @@
"metadata": {},
"outputs": [],
"source": [
"def prioritize_tasks(task_prioritization_chain: LLMChain, this_task_id: int, task_list: List[Dict], objective: str) -> List[Dict]:\n",
"def prioritize_tasks(\n",
" task_prioritization_chain: LLMChain,\n",
" this_task_id: int,\n",
" task_list: List[Dict],\n",
" objective: str,\n",
") -> List[Dict]:\n",
" \"\"\"Prioritize tasks.\"\"\"\n",
" task_names = [t[\"task_name\"] for t in task_list]\n",
" next_task_id = int(this_task_id) + 1\n",
" response = task_prioritization_chain.run(task_names=task_names, next_task_id=next_task_id, objective=objective)\n",
" new_tasks = response.split('\\n')\n",
" response = task_prioritization_chain.run(\n",
" task_names=task_names, next_task_id=next_task_id, objective=objective\n",
" )\n",
" new_tasks = response.split(\"\\n\")\n",
" prioritized_task_list = []\n",
" for task_string in new_tasks:\n",
" if not task_string.strip():\n",
@@ -249,9 +278,12 @@
" if not results:\n",
" return []\n",
" sorted_results, _ = zip(*sorted(results, key=lambda x: x[1], reverse=True))\n",
" return [str(item.metadata['task']) for item in sorted_results]\n",
" return [str(item.metadata[\"task\"]) for item in sorted_results]\n",
"\n",
"def execute_task(vectorstore, execution_chain: LLMChain, objective: str, task: str, k: int = 5) -> str:\n",
"\n",
"def execute_task(\n",
" vectorstore, execution_chain: LLMChain, objective: str, task: str, k: int = 5\n",
") -> str:\n",
" \"\"\"Execute a task.\"\"\"\n",
" context = _get_top_tasks(vectorstore, query=objective, k=k)\n",
" return execution_chain.run(objective=objective, context=context, task=task)"
@@ -264,7 +296,6 @@
"metadata": {},
"outputs": [],
"source": [
"\n",
"class BabyAGI(Chain, BaseModel):\n",
" \"\"\"Controller model for the BabyAGI agent.\"\"\"\n",
"\n",
@@ -275,9 +306,10 @@
" task_id_counter: int = Field(1)\n",
" vectorstore: VectorStore = Field(init=False)\n",
" max_iterations: Optional[int] = None\n",
" \n",
"\n",
" class Config:\n",
" \"\"\"Configuration for this pydantic object.\"\"\"\n",
"\n",
" arbitrary_types_allowed = True\n",
"\n",
" def add_task(self, task: Dict):\n",
@@ -295,18 +327,18 @@
" def print_task_result(self, result: str):\n",
" print(\"\\033[93m\\033[1m\" + \"\\n*****TASK RESULT*****\\n\" + \"\\033[0m\\033[0m\")\n",
" print(result)\n",
" \n",
"\n",
" @property\n",
" def input_keys(self) -> List[str]:\n",
" return [\"objective\"]\n",
" \n",
"\n",
" @property\n",
" def output_keys(self) -> List[str]:\n",
" return []\n",
"\n",
" def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:\n",
" \"\"\"Run the agent.\"\"\"\n",
" objective = inputs['objective']\n",
" objective = inputs[\"objective\"]\n",
" first_task = inputs.get(\"first_task\", \"Make a todo list\")\n",
" self.add_task({\"task_id\": 1, \"task_name\": first_task})\n",
" num_iters = 0\n",
@@ -335,7 +367,11 @@
"\n",
" # Step 4: Create new tasks and reprioritize task list\n",
" new_tasks = get_next_task(\n",
" self.task_creation_chain, result, task[\"task_name\"], [t[\"task_name\"] for t in self.task_list], objective\n",
" self.task_creation_chain,\n",
" result,\n",
" task[\"task_name\"],\n",
" [t[\"task_name\"] for t in self.task_list],\n",
" objective,\n",
" )\n",
" for new_task in new_tasks:\n",
" self.task_id_counter += 1\n",
@@ -343,40 +379,41 @@
" self.add_task(new_task)\n",
" self.task_list = deque(\n",
" prioritize_tasks(\n",
" self.task_prioritization_chain, this_task_id, list(self.task_list), objective\n",
" self.task_prioritization_chain,\n",
" this_task_id,\n",
" list(self.task_list),\n",
" objective,\n",
" )\n",
" )\n",
" num_iters += 1\n",
" if self.max_iterations is not None and num_iters == self.max_iterations:\n",
" print(\"\\033[91m\\033[1m\" + \"\\n*****TASK ENDING*****\\n\" + \"\\033[0m\\033[0m\")\n",
" print(\n",
" \"\\033[91m\\033[1m\" + \"\\n*****TASK ENDING*****\\n\" + \"\\033[0m\\033[0m\"\n",
" )\n",
" break\n",
" return {}\n",
"\n",
" @classmethod\n",
" def from_llm(\n",
" cls,\n",
" llm: BaseLLM,\n",
" vectorstore: VectorStore,\n",
" verbose: bool = False,\n",
" **kwargs\n",
" cls, llm: BaseLLM, vectorstore: VectorStore, verbose: bool = False, **kwargs\n",
" ) -> \"BabyAGI\":\n",
" \"\"\"Initialize the BabyAGI Controller.\"\"\"\n",
" task_creation_chain = TaskCreationChain.from_llm(\n",
" llm, verbose=verbose\n",
" )\n",
" task_creation_chain = TaskCreationChain.from_llm(llm, verbose=verbose)\n",
" task_prioritization_chain = TaskPrioritizationChain.from_llm(\n",
" llm, verbose=verbose\n",
" )\n",
" llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
" tool_names = [tool.name for tool in tools]\n",
" agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names)\n",
" agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)\n",
" agent_executor = AgentExecutor.from_agent_and_tools(\n",
" agent=agent, tools=tools, verbose=True\n",
" )\n",
" return cls(\n",
" task_creation_chain=task_creation_chain,\n",
" task_prioritization_chain=task_prioritization_chain,\n",
" execution_chain=agent_executor,\n",
" vectorstore=vectorstore,\n",
" **kwargs\n",
" **kwargs,\n",
" )"
]
},
@@ -418,14 +455,11 @@
"outputs": [],
"source": [
"# Logging of LLMChains\n",
"verbose=False\n",
"verbose = False\n",
"# If None, will keep on going forever\n",
"max_iterations: Optional[int] = 3\n",
"baby_agi = BabyAGI.from_llm(\n",
" llm=llm,\n",
" vectorstore=vectorstore,\n",
" verbose=verbose,\n",
" max_iterations=max_iterations\n",
" llm=llm, vectorstore=vectorstore, verbose=verbose, max_iterations=max_iterations\n",
")"
]
},

View File

@@ -23,3 +23,4 @@ Query Understanding: GPT-4 processes user queries, grasping the context and extr
The full tutorial is available below.
- [Twitter the-algorithm codebase analysis with Deep Lake](code/twitter-the-algorithm-analysis-deeplake.ipynb): A notebook walking through how to parse github source code and run queries conversation.
- [LangChain codebase analysis with Deep Lake](code/code-analysis-deeplake.ipynb): A notebook walking through how to analyze and do question answering over THIS code base.

View File

@@ -0,0 +1,644 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Use LangChain, GPT and Deep Lake to work with code base\n",
"In this tutorial, we are going to use Langchain + Deep Lake with GPT to analyze the code base of the LangChain itself. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Design"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Prepare data:\n",
" 1. Upload all python project files using the `langchain.document_loaders.TextLoader`. We will call these files the **documents**.\n",
" 2. Split all documents to chunks using the `langchain.text_splitter.CharacterTextSplitter`.\n",
" 3. Embed chunks and upload them into the DeepLake using `langchain.embeddings.openai.OpenAIEmbeddings` and `langchain.vectorstores.DeepLake`\n",
"2. Question-Answering:\n",
" 1. Build a chain from `langchain.chat_models.ChatOpenAI` and `langchain.chains.ConversationalRetrievalChain`\n",
" 2. Prepare questions.\n",
" 3. Get answers running the chain.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Implementation"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"### Integration preparations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need to set up keys for external services and install necessary python libraries."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#!python3 -m pip install --upgrade langchain deeplake openai"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set up OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. \n",
"\n",
"For full documentation of Deep Lake please follow https://docs.activeloop.ai/ and API reference https://docs.deeplake.ai/en/latest/"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"import os\n",
"from getpass import getpass\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass()\n",
"# Please manually enter OpenAI Key"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at [app.activeloop.ai](https://app.activeloop.ai)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare data "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load all repository files. Here we assume this notebook is downloaded as the part of the langchain fork and we work with the python files of the `langchain` repo.\n",
"\n",
"If you want to use files from different repo, change `root_dir` to the root dir of your repo."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1147\n"
]
}
],
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"root_dir = '../../../..'\n",
"\n",
"docs = []\n",
"for dirpath, dirnames, filenames in os.walk(root_dir):\n",
" for file in filenames:\n",
" if file.endswith('.py') and '/.venv/' not in dirpath:\n",
" try: \n",
" loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')\n",
" docs.extend(loader.load_and_split())\n",
" except Exception as e: \n",
" pass\n",
"print(f'{len(docs)}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, chunk the files"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Created a chunk of size 1620, which is longer than the specified 1000\n",
"Created a chunk of size 1213, which is longer than the specified 1000\n",
"Created a chunk of size 1263, which is longer than the specified 1000\n",
"Created a chunk of size 1448, which is longer than the specified 1000\n",
"Created a chunk of size 1120, which is longer than the specified 1000\n",
"Created a chunk of size 1148, which is longer than the specified 1000\n",
"Created a chunk of size 1826, which is longer than the specified 1000\n",
"Created a chunk of size 1260, which is longer than the specified 1000\n",
"Created a chunk of size 1195, which is longer than the specified 1000\n",
"Created a chunk of size 2147, which is longer than the specified 1000\n",
"Created a chunk of size 1410, which is longer than the specified 1000\n",
"Created a chunk of size 1269, which is longer than the specified 1000\n",
"Created a chunk of size 1030, which is longer than the specified 1000\n",
"Created a chunk of size 1046, which is longer than the specified 1000\n",
"Created a chunk of size 1024, which is longer than the specified 1000\n",
"Created a chunk of size 1026, which is longer than the specified 1000\n",
"Created a chunk of size 1285, which is longer than the specified 1000\n",
"Created a chunk of size 1370, which is longer than the specified 1000\n",
"Created a chunk of size 1031, which is longer than the specified 1000\n",
"Created a chunk of size 1999, which is longer than the specified 1000\n",
"Created a chunk of size 1029, which is longer than the specified 1000\n",
"Created a chunk of size 1120, which is longer than the specified 1000\n",
"Created a chunk of size 1033, which is longer than the specified 1000\n",
"Created a chunk of size 1143, which is longer than the specified 1000\n",
"Created a chunk of size 1416, which is longer than the specified 1000\n",
"Created a chunk of size 2482, which is longer than the specified 1000\n",
"Created a chunk of size 1890, which is longer than the specified 1000\n",
"Created a chunk of size 1418, which is longer than the specified 1000\n",
"Created a chunk of size 1848, which is longer than the specified 1000\n",
"Created a chunk of size 1069, which is longer than the specified 1000\n",
"Created a chunk of size 2369, which is longer than the specified 1000\n",
"Created a chunk of size 1045, which is longer than the specified 1000\n",
"Created a chunk of size 1501, which is longer than the specified 1000\n",
"Created a chunk of size 1208, which is longer than the specified 1000\n",
"Created a chunk of size 1950, which is longer than the specified 1000\n",
"Created a chunk of size 1283, which is longer than the specified 1000\n",
"Created a chunk of size 1414, which is longer than the specified 1000\n",
"Created a chunk of size 1304, which is longer than the specified 1000\n",
"Created a chunk of size 1224, which is longer than the specified 1000\n",
"Created a chunk of size 1060, which is longer than the specified 1000\n",
"Created a chunk of size 2461, which is longer than the specified 1000\n",
"Created a chunk of size 1099, which is longer than the specified 1000\n",
"Created a chunk of size 1178, which is longer than the specified 1000\n",
"Created a chunk of size 1449, which is longer than the specified 1000\n",
"Created a chunk of size 1345, which is longer than the specified 1000\n",
"Created a chunk of size 3359, which is longer than the specified 1000\n",
"Created a chunk of size 2248, which is longer than the specified 1000\n",
"Created a chunk of size 1589, which is longer than the specified 1000\n",
"Created a chunk of size 2104, which is longer than the specified 1000\n",
"Created a chunk of size 1505, which is longer than the specified 1000\n",
"Created a chunk of size 1387, which is longer than the specified 1000\n",
"Created a chunk of size 1215, which is longer than the specified 1000\n",
"Created a chunk of size 1240, which is longer than the specified 1000\n",
"Created a chunk of size 1635, which is longer than the specified 1000\n",
"Created a chunk of size 1075, which is longer than the specified 1000\n",
"Created a chunk of size 2180, which is longer than the specified 1000\n",
"Created a chunk of size 1791, which is longer than the specified 1000\n",
"Created a chunk of size 1555, which is longer than the specified 1000\n",
"Created a chunk of size 1082, which is longer than the specified 1000\n",
"Created a chunk of size 1225, which is longer than the specified 1000\n",
"Created a chunk of size 1287, which is longer than the specified 1000\n",
"Created a chunk of size 1085, which is longer than the specified 1000\n",
"Created a chunk of size 1117, which is longer than the specified 1000\n",
"Created a chunk of size 1966, which is longer than the specified 1000\n",
"Created a chunk of size 1150, which is longer than the specified 1000\n",
"Created a chunk of size 1285, which is longer than the specified 1000\n",
"Created a chunk of size 1150, which is longer than the specified 1000\n",
"Created a chunk of size 1585, which is longer than the specified 1000\n",
"Created a chunk of size 1208, which is longer than the specified 1000\n",
"Created a chunk of size 1267, which is longer than the specified 1000\n",
"Created a chunk of size 1542, which is longer than the specified 1000\n",
"Created a chunk of size 1183, which is longer than the specified 1000\n",
"Created a chunk of size 2424, which is longer than the specified 1000\n",
"Created a chunk of size 1017, which is longer than the specified 1000\n",
"Created a chunk of size 1304, which is longer than the specified 1000\n",
"Created a chunk of size 1379, which is longer than the specified 1000\n",
"Created a chunk of size 1324, which is longer than the specified 1000\n",
"Created a chunk of size 1205, which is longer than the specified 1000\n",
"Created a chunk of size 1056, which is longer than the specified 1000\n",
"Created a chunk of size 1195, which is longer than the specified 1000\n",
"Created a chunk of size 3608, which is longer than the specified 1000\n",
"Created a chunk of size 1058, which is longer than the specified 1000\n",
"Created a chunk of size 1075, which is longer than the specified 1000\n",
"Created a chunk of size 1217, which is longer than the specified 1000\n",
"Created a chunk of size 1109, which is longer than the specified 1000\n",
"Created a chunk of size 1440, which is longer than the specified 1000\n",
"Created a chunk of size 1046, which is longer than the specified 1000\n",
"Created a chunk of size 1220, which is longer than the specified 1000\n",
"Created a chunk of size 1403, which is longer than the specified 1000\n",
"Created a chunk of size 1241, which is longer than the specified 1000\n",
"Created a chunk of size 1427, which is longer than the specified 1000\n",
"Created a chunk of size 1049, which is longer than the specified 1000\n",
"Created a chunk of size 1580, which is longer than the specified 1000\n",
"Created a chunk of size 1565, which is longer than the specified 1000\n",
"Created a chunk of size 1131, which is longer than the specified 1000\n",
"Created a chunk of size 1425, which is longer than the specified 1000\n",
"Created a chunk of size 1054, which is longer than the specified 1000\n",
"Created a chunk of size 1027, which is longer than the specified 1000\n",
"Created a chunk of size 2559, which is longer than the specified 1000\n",
"Created a chunk of size 1028, which is longer than the specified 1000\n",
"Created a chunk of size 1382, which is longer than the specified 1000\n",
"Created a chunk of size 1888, which is longer than the specified 1000\n",
"Created a chunk of size 1475, which is longer than the specified 1000\n",
"Created a chunk of size 1652, which is longer than the specified 1000\n",
"Created a chunk of size 1891, which is longer than the specified 1000\n",
"Created a chunk of size 1899, which is longer than the specified 1000\n",
"Created a chunk of size 1021, which is longer than the specified 1000\n",
"Created a chunk of size 1085, which is longer than the specified 1000\n",
"Created a chunk of size 1854, which is longer than the specified 1000\n",
"Created a chunk of size 1672, which is longer than the specified 1000\n",
"Created a chunk of size 2537, which is longer than the specified 1000\n",
"Created a chunk of size 1251, which is longer than the specified 1000\n",
"Created a chunk of size 1734, which is longer than the specified 1000\n",
"Created a chunk of size 1642, which is longer than the specified 1000\n",
"Created a chunk of size 1376, which is longer than the specified 1000\n",
"Created a chunk of size 1253, which is longer than the specified 1000\n",
"Created a chunk of size 1642, which is longer than the specified 1000\n",
"Created a chunk of size 1419, which is longer than the specified 1000\n",
"Created a chunk of size 1438, which is longer than the specified 1000\n",
"Created a chunk of size 1427, which is longer than the specified 1000\n",
"Created a chunk of size 1684, which is longer than the specified 1000\n",
"Created a chunk of size 1760, which is longer than the specified 1000\n",
"Created a chunk of size 1157, which is longer than the specified 1000\n",
"Created a chunk of size 2504, which is longer than the specified 1000\n",
"Created a chunk of size 1082, which is longer than the specified 1000\n",
"Created a chunk of size 2268, which is longer than the specified 1000\n",
"Created a chunk of size 1784, which is longer than the specified 1000\n",
"Created a chunk of size 1311, which is longer than the specified 1000\n",
"Created a chunk of size 2972, which is longer than the specified 1000\n",
"Created a chunk of size 1144, which is longer than the specified 1000\n",
"Created a chunk of size 1825, which is longer than the specified 1000\n",
"Created a chunk of size 1508, which is longer than the specified 1000\n",
"Created a chunk of size 2901, which is longer than the specified 1000\n",
"Created a chunk of size 1715, which is longer than the specified 1000\n",
"Created a chunk of size 1062, which is longer than the specified 1000\n",
"Created a chunk of size 1206, which is longer than the specified 1000\n",
"Created a chunk of size 1102, which is longer than the specified 1000\n",
"Created a chunk of size 1184, which is longer than the specified 1000\n",
"Created a chunk of size 1002, which is longer than the specified 1000\n",
"Created a chunk of size 1065, which is longer than the specified 1000\n",
"Created a chunk of size 1871, which is longer than the specified 1000\n",
"Created a chunk of size 1754, which is longer than the specified 1000\n",
"Created a chunk of size 2413, which is longer than the specified 1000\n",
"Created a chunk of size 1771, which is longer than the specified 1000\n",
"Created a chunk of size 2054, which is longer than the specified 1000\n",
"Created a chunk of size 2000, which is longer than the specified 1000\n",
"Created a chunk of size 2061, which is longer than the specified 1000\n",
"Created a chunk of size 1066, which is longer than the specified 1000\n",
"Created a chunk of size 1419, which is longer than the specified 1000\n",
"Created a chunk of size 1368, which is longer than the specified 1000\n",
"Created a chunk of size 1008, which is longer than the specified 1000\n",
"Created a chunk of size 1227, which is longer than the specified 1000\n",
"Created a chunk of size 1745, which is longer than the specified 1000\n",
"Created a chunk of size 2296, which is longer than the specified 1000\n",
"Created a chunk of size 1083, which is longer than the specified 1000\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"3477\n"
]
}
],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(docs)\n",
"print(f\"{len(texts)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then embed chunks and upload them to the DeepLake.\n",
"\n",
"This can take several minutes. "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', document_model_name='text-embedding-ada-002', query_model_name='text-embedding-ada-002', embedding_ctx_length=8191, openai_api_key=None, openai_organization=None, allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"embeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.vectorstores import DeepLake\n",
"\n",
"db = DeepLake.from_documents(texts, embeddings, dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\")\n",
"db"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Question Answering\n",
"First load the dataset, construct the retriever, then construct the Conversational Chain"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/user_name/langchain-code\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"hub://user_name/langchain-code loaded successfully.\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Deep Lake Dataset in hub://user_name/langchain-code already exists, loading from the storage\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://user_name/langchain-code', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (3477, 1536) float32 None \n",
" ids text (3477, 1) str None \n",
" metadata json (3477, 1) str None \n",
" text text (3477, 1) str None \n"
]
}
],
"source": [
"db = DeepLake(dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\", read_only=True, embedding_function=embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"retriever = db.as_retriever()\n",
"retriever.search_kwargs['distance_metric'] = 'cos'\n",
"retriever.search_kwargs['fetch_k'] = 20\n",
"retriever.search_kwargs['maximal_marginal_relevance'] = True\n",
"retriever.search_kwargs['k'] = 20"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def filter(x):\n",
" # filter based on source code\n",
" if 'something' in x['text'].data()['value']:\n",
" return False\n",
" \n",
" # filter based on path e.g. extension\n",
" metadata = x['metadata'].data()['value']\n",
" return 'only_this' in metadata['source'] or 'also_that' in metadata['source']\n",
"\n",
"### turn on below for custom filtering\n",
"# retriever.search_kwargs['filter'] = filter"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import ConversationalRetrievalChain\n",
"\n",
"model = ChatOpenAI(model='gpt-3.5-turbo') # 'ada' 'gpt-3.5-turbo' 'gpt-4',\n",
"qa = ConversationalRetrievalChain.from_llm(model,retriever=retriever)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"questions = [\n",
" \"What is the class hierarchy?\",\n",
" # \"What classes are derived from the Chain class?\",\n",
" # \"What classes and functions in the ./langchain/utilities/ forlder are not covered by unit tests?\",\n",
" # \"What one improvement do you propose in code in relation to the class herarchy for the Chain class?\",\n",
"] \n",
"chat_history = []\n",
"\n",
"for question in questions: \n",
" result = qa({\"question\": question, \"chat_history\": chat_history})\n",
" chat_history.append((question, result['answer']))\n",
" print(f\"-> **Question**: {question} \\n\")\n",
" print(f\"**Answer**: {result['answer']} \\n\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"-> **Question**: What is the class hierarchy? \n",
"\n",
"**Answer**: There are several class hierarchies in the provided code, so I'll list a few:\n",
"\n",
"1. `BaseModel` -> `ConstitutionalPrinciple`: `ConstitutionalPrinciple` is a subclass of `BaseModel`.\n",
"2. `BasePromptTemplate` -> `StringPromptTemplate`, `AIMessagePromptTemplate`, `BaseChatPromptTemplate`, `ChatMessagePromptTemplate`, `ChatPromptTemplate`, `HumanMessagePromptTemplate`, `MessagesPlaceholder`, `SystemMessagePromptTemplate`, `FewShotPromptTemplate`, `FewShotPromptWithTemplates`, `Prompt`, `PromptTemplate`: All of these classes are subclasses of `BasePromptTemplate`.\n",
"3. `APIChain`, `Chain`, `MapReduceDocumentsChain`, `MapRerankDocumentsChain`, `RefineDocumentsChain`, `StuffDocumentsChain`, `HypotheticalDocumentEmbedder`, `LLMChain`, `LLMBashChain`, `LLMCheckerChain`, `LLMMathChain`, `LLMRequestsChain`, `PALChain`, `QAWithSourcesChain`, `VectorDBQAWithSourcesChain`, `VectorDBQA`, `SQLDatabaseChain`: All of these classes are subclasses of `Chain`.\n",
"4. `BaseLoader`: `BaseLoader` is a subclass of `ABC`.\n",
"5. `BaseTracer` -> `ChainRun`, `LLMRun`, `SharedTracer`, `ToolRun`, `Tracer`, `TracerException`, `TracerSession`: All of these classes are subclasses of `BaseTracer`.\n",
"6. `OpenAIEmbeddings`, `HuggingFaceEmbeddings`, `CohereEmbeddings`, `JinaEmbeddings`, `LlamaCppEmbeddings`, `HuggingFaceHubEmbeddings`, `TensorflowHubEmbeddings`, `SagemakerEndpointEmbeddings`, `HuggingFaceInstructEmbeddings`, `SelfHostedEmbeddings`, `SelfHostedHuggingFaceEmbeddings`, `SelfHostedHuggingFaceInstructEmbeddings`, `FakeEmbeddings`, `AlephAlphaAsymmetricSemanticEmbedding`, `AlephAlphaSymmetricSemanticEmbedding`: All of these classes are subclasses of `BaseLLM`. \n",
"\n",
"\n",
"-> **Question**: What classes are derived from the Chain class? \n",
"\n",
"**Answer**: There are multiple classes that are derived from the Chain class. Some of them are:\n",
"- APIChain\n",
"- AnalyzeDocumentChain\n",
"- ChatVectorDBChain\n",
"- CombineDocumentsChain\n",
"- ConstitutionalChain\n",
"- ConversationChain\n",
"- GraphQAChain\n",
"- HypotheticalDocumentEmbedder\n",
"- LLMChain\n",
"- LLMCheckerChain\n",
"- LLMRequestsChain\n",
"- LLMSummarizationCheckerChain\n",
"- MapReduceChain\n",
"- OpenAPIEndpointChain\n",
"- PALChain\n",
"- QAWithSourcesChain\n",
"- RetrievalQA\n",
"- RetrievalQAWithSourcesChain\n",
"- SequentialChain\n",
"- SQLDatabaseChain\n",
"- TransformChain\n",
"- VectorDBQA\n",
"- VectorDBQAWithSourcesChain\n",
"\n",
"There might be more classes that are derived from the Chain class as it is possible to create custom classes that extend the Chain class.\n",
"\n",
"\n",
"-> **Question**: What classes and functions in the ./langchain/utilities/ forlder are not covered by unit tests? \n",
"\n",
"**Answer**: All classes and functions in the `./langchain/utilities/` folder seem to have unit tests written for them. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -18,31 +18,13 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Define OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. For full documentation of Deep Lake please follow https://docs.activeloop.ai/ and API reference https://docs.deeplake.ai/en/latest/"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import DeepLake\n",
"Define OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. For full documentation of Deep Lake please follow [docs](https://docs.activeloop.ai/) and [API reference](https://docs.deeplake.ai/en/latest/).\n",
"\n",
"os.environ['OPENAI_API_KEY']='sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at https://app.activeloop.ai"
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the [platform](https://app.activeloop.ai)"
]
},
{
@@ -51,7 +33,15 @@
"metadata": {},
"outputs": [],
"source": [
"!activeloop login -t <TOKEN>"
"import os\n",
"import getpass\n",
"\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import DeepLake\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
"os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
@@ -143,15 +133,35 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/davitbun/twitter-algorithm\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"hub://davitbun/twitter-algorithm loaded successfully.\n",
"\n"
]
@@ -184,7 +194,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -205,7 +215,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -224,7 +234,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -267,9 +277,14 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"-> **Question**: What does favCountParams do? \n",
"\n",
"**Answer**: `favCountParams` is an optional ThriftLinearFeatureRankingParams instance that represents the parameters related to the \"favorite count\" feature in the ranking process. It is used to control the weight of the favorite count feature while ranking tweets. The favorite count is the number of times a tweet has been marked as a favorite by users, and it is considered an important signal in the ranking of tweets. By using `favCountParams`, the system can adjust the importance of the favorite count while calculating the final ranking score of a tweet. \n",
"\n",
"-> **Question**: is it Likes + Bookmarks, or not clear from the code?\n",
"\n",
"**Answer**: From the provided code, it is not clear if the favorite count metric is determined by the sum of likes and bookmarks. The favorite count is mentioned in the code, but there is no explicit reference to how it is calculated in terms of likes and bookmarks. \n",
@@ -423,7 +438,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.0"
}
},
"nbformat": 4,

View File

@@ -1,5 +1,5 @@
Evaluation
==============
==========
.. note::
`Conceptual Guide <https://docs.langchain.com/docs/use-cases/evaluation>`_
@@ -83,7 +83,7 @@ The existing examples we have are:
Other Examples
------------
--------------
In addition, we also have some more generic resources for evaluation.