[minor] Deep Lake auth improvements in docs, kwargs pass, faster tests (#2927)

Minor cosmetic changes 
- Activeloop environment cred authentication in notebooks with
`getpass.getpass` (instead of CLI which not always works)
- much faster tests with Deep Lake pytest mode on 
- Deep Lake kwargs pass

Notes
- I put pytest environment creds inside `vectorstores/conftest.py`, but
feel free to suggest a better location. For context, if I put in
`test_deeplake.py`, `ruff` doesn't let me to set them before import
deeplake

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
This commit is contained in:
Davit Buniatyan
2023-04-15 10:49:16 -07:00
committed by GitHub
parent c4ae8c1d24
commit b3a5b51728
5 changed files with 348 additions and 101 deletions

View File

@@ -80,7 +80,7 @@
},
"outputs": [
{
"name": "stdin",
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
@@ -96,10 +96,11 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at https://app.activeloop.ai"
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at [app.activeloop.ai](https://app.activeloop.ai)"
]
},
{
@@ -110,7 +111,7 @@
},
"outputs": [
{
"name": "stdin",
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
@@ -118,37 +119,7 @@
}
],
"source": [
"DEEPLAKE_ACCOUNT_NAME = getpass()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"os.environ['DEEPLAKE_KEY'] = getpass()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!activeloop login -t $DEEPLAKE_KEY"
"os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')"
]
},
{

View File

@@ -18,31 +18,13 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Define OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. For full documentation of Deep Lake please follow https://docs.activeloop.ai/ and API reference https://docs.deeplake.ai/en/latest/"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import DeepLake\n",
"Define OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. For full documentation of Deep Lake please follow [docs](https://docs.activeloop.ai/) and [API reference](https://docs.deeplake.ai/en/latest/).\n",
"\n",
"os.environ['OPENAI_API_KEY']='sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at https://app.activeloop.ai"
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the [platform](https://app.activeloop.ai)"
]
},
{
@@ -51,7 +33,15 @@
"metadata": {},
"outputs": [],
"source": [
"!activeloop login -t <TOKEN>"
"import os\n",
"import getpass\n",
"\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import DeepLake\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
"os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
@@ -143,15 +133,35 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/davitbun/twitter-algorithm\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"hub://davitbun/twitter-algorithm loaded successfully.\n",
"\n"
]
@@ -184,7 +194,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -205,7 +215,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -224,7 +234,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -267,9 +277,14 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"-> **Question**: What does favCountParams do? \n",
"\n",
"**Answer**: `favCountParams` is an optional ThriftLinearFeatureRankingParams instance that represents the parameters related to the \"favorite count\" feature in the ranking process. It is used to control the weight of the favorite count feature while ranking tweets. The favorite count is the number of times a tweet has been marked as a favorite by users, and it is considered an important signal in the ranking of tweets. By using `favCountParams`, the system can adjust the importance of the favorite count while calculating the final ranking score of a tweet. \n",
"\n",
"-> **Question**: is it Likes + Bookmarks, or not clear from the code?\n",
"\n",
"**Answer**: From the provided code, it is not clear if the favorite count metric is determined by the sum of likes and bookmarks. The favorite count is mentioned in the code, but there is no explicit reference to how it is calculated in terms of likes and bookmarks. \n",
@@ -423,7 +438,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.0"
}
},
"nbformat": 4,