Improvements to the Clarifai integration (#9290)

- Improved docs - Improved performance in multiple ways through batching, threading, etc. - fixed error message - Added support for metadata filtering during similarity search. @baskaryan PTAL
2025-09-04 12:39:32 +00:00 · 2023-08-21 15:53:36 -04:00
parent 66a47d9a61
commit 949b2cf177
6 changed files with 261 additions and 100 deletions
--- a/docs/extras/integrations/providers/clarifai.mdx
+++ b/docs/extras/integrations/providers/clarifai.mdx
@@ -37,7 +37,7 @@ There is a Clarifai Embedding model in LangChain, which you can access with:
 from langchain.embeddings import ClarifaiEmbeddings
 embeddings = ClarifaiEmbeddings(pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)
 ```
-For more details, the docs on the Clarifai Embeddings wrapper provide a [detailed walthrough](/docs/integrations/text_embedding/clarifai.html).
+For more details, the docs on the Clarifai Embeddings wrapper provide a [detailed walkthrough](/docs/integrations/text_embedding/clarifai.html).

 ## Vectorstore

@@ -49,4 +49,4 @@ You an also add data directly from LangChain as well, and the auto-indexing will
 from langchain.vectorstores import Clarifai
 clarifai_vector_db = Clarifai.from_texts(user_id=USER_ID, app_id=APP_ID, texts=texts, pat=CLARIFAI_PAT, number_of_docs=NUMBER_OF_DOCS, metadatas = metadatas)
 ```
-For more details, the docs on the Clarifai vector store provide a [detailed walthrough](/docs/integrations/text_embedding/clarifai.html).
+For more details, the docs on the Clarifai vector store provide a [detailed walkthrough](/docs/integrations/text_embedding/clarifai.html).
--- a/docs/extras/integrations/text_embedding/clarifai.ipynb
+++ b/docs/extras/integrations/text_embedding/clarifai.ipynb
@@ -130,9 +130,9 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "USER_ID = \"openai\"\n",
-    "APP_ID = \"embed\"\n",
-    "MODEL_ID = \"text-embedding-ada\"\n",
+    "USER_ID = \"salesforce\"\n",
+    "APP_ID = \"blip\"\n",
+    "MODEL_ID = \"multimodal-embedder-blip-2\"\n",
    "\n",
    "# You can provide a specific model version as the model_version_id arg.\n",
    "# MODEL_VERSION_ID = \"MODEL_VERSION_ID\""
--- a/docs/extras/integrations/vectorstores/clarifai.ipynb
+++ b/docs/extras/integrations/vectorstores/clarifai.ipynb
@@ -53,7 +53,15 @@
   "execution_count": 1,
   "id": "c1e38361-c1fe-4ac6-86e9-c90ebaf7ae87",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      " ········\n"
+     ]
+    }
+   ],
   "source": [
    "# Please login and get your API key from  https://clarifai.com/settings/security\n",
    "from getpass import getpass\n",
@@ -61,18 +69,9 @@
    "CLARIFAI_PAT = getpass()"
   ]
  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "320af802-9271-46ee-948f-d2453933d44b",
-   "metadata": {},
-   "source": [
-    "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
-   ]
-  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 6,
   "id": "aac9563e",
   "metadata": {
    "tags": []
@@ -99,7 +98,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 2,
   "id": "4d853395",
   "metadata": {},
   "outputs": [],
@@ -134,7 +133,7 @@
    "    \"I love playing soccer with my friends\",\n",
    "]\n",
    "\n",
-    "metadatas = [{\"id\": i, \"text\": text} for i, text in enumerate(texts)]"
+    "metadatas = [{\"id\": i, \"text\": text, \"source\": \"book 1\", \"category\": [\"books\", \"modern\"]} for i, text in enumerate(texts)]"
   ]
  },
  {
@@ -156,21 +155,17 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
   "id": "e755cdce",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[Document(page_content='I really enjoy spending time with you', metadata={'text': 'I really enjoy spending time with you', 'id': 0.0}),\n",
-       " Document(page_content='I went to the movies yesterday', metadata={'text': 'I went to the movies yesterday', 'id': 3.0}),\n",
-       " Document(page_content='zab', metadata={'page': '2'}),\n",
-       " Document(page_content='zab', metadata={'page': '2'})]"
+       "[Document(page_content='I really enjoy spending time with you', metadata={'text': 'I really enjoy spending time with you', 'id': 0.0, 'source': 'book 1', 'category': ['books', 'modern']}),\n",
+       " Document(page_content='I went to the movies yesterday', metadata={'text': 'I went to the movies yesterday', 'id': 3.0, 'source': 'book 1', 'category': ['books', 'modern']})]"
      ]
     },
-     "execution_count": 7,
-     "metadata": {},
     "output_type": "execute_result"
    }
   ],
@@ -179,6 +174,21 @@
    "docs"
   ]
  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "140103ec-0936-454a-9f4a-7d5beefc138f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# There is lots powerful filtering you can do within an app by leveraging metadata filters. \n",
+    "# This one will limit the similarity query to only the texts that have key of \"source\" matching value of \"book 1\"\n",
+    "book1_similar_docs = clarifai_vector_db.similarity_search(\"I would love to see you\", filter={\"source\": \"book 1\"})\n",
+    "\n",
+    "# you can also use lists in the input's metadata and then select things that match an item in the list. This is useful for categories like below:\n",
+    "book_category_similar_docs = clarifai_vector_db.similarity_search(\"I would love to see you\", filter={\"category\": [\"books\"]})"
+   ]
+  },
  {
   "attachments": {},
   "cell_type": "markdown",
@@ -249,7 +259,7 @@
    "    user_id=USER_ID,\n",
    "    app_id=APP_ID,\n",
    "    documents=docs,\n",
-    "    pat=CLARIFAI_PAT_KEY,\n",
+    "    pat=CLARIFAI_PAT,\n",
    "    number_of_docs=NUMBER_OF_DOCS,\n",
    ")"
   ]
@@ -278,6 +288,55 @@
    "docs = clarifai_vector_db.similarity_search(\"Texts related to criminals and violence\")\n",
    "docs"
   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "7b332ca4-416b-4ea6-99da-b6949f399d72",
+   "metadata": {},
+   "source": [
+    "## From existing App\n",
+    "Within Clarifai we have great tools for adding data to applications (essentially projects) via API or UI. Most users will already have done that before interacting with LangChain so this example will use the data in an existing app to perform searches. Check out our [API docs](https://docs.clarifai.com/api-guide/data/create-get-update-delete) and [UI docs](https://docs.clarifai.com/portal-guide/data). The Clarifai Application can then be used for semantic search to find relevant documents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "807c1141-591b-436d-abaa-f2c325e66d39",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "USER_ID = \"USERNAME_ID\"\n",
+    "APP_ID = \"APPLICATION_ID\"\n",
+    "NUMBER_OF_DOCS = 4"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "762d74ef-f7df-43d6-b121-4980c4059fc0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clarifai_vector_db = Clarifai(\n",
+    "    user_id=USER_ID,\n",
+    "    app_id=APP_ID,\n",
+    "    documents=docs,\n",
+    "    pat=CLARIFAI_PAT,\n",
+    "    number_of_docs=NUMBER_OF_DOCS,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7636b0f-68ab-4b8f-ba0f-3c27061e3631",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs = clarifai_vector_db.similarity_search(\"Texts related to criminals and violence\")\n",
+    "docs"
+   ]
  }
 ],
 "metadata": {