Separate deepale vector store (#29902)

Thank you for contributing to LangChain! - [ ] **PR title**: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-07-05 20:58:25 +00:00 · 2025-02-20 21:37:19 +04:00 · 2025-02-20 21:37:19 +04:00 · ec403c442a
commit ec403c442a
parent 3acf842e35
5 changed files with 282 additions and 709 deletions
--- a/cookbook/code-analysis-deeplake.ipynb
+++ b/cookbook/code-analysis-deeplake.ipynb
@ -66,7 +66,7 @@
   },
   "outputs": [],
   "source": [
-    "#!python3 -m pip install --upgrade langchain deeplake openai"
+    "#!python3 -m pip install --upgrade langchain langchain-deeplake openai"
   ]
  },
  {
@ -666,89 +666,26 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": null,
   "metadata": {
    "tags": []
   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Your Deep Lake dataset has been successfully created!\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      " \r"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Dataset(path='hub://adilkhan/langchain-code', tensors=['embedding', 'id', 'metadata', 'text'])\n",
-      "\n",
-      "  tensor      htype       shape       dtype  compression\n",
-      "  -------    -------     -------     -------  ------- \n",
-      " embedding  embedding  (8244, 1536)  float32   None   \n",
-      "    id        text      (8244, 1)      str     None   \n",
-      " metadata     json      (8244, 1)      str     None   \n",
-      "   text       text      (8244, 1)      str     None   \n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": []
-    },
-    {
-     "data": {
-      "text/plain": [
-       "<langchain_community.vectorstores.deeplake.DeepLake at 0x7fe1b67d7a30>"
-      ]
-     },
-     "execution_count": 15,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
   "source": [
-    "from langchain_community.vectorstores import DeepLake\n",
+    "from langchain_deeplake.vectorstores import DeeplakeVectorStore\n",
    "\n",
    "username = \"<USERNAME_OR_ORG>\"\n",
    "\n",
    "\n",
-    "db = DeepLake.from_documents(\n",
-    "    texts, embeddings, dataset_path=f\"hub://{username}/langchain-code\", overwrite=True\n",
+    "db = DeeplakeVectorStore.from_documents(\n",
+    "    documents=texts,\n",
+    "    embedding=embeddings,\n",
+    "    dataset_path=f\"hub://{username}/langchain-code\",\n",
+    "    overwrite=True,\n",
    ")\n",
    "db"
   ]
  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "`Optional`: You can also use Deep Lake's Managed Tensor Database as a hosting service and run queries there. In order to do so, it is necessary to specify the runtime parameter as {'tensor_db': True} during the creation of the vector store. This configuration enables the execution of queries on the Managed Tensor Database, rather than on the client side. It should be noted that this functionality is not applicable to datasets stored locally or in-memory. In the event that a vector store has already been created outside of the Managed Tensor Database, it is possible to transfer it to the Managed Tensor Database by following the prescribed steps."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# from langchain_community.vectorstores import DeepLake\n",
-    "\n",
-    "# db = DeepLake.from_documents(\n",
-    "#     texts, embeddings, dataset_path=f\"hub://{<org_id>}/langchain-code\", runtime={\"tensor_db\": True}\n",
-    "# )\n",
-    "# db"
-   ]
-  },
  {
   "attachments": {},
   "cell_type": "markdown",
@ -760,24 +697,16 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": null,
   "metadata": {
    "tags": []
   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Deep Lake Dataset in hub://adilkhan/langchain-code already exists, loading from the storage\n"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
-    "db = DeepLake(\n",
+    "db = DeeplakeVectorStore(\n",
    "    dataset_path=f\"hub://{username}/langchain-code\",\n",
    "    read_only=True,\n",
-    "    embedding=embeddings,\n",
+    "    embedding_function=embeddings,\n",
    ")"
   ]
  },
@ -796,36 +725,6 @@
    "retriever.search_kwargs[\"k\"] = 20"
   ]
  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "def filter(x):\n",
-    "    # filter based on source code\n",
-    "    if \"something\" in x[\"text\"].data()[\"value\"]:\n",
-    "        return False\n",
-    "\n",
-    "    # filter based on path e.g. extension\n",
-    "    metadata = x[\"metadata\"].data()[\"value\"]\n",
-    "    return \"only_this\" in metadata[\"source\"] or \"also_that\" in metadata[\"source\"]\n",
-    "\n",
-    "\n",
-    "### turn on below for custom filtering\n",
-    "# retriever.search_kwargs['filter'] = filter"
-   ]
-  },
  {
   "cell_type": "code",
   "execution_count": 20,
@ -837,10 +736,8 @@
    "from langchain.chains import ConversationalRetrievalChain\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
-    "model = ChatOpenAI(\n",
-    "    model_name=\"gpt-3.5-turbo-0613\"\n",
-    ")  # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
-    "qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
+    "model = ChatOpenAI(model=\"gpt-3.5-turbo-0613\")  # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
+    "qa = RetrievalQA.from_llm(model, retriever=retriever)"
   ]
  },
  {
--- a/docs/docs/integrations/providers/deeplake.mdx
+++ b/docs/docs/integrations/providers/deeplake.mdx
@ -0,0 +1,16 @@
+# Deeplake
+
+[Deeplake](https://www.deeplake.ai/) is a database optimized for AI and deep learning
+applications.
+
+
+## Installation and Setup
+
+```bash
+pip install langchain-deeplake
+```
+
+## Vector stores
+
+See detail on available vector stores
+[here](/docs/integrations/vectorstores/activeloop_deeplake).
--- a/docs/docs/integrations/vectorstores/activeloop_deeplake.ipynb
+++ b/docs/docs/integrations/vectorstores/activeloop_deeplake.ipynb
--- a/libs/community/langchain_community/vectorstores/deeplake.py
+++ b/libs/community/langchain_community/vectorstores/deeplake.py
@ -15,6 +15,7 @@ try:
 except ImportError:
    _DEEPLAKE_INSTALLED = False

+from langchain_core._api import deprecated
 from langchain_core.documents import Document
 from langchain_core.embeddings import Embeddings
 from langchain_core.vectorstores import VectorStore
@ -24,6 +25,18 @@ from langchain_community.vectorstores.utils import maximal_marginal_relevance
 logger = logging.getLogger(__name__)


+@deprecated(
+    since="0.3.3",
+    removal="1.0",
+    message=(
+        "This class is deprecated and will be removed in a future version. "
+        "You can swap to using the `DeeplakeVectorStore`"
+        " implementation in `langchain-deeplake`. "
+        "Please do not submit further PRs to this class."
+        "See <https://github.com/activeloopai/langchain-deeplake>"
+    ),
+    alternative_import="langchain_deeplake.DeeplakeVectorStore",
+)
 class DeepLake(VectorStore):
    """`Activeloop Deep Lake` vector store.

--- a/libs/packages.yml
+++ b/libs/packages.yml
@ -445,6 +445,9 @@ packages:
  repo: Shikenso-Analytics/langchain-discord
  downloads: 1
  downloads_updated_at: '2025-02-15T16:00:00.000000+00:00'
+- name: langchain-deeplake
+  path: .
+  repo: activeloopai/langchain-deeplake
 - name: langchain-cognee
  repo: topoteretes/langchain-cognee
  path: .