Separate deepale vector store (#29902)

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"

- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
Levon Ghukasyan 2025-02-20 21:37:19 +04:00 committed by GitHub
parent 3acf842e35
commit ec403c442a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 282 additions and 709 deletions

View File

@ -66,7 +66,7 @@
},
"outputs": [],
"source": [
"#!python3 -m pip install --upgrade langchain deeplake openai"
"#!python3 -m pip install --upgrade langchain langchain-deeplake openai"
]
},
{
@ -666,89 +666,26 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Your Deep Lake dataset has been successfully created!\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://adilkhan/langchain-code', tensors=['embedding', 'id', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding embedding (8244, 1536) float32 None \n",
" id text (8244, 1) str None \n",
" metadata json (8244, 1) str None \n",
" text text (8244, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": []
},
{
"data": {
"text/plain": [
"<langchain_community.vectorstores.deeplake.DeepLake at 0x7fe1b67d7a30>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"from langchain_community.vectorstores import DeepLake\n",
"from langchain_deeplake.vectorstores import DeeplakeVectorStore\n",
"\n",
"username = \"<USERNAME_OR_ORG>\"\n",
"\n",
"\n",
"db = DeepLake.from_documents(\n",
" texts, embeddings, dataset_path=f\"hub://{username}/langchain-code\", overwrite=True\n",
"db = DeeplakeVectorStore.from_documents(\n",
" documents=texts,\n",
" embedding=embeddings,\n",
" dataset_path=f\"hub://{username}/langchain-code\",\n",
" overwrite=True,\n",
")\n",
"db"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"`Optional`: You can also use Deep Lake's Managed Tensor Database as a hosting service and run queries there. In order to do so, it is necessary to specify the runtime parameter as {'tensor_db': True} during the creation of the vector store. This configuration enables the execution of queries on the Managed Tensor Database, rather than on the client side. It should be noted that this functionality is not applicable to datasets stored locally or in-memory. In the event that a vector store has already been created outside of the Managed Tensor Database, it is possible to transfer it to the Managed Tensor Database by following the prescribed steps."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# from langchain_community.vectorstores import DeepLake\n",
"\n",
"# db = DeepLake.from_documents(\n",
"# texts, embeddings, dataset_path=f\"hub://{<org_id>}/langchain-code\", runtime={\"tensor_db\": True}\n",
"# )\n",
"# db"
]
},
{
"attachments": {},
"cell_type": "markdown",
@ -760,24 +697,16 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Deep Lake Dataset in hub://adilkhan/langchain-code already exists, loading from the storage\n"
]
}
],
"outputs": [],
"source": [
"db = DeepLake(\n",
"db = DeeplakeVectorStore(\n",
" dataset_path=f\"hub://{username}/langchain-code\",\n",
" read_only=True,\n",
" embedding=embeddings,\n",
" embedding_function=embeddings,\n",
")"
]
},
@ -796,36 +725,6 @@
"retriever.search_kwargs[\"k\"] = 20"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def filter(x):\n",
" # filter based on source code\n",
" if \"something\" in x[\"text\"].data()[\"value\"]:\n",
" return False\n",
"\n",
" # filter based on path e.g. extension\n",
" metadata = x[\"metadata\"].data()[\"value\"]\n",
" return \"only_this\" in metadata[\"source\"] or \"also_that\" in metadata[\"source\"]\n",
"\n",
"\n",
"### turn on below for custom filtering\n",
"# retriever.search_kwargs['filter'] = filter"
]
},
{
"cell_type": "code",
"execution_count": 20,
@ -837,10 +736,8 @@
"from langchain.chains import ConversationalRetrievalChain\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"model = ChatOpenAI(\n",
" model_name=\"gpt-3.5-turbo-0613\"\n",
") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
"model = ChatOpenAI(model=\"gpt-3.5-turbo-0613\") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
"qa = RetrievalQA.from_llm(model, retriever=retriever)"
]
},
{

View File

@ -0,0 +1,16 @@
# Deeplake
[Deeplake](https://www.deeplake.ai/) is a database optimized for AI and deep learning
applications.
## Installation and Setup
```bash
pip install langchain-deeplake
```
## Vector stores
See detail on available vector stores
[here](/docs/integrations/vectorstores/activeloop_deeplake).

File diff suppressed because it is too large Load Diff

View File

@ -15,6 +15,7 @@ try:
except ImportError:
_DEEPLAKE_INSTALLED = False
from langchain_core._api import deprecated
from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings
from langchain_core.vectorstores import VectorStore
@ -24,6 +25,18 @@ from langchain_community.vectorstores.utils import maximal_marginal_relevance
logger = logging.getLogger(__name__)
@deprecated(
since="0.3.3",
removal="1.0",
message=(
"This class is deprecated and will be removed in a future version. "
"You can swap to using the `DeeplakeVectorStore`"
" implementation in `langchain-deeplake`. "
"Please do not submit further PRs to this class."
"See <https://github.com/activeloopai/langchain-deeplake>"
),
alternative_import="langchain_deeplake.DeeplakeVectorStore",
)
class DeepLake(VectorStore):
"""`Activeloop Deep Lake` vector store.

View File

@ -445,6 +445,9 @@ packages:
repo: Shikenso-Analytics/langchain-discord
downloads: 1
downloads_updated_at: '2025-02-15T16:00:00.000000+00:00'
- name: langchain-deeplake
path: .
repo: activeloopai/langchain-deeplake
- name: langchain-cognee
repo: topoteretes/langchain-cognee
path: .