astradb: bootstrapping Astra DB as Partner Package (#16875)

**Description:** This PR introduces a new "Astra DB" Partner Package. So far only the vector store class is _duplicated_ there, all others following once this is validated and established. Along with the move to separate package, incidentally, the class name will change `AstraDB` => `AstraDBVectorStore`. The strategy has been to duplicate the module (with prospected removal from community at LangChain 0.2). Until then, the code will be kept in sync with minimal, known differences (there is a makefile target to automate drift control. Out of convenience with this check, the community package has a class `AstraDBVectorStore` aliased to `AstraDB` at the end of the module). With this PR several bugfixes and improvement come to the vector store, as well as a reshuffling of the doc pages/notebooks (Astra and Cassandra) to align with the move to a separate package. **Dependencies:** A brand new pyproject.toml in the new package, no changes otherwise. **Twitter handle:** `@rsprrs` --------- Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>
2025-06-23 07:09:31 +00:00 · 2024-02-16 00:50:59 +01:00 · 2024-02-16 00:50:59 +01:00 · 5240ecab99
commit 5240ecab99
parent f6f0ca1bae
33 changed files with 4622 additions and 448 deletions
--- a/.github/workflows/_integration_test.yml
+++ b/.github/workflows/_integration_test.yml
@ -67,6 +67,9 @@ jobs:
          WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
          PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
          PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
          ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
          ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
          ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
        run: |
          make integration_tests
--- a/.github/workflows/_release.yml
+++ b/.github/workflows/_release.yml
@ -187,6 +187,9 @@ jobs:
          WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
          PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
          PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
          ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
          ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
          ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
        run: make integration_tests
        working-directory: ${{ inputs.working-directory }}
--- a/docs/docs/integrations/document_loaders/cassandra.ipynb
+++ b/docs/docs/integrations/document_loaders/cassandra.ipynb
@ -72,57 +72,72 @@
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "### Init from a cassandra driver Session\n",
    "\n",
    "You need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "from cassandra.cluster import Cluster\n",
    "\n",
    "cluster = Cluster()\n",
    "session = cluster.connect()"
-   ],
+   ]
   "metadata": {
    "collapsed": false
   },
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "You need to provide the name of an existing keyspace of the Cassandra instance:"
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")"
-   ],
+   ]
   "metadata": {
    "collapsed": false
   },
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "Creating the document loader:"
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
@ -144,18 +159,21 @@
  },
  {
   "cell_type": "code",
-   "outputs": [],
+   "execution_count": 17,
   "source": [
    "docs = loader.load()"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-01-19T15:47:26.399472Z",
     "start_time": "2024-01-19T15:47:26.389145Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
-   "execution_count": 17
+   "outputs": [],
   "source": [
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "code",
@ -169,7 +187,9 @@
   "outputs": [
    {
     "data": {
-      "text/plain": "Document(page_content='Row(_id=\\'659bdffa16cbc4586b11a423\\', title=\\'Dangerous Men\\', reviewtext=\\'\"Dangerous Men,\"  the picture\\\\\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\\')', metadata={'table': 'movie_reviews', 'keyspace': 'default_keyspace'})"
+      "text/plain": [
       "Document(page_content='Row(_id=\\'659bdffa16cbc4586b11a423\\', title=\\'Dangerous Men\\', reviewtext=\\'\"Dangerous Men,\"  the picture\\\\\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\\')', metadata={'table': 'movie_reviews', 'keyspace': 'default_keyspace'})"
      ]
     },
     "execution_count": 19,
     "metadata": {},
@ -182,17 +202,27 @@
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "### Init from cassio\n",
    "\n",
    "It's also possible to use cassio to configure the session and keyspace."
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "import cassio\n",
@ -204,11 +234,16 @@
    ")\n",
    "\n",
    "docs = loader.load()"
-   ],
+   ]
   "metadata": {
    "collapsed": false
  },
-   "execution_count": null
+  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Attribution statement\n",
    "\n",
    "> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries."
   ]
  }
 ],
 "metadata": {
@ -233,7 +268,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.18"
+   "version": "3.9.17"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/llms/llm_caching.ipynb
+++ b/docs/docs/integrations/llms/llm_caching.ipynb
@ -1131,6 +1131,16 @@
    "print(llm(\"How come we always see one face of the moon?\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55dc84b3-37cb-4f19-b175-40e18e06f83f",
   "metadata": {},
   "source": [
    "#### Attribution statement\n",
    "\n",
    ">Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8712f8fc-bb89-4164-beb9-c672778bbd91",
@ -1588,7 +1598,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.1"
+   "version": "3.9.17"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/memory/astradb_chat_message_history.ipynb
+++ b/docs/docs/integrations/memory/astradb_chat_message_history.ipynb
@ -32,7 +32,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "%pip install --upgrade --quiet  \"astrapy>=0.6.2\""
+    "%pip install --upgrade --quiet  \"astrapy>=0.7.1\""
   ]
  },
  {
--- a/docs/docs/integrations/memory/cassandra_chat_message_history.ipynb
+++ b/docs/docs/integrations/memory/cassandra_chat_message_history.ipynb
@ -145,6 +145,24 @@
   "source": [
    "message_history.messages"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59902d0f-e9ba-4e3d-a7e0-ce202b9d3c43",
   "metadata": {},
   "source": [
    "#### Attribution statement\n",
    "\n",
    "> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7efaa51c-e9ee-4dce-80a4-eb9280a0dbe5",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
@ -163,7 +181,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.9.17"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/providers/astradb.mdx
+++ b/docs/docs/integrations/providers/astradb.mdx
@ -1,21 +1,17 @@
 # Astra DB
-This page lists the integrations available with [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) and [Apache Cassandra®](https://cassandra.apache.org/).
+> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Apache Cassandra® and made conveniently available
 > through an easy-to-use JSON API.
 ### Setup
 Install the following Python package:
 ```bash
-pip install "astrapy>=0.5.3"
+pip install "astrapy>=0.7.1"
 ```
-## Astra DB
+## Vector Store
 > DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available
 > through an easy-to-use JSON API.
 ### Vector Store
 ```python
 from langchain_community.vectorstores import AstraDB
@ -29,11 +25,22 @@ vector_store = AstraDB(
 Learn more in the [example notebook](/docs/integrations/vectorstores/astradb).
-### LLM Cache
+## Chat message history
 ```python
 from langchain_community.chat_message_histories import AstraDBChatMessageHistory
 message_history = AstraDBChatMessageHistory(
    session_id="test-session",
    api_endpoint="...",
    token="...",
 )
 ```
 ## LLM Cache
 ```python
 from langchain.globals import set_llm_cache
-from langchain.cache import AstraDBCache
+from langchain_community.cache import AstraDBCache
 set_llm_cache(AstraDBCache(
    api_endpoint="...",
    token="...",
@ -43,11 +50,11 @@ set_llm_cache(AstraDBCache(
 Learn more in the [example notebook](/docs/integrations/llms/llm_caching#astra-db-caches) (scroll to the Astra DB section).
-### Semantic LLM Cache
+## Semantic LLM Cache
 ```python
 from langchain.globals import set_llm_cache
-from langchain.cache import AstraDBSemanticCache
+from langchain_community.cache import AstraDBSemanticCache
 set_llm_cache(AstraDBSemanticCache(
    embedding=my_embedding,
    api_endpoint="...",
@ -57,20 +64,9 @@ set_llm_cache(AstraDBSemanticCache(
 Learn more in the [example notebook](/docs/integrations/llms/llm_caching#astra-db-caches) (scroll to the appropriate section).
 ### Chat message history
 ```python
 from langchain.memory import AstraDBChatMessageHistory
 message_history = AstraDBChatMessageHistory(
    session_id="test-session",
    api_endpoint="...",
    token="...",
 )
 ```
 Learn more in the [example notebook](/docs/integrations/memory/astradb_chat_message_history).
-### Document loader
+## Document loader
 ```python
 from langchain_community.document_loaders import AstraDBLoader
@ -83,7 +79,7 @@ loader = AstraDBLoader(
 Learn more in the [example notebook](/docs/integrations/document_loaders/astradb).
-### Self-querying retriever
+## Self-querying retriever
 ```python
 from langchain_community.vectorstores import AstraDB
@ -106,7 +102,7 @@ retriever = SelfQueryRetriever.from_llm(
 Learn more in the [example notebook](/docs/integrations/retrievers/self_query/astradb).
-### Store
+## Store
 ```python
 from langchain_community.storage import AstraDBStore
@ -119,7 +115,7 @@ store = AstraDBStore(
 Learn more in the [example notebook](/docs/integrations/stores/astradb#astradbstore).
-### Byte Store
+## Byte Store
 ```python
 from langchain_community.storage import AstraDBByteStore
@ -131,57 +127,3 @@ store = AstraDBByteStore(
 ```
 Learn more in the [example notebook](/docs/integrations/stores/astradb#astradbbytestore).
 ## Apache Cassandra and Astra DB through CQL
 > [Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.
 > Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html).
 > DataStax [Astra DB through CQL](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths.
 These databases use the CQL protocol (Cassandra Query Language).
 Hence, a different set of connectors, outlined below, shall be used.
 ### Vector Store
 ```python
 from langchain_community.vectorstores import Cassandra
 vector_store = Cassandra(
    embedding=my_embedding,
    table_name="my_store",
 )
 ```
 Learn more in the [example notebook](/docs/integrations/vectorstores/astradb#apache-cassandra-and-astra-db-through-cql) (scroll down to the CQL-specific section).
 ### Memory
 ```python
 from langchain.memory import CassandraChatMessageHistory
 message_history = CassandraChatMessageHistory(session_id="my-session")
 ```
 Learn more in the [example notebook](/docs/integrations/memory/cassandra_chat_message_history).
 ### LLM Cache
 ```python
 from langchain.cache import CassandraCache
 langchain.llm_cache = CassandraCache()
 ```
 Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the Cassandra section).
 ### Semantic LLM Cache
 ```python
 from langchain.cache import CassandraSemanticCache
 cassSemanticCache = CassandraSemanticCache(
    embedding=my_embedding,
    table_name="my_store",
 )
 ```
 Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the appropriate section).
--- a/docs/docs/integrations/providers/cassandra.mdx
+++ b/docs/docs/integrations/providers/cassandra.mdx
@ -0,0 +1,76 @@
 # Apache Cassandra
 > [Apache Cassandra®](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.
 > Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html).
 The integrations outlined in this page can be used with Cassandra as well as other CQL-compatible databases, i.e. those using the Cassandra Query Language protocol.
 ### Setup
 Install the following Python package:
 ```bash
 pip install "cassio>=0.1.4"
 ```
 ## Vector Store
 ```python
 from langchain_community.vectorstores import Cassandra
 vector_store = Cassandra(
    embedding=my_embedding,
    table_name="my_store",
 )
 ```
 Learn more in the [example notebook](/docs/integrations/vectorstores/cassandra).
 ## Chat message history
 ```python
 from langchain_community.chat_message_histories import CassandraChatMessageHistory
 message_history = CassandraChatMessageHistory(session_id="my-session")
 ```
 Learn more in the [example notebook](/docs/integrations/memory/cassandra_chat_message_history).
 ## LLM Cache
 ```python
 from langchain.globals import set_llm_cache
 from langchain_community.cache import CassandraCache
 set_llm_cache(CassandraCache())
 ```
 Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the Cassandra section).
 ## Semantic LLM Cache
 ```python
 from langchain.globals import set_llm_cache
 from langchain_community.cache import CassandraSemanticCache
 set_llm_cache(CassandraSemanticCache(
    embedding=my_embedding,
    table_name="my_store",
 ))
 ```
 Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the appropriate section).
 ## Document loader
 ```python
 from langchain_community.document_loaders import CassandraLoader
 loader = CassandraLoader(table="my_table")
 docs = loader.load()
 ```
 Learn more in the [example notebook](/docs/integrations/document_loaders/cassandra).
 #### Attribution statement
 > Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries.
--- a/docs/docs/integrations/vectorstores/astradb.ipynb
+++ b/docs/docs/integrations/vectorstores/astradb.ipynb
@ -1,14 +1,28 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "66d0270a-b74f-4110-901e-7960b00297af",
   "metadata": {},
   "source": [
    "# Astra DB\n",
    "\n",
    "This page provides a quickstart for using [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) as a Vector Store."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab8cd64f-3bb2-4f16-a0a9-12d7b1789bf6",
   "metadata": {},
   "source": [
    "> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Apache Cassandra® and made conveniently available through an easy-to-use JSON API."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2d6ca14-fb7e-4172-9aa0-a3119a064b96",
   "metadata": {},
   "source": [
    "# Astra DB\n",
    "\n",
    "This page provides a quickstart for using [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) and [Apache Cassandra®](https://cassandra.apache.org/) as a Vector Store.\n",
    "\n",
    "_Note: in addition to access to the database, an OpenAI API Key is required to run the full example._"
   ]
  },
@ -35,7 +49,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "%pip install --upgrade --quiet  \"astrapy>=0.5.3\""
+    "%pip install --upgrade --quiet \"astrapy>=0.7.1\""
   ]
  },
  {
@ -44,7 +58,7 @@
   "metadata": {},
   "source": [
    "_Note: depending on your LangChain setup, you may need to install/upgrade other dependencies needed for this demo_\n",
-    "_(specifically, recent versions of `datasets`, `openai`, `pypdf` and `tiktoken` are required)._"
+    "_(specifically, recent versions of `datasets`, `langchain-openai` and `pypdf` are required, along with `langchain-community`)._"
   ]
  },
  {
@ -89,28 +103,12 @@
    "embe = OpenAIEmbeddings()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd8caa76-bc41-429e-a93b-989ba13aff01",
   "metadata": {},
   "source": [
    "_Keep reading to connect with Astra DB. For usage with Apache Cassandra and Astra DB through CQL, scroll to the section below._"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22866f09-e10d-4f05-a24b-b9420129462e",
   "metadata": {},
   "source": [
-    "## Astra DB"
+    "## Import the Vector Store"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5fba47cc-3533-42fc-84b7-9dc14cd68b2b",
   "metadata": {},
   "source": [
    "DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API."
   ]
  },
  {
@ -128,10 +126,13 @@
   "id": "68f61b01-3e09-47c1-9d67-5d6915c86626",
   "metadata": {},
   "source": [
-    "### Astra DB connection parameters\n",
+    "## Connection parameters\n",
    "\n",
    "These are found on your Astra DB dashboard:\n",
    "\n",
    "- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n",
-    "- the Token looks like `AstraCS:6gBhNmsk135....`"
+    "- the Token looks like `AstraCS:6gBhNmsk135....`\n",
    "- you may optionally provide a _Namespace_ such as `my_namespace`"
   ]
  },
  {
@ -142,7 +143,21 @@
   "outputs": [],
   "source": [
    "ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n",
-    "ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")"
+    "ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
    "\n",
    "desired_namespace = input(\"(optional) Namespace = \")\n",
    "if desired_namespace:\n",
    "    ASTRA_DB_KEYSPACE = desired_namespace\n",
    "else:\n",
    "    ASTRA_DB_KEYSPACE = None"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "196268bd-a950-41c3-bede-f5b55f6a0804",
   "metadata": {},
   "source": [
    "Now you can create the vector store:"
   ]
  },
  {
@ -157,6 +172,7 @@
    "    collection_name=\"astra_vector_demo\",\n",
    "    api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
    "    token=ASTRA_DB_APPLICATION_TOKEN,\n",
    "    namespace=ASTRA_DB_KEYSPACE,\n",
    ")"
   ]
  },
@ -165,7 +181,7 @@
   "id": "9a348678-b2f6-46ca-9a0d-2eb4cc6b66b1",
   "metadata": {},
   "source": [
-    "### Load a dataset"
+    "## Load a dataset"
   ]
  },
  {
@ -243,7 +259,7 @@
   "id": "c031760a-1fc5-4855-adf2-02ed52fe2181",
   "metadata": {},
   "source": [
-    "### Run simple searches"
+    "## Run searches"
   ]
  },
  {
@ -318,12 +334,22 @@
    "    print(f\"* {res.page_content} [{res.metadata}]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60fda5df-14e4-4fb0-bd17-65a393fab8a9",
   "metadata": {},
   "source": [
    "### Async\n",
    "\n",
    "Note that the Astra DB vector store supports all fully async methods (`asimilarity_search`, `afrom_texts`, `adelete` and so on) natively, i.e. without thread wrapping involved."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1cc86edd-692b-4495-906c-ccfd13b03c23",
   "metadata": {},
   "source": [
-    "### Deleting stored documents"
+    "## Deleting stored documents"
   ]
  },
  {
@ -353,7 +379,7 @@
   "id": "847181ba-77d1-4a17-b7f9-9e2c3d8efd13",
   "metadata": {},
   "source": [
-    "### A minimal RAG chain"
+    "## A minimal RAG chain"
   ]
  },
  {
@ -452,7 +478,7 @@
   "id": "177610c7-50d0-4b7b-8634-b03338054c8e",
   "metadata": {},
   "source": [
-    "### Cleanup"
+    "## Cleanup"
   ]
  },
  {
@ -474,290 +500,6 @@
   "source": [
    "vstore.delete_collection()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94ebaab1-7cbf-4144-a147-7b0e32c43069",
   "metadata": {},
   "source": [
    "## Apache Cassandra and Astra DB through CQL"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc3931b4-211d-4f84-bcc0-51c127e3027c",
   "metadata": {},
   "source": [
    "[Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html).\n",
    "\n",
    "DataStax [Astra DB through CQL](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0055fbf-448d-4e46-9c40-28d43df25ca3",
   "metadata": {},
   "source": [
    "#### What sets this case apart from \"Astra DB\" above?\n",
    "\n",
    "Thanks to LangChain having a standardized `VectorStore` interface, most of the \"Astra DB\" section above applies to this case as well. However, this time the database uses the CQL protocol, which means you'll use a _different_ class this time and instantiate it in another way.\n",
    "\n",
    "The cells below show how you should get your `vstore` object in this case and how you can clean up the database resources at the end: for the rest, i.e. the actual usage of the vector store, you will be able to run the very code that was shown above.\n",
    "\n",
    "In other words, running this demo in full with Cassandra or Astra DB through CQL means:\n",
    "\n",
    "- **initialization as shown below**\n",
    "- \"Load a dataset\", _see above section_\n",
    "- \"Run simple searches\", _see above section_\n",
    "- \"MMR search\", _see above section_\n",
    "- \"Deleting stored documents\", _see above section_\n",
    "- \"A minimal RAG chain\", _see above section_\n",
    "- **cleanup as shown below**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23d12be2-745f-4e72-a82c-334a887bc7cd",
   "metadata": {},
   "source": [
    "### Initialization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3212542-79be-423e-8e1f-b8d725e3cda8",
   "metadata": {},
   "source": [
    "The class to use is the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "941af73e-a090-4fba-b23c-595757d470eb",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.vectorstores import Cassandra"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "414d1e72-f7c9-4b6d-bf6f-16075712c7e3",
   "metadata": {},
   "source": [
    "Now, depending on whether you connect to a Cassandra cluster or to Astra DB through CQL, you will provide different parameters when creating the vector store object."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48ecca56-71a4-4a91-b198-29384c44ce27",
   "metadata": {},
   "source": [
    "#### Initialization (Cassandra cluster)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55ebe958-5654-43e0-9aed-d607ffd3fa48",
   "metadata": {},
   "source": [
    "In this case, you first need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4642dafb-a065-4063-b58c-3d276f5ad07e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from cassandra.cluster import Cluster\n",
    "\n",
    "cluster = Cluster([\"127.0.0.1\"])\n",
    "session = cluster.connect()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "624c93bf-fb46-4350-bcfa-09ca09dc068f",
   "metadata": {},
   "source": [
    "You can now set the session, along with your desired keyspace name, as a global CassIO parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "92a4ab28-1c4f-4dad-9671-d47e0b1dde7b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import cassio\n",
    "\n",
    "CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")\n",
    "\n",
    "cassio.init(session=session, keyspace=CASSANDRA_KEYSPACE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b87a824-36f1-45b4-b54c-efec2a2de216",
   "metadata": {},
   "source": [
    "Now you can create the vector store:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "853a2a88-a565-4e24-8789-d78c213954a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "vstore = Cassandra(\n",
    "    embedding=embe,\n",
    "    table_name=\"cassandra_vector_demo\",\n",
    "    # session=None, keyspace=None  # Uncomment on older versions of LangChain\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "768ddf7a-0c3e-4134-ad38-25ac53c3da7a",
   "metadata": {},
   "source": [
    "#### Initialization (Astra DB through CQL)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ed4269a-b7e7-4503-9e66-5a11335c7681",
   "metadata": {},
   "source": [
    "In this case you initialize CassIO with the following connection parameters:\n",
    "\n",
    "- the Database ID, e.g. `01234567-89ab-cdef-0123-456789abcdef`\n",
    "- the Token, e.g. `AstraCS:6gBhNmsk135....` (it must be a \"Database Administrator\" token)\n",
    "- Optionally a Keyspace name (if omitted, the default one for the database will be used)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5fa6bd74-d4b2-45c5-9757-96dddc6242fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "ASTRA_DB_ID = input(\"ASTRA_DB_ID = \")\n",
    "ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
    "\n",
    "desired_keyspace = input(\"ASTRA_DB_KEYSPACE (optional, can be left empty) = \")\n",
    "if desired_keyspace:\n",
    "    ASTRA_DB_KEYSPACE = desired_keyspace\n",
    "else:\n",
    "    ASTRA_DB_KEYSPACE = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "add6e585-17ff-452e-8ef6-7e485ead0b06",
   "metadata": {},
   "outputs": [],
   "source": [
    "import cassio\n",
    "\n",
    "cassio.init(\n",
    "    database_id=ASTRA_DB_ID,\n",
    "    token=ASTRA_DB_APPLICATION_TOKEN,\n",
    "    keyspace=ASTRA_DB_KEYSPACE,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b305823c-bc98-4f3d-aabb-d7eb663ea421",
   "metadata": {},
   "source": [
    "Now you can create the vector store:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f45f3038-9d59-41cc-8b43-774c6aa80295",
   "metadata": {},
   "outputs": [],
   "source": [
    "vstore = Cassandra(\n",
    "    embedding=embe,\n",
    "    table_name=\"cassandra_vector_demo\",\n",
    "    # session=None, keyspace=None  # Uncomment on older versions of LangChain\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39284918-cf8a-49bb-a2d3-aef285bb2ffa",
   "metadata": {},
   "source": [
    "### Usage of the vector store"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cc1aead-d6ec-48a3-affe-1d0cffa955a9",
   "metadata": {},
   "source": [
    "_See the sections \"Load a dataset\" through \"A minimal RAG chain\" above._\n",
    "\n",
    "Speaking of the latter, you can check out a full RAG template for Astra DB through CQL [here](https://github.com/langchain-ai/langchain/tree/master/templates/cassandra-entomology-rag)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "096397d8-6622-4685-9f9d-7e238beca467",
   "metadata": {},
   "source": [
    "### Cleanup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc1e74f9-5500-41aa-836f-235b1ed5f20c",
   "metadata": {},
   "source": [
    "the following essentially retrieves the `Session` object from CassIO and runs a CQL `DROP TABLE` statement with it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5b82c33-0e77-4a37-852c-8d50edbdd991",
   "metadata": {},
   "outputs": [],
   "source": [
    "cassio.config.resolve_session().execute(\n",
    "    f\"DROP TABLE {cassio.config.resolve_keyspace()}.cassandra_vector_demo;\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c10ece4d-ae06-42ab-baf4-4d0ac2051743",
   "metadata": {},
   "source": [
    "### Learn more"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51ea8b69-7e15-458f-85aa-9fa199f95f9c",
   "metadata": {},
   "source": [
    "For more information, extended quickstarts and additional usage examples, please visit the [CassIO documentation](https://cassio.org/frameworks/langchain/about/) for more on using the LangChain `Cassandra` vector store."
   ]
  }
 ],
 "metadata": {
@ -776,7 +518,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.9.18"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/vectorstores/cassandra.ipynb
+++ b/docs/docs/integrations/vectorstores/cassandra.ipynb
@ -0,0 +1,651 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d2d6ca14-fb7e-4172-9aa0-a3119a064b96",
   "metadata": {},
   "source": [
    "# Apache Cassandra\n",
    "\n",
    "This page provides a quickstart for using [Apache Cassandra®](https://cassandra.apache.org/) as a Vector Store."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a1a562e-3d1a-4693-b55d-08bf90943a9a",
   "metadata": {},
   "source": [
    "> [Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cf37d7f-c18e-4e63-adea-138e5e981475",
   "metadata": {},
   "source": [
    "_Note: in addition to access to the database, an OpenAI API Key is required to run the full example._"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb9be7ce-8c70-4d46-9f11-71c42a36e928",
   "metadata": {},
   "source": [
    "### Setup and general dependencies"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbe7c156-0413-47e3-9237-4769c4248869",
   "metadata": {},
   "source": [
    "Use of the integration requires the following Python package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d00fcf4-9798-4289-9214-d9734690adfc",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install --upgrade --quiet \"cassio>=0.1.4\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2453d83a-bc8f-41e1-a692-befe4dd90156",
   "metadata": {},
   "source": [
    "_Note: depending on your LangChain setup, you may need to install/upgrade other dependencies needed for this demo_\n",
    "_(specifically, recent versions of `datasets`, `openai`, `pypdf` and `tiktoken` are required, along with `langchain-community`)._"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b06619af-fea2-4863-8149-7f239a8c9c82",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "from datasets import (\n",
    "    load_dataset,\n",
    ")\n",
    "from langchain.schema import Document\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain_community.document_loaders import PyPDFLoader\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
    "from langchain_openai import ChatOpenAI, OpenAIEmbeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1983f1da-0ae7-4a9b-bf4c-4ade328f7a3a",
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"OPENAI_API_KEY\"] = getpass(\"OPENAI_API_KEY = \")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c656df06-e938-4bc5-b570-440b8b7a0189",
   "metadata": {},
   "outputs": [],
   "source": [
    "embe = OpenAIEmbeddings()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22866f09-e10d-4f05-a24b-b9420129462e",
   "metadata": {},
   "source": [
    "## Import the Vector Store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0b32730d-176e-414c-9d91-fd3644c54211",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.vectorstores import Cassandra"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68f61b01-3e09-47c1-9d67-5d6915c86626",
   "metadata": {},
   "source": [
    "## Connection parameters\n",
    "\n",
    "The Vector Store integration shown in this page can be used with Cassandra as well as other derived databases, such as Astra DB, which use the CQL (Cassandra Query Language) protocol.\n",
    "\n",
    "> DataStax [Astra DB](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths.\n",
    "\n",
    "Depending on whether you connect to a Cassandra cluster or to Astra DB through CQL, you will provide different parameters when creating the vector store object."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36bbb3d9-4d07-4f63-b23d-c52be03f8938",
   "metadata": {},
   "source": [
    "### Connecting to a Cassandra cluster\n",
    "\n",
    "You first need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d95bb1d4-d8a6-4e66-89bc-776f9c6f962b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from cassandra.cluster import Cluster\n",
    "\n",
    "cluster = Cluster([\"127.0.0.1\"])\n",
    "session = cluster.connect()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8279aa78-96d6-43ad-aa21-79fd798d895d",
   "metadata": {},
   "source": [
    "You can now set the session, along with your desired keyspace name, as a global CassIO parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29ececc4-e50b-4428-967f-4b6bbde12a14",
   "metadata": {},
   "outputs": [],
   "source": [
    "import cassio\n",
    "\n",
    "CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")\n",
    "\n",
    "cassio.init(session=session, keyspace=CASSANDRA_KEYSPACE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0bd035a2-f0af-418f-94e5-0fbb4d51ac3c",
   "metadata": {},
   "source": [
    "Now you can create the vector store:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eeb62cde-89fc-44d7-ba76-91e19cbc5898",
   "metadata": {},
   "outputs": [],
   "source": [
    "vstore = Cassandra(\n",
    "    embedding=embe,\n",
    "    table_name=\"cassandra_vector_demo\",\n",
    "    # session=None, keyspace=None  # Uncomment on older versions of LangChain\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce240555-e5fc-431d-ac0f-bcf2f6e6a5fb",
   "metadata": {},
   "source": [
    "_Note: you can also pass your session and keyspace directly as parameters when creating the vector store. Using the global `cassio.init` setting, however, comes handy if your applications uses Cassandra in several ways (for instance, for vector store, chat memory and LLM response caching), as it allows to centralize credential and DB connection management in one place._"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b598e5fa-eb62-4939-9734-091628e84db4",
   "metadata": {},
   "source": [
    "### Connecting to Astra DB through CQL"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2feec7c3-7092-4252-9a3f-05eda4babe74",
   "metadata": {},
   "source": [
    "In this case you initialize CassIO with the following connection parameters:\n",
    "\n",
    "- the Database ID, e.g. `01234567-89ab-cdef-0123-456789abcdef`\n",
    "- the Token, e.g. `AstraCS:6gBhNmsk135....` (it must be a \"Database Administrator\" token)\n",
    "- Optionally a Keyspace name (if omitted, the default one for the database will be used)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f96147d-6d76-4101-bbb0-4a7f215c3d2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "ASTRA_DB_ID = input(\"ASTRA_DB_ID = \")\n",
    "ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
    "\n",
    "desired_keyspace = input(\"ASTRA_DB_KEYSPACE (optional, can be left empty) = \")\n",
    "if desired_keyspace:\n",
    "    ASTRA_DB_KEYSPACE = desired_keyspace\n",
    "else:\n",
    "    ASTRA_DB_KEYSPACE = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d653df1d-9dad-4980-ba52-76a47b4c5c1a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import cassio\n",
    "\n",
    "cassio.init(\n",
    "    database_id=ASTRA_DB_ID,\n",
    "    token=ASTRA_DB_APPLICATION_TOKEN,\n",
    "    keyspace=ASTRA_DB_KEYSPACE,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e606b58b-d390-4fed-a2fc-65036c44860f",
   "metadata": {},
   "source": [
    "Now you can create the vector store:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9cb552d1-e888-4550-a350-6df06b1f5aae",
   "metadata": {},
   "outputs": [],
   "source": [
    "vstore = Cassandra(\n",
    "    embedding=embe,\n",
    "    table_name=\"cassandra_vector_demo\",\n",
    "    # session=None, keyspace=None  # Uncomment on older versions of LangChain\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a348678-b2f6-46ca-9a0d-2eb4cc6b66b1",
   "metadata": {},
   "source": [
    "## Load a dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "552e56b0-301a-4b06-99c7-57ba6faa966f",
   "metadata": {},
   "source": [
    "Convert each entry in the source dataset into a `Document`, then write them into the vector store:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a1f532f-ad63-4256-9730-a183841bd8e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "philo_dataset = load_dataset(\"datastax/philosopher-quotes\")[\"train\"]\n",
    "\n",
    "docs = []\n",
    "for entry in philo_dataset:\n",
    "    metadata = {\"author\": entry[\"author\"]}\n",
    "    doc = Document(page_content=entry[\"quote\"], metadata=metadata)\n",
    "    docs.append(doc)\n",
    "\n",
    "inserted_ids = vstore.add_documents(docs)\n",
    "print(f\"\\nInserted {len(inserted_ids)} documents.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79d4f436-ef04-4288-8f79-97c9abb983ed",
   "metadata": {},
   "source": [
    "In the above, `metadata` dictionaries are created from the source data and are part of the `Document`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "084d8802-ab39-4262-9a87-42eafb746f92",
   "metadata": {},
   "source": [
    "Add some more entries, this time with `add_texts`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6b157f5-eb31-4907-a78e-2e2b06893936",
   "metadata": {},
   "outputs": [],
   "source": [
    "texts = [\"I think, therefore I am.\", \"To the things themselves!\"]\n",
    "metadatas = [{\"author\": \"descartes\"}, {\"author\": \"husserl\"}]\n",
    "ids = [\"desc_01\", \"huss_xy\"]\n",
    "\n",
    "inserted_ids_2 = vstore.add_texts(texts=texts, metadatas=metadatas, ids=ids)\n",
    "print(f\"\\nInserted {len(inserted_ids_2)} documents.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63840eb3-8b29-4017-bc2f-301bf5001f28",
   "metadata": {},
   "source": [
    "_Note: you may want to speed up the execution of `add_texts` and `add_documents` by increasing the concurrency level for_\n",
    "_these bulk operations - check out the methods' `batch_size` parameter_\n",
    "_for more details. Depending on the network and the client machine specifications, your best-performing choice of parameters may vary._"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c031760a-1fc5-4855-adf2-02ed52fe2181",
   "metadata": {},
   "source": [
    "## Run searches"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02a77d8e-1aae-4054-8805-01c77947c49f",
   "metadata": {},
   "source": [
    "This section demonstrates metadata filtering and getting the similarity scores back:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1761806a-1afd-4491-867c-25a80d92b9fe",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = vstore.similarity_search(\"Our life is what we make of it\", k=3)\n",
    "for res in results:\n",
    "    print(f\"* {res.page_content} [{res.metadata}]\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eebc4f7c-f61a-438e-b3c8-17e6888d8a0b",
   "metadata": {},
   "outputs": [],
   "source": [
    "results_filtered = vstore.similarity_search(\n",
    "    \"Our life is what we make of it\",\n",
    "    k=3,\n",
    "    filter={\"author\": \"plato\"},\n",
    ")\n",
    "for res in results_filtered:\n",
    "    print(f\"* {res.page_content} [{res.metadata}]\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11bbfe64-c0cd-40c6-866a-a5786538450e",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = vstore.similarity_search_with_score(\"Our life is what we make of it\", k=3)\n",
    "for res, score in results:\n",
    "    print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b14ea558-bfbe-41ce-807e-d70670060ada",
   "metadata": {},
   "source": [
    "### MMR (Maximal-marginal-relevance) search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76381ce8-780a-4e3b-97b1-056d6782d7d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = vstore.max_marginal_relevance_search(\n",
    "    \"Our life is what we make of it\",\n",
    "    k=3,\n",
    "    filter={\"author\": \"aristotle\"},\n",
    ")\n",
    "for res in results:\n",
    "    print(f\"* {res.page_content} [{res.metadata}]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1cc86edd-692b-4495-906c-ccfd13b03c23",
   "metadata": {},
   "source": [
    "## Deleting stored documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38a70ec4-b522-4d32-9ead-c642864fca37",
   "metadata": {},
   "outputs": [],
   "source": [
    "delete_1 = vstore.delete(inserted_ids[:3])\n",
    "print(f\"all_succeed={delete_1}\")  # True, all documents deleted"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4cf49ed-9d29-4ed9-bdab-51a308c41b8e",
   "metadata": {},
   "outputs": [],
   "source": [
    "delete_2 = vstore.delete(inserted_ids[2:5])\n",
    "print(f\"some_succeeds={delete_2}\")  # True, though some IDs were gone already"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "847181ba-77d1-4a17-b7f9-9e2c3d8efd13",
   "metadata": {},
   "source": [
    "## A minimal RAG chain"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd64b844-846f-43c5-a7dd-c26b9ed417d0",
   "metadata": {},
   "source": [
    "The next cells will implement a simple RAG pipeline:\n",
    "- download a sample PDF file and load it onto the store;\n",
    "- create a RAG chain with LCEL (LangChain Expression Language), with the vector store at its heart;\n",
    "- run the question-answering chain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5cbc4dba-0d5e-4038-8fc5-de6cadd1c2a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl -L \\\n",
    "    \"https://github.com/awesome-astra/datasets/blob/main/demo-resources/what-is-philosophy/what-is-philosophy.pdf?raw=true\" \\\n",
    "    -o \"what-is-philosophy.pdf\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "459385be-5e9c-47ff-ba53-2b7ae6166b09",
   "metadata": {},
   "outputs": [],
   "source": [
    "pdf_loader = PyPDFLoader(\"what-is-philosophy.pdf\")\n",
    "splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)\n",
    "docs_from_pdf = pdf_loader.load_and_split(text_splitter=splitter)\n",
    "\n",
    "print(f\"Documents from PDF: {len(docs_from_pdf)}.\")\n",
    "inserted_ids_from_pdf = vstore.add_documents(docs_from_pdf)\n",
    "print(f\"Inserted {len(inserted_ids_from_pdf)} documents.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5010a66c-4298-4e32-82b5-2da0d36a5c70",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = vstore.as_retriever(search_kwargs={\"k\": 3})\n",
    "\n",
    "philo_template = \"\"\"\n",
    "You are a philosopher that draws inspiration from great thinkers of the past\n",
    "to craft well-thought answers to user questions. Use the provided context as the basis\n",
    "for your answers and do not make up new reasoning paths - just mix-and-match what you are given.\n",
    "Your answers must be concise and to the point, and refrain from answering about other topics than philosophy.\n",
    "\n",
    "CONTEXT:\n",
    "{context}\n",
    "\n",
    "QUESTION: {question}\n",
    "\n",
    "YOUR ANSWER:\"\"\"\n",
    "\n",
    "philo_prompt = ChatPromptTemplate.from_template(philo_template)\n",
    "\n",
    "llm = ChatOpenAI()\n",
    "\n",
    "chain = (\n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
    "    | philo_prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fcbc1296-6c7c-478b-b55b-533ba4e54ddb",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain.invoke(\"How does Russel elaborate on Peirce's idea of the security blanket?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "869ab448-a029-4692-aefc-26b85513314d",
   "metadata": {},
   "source": [
    "For more, check out a complete RAG template using Astra DB through CQL [here](https://github.com/langchain-ai/langchain/tree/master/templates/cassandra-entomology-rag)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "177610c7-50d0-4b7b-8634-b03338054c8e",
   "metadata": {},
   "source": [
    "## Cleanup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0da4d19f-9878-4d3d-82c9-09cafca20322",
   "metadata": {},
   "source": [
    "the following essentially retrieves the `Session` object from CassIO and runs a CQL `DROP TABLE` statement with it:\n",
    "\n",
    "_(You will lose the data you stored in it.)_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd405a13-6f71-46fa-87e6-167238e9c25e",
   "metadata": {},
   "outputs": [],
   "source": [
    "cassio.config.resolve_session().execute(\n",
    "    f\"DROP TABLE {cassio.config.resolve_keyspace()}.cassandra_vector_demo;\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c10ece4d-ae06-42ab-baf4-4d0ac2051743",
   "metadata": {},
   "source": [
    "### Learn more"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51ea8b69-7e15-458f-85aa-9fa199f95f9c",
   "metadata": {},
   "source": [
    "For more information, extended quickstarts and additional usage examples, please visit the [CassIO documentation](https://cassio.org/frameworks/langchain/about/) for more on using the LangChain `Cassandra` vector store."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b8ee30c-2c84-42f3-9cff-e80dbc590490",
   "metadata": {},
   "source": [
    "#### Attribution statement\n",
    "\n",
    "> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/docs/vercel.json
+++ b/docs/vercel.json
@ -594,11 +594,7 @@
    },
    {
      "source": "/docs/integrations/cassandra",
-      "destination": "/docs/integrations/providers/astradb"
+      "destination": "/docs/integrations/providers/cassandra"
    },
    {
      "source": "/docs/integrations/providers/cassandra",
      "destination": "/docs/integrations/providers/astradb"
    },
    {
      "source": "/docs/integrations/providers/providers/semadb",
@ -608,10 +604,6 @@
      "source": "/docs/integrations/vectorstores/vectorstores/semadb",
      "destination": "/docs/integrations/vectorstores/semadb"
    },
    {
      "source": "/docs/integrations/vectorstores/cassandra",
      "destination": "/docs/integrations/vectorstores/astradb"
    },
    {
      "source": "/docs/integrations/vectorstores/async_faiss",
      "destination": "/docs/integrations/vectorstores/faiss_async"
--- a/libs/community/langchain_community/vectorstores/astradb.py
+++ b/libs/community/langchain_community/vectorstores/astradb.py
@ -20,6 +20,7 @@ from typing import (
 )
 import numpy as np
 from langchain_core._api.deprecation import deprecated
 from langchain_core.documents import Document
 from langchain_core.embeddings import Embeddings
 from langchain_core.runnables import run_in_executor
@ -61,6 +62,11 @@ def _unique_list(lst: List[T], key: Callable[[T], U]) -> List[T]:
    return new_lst
@deprecated(
    since="0.1.23",
    removal="0.2.0",
    alternative_import="langchain_astradb.AstraDBVectorStore",
 )
 class AstraDB(VectorStore):
    """Wrapper around DataStax Astra DB for vector-store workloads.
--- a/libs/partners/astradb/.gitignore
+++ b/libs/partners/astradb/.gitignore
@ -0,0 +1,5 @@
 __pycache__
 *.env
 .mypy_cache
 .ruff_cache
 .pytest_cache
--- a/libs/partners/astradb/LICENSE
+++ b/libs/partners/astradb/LICENSE
@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2023 LangChain, Inc.
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/libs/partners/astradb/Makefile
+++ b/libs/partners/astradb/Makefile
@ -0,0 +1,66 @@
 SHELL := /bin/bash
 .PHONY: all format lint test tests integration_test integration_tests spell_check help
 # Default target executed when no arguments are given to make.
 all: help
 # Define a variable for the test file path.
 TEST_FILE ?= tests/unit_tests/
 INTEGRATION_TEST_FILE ?= tests/integration_tests/
 test:
 	poetry run pytest $(TEST_FILE)
 tests:
 	poetry run pytest $(TEST_FILE)
 integration_test:
 	poetry run pytest $(INTEGRATION_TEST_FILE)
 integration_tests:
 	poetry run pytest $(INTEGRATION_TEST_FILE)
 ######################
 # LINTING AND FORMATTING
 ######################
 # Define a variable for Python and notebook files.
 PYTHON_FILES=.
 MYPY_CACHE=.mypy_cache
 lint format: PYTHON_FILES=.
 lint_diff format_diff: PYTHON_FILES=$(shell git diff --relative=libs/partners/astradb --name-only --diff-filter=d master | grep -E '\.py$$|\.ipynb$$')
 lint_package: PYTHON_FILES=langchain_astradb
 lint_tests: PYTHON_FILES=tests
 lint_tests: MYPY_CACHE=.mypy_cache_test
 lint lint_diff lint_package lint_tests:
 	poetry run ruff .
 	poetry run ruff format $(PYTHON_FILES) --diff
 	poetry run ruff --select I $(PYTHON_FILES)
 	mkdir -p $(MYPY_CACHE); poetry run mypy $(PYTHON_FILES) --cache-dir $(MYPY_CACHE)
 format format_diff:
 	poetry run ruff format $(PYTHON_FILES)
 	poetry run ruff --select I --fix $(PYTHON_FILES)
 spell_check:
 	poetry run codespell --toml pyproject.toml
 spell_fix:
 	poetry run codespell --toml pyproject.toml -w
 check_imports: $(shell find langchain_astradb -name '*.py')
 	poetry run python ./scripts/check_imports.py $^
 ######################
 # HELP
 ######################
 help:
 	@echo '----'
 	@echo 'check_imports				- check imports'
 	@echo 'format                       - run code formatters'
 	@echo 'lint                         - run linters'
 	@echo 'test                         - run unit tests'
 	@echo 'tests                        - run unit tests'
 	@echo 'test TEST_FILE=<test_file>   - run all tests in file'
--- a/libs/partners/astradb/README.md
+++ b/libs/partners/astradb/README.md
@ -0,0 +1,35 @@
 # langchain-astradb
 This package contains the LangChain integrations for using DataStax Astra DB.
 > DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Apache Cassandra® and made conveniently available
 > through an easy-to-use JSON API.
 _**Note.** For a short transitional period, only some of the Astra DB integration classes are contained in this package (the remaining ones being still in `langchain-community`). In a short while, and surely by version 0.2 of LangChain, all of the Astra DB support will be removed from `langchain-community` and included in this package._
 ## Installation and Setup
 Installation of this partner package:
 ```bash
 pip install langchain-astradb
 ```
 ## Integrations overview
 ### Vector Store
 ```python
 from langchain_astradb.vectorstores import AstraDBVectorStore
 my_store = AstraDBVectorStore(
  embedding=my_embeddings,
  collection_name="my_store",
  api_endpoint="https://...",
  token="AstraCS:...",
 )
 ```
 ## Reference
 See the [LangChain docs page](https://python.langchain.com/docs/integrations/providers/astradb) for a more detailed listing.
--- a/libs/partners/astradb/langchain_astradb/init.py
+++ b/libs/partners/astradb/langchain_astradb/init.py
@ -0,0 +1,5 @@
 from langchain_astradb.vectorstores import AstraDBVectorStore
 __all__ = [
    "AstraDBVectorStore",
 ]
--- a/libs/partners/astradb/langchain_astradb/py.typed
+++ b/libs/partners/astradb/langchain_astradb/py.typed
--- a/libs/partners/astradb/langchain_astradb/utils/mmr.py
+++ b/libs/partners/astradb/langchain_astradb/utils/mmr.py
@ -0,0 +1,87 @@
 """
 Tools for the Maximal Marginal Relevance (MMR) reranking.
 Duplicated from langchain_community to avoid cross-dependencies.
 Functions "maximal_marginal_relevance" and "cosine_similarity"
 are duplicated in this utility respectively from modules:
    - "libs/community/langchain_community/vectorstores/utils.py"
    - "libs/community/langchain_community/utils/math.py"
 """
 import logging
 from typing import List, Union
 import numpy as np
 logger = logging.getLogger(__name__)
 Matrix = Union[List[List[float]], List[np.ndarray], np.ndarray]
 def cosine_similarity(X: Matrix, Y: Matrix) -> np.ndarray:
    """Row-wise cosine similarity between two equal-width matrices."""
    if len(X) == 0 or len(Y) == 0:
        return np.array([])
    X = np.array(X)
    Y = np.array(Y)
    if X.shape[1] != Y.shape[1]:
        raise ValueError(
            f"Number of columns in X and Y must be the same. X has shape {X.shape} "
            f"and Y has shape {Y.shape}."
        )
    try:
        import simsimd as simd  # type: ignore
        X = np.array(X, dtype=np.float32)
        Y = np.array(Y, dtype=np.float32)
        Z = 1 - simd.cdist(X, Y, metric="cosine")
        if isinstance(Z, float):
            return np.array([Z])
        return Z
    except ImportError:
        logger.info(
            "Unable to import simsimd, defaulting to NumPy implementation. If you want "
            "to use simsimd please install with `pip install simsimd`."
        )
        X_norm = np.linalg.norm(X, axis=1)
        Y_norm = np.linalg.norm(Y, axis=1)
        # Ignore divide by zero errors run time warnings as those are handled below.
        with np.errstate(divide="ignore", invalid="ignore"):
            similarity = np.dot(X, Y.T) / np.outer(X_norm, Y_norm)
        similarity[np.isnan(similarity) | np.isinf(similarity)] = 0.0
        return similarity
 def maximal_marginal_relevance(
    query_embedding: np.ndarray,
    embedding_list: list,
    lambda_mult: float = 0.5,
    k: int = 4,
 ) -> List[int]:
    """Calculate maximal marginal relevance."""
    if min(k, len(embedding_list)) <= 0:
        return []
    if query_embedding.ndim == 1:
        query_embedding = np.expand_dims(query_embedding, axis=0)
    similarity_to_query = cosine_similarity(query_embedding, embedding_list)[0]
    most_similar = int(np.argmax(similarity_to_query))
    idxs = [most_similar]
    selected = np.array([embedding_list[most_similar]])
    while len(idxs) < min(k, len(embedding_list)):
        best_score = -np.inf
        idx_to_add = -1
        similarity_to_selected = cosine_similarity(embedding_list, selected)
        for i, query_score in enumerate(similarity_to_query):
            if i in idxs:
                continue
            redundant_score = max(similarity_to_selected[i])
            equation_score = (
                lambda_mult * query_score - (1 - lambda_mult) * redundant_score
            )
            if equation_score > best_score:
                best_score = equation_score
                idx_to_add = i
        idxs.append(idx_to_add)
        selected = np.append(selected, [embedding_list[idx_to_add]], axis=0)
    return idxs
--- a/libs/partners/astradb/langchain_astradb/vectorstores/init.py
+++ b/libs/partners/astradb/langchain_astradb/vectorstores/init.py
@ -0,0 +1,5 @@
 from langchain_astradb.vectorstores.astradb import AstraDBVectorStore
 __all__ = [
    "AstraDBVectorStore",
 ]
--- a/libs/partners/astradb/langchain_astradb/vectorstores/astradb.py
+++ b/libs/partners/astradb/langchain_astradb/vectorstores/astradb.py
--- a/libs/partners/astradb/poetry.lock
+++ b/libs/partners/astradb/poetry.lock
--- a/libs/partners/astradb/pyproject.toml
+++ b/libs/partners/astradb/pyproject.toml
@ -0,0 +1,90 @@
 [tool.poetry]
 name = "langchain-astradb"
 version = "0.0.1"
 description = "An integration package connecting Astra DB and LangChain"
 authors = []
 readme = "README.md"
 [tool.poetry.dependencies]
 python = ">=3.8.1,<4.0"
 langchain-core = ">=0.1"
 astrapy = "^0.7.5"
 numpy = "^1"
 [tool.poetry.group.test]
 optional = true
 [tool.poetry.group.test.dependencies]
 pytest = "^7.3.0"
 freezegun = "^1.2.2"
 pytest-mock  = "^3.10.0"
 syrupy = "^4.0.2"
 pytest-watcher = "^0.3.4"
 pytest-asyncio = "^0.21.1"
 langchain-core = {path = "../../core", develop = true}
 [tool.poetry.group.codespell]
 optional = true
 [tool.poetry.group.codespell.dependencies]
 codespell = "^2.2.0"
 [tool.poetry.group.test_integration]
 optional = true
 [tool.poetry.group.test_integration.dependencies]
 [tool.poetry.group.lint]
 optional = true
 [tool.poetry.group.lint.dependencies]
 ruff = "^0.1.5"
 [tool.poetry.group.typing.dependencies]
 mypy = "^0.991"
 langchain-core = {path = "../../core", develop = true}
 [tool.poetry.group.dev]
 optional = true
 [tool.poetry.group.dev.dependencies]
 langchain-core = {path = "../../core", develop = true}
 [tool.ruff]
 select = [
  "E",  # pycodestyle
  "F",  # pyflakes
  "I",  # isort
 ]
 [tool.mypy]
 disallow_untyped_defs = "True"
 [tool.coverage.run]
 omit = [
    "tests/*",
 ]
 [build-system]
 requires = ["poetry-core>=1.0.0"]
 build-backend = "poetry.core.masonry.api"
 [tool.pytest.ini_options]
 # --strict-markers will raise errors on unknown marks.
 # https://docs.pytest.org/en/7.1.x/how-to/mark.html#raising-errors-on-unknown-marks
 #
 # https://docs.pytest.org/en/7.1.x/reference/reference.html
 # --strict-config       any warnings encountered while parsing the `pytest`
 #                       section of the configuration file raise errors.
 #
 # https://github.com/tophat/syrupy
 # --snapshot-warn-unused    Prints a warning on unused snapshots rather than fail the test suite.
 addopts = "--snapshot-warn-unused --strict-markers --strict-config --durations=5"
 # Registering custom markers.
 # https://docs.pytest.org/en/7.1.x/example/markers.html#registering-markers
 markers = [
  "requires: mark tests as requiring a specific library",
  "asyncio: mark tests as requiring asyncio",
  "compile: mark placeholder test used to compile integration tests without running them",
 ]
 asyncio_mode = "auto"
--- a/libs/partners/astradb/scripts/check_imports.py
+++ b/libs/partners/astradb/scripts/check_imports.py
@ -0,0 +1,17 @@
 import sys
 import traceback
 from importlib.machinery import SourceFileLoader
 if __name__ == "__main__":
    files = sys.argv[1:]
    has_failure = False
    for file in files:
        try:
            SourceFileLoader("x", file).load_module()
        except Exception:
            has_faillure = True
            print(file)
            traceback.print_exc()
            print()
    sys.exit(1 if has_failure else 0)
--- a/libs/partners/astradb/scripts/check_pydantic.sh
+++ b/libs/partners/astradb/scripts/check_pydantic.sh
@ -0,0 +1,27 @@
 #!/bin/bash
 #
 # This script searches for lines starting with "import pydantic" or "from pydantic"
 # in tracked files within a Git repository.
 #
 # Usage: ./scripts/check_pydantic.sh /path/to/repository
 # Check if a path argument is provided
 if [ $# -ne 1 ]; then
  echo "Usage: $0 /path/to/repository"
  exit 1
 fi
 repository_path="$1"
 # Search for lines matching the pattern within the specified repository
 result=$(git -C "$repository_path" grep -E '^import pydantic|^from pydantic')
 # Check if any matching lines were found
 if [ -n "$result" ]; then
  echo "ERROR: The following lines need to be updated:"
  echo "$result"
  echo "Please replace the code with an import from langchain_core.pydantic_v1."
  echo "For example, replace 'from pydantic import BaseModel'"
  echo "with 'from langchain_core.pydantic_v1 import BaseModel'"
  exit 1
 fi
--- a/libs/partners/astradb/scripts/lint_imports.sh
+++ b/libs/partners/astradb/scripts/lint_imports.sh
@ -0,0 +1,17 @@
 #!/bin/bash
 set -eu
 # Initialize a variable to keep track of errors
 errors=0
 # make sure not importing from langchain or langchain_experimental
 git --no-pager grep '^from langchain\.' . && errors=$((errors+1))
 git --no-pager grep '^from langchain_experimental\.' . && errors=$((errors+1))
 # Decide on an exit status based on the errors
 if [ "$errors" -gt 0 ]; then
    exit 1
 else
    exit 0
 fi
--- a/libs/partners/astradb/tests/init.py
+++ b/libs/partners/astradb/tests/init.py
--- a/libs/partners/astradb/tests/integration_tests/init.py
+++ b/libs/partners/astradb/tests/integration_tests/init.py
--- a/libs/partners/astradb/tests/integration_tests/test_compile.py
+++ b/libs/partners/astradb/tests/integration_tests/test_compile.py
@ -0,0 +1,7 @@
 import pytest
@pytest.mark.compile
 def test_placeholder() -> None:
    """Used for compiling integration tests without running any real tests."""
    pass
--- a/libs/partners/astradb/tests/integration_tests/vectorstores/test_astradb.py
+++ b/libs/partners/astradb/tests/integration_tests/vectorstores/test_astradb.py
@ -0,0 +1,869 @@
 """
 Test of Astra DB vector store class `AstraDBVectorStore`
 Required to run this test:
    - a recent `astrapy` Python package available
    - an Astra DB instance;
    - the two environment variables set:
        export ASTRA_DB_API_ENDPOINT="https://<DB-ID>-us-east1.apps.astra.datastax.com"
        export ASTRA_DB_APPLICATION_TOKEN="AstraCS:........."
    - optionally this as well (otherwise defaults are used):
        export ASTRA_DB_KEYSPACE="my_keyspace"
    - optionally:
        export SKIP_COLLECTION_DELETE="0" ("1" = no deletions, default)
 """
 import json
 import math
 import os
 from typing import Iterable, List, Optional, TypedDict
 import pytest
 from langchain_core.documents import Document
 from langchain_core.embeddings import Embeddings
 from langchain_astradb.vectorstores import AstraDBVectorStore
 # Faster testing (no actual collection deletions). Off by default (=full tests)
 SKIP_COLLECTION_DELETE = (
    int(os.environ.get("ASTRA_DB_SKIP_COLLECTION_DELETIONS", "0")) != 0
 )
 COLLECTION_NAME_DIM2 = "lc_test_d2"
 COLLECTION_NAME_DIM2_EUCLIDEAN = "lc_test_d2_eucl"
 MATCH_EPSILON = 0.0001
 # Ad-hoc embedding classes:
 class AstraDBCredentials(TypedDict):
    token: str
    api_endpoint: str
    namespace: Optional[str]
 class SomeEmbeddings(Embeddings):
    """
    Turn a sentence into an embedding vector in some way.
    Not important how. It is deterministic is all that counts.
    """
    def __init__(self, dimension: int) -> None:
        self.dimension = dimension
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return [self.embed_query(txt) for txt in texts]
    async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
        return self.embed_documents(texts)
    def embed_query(self, text: str) -> List[float]:
        unnormed0 = [ord(c) for c in text[: self.dimension]]
        unnormed = (unnormed0 + [1] + [0] * (self.dimension - 1 - len(unnormed0)))[
            : self.dimension
        ]
        norm = sum(x * x for x in unnormed) ** 0.5
        normed = [x / norm for x in unnormed]
        return normed
    async def aembed_query(self, text: str) -> List[float]:
        return self.embed_query(text)
 class ParserEmbeddings(Embeddings):
    """
    Parse input texts: if they are json for a List[float], fine.
    Otherwise, return all zeros and call it a day.
    """
    def __init__(self, dimension: int) -> None:
        self.dimension = dimension
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return [self.embed_query(txt) for txt in texts]
    async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
        return self.embed_documents(texts)
    def embed_query(self, text: str) -> List[float]:
        try:
            vals = json.loads(text)
            assert len(vals) == self.dimension
            return vals
        except Exception:
            print(f'[ParserEmbeddings] Returning a moot vector for "{text}"')
            return [0.0] * self.dimension
    async def aembed_query(self, text: str) -> List[float]:
        return self.embed_query(text)
 def _has_env_vars() -> bool:
    return all(
        [
            "ASTRA_DB_APPLICATION_TOKEN" in os.environ,
            "ASTRA_DB_API_ENDPOINT" in os.environ,
        ]
    )
@pytest.fixture(scope="session")
 def astradb_credentials() -> Iterable[AstraDBCredentials]:
    yield {
        "token": os.environ["ASTRA_DB_APPLICATION_TOKEN"],
        "api_endpoint": os.environ["ASTRA_DB_API_ENDPOINT"],
        "namespace": os.environ.get("ASTRA_DB_KEYSPACE"),
    }
@pytest.fixture(scope="function")
 def store_someemb(
    astradb_credentials: AstraDBCredentials,
 ) -> Iterable[AstraDBVectorStore]:
    emb = SomeEmbeddings(dimension=2)
    v_store = AstraDBVectorStore(
        embedding=emb,
        collection_name=COLLECTION_NAME_DIM2,
        **astradb_credentials,
    )
    v_store.clear()
    yield v_store
    if not SKIP_COLLECTION_DELETE:
        v_store.delete_collection()
    else:
        v_store.clear()
@pytest.fixture(scope="function")
 def store_parseremb(
    astradb_credentials: AstraDBCredentials,
 ) -> Iterable[AstraDBVectorStore]:
    emb = ParserEmbeddings(dimension=2)
    v_store = AstraDBVectorStore(
        embedding=emb,
        collection_name=COLLECTION_NAME_DIM2,
        **astradb_credentials,
    )
    v_store.clear()
    yield v_store
    if not SKIP_COLLECTION_DELETE:
        v_store.delete_collection()
    else:
        v_store.clear()
@pytest.mark.requires("astrapy")
@pytest.mark.skipif(not _has_env_vars(), reason="Missing Astra DB env. vars")
 class TestAstraDBVectorStore:
    def test_astradb_vectorstore_create_delete(
        self, astradb_credentials: AstraDBCredentials
    ) -> None:
        """Create and delete."""
        from astrapy.db import AstraDB as LibAstraDB
        emb = SomeEmbeddings(dimension=2)
        # creation by passing the connection secrets
        v_store = AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        )
        v_store.add_texts("Sample 1")
        if not SKIP_COLLECTION_DELETE:
            v_store.delete_collection()
        else:
            v_store.clear()
        # Creation by passing a ready-made astrapy client:
        astra_db_client = LibAstraDB(
            **astradb_credentials,
        )
        v_store_2 = AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            astra_db_client=astra_db_client,
        )
        v_store_2.add_texts("Sample 2")
        if not SKIP_COLLECTION_DELETE:
            v_store_2.delete_collection()
        else:
            v_store_2.clear()
    async def test_astradb_vectorstore_create_delete_async(
        self, astradb_credentials: AstraDBCredentials
    ) -> None:
        """Create and delete."""
        emb = SomeEmbeddings(dimension=2)
        # creation by passing the connection secrets
        v_store = AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        )
        await v_store.adelete_collection()
        # Creation by passing a ready-made astrapy client:
        from astrapy.db import AsyncAstraDB
        astra_db_client = AsyncAstraDB(
            **astradb_credentials,
        )
        v_store_2 = AstraDBVectorStore(
            embedding=emb,
            collection_name="lc_test_2_async",
            async_astra_db_client=astra_db_client,
        )
        if not SKIP_COLLECTION_DELETE:
            await v_store_2.adelete_collection()
        else:
            await v_store_2.aclear()
    @pytest.mark.skipif(
        SKIP_COLLECTION_DELETE,
        reason="Collection-deletion tests are suppressed",
    )
    def test_astradb_vectorstore_pre_delete_collection(
        self, astradb_credentials: AstraDBCredentials
    ) -> None:
        """Use of the pre_delete_collection flag."""
        emb = SomeEmbeddings(dimension=2)
        v_store = AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        )
        v_store.clear()
        try:
            v_store.add_texts(
                texts=["aa"],
                metadatas=[
                    {"k": "a", "ord": 0},
                ],
                ids=["a"],
            )
            res1 = v_store.similarity_search("aa", k=5)
            assert len(res1) == 1
            v_store = AstraDBVectorStore(
                embedding=emb,
                pre_delete_collection=True,
                collection_name=COLLECTION_NAME_DIM2,
                **astradb_credentials,
            )
            res1 = v_store.similarity_search("aa", k=5)
            assert len(res1) == 0
        finally:
            v_store.delete_collection()
    @pytest.mark.skipif(
        SKIP_COLLECTION_DELETE,
        reason="Collection-deletion tests are suppressed",
    )
    async def test_astradb_vectorstore_pre_delete_collection_async(
        self, astradb_credentials: AstraDBCredentials
    ) -> None:
        """Use of the pre_delete_collection flag."""
        emb = SomeEmbeddings(dimension=2)
        # creation by passing the connection secrets
        v_store = AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        )
        try:
            await v_store.aadd_texts(
                texts=["aa"],
                metadatas=[
                    {"k": "a", "ord": 0},
                ],
                ids=["a"],
            )
            res1 = await v_store.asimilarity_search("aa", k=5)
            assert len(res1) == 1
            v_store = AstraDBVectorStore(
                embedding=emb,
                pre_delete_collection=True,
                collection_name=COLLECTION_NAME_DIM2,
                **astradb_credentials,
            )
            res1 = await v_store.asimilarity_search("aa", k=5)
            assert len(res1) == 0
        finally:
            await v_store.adelete_collection()
    def test_astradb_vectorstore_from_x(
        self, astradb_credentials: AstraDBCredentials
    ) -> None:
        """from_texts and from_documents methods."""
        emb = SomeEmbeddings(dimension=2)
        # prepare empty collection
        AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        ).clear()
        # from_texts
        v_store = AstraDBVectorStore.from_texts(
            texts=["Hi", "Ho"],
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        )
        try:
            assert v_store.similarity_search("Ho", k=1)[0].page_content == "Ho"
        finally:
            if not SKIP_COLLECTION_DELETE:
                v_store.delete_collection()
            else:
                v_store.clear()
        # from_documents
        v_store_2 = AstraDBVectorStore.from_documents(
            [
                Document(page_content="Hee"),
                Document(page_content="Hoi"),
            ],
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        )
        try:
            assert v_store_2.similarity_search("Hoi", k=1)[0].page_content == "Hoi"
        finally:
            if not SKIP_COLLECTION_DELETE:
                v_store_2.delete_collection()
            else:
                v_store_2.clear()
    async def test_astradb_vectorstore_from_x_async(
        self, astradb_credentials: AstraDBCredentials
    ) -> None:
        """from_texts and from_documents methods."""
        emb = SomeEmbeddings(dimension=2)
        # prepare empty collection
        await AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        ).aclear()
        # from_texts
        v_store = await AstraDBVectorStore.afrom_texts(
            texts=["Hi", "Ho"],
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        )
        try:
            assert (await v_store.asimilarity_search("Ho", k=1))[0].page_content == "Ho"
        finally:
            if not SKIP_COLLECTION_DELETE:
                await v_store.adelete_collection()
            else:
                await v_store.aclear()
        # from_documents
        v_store_2 = await AstraDBVectorStore.afrom_documents(
            [
                Document(page_content="Hee"),
                Document(page_content="Hoi"),
            ],
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        )
        try:
            assert (await v_store_2.asimilarity_search("Hoi", k=1))[
                0
            ].page_content == "Hoi"
        finally:
            if not SKIP_COLLECTION_DELETE:
                await v_store_2.adelete_collection()
            else:
                await v_store_2.aclear()
    def test_astradb_vectorstore_crud(self, store_someemb: AstraDBVectorStore) -> None:
        """Basic add/delete/update behaviour."""
        res0 = store_someemb.similarity_search("Abc", k=2)
        assert res0 == []
        # write and check again
        store_someemb.add_texts(
            texts=["aa", "bb", "cc"],
            metadatas=[
                {"k": "a", "ord": 0},
                {"k": "b", "ord": 1},
                {"k": "c", "ord": 2},
            ],
            ids=["a", "b", "c"],
        )
        res1 = store_someemb.similarity_search("Abc", k=5)
        assert {doc.page_content for doc in res1} == {"aa", "bb", "cc"}
        # partial overwrite and count total entries
        store_someemb.add_texts(
            texts=["cc", "dd"],
            metadatas=[
                {"k": "c_new", "ord": 102},
                {"k": "d_new", "ord": 103},
            ],
            ids=["c", "d"],
        )
        res2 = store_someemb.similarity_search("Abc", k=10)
        assert len(res2) == 4
        # pick one that was just updated and check its metadata
        res3 = store_someemb.similarity_search_with_score_id(
            query="cc", k=1, filter={"k": "c_new"}
        )
        print(str(res3))
        doc3, score3, id3 = res3[0]
        assert doc3.page_content == "cc"
        assert doc3.metadata == {"k": "c_new", "ord": 102}
        assert score3 > 0.999  # leaving some leeway for approximations...
        assert id3 == "c"
        # delete and count again
        del1_res = store_someemb.delete(["b"])
        assert del1_res is True
        del2_res = store_someemb.delete(["a", "c", "Z!"])
        assert del2_res is True  # a non-existing ID was supplied
        assert len(store_someemb.similarity_search("xy", k=10)) == 1
        # clear store
        store_someemb.clear()
        assert store_someemb.similarity_search("Abc", k=2) == []
        # add_documents with "ids" arg passthrough
        store_someemb.add_documents(
            [
                Document(page_content="vv", metadata={"k": "v", "ord": 204}),
                Document(page_content="ww", metadata={"k": "w", "ord": 205}),
            ],
            ids=["v", "w"],
        )
        assert len(store_someemb.similarity_search("xy", k=10)) == 2
        res4 = store_someemb.similarity_search("ww", k=1, filter={"k": "w"})
        assert res4[0].metadata["ord"] == 205
    async def test_astradb_vectorstore_crud_async(
        self, store_someemb: AstraDBVectorStore
    ) -> None:
        """Basic add/delete/update behaviour."""
        res0 = await store_someemb.asimilarity_search("Abc", k=2)
        assert res0 == []
        # write and check again
        await store_someemb.aadd_texts(
            texts=["aa", "bb", "cc"],
            metadatas=[
                {"k": "a", "ord": 0},
                {"k": "b", "ord": 1},
                {"k": "c", "ord": 2},
            ],
            ids=["a", "b", "c"],
        )
        res1 = await store_someemb.asimilarity_search("Abc", k=5)
        assert {doc.page_content for doc in res1} == {"aa", "bb", "cc"}
        # partial overwrite and count total entries
        await store_someemb.aadd_texts(
            texts=["cc", "dd"],
            metadatas=[
                {"k": "c_new", "ord": 102},
                {"k": "d_new", "ord": 103},
            ],
            ids=["c", "d"],
        )
        res2 = await store_someemb.asimilarity_search("Abc", k=10)
        assert len(res2) == 4
        # pick one that was just updated and check its metadata
        res3 = await store_someemb.asimilarity_search_with_score_id(
            query="cc", k=1, filter={"k": "c_new"}
        )
        print(str(res3))
        doc3, score3, id3 = res3[0]
        assert doc3.page_content == "cc"
        assert doc3.metadata == {"k": "c_new", "ord": 102}
        assert score3 > 0.999  # leaving some leeway for approximations...
        assert id3 == "c"
        # delete and count again
        del1_res = await store_someemb.adelete(["b"])
        assert del1_res is True
        del2_res = await store_someemb.adelete(["a", "c", "Z!"])
        assert del2_res is False  # a non-existing ID was supplied
        assert len(await store_someemb.asimilarity_search("xy", k=10)) == 1
        # clear store
        await store_someemb.aclear()
        assert await store_someemb.asimilarity_search("Abc", k=2) == []
        # add_documents with "ids" arg passthrough
        await store_someemb.aadd_documents(
            [
                Document(page_content="vv", metadata={"k": "v", "ord": 204}),
                Document(page_content="ww", metadata={"k": "w", "ord": 205}),
            ],
            ids=["v", "w"],
        )
        assert len(await store_someemb.asimilarity_search("xy", k=10)) == 2
        res4 = await store_someemb.asimilarity_search("ww", k=1, filter={"k": "w"})
        assert res4[0].metadata["ord"] == 205
    def test_astradb_vectorstore_mmr(self, store_parseremb: AstraDBVectorStore) -> None:
        """
        MMR testing. We work on the unit circle with angle multiples
        of 2*pi/20 and prepare a store with known vectors for a controlled
        MMR outcome.
        """
        def _v_from_i(i: int, N: int) -> str:
            angle = 2 * math.pi * i / N
            vector = [math.cos(angle), math.sin(angle)]
            return json.dumps(vector)
        i_vals = [0, 4, 5, 13]
        N_val = 20
        store_parseremb.add_texts(
            [_v_from_i(i, N_val) for i in i_vals], metadatas=[{"i": i} for i in i_vals]
        )
        res1 = store_parseremb.max_marginal_relevance_search(
            _v_from_i(3, N_val),
            k=2,
            fetch_k=3,
        )
        res_i_vals = {doc.metadata["i"] for doc in res1}
        assert res_i_vals == {0, 4}
    async def test_astradb_vectorstore_mmr_async(
        self, store_parseremb: AstraDBVectorStore
    ) -> None:
        """
        MMR testing. We work on the unit circle with angle multiples
        of 2*pi/20 and prepare a store with known vectors for a controlled
        MMR outcome.
        """
        def _v_from_i(i: int, N: int) -> str:
            angle = 2 * math.pi * i / N
            vector = [math.cos(angle), math.sin(angle)]
            return json.dumps(vector)
        i_vals = [0, 4, 5, 13]
        N_val = 20
        await store_parseremb.aadd_texts(
            [_v_from_i(i, N_val) for i in i_vals],
            metadatas=[{"i": i} for i in i_vals],
        )
        res1 = await store_parseremb.amax_marginal_relevance_search(
            _v_from_i(3, N_val),
            k=2,
            fetch_k=3,
        )
        res_i_vals = {doc.metadata["i"] for doc in res1}
        assert res_i_vals == {0, 4}
    def test_astradb_vectorstore_metadata(
        self, store_someemb: AstraDBVectorStore
    ) -> None:
        """Metadata filtering."""
        store_someemb.add_documents(
            [
                Document(
                    page_content="q",
                    metadata={"ord": ord("q"), "group": "consonant"},
                ),
                Document(
                    page_content="w",
                    metadata={"ord": ord("w"), "group": "consonant"},
                ),
                Document(
                    page_content="r",
                    metadata={"ord": ord("r"), "group": "consonant"},
                ),
                Document(
                    page_content="e",
                    metadata={"ord": ord("e"), "group": "vowel"},
                ),
                Document(
                    page_content="i",
                    metadata={"ord": ord("i"), "group": "vowel"},
                ),
                Document(
                    page_content="o",
                    metadata={"ord": ord("o"), "group": "vowel"},
                ),
            ]
        )
        # no filters
        res0 = store_someemb.similarity_search("x", k=10)
        assert {doc.page_content for doc in res0} == set("qwreio")
        # single filter
        res1 = store_someemb.similarity_search(
            "x",
            k=10,
            filter={"group": "vowel"},
        )
        assert {doc.page_content for doc in res1} == set("eio")
        # multiple filters
        res2 = store_someemb.similarity_search(
            "x",
            k=10,
            filter={"group": "consonant", "ord": ord("q")},
        )
        assert {doc.page_content for doc in res2} == set("q")
        # excessive filters
        res3 = store_someemb.similarity_search(
            "x",
            k=10,
            filter={"group": "consonant", "ord": ord("q"), "case": "upper"},
        )
        assert res3 == []
        # filter with logical operator
        res4 = store_someemb.similarity_search(
            "x",
            k=10,
            filter={"$or": [{"ord": ord("q")}, {"ord": ord("r")}]},
        )
        assert {doc.page_content for doc in res4} == {"q", "r"}
    def test_astradb_vectorstore_similarity_scale(
        self, store_parseremb: AstraDBVectorStore
    ) -> None:
        """Scale of the similarity scores."""
        store_parseremb.add_texts(
            texts=[
                json.dumps([1, 1]),
                json.dumps([-1, -1]),
            ],
            ids=["near", "far"],
        )
        res1 = store_parseremb.similarity_search_with_score(
            json.dumps([0.5, 0.5]),
            k=2,
        )
        scores = [sco for _, sco in res1]
        sco_near, sco_far = scores
        assert abs(1 - sco_near) < MATCH_EPSILON and abs(sco_far) < MATCH_EPSILON
    async def test_astradb_vectorstore_similarity_scale_async(
        self, store_parseremb: AstraDBVectorStore
    ) -> None:
        """Scale of the similarity scores."""
        await store_parseremb.aadd_texts(
            texts=[
                json.dumps([1, 1]),
                json.dumps([-1, -1]),
            ],
            ids=["near", "far"],
        )
        res1 = await store_parseremb.asimilarity_search_with_score(
            json.dumps([0.5, 0.5]),
            k=2,
        )
        scores = [sco for _, sco in res1]
        sco_near, sco_far = scores
        assert abs(1 - sco_near) < MATCH_EPSILON and abs(sco_far) < MATCH_EPSILON
    def test_astradb_vectorstore_massive_delete(
        self, store_someemb: AstraDBVectorStore
    ) -> None:
        """Larger-scale bulk deletes."""
        M = 50
        texts = [str(i + 1 / 7.0) for i in range(2 * M)]
        ids0 = ["doc_%i" % i for i in range(M)]
        ids1 = ["doc_%i" % (i + M) for i in range(M)]
        ids = ids0 + ids1
        store_someemb.add_texts(texts=texts, ids=ids)
        # deleting a bunch of these
        del_res0 = store_someemb.delete(ids0)
        assert del_res0 is True
        # deleting the rest plus a fake one
        del_res1 = store_someemb.delete(ids1 + ["ghost!"])
        assert del_res1 is True  # ensure no error
        # nothing left
        assert store_someemb.similarity_search("x", k=2 * M) == []
    @pytest.mark.skipif(
        SKIP_COLLECTION_DELETE,
        reason="Collection-deletion tests are suppressed",
    )
    def test_astradb_vectorstore_delete_collection(
        self, astradb_credentials: AstraDBCredentials
    ) -> None:
        """behaviour of 'delete_collection'."""
        collection_name = COLLECTION_NAME_DIM2
        emb = SomeEmbeddings(dimension=2)
        v_store = AstraDBVectorStore(
            embedding=emb,
            collection_name=collection_name,
            **astradb_credentials,
        )
        v_store.add_texts(["huh"])
        assert len(v_store.similarity_search("hah", k=10)) == 1
        # another instance pointing to the same collection on DB
        v_store_kenny = AstraDBVectorStore(
            embedding=emb,
            collection_name=collection_name,
            **astradb_credentials,
        )
        v_store_kenny.delete_collection()
        # dropped on DB, but 'v_store' should have no clue:
        with pytest.raises(ValueError):
            _ = v_store.similarity_search("hah", k=10)
    def test_astradb_vectorstore_custom_params(
        self, astradb_credentials: AstraDBCredentials
    ) -> None:
        """Custom batch size and concurrency params."""
        emb = SomeEmbeddings(dimension=2)
        # prepare empty collection
        AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        ).clear()
        v_store = AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
            batch_size=17,
            bulk_insert_batch_concurrency=13,
            bulk_insert_overwrite_concurrency=7,
            bulk_delete_concurrency=19,
        )
        try:
            # add_texts
            N = 50
            texts = [str(i + 1 / 7.0) for i in range(N)]
            ids = ["doc_%i" % i for i in range(N)]
            v_store.add_texts(texts=texts, ids=ids)
            v_store.add_texts(
                texts=texts,
                ids=ids,
                batch_size=19,
                batch_concurrency=7,
                overwrite_concurrency=13,
            )
            #
            _ = v_store.delete(ids[: N // 2])
            _ = v_store.delete(ids[N // 2 :], concurrency=23)
            #
        finally:
            if not SKIP_COLLECTION_DELETE:
                v_store.delete_collection()
            else:
                v_store.clear()
    async def test_astradb_vectorstore_custom_params_async(
        self, astradb_credentials: AstraDBCredentials
    ) -> None:
        """Custom batch size and concurrency params."""
        emb = SomeEmbeddings(dimension=2)
        v_store = AstraDBVectorStore(
            embedding=emb,
            collection_name="lc_test_c_async",
            batch_size=17,
            bulk_insert_batch_concurrency=13,
            bulk_insert_overwrite_concurrency=7,
            bulk_delete_concurrency=19,
            **astradb_credentials,
        )
        try:
            # add_texts
            N = 50
            texts = [str(i + 1 / 7.0) for i in range(N)]
            ids = ["doc_%i" % i for i in range(N)]
            await v_store.aadd_texts(texts=texts, ids=ids)
            await v_store.aadd_texts(
                texts=texts,
                ids=ids,
                batch_size=19,
                batch_concurrency=7,
                overwrite_concurrency=13,
            )
            #
            await v_store.adelete(ids[: N // 2])
            await v_store.adelete(ids[N // 2 :], concurrency=23)
            #
        finally:
            if not SKIP_COLLECTION_DELETE:
                await v_store.adelete_collection()
            else:
                await v_store.aclear()
    def test_astradb_vectorstore_metrics(
        self, astradb_credentials: AstraDBCredentials
    ) -> None:
        """
        Different choices of similarity metric.
        Both stores (with "cosine" and "euclidea" metrics) contain these two:
            - a vector slightly rotated w.r.t query vector
            - a vector which is a long multiple of query vector
        so, which one is "the closest one" depends on the metric.
        """
        emb = ParserEmbeddings(dimension=2)
        isq2 = 0.5**0.5
        isa = 0.7
        isb = (1.0 - isa * isa) ** 0.5
        texts = [
            json.dumps([isa, isb]),
            json.dumps([10 * isq2, 10 * isq2]),
        ]
        ids = [
            "rotated",
            "scaled",
        ]
        query_text = json.dumps([isq2, isq2])
        # prepare empty collections
        AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            **astradb_credentials,
        ).clear()
        AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2_EUCLIDEAN,
            metric="euclidean",
            **astradb_credentials,
        ).clear()
        # creation, population, query - cosine
        vstore_cos = AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2,
            metric="cosine",
            **astradb_credentials,
        )
        try:
            vstore_cos.add_texts(
                texts=texts,
                ids=ids,
            )
            _, _, id_from_cos = vstore_cos.similarity_search_with_score_id(
                query_text,
                k=1,
            )[0]
            assert id_from_cos == "scaled"
        finally:
            if not SKIP_COLLECTION_DELETE:
                vstore_cos.delete_collection()
            else:
                vstore_cos.clear()
        # creation, population, query - euclidean
        vstore_euc = AstraDBVectorStore(
            embedding=emb,
            collection_name=COLLECTION_NAME_DIM2_EUCLIDEAN,
            metric="euclidean",
            **astradb_credentials,
        )
        try:
            vstore_euc.add_texts(
                texts=texts,
                ids=ids,
            )
            _, _, id_from_euc = vstore_euc.similarity_search_with_score_id(
                query_text,
                k=1,
            )[0]
            assert id_from_euc == "rotated"
        finally:
            if not SKIP_COLLECTION_DELETE:
                vstore_euc.delete_collection()
            else:
                vstore_euc.clear()
--- a/libs/partners/astradb/tests/unit_tests/init.py
+++ b/libs/partners/astradb/tests/unit_tests/init.py
--- a/libs/partners/astradb/tests/unit_tests/test_imports.py
+++ b/libs/partners/astradb/tests/unit_tests/test_imports.py
@ -0,0 +1,9 @@
 from langchain_astradb import __all__
 EXPECTED_ALL = [
    "AstraDBVectorStore",
 ]
 def test_all_imports() -> None:
    assert sorted(EXPECTED_ALL) == sorted(__all__)
--- a/libs/partners/astradb/tests/unit_tests/test_vectorstores.py
+++ b/libs/partners/astradb/tests/unit_tests/test_vectorstores.py
@ -0,0 +1,45 @@
 from typing import List
 from unittest.mock import Mock
 from langchain_core.embeddings import Embeddings
 from langchain_astradb.vectorstores import AstraDBVectorStore
 class SomeEmbeddings(Embeddings):
    """
    Turn a sentence into an embedding vector in some way.
    Not important how. It is deterministic is all that counts.
    """
    def __init__(self, dimension: int) -> None:
        self.dimension = dimension
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return [self.embed_query(txt) for txt in texts]
    async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
        return self.embed_documents(texts)
    def embed_query(self, text: str) -> List[float]:
        unnormed0 = [ord(c) for c in text[: self.dimension]]
        unnormed = (unnormed0 + [1] + [0] * (self.dimension - 1 - len(unnormed0)))[
            : self.dimension
        ]
        norm = sum(x * x for x in unnormed) ** 0.5
        normed = [x / norm for x in unnormed]
        return normed
    async def aembed_query(self, text: str) -> List[float]:
        return self.embed_query(text)
 def test_initialization() -> None:
    """Test integration vectorstore initialization."""
    mock_astra_db = Mock()
    embedding = SomeEmbeddings(dimension=2)
    AstraDBVectorStore(
        embedding=embedding,
        collection_name="mock_coll_name",
        astra_db_client=mock_astra_db,
    )