docs: Astra DB, replace leftover links to "community" legacy package + modernize doc loader signature (#30969)

This PR brings some much-needed updates to some of the Astra DB shorter
example notebooks,

- ensuring imports are from the partner package instead of the
(deprecated) community legacy package
- improving the wording in a few related places
- updating the constructor signature introduced with the latest partner
package's AstraDBLoader
- marking the community package counterpart of the LLM caches as
deprecated in the summary table at the end of the page.
This commit is contained in:
Stefano Lottini 2025-04-25 21:45:24 +02:00 committed by GitHub
parent a60fd06784
commit a82d987f09
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 233 additions and 92 deletions

View File

@ -34,33 +34,46 @@
"id": "juAmbgoWD17u" "id": "juAmbgoWD17u"
}, },
"source": [ "source": [
"The AstraDB Document Loader returns a list of Langchain Documents from an AstraDB database.\n", "The Astra DB Document Loader returns a list of Langchain `Document` objects read from an Astra DB collection.\n",
"\n", "\n",
"The Loader takes the following parameters:\n", "The loader takes the following parameters:\n",
"\n", "\n",
"* `api_endpoint`: AstraDB API endpoint. Looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n", "* `api_endpoint`: Astra DB API endpoint. Looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n",
"* `token`: AstraDB token. Looks like `AstraCS:6gBhNmsk135....`\n", "* `token`: Astra DB token. Looks like `AstraCS:aBcD0123...`\n",
"* `collection_name` : AstraDB collection name\n", "* `collection_name` : AstraDB collection name\n",
"* `namespace`: (Optional) AstraDB namespace\n", "* `namespace`: (Optional) AstraDB namespace (called _keyspace_ in Astra DB)\n",
"* `filter_criteria`: (Optional) Filter used in the find query\n", "* `filter_criteria`: (Optional) Filter used in the find query\n",
"* `projection`: (Optional) Projection used in the find query\n", "* `projection`: (Optional) Projection used in the find query\n",
"* `find_options`: (Optional) Options used in the find query\n", "* `limit`: (Optional) Maximum number of documents to retrieve\n",
"* `nb_prefetched`: (Optional) Number of documents pre-fetched by the loader\n",
"* `extraction_function`: (Optional) A function to convert the AstraDB document to the LangChain `page_content` string. Defaults to `json.dumps`\n", "* `extraction_function`: (Optional) A function to convert the AstraDB document to the LangChain `page_content` string. Defaults to `json.dumps`\n",
"\n", "\n",
"The following metadata is set to the LangChain Documents metadata output:\n", "The loader sets the following metadata for the documents it reads:\n",
"\n", "\n",
"```python\n", "```python\n",
"{\n", "metadata={\n",
" metadata : {\n",
" \"namespace\": \"...\", \n", " \"namespace\": \"...\", \n",
" \"api_endpoint\": \"...\", \n", " \"api_endpoint\": \"...\", \n",
" \"collection\": \"...\"\n", " \"collection\": \"...\"\n",
" }\n",
"}\n", "}\n",
"```" "```"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"!pip install \"langchain-astradb>=0.6,<0.7\""
]
},
{ {
"attachments": {}, "attachments": {},
"cell_type": "markdown", "cell_type": "markdown",
@ -71,24 +84,43 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 2,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from langchain_community.document_loaders import AstraDBLoader" "from langchain_astradb import AstraDBLoader"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[**API Reference:** `AstraDBLoader`](https://python.langchain.com/api_reference/astradb/document_loaders/langchain_astradb.document_loaders.AstraDBLoader.html#langchain_astradb.document_loaders.AstraDBLoader)"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 4, "execution_count": 3,
"metadata": { "metadata": {
"ExecuteTime": { "ExecuteTime": {
"end_time": "2024-01-08T12:41:22.643335Z", "end_time": "2024-01-08T12:41:22.643335Z",
"start_time": "2024-01-08T12:40:57.759116Z" "start_time": "2024-01-08T12:40:57.759116Z"
}, },
"collapsed": false "collapsed": false,
"jupyter": {
"outputs_hidden": false
}
}, },
"outputs": [], "outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"ASTRA_DB_API_ENDPOINT = https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com\n",
"ASTRA_DB_APPLICATION_TOKEN = ········\n"
]
}
],
"source": [ "source": [
"from getpass import getpass\n", "from getpass import getpass\n",
"\n", "\n",
@ -98,7 +130,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 6, "execution_count": 4,
"metadata": { "metadata": {
"ExecuteTime": { "ExecuteTime": {
"end_time": "2024-01-08T12:42:25.395162Z", "end_time": "2024-01-08T12:42:25.395162Z",
@ -112,19 +144,22 @@
" token=ASTRA_DB_APPLICATION_TOKEN,\n", " token=ASTRA_DB_APPLICATION_TOKEN,\n",
" collection_name=\"movie_reviews\",\n", " collection_name=\"movie_reviews\",\n",
" projection={\"title\": 1, \"reviewtext\": 1},\n", " projection={\"title\": 1, \"reviewtext\": 1},\n",
" find_options={\"limit\": 10},\n", " limit=10,\n",
")" ")"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 7, "execution_count": 5,
"metadata": { "metadata": {
"ExecuteTime": { "ExecuteTime": {
"end_time": "2024-01-08T12:42:30.236489Z", "end_time": "2024-01-08T12:42:30.236489Z",
"start_time": "2024-01-08T12:42:29.612133Z" "start_time": "2024-01-08T12:42:29.612133Z"
}, },
"collapsed": false "collapsed": false,
"jupyter": {
"outputs_hidden": false
}
}, },
"outputs": [], "outputs": [],
"source": [ "source": [
@ -133,7 +168,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 8, "execution_count": 6,
"metadata": { "metadata": {
"ExecuteTime": { "ExecuteTime": {
"end_time": "2024-01-08T12:42:31.369394Z", "end_time": "2024-01-08T12:42:31.369394Z",
@ -144,10 +179,10 @@
{ {
"data": { "data": {
"text/plain": [ "text/plain": [
"Document(page_content='{\"_id\": \"659bdffa16cbc4586b11a423\", \"title\": \"Dangerous Men\", \"reviewtext\": \"\\\\\"Dangerous Men,\\\\\" the picture\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\"}', metadata={'namespace': 'default_keyspace', 'api_endpoint': 'https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com', 'collection': 'movie_reviews'})" "Document(metadata={'namespace': 'default_keyspace', 'api_endpoint': 'https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com', 'collection': 'movie_reviews'}, page_content='{\"_id\": \"659bdffa16cbc4586b11a423\", \"title\": \"Dangerous Men\", \"reviewtext\": \"\\\\\"Dangerous Men,\\\\\" the picture\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\"}')"
] ]
}, },
"execution_count": 8, "execution_count": 7,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -179,7 +214,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.9.18" "version": "3.12.8"
} }
}, },
"nbformat": 4, "nbformat": 4,

View File

@ -3112,8 +3112,8 @@
"|------------|---------|\n", "|------------|---------|\n",
"| langchain_astradb.cache | [AstraDBCache](https://python.langchain.com/api_reference/astradb/cache/langchain_astradb.cache.AstraDBCache.html) |\n", "| langchain_astradb.cache | [AstraDBCache](https://python.langchain.com/api_reference/astradb/cache/langchain_astradb.cache.AstraDBCache.html) |\n",
"| langchain_astradb.cache | [AstraDBSemanticCache](https://python.langchain.com/api_reference/astradb/cache/langchain_astradb.cache.AstraDBSemanticCache.html) |\n", "| langchain_astradb.cache | [AstraDBSemanticCache](https://python.langchain.com/api_reference/astradb/cache/langchain_astradb.cache.AstraDBSemanticCache.html) |\n",
"| langchain_community.cache | [AstraDBCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.AstraDBCache.html) |\n", "| langchain_community.cache | [AstraDBCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.AstraDBCache.html) (deprecated since `langchain-community==0.0.28`) |\n",
"| langchain_community.cache | [AstraDBSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.AstraDBSemanticCache.html) |\n", "| langchain_community.cache | [AstraDBSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.AstraDBSemanticCache.html) (deprecated since `langchain-community==0.0.28`) |\n",
"| langchain_community.cache | [AzureCosmosDBSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.AzureCosmosDBSemanticCache.html) |\n", "| langchain_community.cache | [AzureCosmosDBSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.AzureCosmosDBSemanticCache.html) |\n",
"| langchain_community.cache | [CassandraCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.CassandraCache.html) |\n", "| langchain_community.cache | [CassandraCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.CassandraCache.html) |\n",
"| langchain_community.cache | [CassandraSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.CassandraSemanticCache.html) |\n", "| langchain_community.cache | [CassandraSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.CassandraSemanticCache.html) |\n",

View File

@ -17,22 +17,22 @@
"id": "f507f58b-bf22-4a48-8daf-68d869bcd1ba", "id": "f507f58b-bf22-4a48-8daf-68d869bcd1ba",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Setting up\n", "## Setup\n",
"\n", "\n",
"To run this notebook you need a running Astra DB. Get the connection secrets on your Astra dashboard:\n", "To run this notebook you need a running Astra DB. Get the connection secrets on your Astra dashboard:\n",
"\n", "\n",
"- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`;\n", "- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`;\n",
"- the Token looks like `AstraCS:6gBhNmsk135...`." "- the Database Token looks like `AstraCS:aBcD0123...`."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 1,
"id": "d7092199", "id": "d7092199",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"%pip install --upgrade --quiet \"astrapy>=0.7.1 langchain-community\" " "!pip install \"langchain-astradb>=0.6,<0.7\""
] ]
}, },
{ {
@ -45,12 +45,12 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 1, "execution_count": 2,
"id": "163d97f0", "id": "163d97f0",
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
"name": "stdout", "name": "stdin",
"output_type": "stream", "output_type": "stream",
"text": [ "text": [
"ASTRA_DB_API_ENDPOINT = https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com\n", "ASTRA_DB_API_ENDPOINT = https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com\n",
@ -65,14 +65,6 @@
"ASTRA_DB_APPLICATION_TOKEN = getpass.getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")" "ASTRA_DB_APPLICATION_TOKEN = getpass.getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")"
] ]
}, },
{
"cell_type": "markdown",
"id": "55860b2d",
"metadata": {},
"source": [
"Depending on whether local or cloud-based Astra DB, create the corresponding database connection \"Session\" object."
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "36c163e8", "id": "36c163e8",
@ -83,12 +75,12 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": 3,
"id": "d15e3302", "id": "d15e3302",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from langchain_community.chat_message_histories import AstraDBChatMessageHistory\n", "from langchain_astradb import AstraDBChatMessageHistory\n",
"\n", "\n",
"message_history = AstraDBChatMessageHistory(\n", "message_history = AstraDBChatMessageHistory(\n",
" session_id=\"test-session\",\n", " session_id=\"test-session\",\n",
@ -98,22 +90,31 @@
"\n", "\n",
"message_history.add_user_message(\"hi!\")\n", "message_history.add_user_message(\"hi!\")\n",
"\n", "\n",
"message_history.add_ai_message(\"whats up?\")" "message_history.add_ai_message(\"hello, how are you?\")"
]
},
{
"cell_type": "markdown",
"id": "53acb4a8-d536-4a58-9fee-7d70033d9c81",
"metadata": {},
"source": [
"[**API Reference:** `AstraDBChatMessageHistory`](https://python.langchain.com/api_reference/astradb/chat_message_histories/langchain_astradb.chat_message_histories.AstraDBChatMessageHistory.html#langchain_astradb.chat_message_histories.AstraDBChatMessageHistory)"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 3, "execution_count": 4,
"id": "64fc465e", "id": "64fc465e",
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
"data": { "data": {
"text/plain": [ "text/plain": [
"[HumanMessage(content='hi!'), AIMessage(content='whats up?')]" "[HumanMessage(content='hi!', additional_kwargs={}, response_metadata={}),\n",
" AIMessage(content='hello, how are you?', additional_kwargs={}, response_metadata={})]"
] ]
}, },
"execution_count": 3, "execution_count": 4,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -139,7 +140,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.12" "version": "3.12.8"
} }
}, },
"nbformat": 4, "nbformat": 4,

View File

@ -4,7 +4,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Astra DB (Cassandra)\n", "# Astra DB\n",
"\n", "\n",
">[DataStax Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on `Cassandra` and made conveniently available through an easy-to-use JSON API.\n", ">[DataStax Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on `Cassandra` and made conveniently available through an easy-to-use JSON API.\n",
"\n", "\n",
@ -16,32 +16,46 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Creating an Astra DB vector store\n", "## Creating an Astra DB vector store\n",
"First we'll want to create an Astra DB VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n", "First, create an Astra DB vector store and seed it with some data.\n",
"\n", "\n",
"NOTE: The self-query retriever requires you to have `lark` installed (`pip install lark`). We also need the `astrapy` package." "We've created a small demo set of documents containing movie summaries.\n",
"\n",
"NOTE: The self-query retriever requires the `lark` package installed (`pip install lark`)."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 5, "execution_count": null,
"metadata": {}, "metadata": {
"scrolled": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"%pip install --upgrade --quiet lark astrapy langchain-openai" "!pip install \"langchain-astradb>=0.6,<0.7\" \\\n",
" \"langchain_openai>=0.3,<0.4\" \\\n",
" \"lark>=1.2,<2.0\""
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key." "In this example, you'll use the `OpenAIEmbeddings`. Please enter an OpenAI API Key."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 1,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"OpenAI API Key: ········\n"
]
}
],
"source": [ "source": [
"import os\n", "import os\n",
"from getpass import getpass\n", "from getpass import getpass\n",
@ -69,14 +83,23 @@
"Create the Astra DB VectorStore:\n", "Create the Astra DB VectorStore:\n",
"\n", "\n",
"- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n", "- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n",
"- the Token looks like `AstraCS:6gBhNmsk135....`" "- the Token looks like `AstraCS:aBcD0123...`"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 2,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"ASTRA_DB_API_ENDPOINT = https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com\n",
"ASTRA_DB_APPLICATION_TOKEN = ········\n"
]
}
],
"source": [ "source": [
"ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n", "ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n",
"ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")" "ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")"
@ -84,11 +107,11 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 3,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from langchain_community.vectorstores import AstraDB\n", "from langchain_astradb import AstraDBVectorStore\n",
"from langchain_core.documents import Document\n", "from langchain_core.documents import Document\n",
"\n", "\n",
"docs = [\n", "docs = [\n",
@ -101,11 +124,13 @@
" metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2},\n", " metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2},\n",
" ),\n", " ),\n",
" Document(\n", " Document(\n",
" page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n", " page_content=\"A psychologist / detective gets lost in a series of dreams within dreams \"\n",
" \"within dreams and Inception reused the idea\",\n",
" metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6},\n", " metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6},\n",
" ),\n", " ),\n",
" Document(\n", " Document(\n",
" page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n", " page_content=\"A bunch of normal-sized women are supremely wholesome and some men \"\n",
" \"pine after them\",\n",
" metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3},\n", " metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3},\n",
" ),\n", " ),\n",
" Document(\n", " Document(\n",
@ -123,7 +148,7 @@
" ),\n", " ),\n",
"]\n", "]\n",
"\n", "\n",
"vectorstore = AstraDB.from_documents(\n", "vectorstore = AstraDBVectorStore.from_documents(\n",
" docs,\n", " docs,\n",
" embeddings,\n", " embeddings,\n",
" collection_name=\"astra_self_query_demo\",\n", " collection_name=\"astra_self_query_demo\",\n",
@ -136,13 +161,16 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Creating our self-querying retriever\n", "## Creating a self-querying retriever\n",
"Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents." "\n",
"Now you can instantiate the retriever.\n",
"\n",
"To do this, you need to provide some information upfront about the metadata fields that the documents support, along with a short description of the documents' contents."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 4,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -174,7 +202,11 @@
"llm = OpenAI(temperature=0)\n", "llm = OpenAI(temperature=0)\n",
"\n", "\n",
"retriever = SelfQueryRetriever.from_llm(\n", "retriever = SelfQueryRetriever.from_llm(\n",
" llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n", " llm,\n",
" vectorstore,\n",
" document_content_description,\n",
" metadata_field_info,\n",
" verbose=True,\n",
")" ")"
] ]
}, },
@ -183,14 +215,29 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Testing it out\n", "## Testing it out\n",
"And now we can try actually using our retriever!" "\n",
"Now you can try actually using our retriever:"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 5,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
{
"data": {
"text/plain": [
"[Document(id='d7b9ec1edafa467caab524455e8c1f5d', metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction'}, page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose'),\n",
" Document(id='8ad04ef2a73d4f74897a51e49be1a8d2', metadata={'year': 1995, 'genre': 'animated'}, page_content='Toys come alive and have a blast doing so'),\n",
" Document(id='5b07e600d3494506952b60e0a45a0546', metadata={'year': 1979, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'rating': 9.9}, page_content='Three men walk into the Zone, three men walk out of the Zone'),\n",
" Document(id='a0cef19e27c341929098ac4793602829', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea')]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [ "source": [
"# This example only specifies a relevant query\n", "# This example only specifies a relevant query\n",
"retriever.invoke(\"What are some movies about dinosaurs?\")" "retriever.invoke(\"What are some movies about dinosaurs?\")"
@ -198,9 +245,21 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 6,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
{
"data": {
"text/plain": [
"[Document(id='5b07e600d3494506952b60e0a45a0546', metadata={'year': 1979, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'rating': 9.9}, page_content='Three men walk into the Zone, three men walk out of the Zone'),\n",
" Document(id='a0cef19e27c341929098ac4793602829', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea')]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [ "source": [
"# This example specifies a filter\n", "# This example specifies a filter\n",
"retriever.invoke(\"I want to watch a movie rated higher than 8.5\")" "retriever.invoke(\"I want to watch a movie rated higher than 8.5\")"
@ -208,9 +267,20 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 7,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
{
"data": {
"text/plain": [
"[Document(id='0539843fd203484c9be486c2a0e2454c', metadata={'year': 2019, 'director': 'Greta Gerwig', 'rating': 8.3}, page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them')]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [ "source": [
"# This example only specifies a query and a filter\n", "# This example only specifies a query and a filter\n",
"retriever.invoke(\"Has Greta Gerwig directed any movies about women\")" "retriever.invoke(\"Has Greta Gerwig directed any movies about women\")"
@ -218,9 +288,21 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 8,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
{
"data": {
"text/plain": [
"[Document(id='a0cef19e27c341929098ac4793602829', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea'),\n",
" Document(id='5b07e600d3494506952b60e0a45a0546', metadata={'year': 1979, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'rating': 9.9}, page_content='Three men walk into the Zone, three men walk out of the Zone')]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [ "source": [
"# This example specifies a composite filter\n", "# This example specifies a composite filter\n",
"retriever.invoke(\"What's a highly rated (above 8.5), science fiction movie ?\")" "retriever.invoke(\"What's a highly rated (above 8.5), science fiction movie ?\")"
@ -228,9 +310,20 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 9,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
{
"data": {
"text/plain": [
"[Document(id='8ad04ef2a73d4f74897a51e49be1a8d2', metadata={'year': 1995, 'genre': 'animated'}, page_content='Toys come alive and have a blast doing so')]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [ "source": [
"# This example specifies a query and composite filter\n", "# This example specifies a query and composite filter\n",
"retriever.invoke(\n", "retriever.invoke(\n",
@ -242,20 +335,20 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Filter k\n", "## Set a limit ('k')\n",
"\n", "\n",
"We can also use the self query retriever to specify `k`: the number of documents to fetch.\n", "you can also use the self-query retriever to specify `k`, the number of documents to fetch.\n",
"\n", "\n",
"We can do this by passing `enable_limit=True` to the constructor." "You achieve this by passing `enable_limit=True` to the constructor."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 10,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"retriever = SelfQueryRetriever.from_llm(\n", "retriever_k = SelfQueryRetriever.from_llm(\n",
" llm,\n", " llm,\n",
" vectorstore,\n", " vectorstore,\n",
" document_content_description,\n", " document_content_description,\n",
@ -267,12 +360,24 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 11,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
{
"data": {
"text/plain": [
"[Document(id='d7b9ec1edafa467caab524455e8c1f5d', metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction'}, page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose'),\n",
" Document(id='8ad04ef2a73d4f74897a51e49be1a8d2', metadata={'year': 1995, 'genre': 'animated'}, page_content='Toys come alive and have a blast doing so')]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [ "source": [
"# This example only specifies a relevant query\n", "# This example only specifies a relevant query\n",
"retriever.invoke(\"What are two movies about dinosaurs?\")" "retriever_k.invoke(\"What are two movies about dinosaurs?\")"
] ]
}, },
{ {
@ -293,7 +398,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 12,
"metadata": { "metadata": {
"collapsed": false, "collapsed": false,
"jupyter": { "jupyter": {
@ -322,7 +427,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.12" "version": "3.12.8"
} }
}, },
"nbformat": 4, "nbformat": 4,