mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-06 15:13:56 +00:00
**Description:** Fixing typos and rendering issues for Google Cloud database integrations. **Issue:** NA **Dependencies:** NA
643 lines
22 KiB
Plaintext
643 lines
22 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Google Cloud SQL for MySQL\n",
|
|
"\n",
|
|
"> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers [MySQL](https://cloud.google.com/sql/mysql), [PostgreSQL](https://cloud.google.com/sql/postgres), and [SQL Server](https://cloud.google.com/sql/sqlserver) database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n",
|
|
"\n",
|
|
"This notebook goes over how to use [Cloud SQL for MySQL](https://cloud.google.com/sql/mysql) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `MySQLLoader` and `MySQLDocumentSaver`.\n",
|
|
"\n",
|
|
"[](https://colab.research.google.com/github/googleapis/langchain-google-cloud-sql-mysql-python/blob/main/docs/document_loader.ipynb)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Before You Begin\n",
|
|
"\n",
|
|
"To run this notebook, you will need to do the following:\n",
|
|
"\n",
|
|
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
|
|
"* [Create a Cloud SQL for MySQL instance](https://cloud.google.com/sql/docs/mysql/create-instance)\n",
|
|
"* [Create a Cloud SQL database](https://cloud.google.com/sql/docs/mysql/create-manage-databases)\n",
|
|
"* [Add an IAM database user to the database](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users#creating-a-database-user) (Optional)\n",
|
|
"\n",
|
|
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# @markdown Please fill in the both the Google Cloud region and name of your Cloud SQL instance.\n",
|
|
"REGION = \"us-central1\" # @param {type:\"string\"}\n",
|
|
"INSTANCE = \"test-instance\" # @param {type:\"string\"}\n",
|
|
"\n",
|
|
"# @markdown Please specify a database and a table for demo purpose.\n",
|
|
"DATABASE = \"test\" # @param {type:\"string\"}\n",
|
|
"TABLE_NAME = \"test-default\" # @param {type:\"string\"}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### 🦜🔗 Library Installation\n",
|
|
"\n",
|
|
"The integration lives in its own `langchain-google-cloud-sql-mysql` package, so we need to install it."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install -upgrade --quiet langchain-google-cloud-sql-mysql"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
|
|
"# import IPython\n",
|
|
"\n",
|
|
"# app = IPython.Application.instance()\n",
|
|
"# app.kernel.do_shutdown(True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### ☁ Set Your Google Cloud Project\n",
|
|
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
|
|
"\n",
|
|
"If you don't know your project ID, try the following:\n",
|
|
"\n",
|
|
"* Run `gcloud config list`.\n",
|
|
"* Run `gcloud projects list`.\n",
|
|
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
|
|
"\n",
|
|
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
|
|
"\n",
|
|
"# Set the project id\n",
|
|
"!gcloud config set project {PROJECT_ID}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### 🔐 Authentication\n",
|
|
"\n",
|
|
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
|
|
"\n",
|
|
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
|
|
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from google.colab import auth\n",
|
|
"\n",
|
|
"auth.authenticate_user()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### API Enablement\n",
|
|
"The `langchain-google-cloud-sql-mysql` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# enable Cloud SQL Admin API\n",
|
|
"!gcloud services enable sqladmin.googleapis.com"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Basic Usage"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### MySQLEngine Connection Pool\n",
|
|
"\n",
|
|
"Before saving or loading documents from MySQL table, we need first configures a connection pool to Cloud SQL database. The `MySQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n",
|
|
"\n",
|
|
"To create a `MySQLEngine` using `MySQLEngine.from_instance()` you need to provide only 4 things:\n",
|
|
"\n",
|
|
"1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n",
|
|
"2. `region` : Region where the Cloud SQL instance is located.\n",
|
|
"3. `instance` : The name of the Cloud SQL instance.\n",
|
|
"4. `database` : The name of the database to connect to on the Cloud SQL instance.\n",
|
|
"\n",
|
|
"By default, [IAM database authentication](https://cloud.google.com/sql/docs/mysql/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n",
|
|
"\n",
|
|
"For more informatin on IAM database authentication please see:\n",
|
|
"\n",
|
|
"* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/mysql/create-edit-iam-instances)\n",
|
|
"* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users)\n",
|
|
"\n",
|
|
"Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/mysql/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `MySQLEngine.from_instance()`:\n",
|
|
"\n",
|
|
"* `user` : Database user to use for built-in database authentication and login\n",
|
|
"* `password` : Database password to use for built-in database authentication and login."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_google_cloud_sql_mysql import MySQLEngine\n",
|
|
"\n",
|
|
"engine = MySQLEngine.from_instance(\n",
|
|
" project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Initialize a table\n",
|
|
"\n",
|
|
"Initialize a table of default schema via `MySQLEngine.init_document_table(<table_name>)`. Table Columns:\n",
|
|
"\n",
|
|
"- page_content (type: text)\n",
|
|
"- langchain_metadata (type: JSON)\n",
|
|
"\n",
|
|
"`overwrite_existing=True` flag means the newly initialized table will replace any existing table of the same name."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"engine.init_document_table(TABLE_NAME, overwrite_existing=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Save documents\n",
|
|
"\n",
|
|
"Save langchain documents with `MySQLDocumentSaver.add_documents(<documents>)`. To initialize `MySQLDocumentSaver` class you need to provide 2 things:\n",
|
|
"\n",
|
|
"1. `engine` - An instance of a `MySQLEngine` engine.\n",
|
|
"2. `table_name` - The name of the table within the Cloud SQL database to store langchain documents."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_core.documents import Document\n",
|
|
"from langchain_google_cloud_sql_mysql import MySQLDocumentSaver\n",
|
|
"\n",
|
|
"test_docs = [\n",
|
|
" Document(\n",
|
|
" page_content=\"Apple Granny Smith 150 0.99 1\",\n",
|
|
" metadata={\"fruit_id\": 1},\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" page_content=\"Banana Cavendish 200 0.59 0\",\n",
|
|
" metadata={\"fruit_id\": 2},\n",
|
|
" ),\n",
|
|
" Document(\n",
|
|
" page_content=\"Orange Navel 80 1.29 1\",\n",
|
|
" metadata={\"fruit_id\": 3},\n",
|
|
" ),\n",
|
|
"]\n",
|
|
"saver = MySQLDocumentSaver(engine=engine, table_name=TABLE_NAME)\n",
|
|
"saver.add_documents(test_docs)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load documents"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Load langchain documents with `MySQLLoader.load()` or `MySQLLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `MySQLLoader` class you need to provide:\n",
|
|
"\n",
|
|
"1. `engine` - An instance of a `MySQLEngine` engine.\n",
|
|
"2. `table_name` - The name of the table within the Cloud SQL database to store langchain documents."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_google_cloud_sql_mysql import MySQLLoader\n",
|
|
"\n",
|
|
"loader = MySQLLoader(engine=engine, table_name=TABLE_NAME)\n",
|
|
"docs = loader.lazy_load()\n",
|
|
"for doc in docs:\n",
|
|
" print(\"Loaded documents:\", doc)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load documents via query"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Other than loading documents from a table, we can also choose to load documents from a view generated from a SQL query. For example:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_google_cloud_sql_mysql import MySQLLoader\n",
|
|
"\n",
|
|
"loader = MySQLLoader(\n",
|
|
" engine=engine,\n",
|
|
" query=f\"select * from `{TABLE_NAME}` where JSON_EXTRACT(langchain_metadata, '$.fruit_id') = 1;\",\n",
|
|
")\n",
|
|
"onedoc = loader.load()\n",
|
|
"onedoc"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The view generated from SQL query can have different schema than default table. In such cases, the behavior of MySQLLoader is the same as loading from table with non-default schema. Please refer to section [Load documents with customized document page content & metadata](#Load-documents-with-customized-document-page-content-&-metadata)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Delete documents"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Delete a list of langchain documents from MySQL table with `MySQLDocumentSaver.delete(<documents>)`.\n",
|
|
"\n",
|
|
"For table with default schema (page_content, langchain_metadata), the deletion criteria is:\n",
|
|
"\n",
|
|
"A `row` should be deleted if there exists a `document` in the list, such that\n",
|
|
"\n",
|
|
"- `document.page_content` equals `row[page_content]`\n",
|
|
"- `document.metadata` equals `row[langchain_metadata]`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_google_cloud_sql_mysql import MySQLLoader\n",
|
|
"\n",
|
|
"loader = MySQLLoader(engine=engine, table_name=TABLE_NAME)\n",
|
|
"docs = loader.load()\n",
|
|
"print(\"Documents before delete:\", docs)\n",
|
|
"saver.delete(onedoc)\n",
|
|
"print(\"Documents after delete:\", loader.load())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Advanced Usage"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load documents with customized document page content & metadata"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"First we prepare an example table with non-default schema, and populate it with some arbitary data."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import sqlalchemy\n",
|
|
"\n",
|
|
"with engine.connect() as conn:\n",
|
|
" conn.execute(sqlalchemy.text(f\"DROP TABLE IF EXISTS `{TABLE_NAME}`\"))\n",
|
|
" conn.commit()\n",
|
|
" conn.execute(\n",
|
|
" sqlalchemy.text(\n",
|
|
" f\"\"\"\n",
|
|
" CREATE TABLE IF NOT EXISTS `{TABLE_NAME}`(\n",
|
|
" fruit_id INT AUTO_INCREMENT PRIMARY KEY,\n",
|
|
" fruit_name VARCHAR(100) NOT NULL,\n",
|
|
" variety VARCHAR(50),\n",
|
|
" quantity_in_stock INT NOT NULL,\n",
|
|
" price_per_unit DECIMAL(6,2) NOT NULL,\n",
|
|
" organic TINYINT(1) NOT NULL\n",
|
|
" )\n",
|
|
" \"\"\"\n",
|
|
" )\n",
|
|
" )\n",
|
|
" conn.execute(\n",
|
|
" sqlalchemy.text(\n",
|
|
" f\"\"\"\n",
|
|
" INSERT INTO `{TABLE_NAME}` (fruit_name, variety, quantity_in_stock, price_per_unit, organic)\n",
|
|
" VALUES\n",
|
|
" ('Apple', 'Granny Smith', 150, 0.99, 1),\n",
|
|
" ('Banana', 'Cavendish', 200, 0.59, 0),\n",
|
|
" ('Orange', 'Navel', 80, 1.29, 1);\n",
|
|
" \"\"\"\n",
|
|
" )\n",
|
|
" )\n",
|
|
" conn.commit()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"If we still load langchain documents with default parameters of `MySQLLoader` from this example table, the `page_content` of loaded documents will be the first column of the table, and `metadata` will be consisting of key-value pairs of all the other columns."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader = MySQLLoader(\n",
|
|
" engine=engine,\n",
|
|
" table_name=TABLE_NAME,\n",
|
|
")\n",
|
|
"loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can specify the content and metadata we want to load by setting the `content_columns` and `metadata_columns` when initializing the `MySQLLoader`.\n",
|
|
"\n",
|
|
"1. `content_columns`: The columns to write into the `page_content` of the document.\n",
|
|
"2. `metadata_columns`: The columns to write into the `metadata` of the document.\n",
|
|
"\n",
|
|
"For example here, the values of columns in `content_columns` will be joined together into a space-separated string, as `page_content` of loaded documents, and `metadata` of loaded documents will only contain key-value pairs of columns specified in `metadata_columns`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader = MySQLLoader(\n",
|
|
" engine=engine,\n",
|
|
" table_name=TABLE_NAME,\n",
|
|
" content_columns=[\n",
|
|
" \"variety\",\n",
|
|
" \"quantity_in_stock\",\n",
|
|
" \"price_per_unit\",\n",
|
|
" \"organic\",\n",
|
|
" ],\n",
|
|
" metadata_columns=[\"fruit_id\", \"fruit_name\"],\n",
|
|
")\n",
|
|
"loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Save document with customized page content & metadata"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"In order to save langchain document into table with customized metadata fields. We need first create such a table via `MySQLEngine.init_document_table()`, and specify the list of `metadata_columns` we want it to have. In this example, the created table will have table columns:\n",
|
|
"\n",
|
|
"- description (type: text): for storing fruit description.\n",
|
|
"- fruit_name (type text): for storing fruit name.\n",
|
|
"- organic (type tinyint(1)): to tell if the fruit is organic.\n",
|
|
"- other_metadata (type: JSON): for storing other metadata information of the fruit.\n",
|
|
"\n",
|
|
"We can use the following parameters with `MySQLEngine.init_document_table()` to create the table:\n",
|
|
"\n",
|
|
"1. `table_name`: The name of the table within the Cloud SQL database to store langchain documents.\n",
|
|
"2. `metadata_columns`: A list of `sqlalchemy.Column` indicating the list of metadata columns we need.\n",
|
|
"3. `content_column`: The name of column to store `page_content` of langchain document. Default: `page_content`.\n",
|
|
"4. `metadata_json_column`: The name of JSON column to store extra `metadata` of langchain document. Default: `langchain_metadata`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"engine.init_document_table(\n",
|
|
" TABLE_NAME,\n",
|
|
" metadata_columns=[\n",
|
|
" sqlalchemy.Column(\n",
|
|
" \"fruit_name\",\n",
|
|
" sqlalchemy.UnicodeText,\n",
|
|
" primary_key=False,\n",
|
|
" nullable=True,\n",
|
|
" ),\n",
|
|
" sqlalchemy.Column(\n",
|
|
" \"organic\",\n",
|
|
" sqlalchemy.Boolean,\n",
|
|
" primary_key=False,\n",
|
|
" nullable=True,\n",
|
|
" ),\n",
|
|
" ],\n",
|
|
" content_column=\"description\",\n",
|
|
" metadata_json_column=\"other_metadata\",\n",
|
|
" overwrite_existing=True,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Save documents with `MySQLDocumentSaver.add_documents(<documents>)`. As you can see in this example, \n",
|
|
"\n",
|
|
"- `document.page_content` will be saved into `description` column.\n",
|
|
"- `document.metadata.fruit_name` will be saved into `fruit_name` column.\n",
|
|
"- `document.metadata.organic` will be saved into `organic` column.\n",
|
|
"- `document.metadata.fruit_id` will be saved into `other_metadata` column in JSON format."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"test_docs = [\n",
|
|
" Document(\n",
|
|
" page_content=\"Granny Smith 150 0.99\",\n",
|
|
" metadata={\"fruit_id\": 1, \"fruit_name\": \"Apple\", \"organic\": 1},\n",
|
|
" ),\n",
|
|
"]\n",
|
|
"saver = MySQLDocumentSaver(\n",
|
|
" engine=engine,\n",
|
|
" table_name=TABLE_NAME,\n",
|
|
" content_column=\"description\",\n",
|
|
" metadata_json_column=\"other_metadata\",\n",
|
|
")\n",
|
|
"saver.add_documents(test_docs)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"with engine.connect() as conn:\n",
|
|
" result = conn.execute(sqlalchemy.text(f\"select * from `{TABLE_NAME}`;\"))\n",
|
|
" print(result.keys())\n",
|
|
" print(result.fetchall())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Delete documents with customized page content & metadata"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can also delete documents from table with customized metadata columns via `MySQLDocumentSaver.delete(<documents>)`. The deletion criteria is:\n",
|
|
"\n",
|
|
"A `row` should be deleted if there exists a `document` in the list, such that\n",
|
|
"\n",
|
|
"- `document.page_content` equals `row[page_content]`\n",
|
|
"- For every metadata field `k` in `document.metadata`\n",
|
|
" - `document.metadata[k]` equals `row[k]` or `document.metadata[k]` equals `row[langchain_metadata][k]`\n",
|
|
"- There no extra metadata field presents in `row` but not in `document.metadata`.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader = MySQLLoader(engine=engine, table_name=TABLE_NAME)\n",
|
|
"docs = loader.load()\n",
|
|
"print(\"Documents before delete:\", docs)\n",
|
|
"saver.delete(docs)\n",
|
|
"print(\"Documents after delete:\", loader.load())"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|