Files
langchain/docs/versioned_docs/version-0.2.x/integrations/document_loaders/tidb.ipynb
Jacob Lee aff771923a Jacob/new docs (#20570)
Use docusaurus versioning with a callout, merged master as well

@hwchase17 @baskaryan

---------

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Averi Kitsch <akitsch@google.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Martín Gotelli Ferenaz <martingotelliferenaz@gmail.com>
Co-authored-by: Fayfox <admin@fayfox.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Dawson Bauer <105886620+djbauer2@users.noreply.github.com>
Co-authored-by: Ravindu Somawansa <ravindu.somawansa@gmail.com>
Co-authored-by: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: WeichenXu <weichen.xu@databricks.com>
Co-authored-by: Benito Geordie <89472452+benitoThree@users.noreply.github.com>
Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
Co-authored-by: Sevin F. Varoglu <sfvaroglu@octoml.ai>
Co-authored-by: MacanPN <martin.triska@gmail.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
Co-authored-by: Hyeongchan Kim <kozistr@gmail.com>
Co-authored-by: sdan <git@sdan.io>
Co-authored-by: Guangdong Liu <liugddx@gmail.com>
Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: pjb157 <84070455+pjb157@users.noreply.github.com>
Co-authored-by: Eun Hye Kim <ehkim1440@gmail.com>
Co-authored-by: kaijietti <43436010+kaijietti@users.noreply.github.com>
Co-authored-by: Pengcheng Liu <pcliu.fd@gmail.com>
Co-authored-by: Tomer Cagan <tomer@tomercagan.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
2024-04-18 11:10:55 -07:00

190 lines
6.4 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TiDB\n",
"\n",
"> [TiDB Cloud](https://tidbcloud.com/), is a comprehensive Database-as-a-Service (DBaaS) solution, that provides dedicated and serverless options. TiDB Serverless is now integrating a built-in vector search into the MySQL landscape. With this enhancement, you can seamlessly develop AI applications using TiDB Serverless without the need for a new database or additional technical stacks. Be among the first to experience it by joining the waitlist for the private beta at https://tidb.cloud/ai.\n",
"\n",
"This notebook introduces how to use `TiDBLoader` to load data from TiDB in langchain."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"Before using the `TiDBLoader`, we will install the following dependencies:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we will configure the connection to a TiDB. In this notebook, we will follow the standard connection method provided by TiDB Cloud to establish a secure and efficient database connection."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"\n",
"# copy from tidb cloud consolereplace it with your own\n",
"tidb_connection_string_template = \"mysql+pymysql://<USER>:<PASSWORD>@<HOST>:4000/<DB>?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true\"\n",
"tidb_password = getpass.getpass(\"Input your TiDB password:\")\n",
"tidb_connection_string = tidb_connection_string_template.replace(\n",
" \"<PASSWORD>\", tidb_password\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Data from TiDB\n",
"\n",
"Here's a breakdown of some key arguments you can use to customize the behavior of the `TiDBLoader`:\n",
"\n",
"- `query` (str): This is the SQL query to be executed against the TiDB database. The query should select the data you want to load into your `Document` objects. \n",
" For instance, you might use a query like `\"SELECT * FROM my_table\"` to fetch all data from `my_table`.\n",
"\n",
"- `page_content_columns` (Optional[List[str]]): Specifies the list of column names whose values should be included in the `page_content` of each `Document` object. \n",
" If set to `None` (the default), all columns returned by the query are included in `page_content`. This allows you to tailor the content of each document based on specific columns of your data.\n",
"\n",
"- `metadata_columns` (Optional[List[str]]): Specifies the list of column names whose values should be included in the `metadata` of each `Document` object. \n",
" By default, this list is empty, meaning no metadata will be included unless explicitly specified. This is useful for including additional information about each document that doesn't form part of the main content but is still valuable for processing or analysis."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from sqlalchemy import Column, Integer, MetaData, String, Table, create_engine\n",
"\n",
"# Connect to the database\n",
"engine = create_engine(tidb_connection_string)\n",
"metadata = MetaData()\n",
"table_name = \"test_tidb_loader\"\n",
"\n",
"# Create a table\n",
"test_table = Table(\n",
" table_name,\n",
" metadata,\n",
" Column(\"id\", Integer, primary_key=True),\n",
" Column(\"name\", String(255)),\n",
" Column(\"description\", String(255)),\n",
")\n",
"metadata.create_all(engine)\n",
"\n",
"\n",
"with engine.connect() as connection:\n",
" transaction = connection.begin()\n",
" try:\n",
" connection.execute(\n",
" test_table.insert(),\n",
" [\n",
" {\"name\": \"Item 1\", \"description\": \"Description of Item 1\"},\n",
" {\"name\": \"Item 2\", \"description\": \"Description of Item 2\"},\n",
" {\"name\": \"Item 3\", \"description\": \"Description of Item 3\"},\n",
" ],\n",
" )\n",
" transaction.commit()\n",
" except:\n",
" transaction.rollback()\n",
" raise"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"------------------------------\n",
"content: name: Item 1\n",
"description: Description of Item 1\n",
"metada: {'id': 1}\n",
"------------------------------\n",
"content: name: Item 2\n",
"description: Description of Item 2\n",
"metada: {'id': 2}\n",
"------------------------------\n",
"content: name: Item 3\n",
"description: Description of Item 3\n",
"metada: {'id': 3}\n"
]
}
],
"source": [
"from langchain_community.document_loaders import TiDBLoader\n",
"\n",
"# Setup TiDBLoader to retrieve data\n",
"loader = TiDBLoader(\n",
" connection_string=tidb_connection_string,\n",
" query=f\"SELECT * FROM {table_name};\",\n",
" page_content_columns=[\"name\", \"description\"],\n",
" metadata_columns=[\"id\"],\n",
")\n",
"\n",
"# Load data\n",
"documents = loader.load()\n",
"\n",
"# Display the loaded documents\n",
"for doc in documents:\n",
" print(\"-\" * 30)\n",
" print(f\"content: {doc.page_content}\\nmetada: {doc.metadata}\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"test_table.drop(bind=engine)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "langchain",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}