docs: add dbgpt_hub usage documents (#955)

2025-09-12 12:37:14 +00:00 · 2023-12-20 10:18:22 +08:00
parent ba8fa8774d
commit aec124a5f1
13 changed files with 339 additions and 67 deletions
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,4 +1,4 @@
-# DB-GPT Website
+# DB-GPT documentation 

 ## Quick Start

@@ -17,9 +17,3 @@ yarn start

 The default service starts on port `3000`, visit `localhost:3000`

-## Docker development 
-
-```commandline
-docker build -t dbgptweb .
-docker run --restart=unless-stopped -d -p 3000:3000 dbgptweb
-```
--- a/docs/docs/application/fine_tuning_manual/dbgpt_hub.md
+++ b/docs/docs/application/fine_tuning_manual/dbgpt_hub.md
@@ -0,0 +1,118 @@
+# Fine-Tuning use dbgpt_hub 
+
+The DB-GPT-Hub project has released a pip package to lower the threshold for Text2SQL training. In addition to fine-tuning through the scripts provided in the warehouse, you can alse use the Python package we provide 
+for fine-tuning.
+
+## Install
+```
+pip install dbgpt_hub
+```
+
+## Show Baseline
+```python
+from dbgpt_hub.baseline import show_scores
+show_scores()
+```
+<p align="left">
+  <img src={'/img/ft/baseline.png'} width="720px" />
+</p>
+
+## Fine-tuning
+
+```python
+from dbgpt_hub.data_process import preprocess_sft_data
+from dbgpt_hub.train import train_sft
+from dbgpt_hub.predict import start_predict
+from dbgpt_hub.eval import start_evaluate
+```
+
+
+Preprocessing data into fine-tuned data format.
+```
+data_folder = "dbgpt_hub/data"
+data_info = [
+     {
+        "data_source": "spider",
+        "train_file": ["train_spider.json", "train_others.json"],
+        "dev_file": ["dev.json"],
+        "tables_file": "tables.json",
+        "db_id_name": "db_id",
+        "is_multiple_turn": False,
+        "train_output": "spider_train.json",
+        "dev_output": "spider_dev.json",
+    }
+]
+
+preprocess_sft_data(
+      data_folder = data_folder,
+      data_info = data_info
+)
+```
+
+Fine-tune the basic model and generate model weights
+```
+train_args = {
+            "model_name_or_path": "codellama/CodeLlama-13b-Instruct-hf",
+            "do_train": True,
+            "dataset": "example_text2sql_train",
+            "max_source_length": 2048,
+            "max_target_length": 512,
+            "finetuning_type": "lora",
+            "lora_target": "q_proj,v_proj",
+            "template": "llama2",
+            "lora_rank": 64,
+            "lora_alpha": 32,
+            "output_dir": "dbgpt_hub/output/adapter/CodeLlama-13b-sql-lora",
+            "overwrite_cache": True,
+            "overwrite_output_dir": True,
+            "per_device_train_batch_size": 1,
+            "gradient_accumulation_steps": 16,
+            "lr_scheduler_type": "cosine_with_restarts",
+            "logging_steps": 50,
+            "save_steps": 2000,
+            "learning_rate": 2e-4,
+            "num_train_epochs": 8,
+            "plot_loss": True,
+            "bf16": True,
+}
+
+start_sft(train_args)
+
+```
+
+Predictive model output results
+```
+predict_args = {
+            "model_name_or_path": "codellama/CodeLlama-13b-Instruct-hf",
+            "template": "llama2",
+            "finetuning_type": "lora",
+            "checkpoint_dir": "dbgpt_hub/output/adapter/CodeLlama-13b-sql-lora",
+            "predict_file_path": "dbgpt_hub/data/eval_data/dev_sql.json",
+            "predict_out_dir": "dbgpt_hub/output/",
+            "predicted_out_filename": "pred_sql.sql",
+}
+start_predict(predict_args)
+
+```
+
+Evaluate the accuracy of the output results on the test datasets
+
+```
+evaluate_args =  {
+            "input": "./dbgpt_hub/output/pred/pred_sql_dev_skeleton.sql",
+            "gold": "./dbgpt_hub/data/eval_data/gold.txt",
+            "gold_natsql": "./dbgpt_hub/data/eval_data/gold_natsql2sql.txt",
+            "db": "./dbgpt_hub/data/spider/database",
+            "table": "./dbgpt_hub/data/eval_data/tables.json",
+            "table_natsql": "./dbgpt_hub/data/eval_data/tables_for_natsql2sql.json",
+            "etype": "exec",
+            "plug_value": True,
+            "keep_distict": False,
+            "progress_bar_for_each_datapoint": False,
+            "natsql": False,
+}
+start_evaluate(evaluate_args)
+```
+
+
+
--- a/docs/docs/faq/chatdata.md
+++ b/docs/docs/faq/chatdata.md
@@ -1,55 +0,0 @@
-ChatData & ChatDB
-==================================
-ChatData generates SQL from natural language and executes it. ChatDB involves conversing with metadata from the
-Database, including metadata about databases, tables, and
-fields.![db plugins demonstration](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/d8bfeee9-e982-465e-a2b8-1164b673847e)
-
-### 1.Choose Datasource
-
-If you are using DB-GPT for the first time, you need to add a data source and set the relevant connection information
-for the data source.
-
-```{tip}
-there are some example data in DB-GPT-NEW/DB-GPT/docker/examples
-
-you can execute sql script to generate data.
-```
-
-#### 1.1 Datasource management
-
-![db plugins demonstration](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/7678f07e-9eee-40a9-b980-5b3978a0ed52)
-
-#### 1.2 Connection management
-
-![db plugins demonstration](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/25b8f5a9-d322-459e-a8b2-bfe8cb42bdd6)
-
-#### 1.3 Add Datasource
-
-![db plugins demonstration](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/19ce31a7-4061-4da8-a9cb-efca396cc085)
-
-```{note}
-now DB-GPT support Datasource Type
-
-* Mysql
-* Sqlite
-* DuckDB
-* Clickhouse
-* Mssql
-```
-
-### 2.ChatData
-##### Preview Mode
-After successfully setting up the data source, you can start conversing with the database. You can ask it to generate
-SQL for you or inquire about relevant information on the database's metadata.
-![db plugins demonstration](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/8acf6a42-e511-48ff-aabf-3d9037485c1c)
-
-##### Editor Mode
-In Editor Mode, you can edit your sql and execute it.
-![db plugins demonstration](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/1a896dc1-7c0e-4354-8629-30357ffd8d7f)
-
-
-### 3.ChatDB
-
-![db plugins demonstration](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/e04bc1b1-2c58-4b33-af62-97e89098ace7)
-
-
--- a/docs/docs/overview.md
+++ b/docs/docs/overview.md
@@ -82,4 +82,10 @@ Connect various data sources
 Observing & monitoring

 - [Evaluation](/docs/modules/eval)
-Evaluate framework performance and accuracy
+Evaluate framework performance and accuracy
+
+## Community
+If you encounter any problems during the process, you can submit an [issue](https://github.com/eosphoros-ai/DB-GPT/issues) and communicate with us.
+
+We welcome [discussions](https://github.com/orgs/eosphoros-ai/discussions) in the community
+
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@@ -161,6 +161,10 @@ const sidebars = {
              type: 'doc',
              id: 'application/fine_tuning_manual/text_to_sql',
            },
+            {
+              type: 'doc',
+              id: 'application/fine_tuning_manual/dbgpt_hub',
+            },
          ],
        },
      ],
@@ -224,10 +228,6 @@ const sidebars = {
          type: 'doc',
          id: 'faq/kbqa',
        }
-        ,{
-          type: 'doc',
-          id: 'faq/chatdata',
-        },
      ],
    },
    
--- a/docs/static/img/ft/baseline.png
+++ b/docs/static/img/ft/baseline.png