feat: Command-line tool design and multi-model integration

2025-09-08 12:30:14 +00:00 · 2023-08-31 17:21:38 +08:00
parent 05712d39b9
commit e4dd6060da
15 changed files with 887 additions and 229 deletions
--- a/docs/getting_started/install/llm/cluster/model_cluster.md
+++ b/docs/getting_started/install/llm/cluster/model_cluster.md
@@ -0,0 +1,219 @@
+Cluster deployment
+==================================
+
+## Model cluster deployment
+
+
+**Installing Command-Line Tool**
+
+All operations below are performed using the `dbgpt` command. To use the `dbgpt` command, you need to install the DB-GPT project with `pip install -e .`. Alternatively, you can use `python pilot/scripts/cli_scripts.py` as a substitute for the `dbgpt` command.
+
+### Launch Model Controller
+
+```bash
+dbgpt start controller
+```
+
+By default, the Model Controller starts on port 8000.
+
+
+### Launch Model Worker
+
+If you are starting `chatglm2-6b`:
+
+```bash
+dbgpt start worker --model_name chatglm2-6b \
+--model_path /app/models/chatglm2-6b \
+--port 8001 \
+--controller_addr http://127.0.0.1:8000
+```
+
+If you are starting `vicuna-13b-v1.5`:
+
+```bash
+dbgpt start worker --model_name vicuna-13b-v1.5 \
+--model_path /app/models/vicuna-13b-v1.5 \
+--port 8002 \
+--controller_addr http://127.0.0.1:8000
+```
+
+Note: Be sure to use your own model name and model path.
+
+
+Check your model:
+
+```bash
+dbgpt model list
+```
+
+You will see the following output:
+```
+-----------------+------------+------------+------+---------+---------+-----------------+----------------------------+
+|    Model Name   | Model Type |    Host    | Port | Healthy | Enabled | Prompt Template |       Last Heartbeat       |
+-----------------+------------+------------+------+---------+---------+-----------------+----------------------------+
+|   chatglm2-6b   |    llm     | 172.17.0.6 | 8001 |   True  |   True  |       None      | 2023-08-31T04:48:45.252939 |
+| vicuna-13b-v1.5 |    llm     | 172.17.0.6 | 8002 |   True  |   True  |       None      | 2023-08-31T04:48:55.136676 |
+-----------------+------------+------------+------+---------+---------+-----------------+----------------------------+
+```
+
+### Connect to the model service in the webserver (dbgpt_server)
+
+**First, modify the `.env` file to change the model name and the Model Controller connection address.**
+
+```bash
+LLM_MODEL=vicuna-13b-v1.5
+# The current default MODEL_SERVER address is the address of the Model Controller
+MODEL_SERVER=http://127.0.0.1:8000
+```
+
+#### Start the webserver
+
+```bash
+python pilot/server/dbgpt_server.py --light
+```
+
+`--light`  indicates not to start the embedded model service.
+
+Alternatively, you can prepend the command with `LLM_MODEL=chatglm2-6b` to start:
+
+```bash
+LLM_MODEL=chatglm2-6b python pilot/server/dbgpt_server.py --light
+```
+
+
+### More Command-Line Usages
+
+You can view more command-line usages through the help command.
+
+**View the `dbgpt` help**
+```bash
+dbgpt --help
+```
+
+You will see the basic command parameters and usage:
+
+```
+Usage: dbgpt [OPTIONS] COMMAND [ARGS]...
+
+Options:
+  --log-level TEXT  Log level
+  --version         Show the version and exit.
+  --help            Show this message and exit.
+
+Commands:
+  model  Clients that manage model serving
+  start  Start specific server.
+  stop   Start specific server.
+```
+
+**View the `dbgpt start` help**
+
+```bash
+dbgpt start --help
+```
+
+Here you can see the related commands and usage for start:
+
+```
+Usage: dbgpt start [OPTIONS] COMMAND [ARGS]...
+
+  Start specific server.
+
+Options:
+  --help  Show this message and exit.
+
+Commands:
+  apiserver   Start apiserver(TODO)
+  controller  Start model controller
+  webserver   Start webserver(dbgpt_server.py)
+  worker      Start model worker
+```
+
+**View the `dbgpt start worker`help**
+
+```bash
+dbgpt start worker --help
+```
+
+Here you can see the parameters to start Model Worker:
+
+```
+Usage: dbgpt start worker [OPTIONS]
+
+  Start model worker
+
+Options:
+  --model_name TEXT               Model name  [required]
+  --model_path TEXT               Model path  [required]
+  --worker_type TEXT              Worker type
+  --worker_class TEXT             Model worker class, pilot.model.worker.defau
+                                  lt_worker.DefaultModelWorker
+  --host TEXT                     Model worker deploy host  [default: 0.0.0.0]
+  --port INTEGER                  Model worker deploy port  [default: 8000]
+  --limit_model_concurrency INTEGER
+                                  Model concurrency limit  [default: 5]
+  --standalone                    Standalone mode. If True, embedded Run
+                                  ModelController
+  --register                      Register current worker to model controller
+                                  [default: True]
+  --worker_register_host TEXT     The ip address of current worker to register
+                                  to ModelController. If None, the address is
+                                  automatically determined
+  --controller_addr TEXT          The Model controller address to register
+  --send_heartbeat                Send heartbeat to model controller
+                                  [default: True]
+  --heartbeat_interval INTEGER    The interval for sending heartbeats
+                                  (seconds)  [default: 20]
+  --device TEXT                   Device to run model. If None, the device is
+                                  automatically determined
+  --model_type TEXT               Model type, huggingface or llama.cpp
+                                  [default: huggingface]
+  --prompt_template TEXT          Prompt template. If None, the prompt
+                                  template is automatically determined from
+                                  model path, supported template: zero_shot,vi
+                                  cuna_v1.1,llama-2,alpaca,baichuan-chat
+  --max_context_size INTEGER      Maximum context size  [default: 4096]
+  --num_gpus INTEGER              The number of gpus you expect to use, if it
+                                  is empty, use all of them as much as
+                                  possible
+  --max_gpu_memory TEXT           The maximum memory limit of each GPU, only
+                                  valid in multi-GPU configuration
+  --cpu_offloading                CPU offloading
+  --load_8bit                     8-bit quantization
+  --load_4bit                     4-bit quantization
+  --quant_type TEXT               Quantization datatypes, `fp4` (four bit
+                                  float) and `nf4` (normal four bit float),
+                                  only valid when load_4bit=True  [default:
+                                  nf4]
+  --use_double_quant              Nested quantization, only valid when
+                                  load_4bit=True  [default: True]
+  --compute_dtype TEXT            Model compute type
+  --trust_remote_code             Trust remote code  [default: True]
+  --verbose                       Show verbose output.
+  --help                          Show this message and exit.
+```
+
+**View the `dbgpt model`help**
+
+```bash
+dbgpt model --help
+```
+
+The `dbgpt model ` command can connect to the Model Controller via the Model Controller address and then manage a remote model:
+
+```
+Usage: dbgpt model [OPTIONS] COMMAND [ARGS]...
+
+  Clients that manage model serving
+
+Options:
+  --address TEXT  Address of the Model Controller to connect to. Just support
+                  light deploy model  [default: http://127.0.0.1:8000]
+  --help          Show this message and exit.
+
+Commands:
+  list     List model instances
+  restart  Restart model instances
+  start    Start model instances
+  stop     Stop model instances
+```
--- a/docs/getting_started/install/llm/llm.rst
+++ b/docs/getting_started/install/llm/llm.rst
@@ -19,6 +19,7 @@ Multi LLMs Support, Supports multiple large language models, currently supportin

 - llama_cpp
 - quantization
+- cluster deployment

 .. toctree::
   :maxdepth: 2
@@ -28,3 +29,4 @@ Multi LLMs Support, Supports multiple large language models, currently supportin

   ./llama/llama_cpp.md
   ./quantization/quantization.md
+   ./cluster/model_cluster.md