feat: Command-line tool design and multi-model integration

This commit is contained in:
FangYin Cheng
2023-08-31 17:21:38 +08:00
parent 05712d39b9
commit e4dd6060da
15 changed files with 887 additions and 229 deletions

View File

@@ -0,0 +1,219 @@
Cluster deployment
==================================
## Model cluster deployment
**Installing Command-Line Tool**
All operations below are performed using the `dbgpt` command. To use the `dbgpt` command, you need to install the DB-GPT project with `pip install -e .`. Alternatively, you can use `python pilot/scripts/cli_scripts.py` as a substitute for the `dbgpt` command.
### Launch Model Controller
```bash
dbgpt start controller
```
By default, the Model Controller starts on port 8000.
### Launch Model Worker
If you are starting `chatglm2-6b`:
```bash
dbgpt start worker --model_name chatglm2-6b \
--model_path /app/models/chatglm2-6b \
--port 8001 \
--controller_addr http://127.0.0.1:8000
```
If you are starting `vicuna-13b-v1.5`:
```bash
dbgpt start worker --model_name vicuna-13b-v1.5 \
--model_path /app/models/vicuna-13b-v1.5 \
--port 8002 \
--controller_addr http://127.0.0.1:8000
```
Note: Be sure to use your own model name and model path.
Check your model:
```bash
dbgpt model list
```
You will see the following output:
```
+-----------------+------------+------------+------+---------+---------+-----------------+----------------------------+
| Model Name | Model Type | Host | Port | Healthy | Enabled | Prompt Template | Last Heartbeat |
+-----------------+------------+------------+------+---------+---------+-----------------+----------------------------+
| chatglm2-6b | llm | 172.17.0.6 | 8001 | True | True | None | 2023-08-31T04:48:45.252939 |
| vicuna-13b-v1.5 | llm | 172.17.0.6 | 8002 | True | True | None | 2023-08-31T04:48:55.136676 |
+-----------------+------------+------------+------+---------+---------+-----------------+----------------------------+
```
### Connect to the model service in the webserver (dbgpt_server)
**First, modify the `.env` file to change the model name and the Model Controller connection address.**
```bash
LLM_MODEL=vicuna-13b-v1.5
# The current default MODEL_SERVER address is the address of the Model Controller
MODEL_SERVER=http://127.0.0.1:8000
```
#### Start the webserver
```bash
python pilot/server/dbgpt_server.py --light
```
`--light` indicates not to start the embedded model service.
Alternatively, you can prepend the command with `LLM_MODEL=chatglm2-6b` to start:
```bash
LLM_MODEL=chatglm2-6b python pilot/server/dbgpt_server.py --light
```
### More Command-Line Usages
You can view more command-line usages through the help command.
**View the `dbgpt` help**
```bash
dbgpt --help
```
You will see the basic command parameters and usage:
```
Usage: dbgpt [OPTIONS] COMMAND [ARGS]...
Options:
--log-level TEXT Log level
--version Show the version and exit.
--help Show this message and exit.
Commands:
model Clients that manage model serving
start Start specific server.
stop Start specific server.
```
**View the `dbgpt start` help**
```bash
dbgpt start --help
```
Here you can see the related commands and usage for start:
```
Usage: dbgpt start [OPTIONS] COMMAND [ARGS]...
Start specific server.
Options:
--help Show this message and exit.
Commands:
apiserver Start apiserver(TODO)
controller Start model controller
webserver Start webserver(dbgpt_server.py)
worker Start model worker
```
**View the `dbgpt start worker`help**
```bash
dbgpt start worker --help
```
Here you can see the parameters to start Model Worker:
```
Usage: dbgpt start worker [OPTIONS]
Start model worker
Options:
--model_name TEXT Model name [required]
--model_path TEXT Model path [required]
--worker_type TEXT Worker type
--worker_class TEXT Model worker class, pilot.model.worker.defau
lt_worker.DefaultModelWorker
--host TEXT Model worker deploy host [default: 0.0.0.0]
--port INTEGER Model worker deploy port [default: 8000]
--limit_model_concurrency INTEGER
Model concurrency limit [default: 5]
--standalone Standalone mode. If True, embedded Run
ModelController
--register Register current worker to model controller
[default: True]
--worker_register_host TEXT The ip address of current worker to register
to ModelController. If None, the address is
automatically determined
--controller_addr TEXT The Model controller address to register
--send_heartbeat Send heartbeat to model controller
[default: True]
--heartbeat_interval INTEGER The interval for sending heartbeats
(seconds) [default: 20]
--device TEXT Device to run model. If None, the device is
automatically determined
--model_type TEXT Model type, huggingface or llama.cpp
[default: huggingface]
--prompt_template TEXT Prompt template. If None, the prompt
template is automatically determined from
model path, supported template: zero_shot,vi
cuna_v1.1,llama-2,alpaca,baichuan-chat
--max_context_size INTEGER Maximum context size [default: 4096]
--num_gpus INTEGER The number of gpus you expect to use, if it
is empty, use all of them as much as
possible
--max_gpu_memory TEXT The maximum memory limit of each GPU, only
valid in multi-GPU configuration
--cpu_offloading CPU offloading
--load_8bit 8-bit quantization
--load_4bit 4-bit quantization
--quant_type TEXT Quantization datatypes, `fp4` (four bit
float) and `nf4` (normal four bit float),
only valid when load_4bit=True [default:
nf4]
--use_double_quant Nested quantization, only valid when
load_4bit=True [default: True]
--compute_dtype TEXT Model compute type
--trust_remote_code Trust remote code [default: True]
--verbose Show verbose output.
--help Show this message and exit.
```
**View the `dbgpt model`help**
```bash
dbgpt model --help
```
The `dbgpt model ` command can connect to the Model Controller via the Model Controller address and then manage a remote model:
```
Usage: dbgpt model [OPTIONS] COMMAND [ARGS]...
Clients that manage model serving
Options:
--address TEXT Address of the Model Controller to connect to. Just support
light deploy model [default: http://127.0.0.1:8000]
--help Show this message and exit.
Commands:
list List model instances
restart Restart model instances
start Start model instances
stop Stop model instances
```

View File

@@ -19,6 +19,7 @@ Multi LLMs Support, Supports multiple large language models, currently supportin
- llama_cpp
- quantization
- cluster deployment
.. toctree::
:maxdepth: 2
@@ -28,3 +29,4 @@ Multi LLMs Support, Supports multiple large language models, currently supportin
./llama/llama_cpp.md
./quantization/quantization.md
./cluster/model_cluster.md