mirror of
https://github.com/csunny/DB-GPT.git
synced 2025-09-05 11:01:09 +00:00
feat(model): Support llama.cpp server deploy (#2263)
This commit is contained in:
40
docs/docs/installation/advanced_usage/Llamacpp_server.md
Normal file
40
docs/docs/installation/advanced_usage/Llamacpp_server.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# LLama.cpp Server
|
||||
|
||||
DB-GPT supports native [llama.cpp server](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md),
|
||||
which supports concurrent requests and continuous batching inference.
|
||||
|
||||
|
||||
## Install dependencies
|
||||
|
||||
```bash
|
||||
pip install -e ".[llama_cpp_server]"
|
||||
```
|
||||
If you want to accelerate the inference speed, and you have a GPU, you can install the following dependencies:
|
||||
|
||||
```bash
|
||||
CMAKE_ARGS="-DGGML_CUDA=ON" pip install -e ".[llama_cpp_server]"
|
||||
```
|
||||
|
||||
## Download the model
|
||||
|
||||
Here, we use the `qwen2.5-0.5b-instruct` model as an example. You can download the model from the [Huggingface](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF).
|
||||
|
||||
```bash
|
||||
wget https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_k_m.gguf?download=true -O /tmp/qwen2.5-0.5b-instruct-q4_k_m.gguf
|
||||
````
|
||||
|
||||
## Modify configuration file
|
||||
|
||||
In the `.env` configuration file, modify the inference type of the model to start `llama.cpp` inference.
|
||||
|
||||
```bash
|
||||
LLM_MODEL=qwen2.5-0.5b-instruct
|
||||
LLM_MODEL_PATH=/tmp/qwen2.5-0.5b-instruct-q4_k_m.gguf
|
||||
MODEL_TYPE=llama_cpp_server
|
||||
```
|
||||
|
||||
## Start the DB-GPT server
|
||||
|
||||
```bash
|
||||
python dbgpt/app/dbgpt_server.py
|
||||
```
|
@@ -271,6 +271,10 @@ const sidebars = {
|
||||
type: 'doc',
|
||||
id: 'installation/advanced_usage/vLLM_inference',
|
||||
},
|
||||
{
|
||||
type: 'doc',
|
||||
id: 'installation/advanced_usage/Llamacpp_server',
|
||||
},
|
||||
{
|
||||
type: 'doc',
|
||||
id: 'installation/advanced_usage/OpenAI_SDK_call',
|
||||
|
Reference in New Issue
Block a user