DB-GPT/docs/getting_started/install/deploy/deploy.md

# Installation From Source

This tutorial gives you a quick walkthrough about use DB-GPT with you environment and data.

## Installation

To get started, install DB-GPT with the following steps.

### 1. Hardware Requirements
 DB-GPT can be deployed on servers with low hardware requirements or on servers with high hardware requirements.

##### Low hardware requirements
The low hardware requirements mode is suitable for integrating with third-party LLM services' APIs, such as OpenAI, Tongyi, Wenxin, or Llama.cpp.

DB-GPT provides set proxy api to support LLM api.

As our project has the ability to achieve ChatGPT performance of over 85%,

##### High hardware requirements
The high hardware requirements mode is suitable for independently deploying LLM services, such as Llama series models, Baichuan, ChatGLM, Vicuna, and other private LLM service.
there are certain hardware requirements. However, overall, the project can be deployed and used on consumer-grade graphics cards. The specific hardware requirements for deployment are as follows:

| GPU      | VRAM Size | Performance                                 |
|----------|-----------| ------------------------------------------- |
| RTX 4090 | 24 GB     | Smooth conversation inference        |
| RTX 3090 | 24 GB     | Smooth conversation inference, better than V100 |
| V100     | 16 GB     | Conversation inference possible, noticeable stutter |
| T4       | 16 GB     | Conversation inference possible, noticeable stutter |

If your VRAM Size is not enough, DB-GPT supported 8-bit quantization and 4-bit quantization.

Here are some of the VRAM size usage of the models we tested in some common scenarios.

| Model     |  Quantize | VRAM Size |
| --------- | --------- | --------- |
| vicuna-7b-v1.5  | 4-bit  | 8 GB     |
| vicuna-7b-v1.5  | 8-bit  | 12 GB     |
| vicuna-13b-v1.5  | 4-bit  | 12 GB     |
| vicuna-13b-v1.5  | 8-bit  | 20 GB     |
| llama-2-7b  | 4-bit  | 8 GB     |
| llama-2-7b  | 8-bit  | 12 GB     |
| llama-2-13b  | 4-bit  | 12 GB     |
| llama-2-13b  | 8-bit  | 20 GB     |
| llama-2-70b  | 4-bit  | 48 GB     |
| llama-2-70b  | 8-bit  | 80 GB     |
| baichuan-7b  | 4-bit  | 8 GB     |
| baichuan-7b  | 8-bit  | 12 GB     |
| baichuan-13b  | 4-bit  | 12 GB     |
| baichuan-13b  | 8-bit  | 20 GB     |

### 2. Install
```bash
git clone https://github.com/eosphoros-ai/DB-GPT.git
```

We use Sqlite as default database, so there is no need for database installation.  If you choose to connect to other databases, you can follow our tutorial for installation and configuration.
For the entire installation process of DB-GPT, we use the miniconda3 virtual environment. Create a virtual environment and install the Python dependencies.
[How to install Miniconda](https://docs.conda.io/en/latest/miniconda.html)
```bash
python>=3.10
conda create -n dbgpt_env python=3.10
conda activate dbgpt_env
# it will take some minutes
pip install -e ".[default]"
```

Once the environment is installed, we have to create a new folder "models" in the DB-GPT project, and then we can put all the models downloaded from huggingface in this directory

```{tip}
Notice make sure you have install git-lfs

centos:yum install git-lfs

ubuntu:apt-get install git-lfs

macos:brew install git-lfs
```
##### Download LLM Model and Embedding Model

If you use OpenAI llm service, see [How to Use LLM REST API](https://db-gpt.readthedocs.io/en/latest/getting_started/install/llm/proxyllm/proxyllm.html)

```{tip}
If you use openai or Axzure or tongyi llm api service, you don't need to download llm model.

```

```bash
cd DB-GPT
mkdir models and cd models

#### embedding model
git clone https://huggingface.co/GanymedeNil/text2vec-large-chinese
or
git clone https://huggingface.co/moka-ai/m3e-large

#### llm model, if you use openai or Azure or tongyi llm api service, you don't need to download llm model
git clone https://huggingface.co/lmsys/vicuna-13b-v1.5
or
git clone https://huggingface.co/THUDM/chatglm2-6b

```

The model files are large and will take a long time to download. During the download, let's configure the .env file, which needs to be copied and created from the .env.template

```{tip}
cp .env.template .env
```

You can configure basic parameters in the .env file, for example setting LLM_MODEL to the model to be used

([Vicuna-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5) based on llama-2 has been released, we recommend you set `LLM_MODEL=vicuna-13b-v1.5` to try this model)

### 3. Run

**(Optional) load examples into SQLite**
```bash
bash ./scripts/examples/load_examples.sh
```

On windows platform:
```PowerShell
.\scripts\examples\load_examples.bat
```

Run db-gpt server

```bash
python pilot/server/dbgpt_server.py
```

Open http://localhost:5000 with your browser to see the product.


### Multiple GPUs

DB-GPT will use all available gpu by default. And you can modify the setting `CUDA_VISIBLE_DEVICES=0,1` in `.env` file to use the specific gpu IDs.

Optionally, you can also specify the gpu ID to use before the starting command, as shown below:

````shell
# Specify 1 gpu
CUDA_VISIBLE_DEVICES=0 python3 pilot/server/dbgpt_server.py

# Specify 4 gpus
CUDA_VISIBLE_DEVICES=3,4,5,6 python3 pilot/server/dbgpt_server.py
````

You can modify the setting `MAX_GPU_MEMORY=xxGib` in `.env` file to configure the maximum memory used by each GPU.

### Not Enough Memory

DB-GPT supported 8-bit quantization and 4-bit quantization.

You can modify the setting `QUANTIZE_8bit=True` or `QUANTIZE_4bit=True` in `.env` file to use quantization(8-bit quantization is enabled by default).

Llama-2-70b with 8-bit quantization can run with 80 GB of VRAM, and 4-bit quantization can run with 48 GB of VRAM.