[example] add zero1, zero2 example in GPT examples (#2146)

* [example] add zero1 and zero2 for GPT

* update readme in gpt example

* polish code

* change init value

* update readme
This commit is contained in:
HELSON
2022-12-20 14:30:27 +08:00
committed by GitHub
parent 1cce6e36ca
commit a7d95b7024
5 changed files with 40 additions and 27 deletions

View File

@@ -19,10 +19,10 @@ conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
```
### Install [Colossal-AI v0.1.11rc5](https://colossalai.org/download/) From Official Website
### Install [Colossal-AI v0.1.12](https://colossalai.org/download/) From Official Website
```bash
pip install colossalai==0.1.11rc5+torch1.12cu11.3 -f https://release.colossalai.org
pip install colossalai==0.1.12+torch1.12cu11.3 -f https://release.colossalai.org
```
### Install transformers
@@ -31,7 +31,8 @@ pip install colossalai==0.1.11rc5+torch1.12cu11.3 -f https://release.colossalai.
pip install transformers
```
This is just an example that we download PyTorch=1.12.0, CUDA=11.6 and colossalai=0.1.11rc5+torch1.12cu11.3. You can download another version of PyTorch and its corresponding ColossalAI version. Just make sure that the version of ColossalAI is at least 0.1.10, PyTorch is at least 1.8.1 and transformers is at least 4.231.
This is just an example that we download PyTorch=1.12.0, CUDA=11.6 and colossalai=0.1.12+torch1.12cu11.3. You can download another version of PyTorch and its corresponding ColossalAI version. Just make sure that the version of ColossalAI is at least 0.1.10, PyTorch is at least 1.8.1 and transformers is at least 4.231.
If you want to test ZeRO1 and ZeRO2 in Colossal-AI, you need to ensure Colossal-AI>=0.1.12.
## Dataset
@@ -48,5 +49,7 @@ bash run.sh
The `train_gpt_demo.py` provides three distributed plans, you can choose the plan you want in `run.sh`. The Colossal-AI leverages Tensor Parallel and Gemini + ZeRO DDP.
- Colossal-AI
- PyTorch DDP
- ZeRO
- ZeRO1 (Colossal-AI)
- ZeRO2 (Colossal-AI)
- Pytorch DDP
- Pytorch ZeRO