mirror of
https://github.com/hpcaitech/ColossalAI.git
synced 2025-11-02 06:54:14 +00:00
* [misc] remove config arg from initialize * [misc] remove old tensor contrusctor * [plugin] add npu support for ddp * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [devops] fix doc test ci * [test] fix test launch * [doc] update launch doc --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Train ViT on CIFAR-10 from scratch
🚀 Quick Start
This example provides a training script, which provides an example of training ViT on CIFAR10 dataset from scratch.
- Training Arguments
-p,--plugin: Plugin to use. Choices:torch_ddp,torch_ddp_fp16,low_level_zero. Defaults totorch_ddp.-r,--resume: Resume from checkpoint file path. Defaults to-1, which means not resuming.-c,--checkpoint: The folder to save checkpoints. Defaults to./checkpoint.-i,--interval: Epoch interval to save checkpoints. Defaults to5. If set to0, no checkpoint will be saved.--target_acc: Target accuracy. Raise exception if not reached. Defaults toNone.
Install requirements
pip install -r requirements.txt
Train
# train with torch DDP with fp32
colossalai run --nproc_per_node 4 train.py -c ./ckpt-fp32
# train with torch DDP with mixed precision training
colossalai run --nproc_per_node 4 train.py -c ./ckpt-fp16 -p torch_ddp_fp16
# train with low level zero
colossalai run --nproc_per_node 4 train.py -c ./ckpt-low_level_zero -p low_level_zero
Expected accuracy performance will be:
| Model | Single-GPU Baseline FP32 | Booster DDP with FP32 | Booster DDP with FP16 | Booster Low Level Zero |
|---|---|---|---|---|
| ViT | 83.00% | 84.03% | 84.00% | 84.43% |