[tutorial] added synthetic data for hybrid parallel (#1921)

* [tutorial] added synthetic data for hybrid parallel

* polish code
This commit is contained in:
Frank Lee
2022-11-12 18:18:55 +08:00
committed by GitHub
parent 3c42fdbedc
commit ff16773ded
3 changed files with 154 additions and 124 deletions

View File

@@ -1,16 +1,17 @@
# Handson 1: Multi-dimensional Parallelism with Colossal-AI
## Install Colossal-AI and other dependencies
## Install Titans Model Zoo
```bash
sh install.sh
pip install titans
```
## Prepare Dataset
We use CIFAR10 dataset in this example. The dataset will be downloaded to `../data` by default.
We use CIFAR10 dataset in this example. You should invoke the `donwload_cifar10.py` in the tutorial root directory or directly run the `auto_parallel_with_resnet.py`.
The dataset will be downloaded to `colossalai/examples/tutorials/data` by default.
If you wish to use customized directory for the dataset. You can set the environment variable `DATA` via the following command.
```bash
@@ -23,5 +24,9 @@ export DATA=/path/to/data
Current configuration setting on `config.py` is TP=2, PP=2.
```bash
# train with cifar10
colossalai run --nproc_per_node 4 train.py --config config.py
```
# train with synthetic data
colossalai run --nproc_per_node 4 train.py --config config.py
```