mirror of
https://github.com/hpcaitech/ColossalAI.git
synced 2025-07-04 11:06:25 +00:00
[shardformer] update shardformer readme (#4617)
[shardformer] update shardformer readme [shardformer] update shardformer readme
This commit is contained in:
parent
86d22581e4
commit
ec0866804c
@ -429,12 +429,13 @@ As shown in the figures above, when the sequence length is around 1000 or greate
|
|||||||
### Convergence
|
### Convergence
|
||||||
|
|
||||||
|
|
||||||
To validate that training the model using shardformers does not impact its convergence. We [fine-tuned the BERT model](./examples/convergence_benchmark.py) using both shardformer and non-shardformer approaches. We compared the accuracy, loss, F1 score of the training results.
|
To validate that training the model using shardformers does not impact its convergence. We [fine-tuned the BERT model](../../examples/language/bert/finetune.py) using both shardformer and non-shardformer approaches. The example that utilizes Shardformer simultaneously with Pipeline Parallelism and Data Parallelism (Zero1). We then compared the accuracy, loss, and F1 score of the training results.
|
||||||
|
|
||||||
| accuracy | f1 | loss | GPU number | model shard |
|
|
||||||
|
| accuracy | f1 | loss | GPU number | model sharded |
|
||||||
| :------: | :-----: | :-----: | :--------: | :---------: |
|
| :------: | :-----: | :-----: | :--------: | :---------: |
|
||||||
| 0.82594 | 0.87441 | 0.09913 | 4 | True |
|
| 0.84589 | 0.88613 | 0.43414 | 4 | True |
|
||||||
| 0.81884 | 0.87299 | 0.10120 | 2 | True |
|
| 0.83594 | 0.88064 | 0.43298 | 1 | False |
|
||||||
| 0.81855 | 0.87124 | 0.10357 | 1 | False |
|
|
||||||
|
|
||||||
Overall, the results demonstrate that using shardformers during model training does not affect the convergence.
|
Overall, the results demonstrate that using shardformers during model training does not affect the convergence.
|
||||||
|
@ -7,13 +7,15 @@ This directory includes two parts: Using the Booster API finetune Huggingface Be
|
|||||||
bash test_ci.sh
|
bash test_ci.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
### Results on 2-GPU
|
### Bert-Finetune Results
|
||||||
|
|
||||||
|
| Plugin | Accuracy | F1-score | GPU number |
|
||||||
|
| -------------- | -------- | -------- | -------- |
|
||||||
|
| torch_ddp | 84.4% | 88.6% | 2 |
|
||||||
|
| torch_ddp_fp16 | 84.7% | 88.8% | 2 |
|
||||||
|
| gemini | 84.0% | 88.4% | 2 |
|
||||||
|
| hybrid_parallel | 84.5% | 88.6% | 4 |
|
||||||
|
|
||||||
| Plugin | Accuracy | F1-score |
|
|
||||||
| -------------- | -------- | -------- |
|
|
||||||
| torch_ddp | 84.4% | 88.6% |
|
|
||||||
| torch_ddp_fp16 | 84.7% | 88.8% |
|
|
||||||
| gemini | 84.0% | 88.4% |
|
|
||||||
|
|
||||||
## Benchmark
|
## Benchmark
|
||||||
```
|
```
|
||||||
|
Loading…
Reference in New Issue
Block a user