[shardformer] update shardformer readme (#4617)

[shardformer] update shardformer readme [shardformer] update shardformer readme
2025-08-20 17:03:22 +00:00 · 2023-09-05 13:14:41 +08:00 · 2023-09-05 13:14:41 +08:00 · ec0866804c
commit ec0866804c
parent 86d22581e4
2 changed files with 14 additions and 11 deletions
--- a/colossalai/shardformer/README.md
+++ b/colossalai/shardformer/README.md
@ -429,12 +429,13 @@ As shown in the figures above, when the sequence length is around 1000 or greate
 ### Convergence
-To validate that training the model using shardformers does not impact its convergence. We [fine-tuned the BERT model](./examples/convergence_benchmark.py) using both shardformer and non-shardformer approaches. We compared the accuracy, loss, F1 score of the training results.
+To validate that training the model using shardformers does not impact its convergence. We [fine-tuned the BERT model](../../examples/language/bert/finetune.py) using both shardformer and non-shardformer approaches. The example that utilizes Shardformer simultaneously with Pipeline Parallelism and Data Parallelism (Zero1). We then compared the accuracy, loss, and F1 score of the training results.
-| accuracy |   f1    |  loss   | GPU number | model shard |
+
 | accuracy |   f1    |  loss   | GPU number | model sharded |
 | :------: | :-----: | :-----: | :--------: | :---------: |
-| 0.82594  | 0.87441 | 0.09913 |     4      |    True     |
+| 0.84589  | 0.88613 | 0.43414 |     4      |    True    |
-| 0.81884  | 0.87299 | 0.10120 |     2      |    True     |
+| 0.83594  | 0.88064 | 0.43298 |     1      |    False    |
-| 0.81855  | 0.87124 | 0.10357 |     1      |    False    |
+
 Overall, the results demonstrate that using shardformers during model training does not affect the convergence.
--- a/examples/language/bert/README.md
+++ b/examples/language/bert/README.md
@ -7,13 +7,15 @@ This directory includes two parts: Using the Booster API finetune Huggingface Be
 bash test_ci.sh
 ```
-### Results on 2-GPU
+### Bert-Finetune Results
 | Plugin         | Accuracy | F1-score | GPU number |
 | -------------- | -------- | -------- | -------- |
 | torch_ddp      | 84.4%    | 88.6%    |    2     |
 | torch_ddp_fp16 | 84.7%    | 88.8%    |    2     |
 | gemini         | 84.0%    | 88.4%    |    2     |
 | hybrid_parallel | 84.5%    | 88.6%    |    4     |
 | Plugin         | Accuracy | F1-score |
 | -------------- | -------- | -------- |
 | torch_ddp      | 84.4%    | 88.6%    |
 | torch_ddp_fp16 | 84.7%    | 88.8%    |
 | gemini         | 84.0%    | 88.4%    |
 ## Benchmark
 ```