[example] simplify opt example (#2344)

2026-01-26 13:24:33 +00:00 · 2023-01-06 10:08:41 +08:00
parent 7080a8edb0
commit 35e22be2f6
10 changed files with 234 additions and 684 deletions
--- a/examples/language/opt/README.md
+++ b/examples/language/opt/README.md
@@ -29,24 +29,5 @@ We adapt the OPT training code to ColossalAI by leveraging Gemini and ZeRO DDP.
 You can launch training by using the following bash script

 ```bash
-bash ./run_clm.sh <batch-size-per-gpu> <mem-cap> <model> <gpu-num>
+bash ./run_gemini.sh
 ```
-
- batch-size-per-gpu: number of samples fed to each GPU, default is 16
- mem-cap: limit memory usage within a value in GB, default is 0 (no limit)
- model: the size of the OPT model, default is `6.7b`. Acceptable values include `125m`, `350m`, `1.3b`, `2.7b`, `6.7`, `13b`, `30b`, `66b`. For `175b`, you can request
-the pretrained weights from [OPT weight downloading page](https://github.com/facebookresearch/metaseq/tree/main/projects/OPT).
- gpu-num: the number of GPUs to use, default is 1.
-
-## Remarkable Performance
-On a single GPU, Colossal-AI’s automatic strategy provides remarkable performance gains from the ZeRO Offloading strategy by Microsoft DeepSpeed.
-Users can experience up to a 40% speedup, at a variety of model scales. However, when using a traditional deep learning training framework like PyTorch, a single GPU can no longer support the training of models at such a scale.
-
-<p align="center">
-<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT.png" width=1000/>
-</p>
-
-Adopting the distributed training strategy with 8 GPUs is as simple as adding a `-nprocs 8` to the training command of Colossal-AI!
-
-More details about behind the scenes can be found on the corresponding [blog](https://medium.com/@yangyou_berkeley/colossal-ai-seamlessly-accelerates-large-models-at-low-costs-with-hugging-face-4d1a887e500d),
-and a detailed tutorial will be added in [Documentation](https://www.colossalai.org/docs/get_started/installation) very soon.