ColossalAI

mirror of https://github.com/hpcaitech/ColossalAI.git synced 2026-04-27 10:30:10 +00:00

Files

Wenhao Chen 7172459e74 [shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088 )

* [shardformer] implement policy for all GPT-J models and test

* [shardformer] support interleaved pipeline parallel for bert finetune

* [shardformer] shardformer support falcon (#4883)

* [shardformer]: fix interleaved pipeline for bert model (#5048)

* [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093)

* Add Mistral support for Shardformer (#5103)

* [shardformer] add tests to mistral (#5105)

---------

Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: eric8607242 <e0928021388@gmail.com>

2023-11-28 16:54:42 +08:00

benchmark_utils.py

[misc] update pre-commit and run all files (#4752 )

2023-09-19 14:20:26 +08:00

benchmark.py

[misc] update pre-commit and run all files (#4752 )

2023-09-19 14:20:26 +08:00

benchmark.sh

[booster] update bert example, using booster api (#3885 )

2023-06-07 15:51:00 +08:00

data.py

[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088 )

2023-11-28 16:54:42 +08:00

finetune.py

[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088 )

2023-11-28 16:54:42 +08:00

README.md

[shardformer] update shardformer readme (#4617 )

2023-09-05 13:14:41 +08:00

requirements.txt

[booster] update bert example, using booster api (#3885 )

2023-06-07 15:51:00 +08:00

test_ci.sh

[shardformer] update bert finetune example with HybridParallelPlugin (#4584 )

2023-09-04 21:46:29 +08:00

README.md

Overview

This directory includes two parts: Using the Booster API finetune Huggingface Bert and AlBert models and benchmarking Bert and AlBert models with different Booster Plugin.

Finetune

bash test_ci.sh

Bert-Finetune Results

Plugin	Accuracy	F1-score	GPU number
torch_ddp	84.4%	88.6%	2
torch_ddp_fp16	84.7%	88.8%	2
gemini	84.0%	88.4%	2
hybrid_parallel	84.5%	88.6%	4

Benchmark

bash benchmark.sh

Now include these metrics in benchmark: CUDA mem occupy, throughput and the number of model parameters. If you have custom metrics, you can add them to benchmark_util.

Results

Bert

	max cuda mem	throughput(sample/s)	params
ddp	21.44 GB	3.0	82M
ddp_fp16	16.26 GB	11.3	82M
gemini	11.0 GB	12.9	82M
low_level_zero	11.29 G	14.7	82M

AlBert

	max cuda mem	throughput(sample/s)	params
ddp	OOM
ddp_fp16	OOM
gemini	69.39 G	1.3	208M
low_level_zero	56.89 G	1.4	208M