Commit Graph

2556 Commits

Author SHA1 Message Date
Jianghai
15b34e0618
[pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172)
* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* add bloom model and policy ,revise the base class of policy

* revise

* revision

* add bert_for_pretraining

* add bert_for_pretraining forward and policy

* fix typos

* cancel warning

* change the imediate output to default dict

* change the default output of get_shared_params
2023-07-06 14:49:10 +08:00
Jianghai
12e6d5df6d
Merge pull request #4176 from ver217/feature/pipeline-policy
[pipeline] fit shardformer policy
2023-07-05 18:09:14 +08:00
ver217
d4b96abe5c [shardformer] fix type hint 2023-07-05 15:20:59 +08:00
ver217
1a87dd737d [shardformer] rename policy file name 2023-07-05 15:13:00 +08:00
ver217
0cbe423009 [test] add shard util tests 2023-07-05 14:49:05 +08:00
ver217
914355604d [test] update shardformer tests 2023-07-05 14:30:17 +08:00
ver217
8b6679dec4 [pipeline] update shardformer docstring 2023-07-05 14:19:12 +08:00
ver217
15a4e82cb3 [pipeline] update shardformer policy 2023-07-05 14:16:55 +08:00
Jianghai
386d34ebbc
[pipeline] build bloom model and policy , revise the base class of policy (#4161)
* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* add bloom model and policy ,revise the base class of policy

* revise

* revision

* add bert_for_pretraining
2023-07-05 10:52:53 +08:00
Frank Lee
ef1f9726a9
Merge pull request #4166 from ver217/sync/main
[sync] update from main
2023-07-04 18:09:23 +08:00
Jianghai
836a3a290c [pipeline]add pipeline policy and bert forward (#4130)
* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt
2023-07-04 18:03:15 +08:00
Hongxin Liu
9526f44b9d [pipeline] refactor 1f1b schedule (#4115)
* [api] update optimizer wrapper to fit pipeline

* [pipeline] add base schedule

* [pipeline] add 1f1b schedule

* [test] add pipeline schedule utils test

* [pipeline] fix import
2023-07-04 18:03:15 +08:00
Hongxin Liu
5a467e9f33 [pipeline] implement p2p communication (#4100)
* [pipeline] add p2p communication

* [test] add p2p communication test

* [test] add rerun decorator

* [test] rename to avoid conflict
2023-07-04 18:03:15 +08:00
Hongxin Liu
18c7539009 [pipeline] add stage manager (#4093)
* [pipeline] add stage manager

* [test] add pipeline stage manager test

* [pipeline] add docstring for stage manager
2023-07-04 18:03:15 +08:00
Hongxin Liu
3be0c35803 [cluster] add process group mesh (#4039)
* [cluster] add process group mesh

* [test] add process group mesh test

* force sync
2023-07-04 18:03:15 +08:00
Hongxin Liu
1908caad38
[cli] hotfix launch command for multi-nodes (#4165) 2023-07-04 17:54:40 +08:00
digger yu
2ac24040eb
fix some typo colossalai/shardformer (#4160) 2023-07-04 17:53:39 +08:00
github-actions[bot]
c77b3b19be
[format] applied code formatting on changed files in pull request 4152 (#4157)
Co-authored-by: github-actions <github-actions@github.com>
2023-07-04 16:07:47 +08:00
Frank Lee
f447ca1811 [chat] removed cache file (#4155) 2023-07-04 16:05:01 +08:00
Frank Lee
89f45eda5a [shardformer] added development protocol for standardization (#4149) 2023-07-04 16:05:01 +08:00
Frank Lee
1fb0d95df0 [shardformer] made tensor parallelism configurable (#4144)
* [shardformer] made tensor parallelism configurable

* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
74257cb446 [shardformer] refactored some doc and api (#4137)
* [shardformer] refactored some doc and api

* polish code
2023-07-04 16:05:01 +08:00
jiangmingyan
7f9b30335b [shardformer] write an shardformer example with bert finetuning (#4126)
* [shardformer] add benchmark of shardformer

* [shardformer] add benchmark of shardformer
2023-07-04 16:05:01 +08:00
Frank Lee
ae035d305d [shardformer] added embedding gradient check (#4124) 2023-07-04 16:05:01 +08:00
Frank Lee
44a190e6ac [shardformer] import huggingface implicitly (#4101) 2023-07-04 16:05:01 +08:00
Frank Lee
6a88bae4ec [shardformer] integrate with data parallelism (#4103) 2023-07-04 16:05:01 +08:00
Frank Lee
f3b6aaa6b7 [shardformer] supported fused normalization (#4112) 2023-07-04 16:05:01 +08:00
Frank Lee
b1c2901530 [shardformer] supported bloom model (#4098) 2023-07-04 16:05:01 +08:00
Kun Lin
8af29ee47a [shardformer] support vision transformer (#4096)
* first v of vit shardformer

* keep vit

* update

* vit shard add vitattention vitlayer

* update num head shard para

* finish test for vit

* add new_model_class & postprocess

* add vit readme

* delete old files & fix the conflict

* fix sth
2023-07-04 16:05:01 +08:00
jiangmingyan
ac80937138 [shardformer] shardformer support opt models (#4091)
* [shardformer] shardformer support opt models

* [shardformer] shardformer support opt models, fix

* [shardformer] shardformer support opt models, fix

* [shardformer] shardformer support opt models, fix
2023-07-04 16:05:01 +08:00
Frank Lee
d33a44e8c3 [shardformer] refactored layernorm (#4086) 2023-07-04 16:05:01 +08:00
Frank Lee
c4b1b65931 [test] fixed tests failed due to dtensor change (#4082)
* [test] fixed tests failed due to dtensor change

* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
92f6791095 [shardformer] Add layernorm (#4072)
* add layernorm to bert

* add layernorm test

* add layernorm test with load state dict

* add use_mixedfusedLN in shard config

* refactor policy to support fused_layernorm
2023-07-04 16:05:01 +08:00
Frank Lee
70c58cfd4f [shardformer] supported fused qkv checkpoint (#4073) 2023-07-04 16:05:01 +08:00
FoolPlayer
0803a61412 [shardformer] add linearconv1d test (#4067)
* add linearconv1d test

* add linearconv1d test
2023-07-04 16:05:01 +08:00
Frank Lee
8eb09a4c69 [shardformer] support module saving and loading (#4062)
* [shardformer] support module saving and loading

* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
7740c55c55 support kit use for bert/gpt test (#4055)
* support kit use for bert test

* support kit test for gpt2
2023-07-04 16:05:01 +08:00
Frank Lee
f22ddacef0 [shardformer] refactored the shardformer layer structure (#4053) 2023-07-04 16:05:01 +08:00
Frank Lee
58df720570 [shardformer] adapted T5 and LLaMa test to use kit (#4049)
* [shardformer] adapted T5 and LLaMa test to use kit

* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
4021b9a8a2 [shardformer] add gpt2 test and layer class refactor (#4041)
* add gpt2 test and layer class refactor

* add dropout in gpt2 policy
2023-07-04 16:05:01 +08:00
Frank Lee
d857f3dbba [shardformer] supported T5 and its variants (#4045) 2023-07-04 16:05:01 +08:00
Frank Lee
c1d5453e9f [shardformer] adapted llama to the new API (#4036) 2023-07-04 16:05:01 +08:00
FoolPlayer
74d176c8d8 [shardformer] fix bert and gpt downstream with new api (#4024)
* fix bert downstream with new api

* remove comment line
2023-07-04 16:05:01 +08:00
Frank Lee
e253a07007 [shardformer] updated doc (#4016) 2023-07-04 16:05:01 +08:00
FoolPlayer
df018fc305 support bert with new api 2023-07-04 16:05:01 +08:00
FoolPlayer
507c0ad368 add vocabembedding layer 2023-07-04 16:05:01 +08:00
Frank Lee
45d9384346 [shardformer] removed inplace tensor sharding (#4018) 2023-07-04 16:05:01 +08:00
Frank Lee
3893fa1a8d [shardformer] refactored embedding and dropout to parallel module (#4013)
* [shardformer] refactored embedding and dropout to parallel module

* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
dfca9678fa integrate with dist layer (#4011) 2023-07-04 16:05:01 +08:00
Frank Lee
015af592f8 [shardformer] integrated linear 1D with dtensor (#3996)
* [shardformer] integrated linear 1D with dtensor

* polish code
2023-07-04 16:05:01 +08:00