Commit Graph

2539 Commits

Author SHA1 Message Date
Frank Lee
fee32a3b78
[docker] added ssh and rdma support for docker (#4192) 2023-07-07 15:31:51 +08:00
Frank Lee
190a6ea9c2
[dtensor] fixed readme file name and removed deprecated file (#4162) 2023-07-04 18:21:11 +08:00
Frank Lee
cc3cbe9f6f
[workflow] show test duration (#4159) 2023-07-04 18:11:46 +08:00
Hongxin Liu
1908caad38
[cli] hotfix launch command for multi-nodes (#4165) 2023-07-04 17:54:40 +08:00
digger yu
2ac24040eb
fix some typo colossalai/shardformer (#4160) 2023-07-04 17:53:39 +08:00
github-actions[bot]
c77b3b19be
[format] applied code formatting on changed files in pull request 4152 (#4157)
Co-authored-by: github-actions <github-actions@github.com>
2023-07-04 16:07:47 +08:00
Frank Lee
f447ca1811 [chat] removed cache file (#4155) 2023-07-04 16:05:01 +08:00
Frank Lee
89f45eda5a [shardformer] added development protocol for standardization (#4149) 2023-07-04 16:05:01 +08:00
Frank Lee
1fb0d95df0 [shardformer] made tensor parallelism configurable (#4144)
* [shardformer] made tensor parallelism configurable

* polish code
2023-07-04 16:05:01 +08:00
Frank Lee
74257cb446 [shardformer] refactored some doc and api (#4137)
* [shardformer] refactored some doc and api

* polish code
2023-07-04 16:05:01 +08:00
jiangmingyan
7f9b30335b [shardformer] write an shardformer example with bert finetuning (#4126)
* [shardformer] add benchmark of shardformer

* [shardformer] add benchmark of shardformer
2023-07-04 16:05:01 +08:00
Frank Lee
ae035d305d [shardformer] added embedding gradient check (#4124) 2023-07-04 16:05:01 +08:00
Frank Lee
44a190e6ac [shardformer] import huggingface implicitly (#4101) 2023-07-04 16:05:01 +08:00
Frank Lee
6a88bae4ec [shardformer] integrate with data parallelism (#4103) 2023-07-04 16:05:01 +08:00
Frank Lee
f3b6aaa6b7 [shardformer] supported fused normalization (#4112) 2023-07-04 16:05:01 +08:00
Frank Lee
b1c2901530 [shardformer] supported bloom model (#4098) 2023-07-04 16:05:01 +08:00
Kun Lin
8af29ee47a [shardformer] support vision transformer (#4096)
* first v of vit shardformer

* keep vit

* update

* vit shard add vitattention vitlayer

* update num head shard para

* finish test for vit

* add new_model_class & postprocess

* add vit readme

* delete old files & fix the conflict

* fix sth
2023-07-04 16:05:01 +08:00
jiangmingyan
ac80937138 [shardformer] shardformer support opt models (#4091)
* [shardformer] shardformer support opt models

* [shardformer] shardformer support opt models, fix

* [shardformer] shardformer support opt models, fix

* [shardformer] shardformer support opt models, fix
2023-07-04 16:05:01 +08:00
Frank Lee
d33a44e8c3 [shardformer] refactored layernorm (#4086) 2023-07-04 16:05:01 +08:00
Frank Lee
c4b1b65931 [test] fixed tests failed due to dtensor change (#4082)
* [test] fixed tests failed due to dtensor change

* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
92f6791095 [shardformer] Add layernorm (#4072)
* add layernorm to bert

* add layernorm test

* add layernorm test with load state dict

* add use_mixedfusedLN in shard config

* refactor policy to support fused_layernorm
2023-07-04 16:05:01 +08:00
Frank Lee
70c58cfd4f [shardformer] supported fused qkv checkpoint (#4073) 2023-07-04 16:05:01 +08:00
FoolPlayer
0803a61412 [shardformer] add linearconv1d test (#4067)
* add linearconv1d test

* add linearconv1d test
2023-07-04 16:05:01 +08:00
Frank Lee
8eb09a4c69 [shardformer] support module saving and loading (#4062)
* [shardformer] support module saving and loading

* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
7740c55c55 support kit use for bert/gpt test (#4055)
* support kit use for bert test

* support kit test for gpt2
2023-07-04 16:05:01 +08:00
Frank Lee
f22ddacef0 [shardformer] refactored the shardformer layer structure (#4053) 2023-07-04 16:05:01 +08:00
Frank Lee
58df720570 [shardformer] adapted T5 and LLaMa test to use kit (#4049)
* [shardformer] adapted T5 and LLaMa test to use kit

* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
4021b9a8a2 [shardformer] add gpt2 test and layer class refactor (#4041)
* add gpt2 test and layer class refactor

* add dropout in gpt2 policy
2023-07-04 16:05:01 +08:00
Frank Lee
d857f3dbba [shardformer] supported T5 and its variants (#4045) 2023-07-04 16:05:01 +08:00
Frank Lee
c1d5453e9f [shardformer] adapted llama to the new API (#4036) 2023-07-04 16:05:01 +08:00
FoolPlayer
74d176c8d8 [shardformer] fix bert and gpt downstream with new api (#4024)
* fix bert downstream with new api

* remove comment line
2023-07-04 16:05:01 +08:00
Frank Lee
e253a07007 [shardformer] updated doc (#4016) 2023-07-04 16:05:01 +08:00
FoolPlayer
df018fc305 support bert with new api 2023-07-04 16:05:01 +08:00
FoolPlayer
507c0ad368 add vocabembedding layer 2023-07-04 16:05:01 +08:00
Frank Lee
45d9384346 [shardformer] removed inplace tensor sharding (#4018) 2023-07-04 16:05:01 +08:00
Frank Lee
3893fa1a8d [shardformer] refactored embedding and dropout to parallel module (#4013)
* [shardformer] refactored embedding and dropout to parallel module

* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
dfca9678fa integrate with dist layer (#4011) 2023-07-04 16:05:01 +08:00
Frank Lee
015af592f8 [shardformer] integrated linear 1D with dtensor (#3996)
* [shardformer] integrated linear 1D with dtensor

* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
d3bc530849 [shardformer] Refactor shardformer api (#4001)
* fix an error in readme

* simplify code

* refactor shardformer

* add todo

* remove slicer

* resolve code review
2023-07-04 16:05:01 +08:00
Frank Lee
611971248c [device] support init device mesh from process group (#3990) 2023-07-04 16:05:01 +08:00
FoolPlayer
a2f9af810d [shardformer] fix an error in readme (#3988)
* fix an error in readme

* simplify code
2023-07-04 16:05:01 +08:00
FoolPlayer
f7774ec0f3 [Shardformer] Downstream bert (#3979)
* add dist dropout in model

* update docstring and bert policy with dropout

* refactor basepolicy and sharded, update bert

* update format

* update gpt2 policy

* update bert policy

* remove unused code

* update readme for new policy usage

* add downstream model of bert

* remove unused code
2023-07-04 16:05:01 +08:00
wukong1992
c1c672d0f0 [shardformer] shardformer support t5 model (#3994)
test t5
2023-07-04 16:05:01 +08:00
wukong1992
6b30dfb7ce [shardformer] support llama model using shardformer (#3969)
adjust layer attr
2023-07-04 16:05:01 +08:00
FoolPlayer
45927d5527 [shardformer] Add dropout layer in shard model and refactor policy api (#3949)
* add dist dropout in model

* update docstring and bert policy with dropout

* refactor basepolicy and sharded, update bert

* update format

* update gpt2 policy

* update bert policy

* remove unused code

* update readme for new policy usage
2023-07-04 16:05:01 +08:00
FoolPlayer
a73130482d [shardformer] Unit test (#3928)
* fix bug in slicer, add slicer unit test

* add dropout test

* use pid as dropout seed

* updata dropout test with local pattern

* ad todo
2023-07-04 16:05:01 +08:00
FoolPlayer
f1cb5ac6bf [shardformer] Align bert value (#3907)
* add bert align test, fix dist loss bug

* forward and backward align

* add ignore index

* add shardformer CI

* add gather_output optional for user in shardconfig

* update readme with optional gather_ouput

* add dist crossentropy loss test, remove unused files

* remove unused file

* remove unused file

* rename the file

* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
79f8d5d54b [shardformer] add gpt2 policy and modify shard and slicer to support (#3883)
* add gpt2 policy and modify shard and slicer to support

* remove unused code

* polish code
2023-07-04 16:05:01 +08:00
FoolPlayer
70173e3123 update README (#3909) 2023-07-04 16:05:01 +08:00
FoolPlayer
ab8a47f830 [shardformer] add Dropout layer support different dropout pattern (#3856)
* add dropout layer, add dropout test

* modify seed manager as context manager

* add a copy of col_nn.layer

* add dist_crossentropy loss; separate module test

* polish the code

* fix dist crossentropy loss
2023-07-04 16:05:01 +08:00