flybird11111
|
ec0866804c
|
[shardformer] update shardformer readme (#4617)
[shardformer] update shardformer readme
[shardformer] update shardformer readme
|
2023-09-05 13:14:41 +08:00 |
|
Hongxin Liu
|
a39a5c66fe
|
Merge branch 'main' into feature/shardformer
|
2023-09-04 23:43:13 +08:00 |
|
flybird11111
|
0a94fcd351
|
[shardformer] update bert finetune example with HybridParallelPlugin (#4584)
* [shardformer] fix opt test hanging
* fix
* test
* test
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] fix epoch change
* [shardformer] broadcast add pp group
* [shardformer] fix opt test hanging
* fix
* test
* test
* [shardformer] zero1+pp and the corresponding tests (#4517)
* pause
* finish pp+zero1
* Update test_shard_vit.py
* [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)
* fix overlap bug and support bert, add overlap as an option in shardconfig
* support overlap for chatglm and bloom
* [shardformer] fix emerged bugs after updating transformers (#4526)
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] Add overlap support for gpt2 (#4535)
* add overlap support for gpt2
* remove unused code
* remove unused code
* [shardformer] support pp+tp+zero1 tests (#4531)
* [shardformer] fix opt test hanging
* fix
* test
* test
* test
* fix test
* fix test
* remove print
* add fix
* [shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
[shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] pp+tp+zero1
* [shardformer] fix submodule replacement bug when enabling pp (#4544)
* [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)
* implement sharded optimizer saving
* add more param info
* finish implementation of sharded optimizer saving
* fix bugs in optimizer sharded saving
* add pp+zero test
* param group loading
* greedy loading of optimizer
* fix bug when loading
* implement optimizer sharded saving
* add optimizer test & arrange checkpointIO utils
* fix gemini sharding state_dict
* add verbose option
* add loading of master params
* fix typehint
* fix master/working mapping in fp16 amp
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] add bert finetune example
* [shardformer] fix epoch change
* [shardformer] broadcast add pp group
* rebase feature/shardformer
* update pipeline
* [shardformer] fix
* [shardformer] fix
* [shardformer] bert finetune fix
* [shardformer] add all_reduce operation to loss
add all_reduce operation to loss
* [shardformer] make compatible with pytree.
make compatible with pytree.
* [shardformer] disable tp
disable tp
* [shardformer] add 3d plugin to ci test
* [shardformer] update num_microbatches to None
* [shardformer] update microbatchsize
* [shardformer] update assert
* update scheduler
* update scheduler
---------
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
|
2023-09-04 21:46:29 +08:00 |
|
binmakeswell
|
8d7b02290f
|
[doc] add llama2 benchmark (#4604)
* [doc] add llama2 benchmark
* [doc] add llama2 benchmark
|
2023-09-04 13:49:33 +08:00 |
|
Tian Siyuan
|
f1ae8c9104
|
[example] change accelerate version (#4431)
Co-authored-by: Siyuan Tian <siyuant@vmware.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
|
2023-08-30 22:56:13 +08:00 |
|
ChengDaqi2023
|
8e2e1992b8
|
[example] update streamlit 0.73.1 to 1.11.1 (#4386)
|
2023-08-30 22:54:45 +08:00 |
|
Hongxin Liu
|
0b00def881
|
[example] add llama2 example (#4527)
* [example] transfer llama-1 example
* [example] fit llama-2
* [example] refactor scripts folder
* [example] fit new gemini plugin
* [cli] fix multinode runner
* [example] fit gemini optim checkpoint
* [example] refactor scripts
* [example] update requirements
* [example] update requirements
* [example] rename llama to llama2
* [example] update readme and pretrain script
* [example] refactor scripts
|
2023-08-28 17:59:11 +08:00 |
|
Hongxin Liu
|
27061426f7
|
[gemini] improve compatibility and add static placement policy (#4479)
* [gemini] remove distributed-related part from colotensor (#4379)
* [gemini] remove process group dependency
* [gemini] remove tp part from colo tensor
* [gemini] patch inplace op
* [gemini] fix param op hook and update tests
* [test] remove useless tests
* [test] remove useless tests
* [misc] fix requirements
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [test] fix model zoo
* [misc] update requirements
* [gemini] refactor gemini optimizer and gemini ddp (#4398)
* [gemini] update optimizer interface
* [gemini] renaming gemini optimizer
* [gemini] refactor gemini ddp class
* [example] update gemini related example
* [example] update gemini related example
* [plugin] fix gemini plugin args
* [test] update gemini ckpt tests
* [gemini] fix checkpoint io
* [example] fix opt example requirements
* [example] fix opt example
* [example] fix opt example
* [example] fix opt example
* [gemini] add static placement policy (#4443)
* [gemini] add static placement policy
* [gemini] fix param offload
* [test] update gemini tests
* [plugin] update gemini plugin
* [plugin] update gemini plugin docstr
* [misc] fix flash attn requirement
* [test] fix gemini checkpoint io test
* [example] update resnet example result (#4457)
* [example] update bert example result (#4458)
* [doc] update gemini doc (#4468)
* [example] update gemini related examples (#4473)
* [example] update gpt example
* [example] update dreambooth example
* [example] update vit
* [example] update opt
* [example] update palm
* [example] update vit and opt benchmark
* [hotfix] fix bert in model zoo (#4480)
* [hotfix] fix bert in model zoo
* [test] remove chatglm gemini test
* [test] remove sam gemini test
* [test] remove vit gemini test
* [hotfix] fix opt tutorial example (#4497)
* [hotfix] fix opt tutorial example
* [hotfix] fix opt tutorial example
|
2023-08-24 09:29:25 +08:00 |
|
Tian Siyuan
|
ff836790ae
|
[doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430)
Co-authored-by: Siyuan Tian <siyuant@vmware.com>
|
2023-08-15 00:22:57 +08:00 |
|
binmakeswell
|
089c365fa0
|
[doc] add Series A Funding and NeurIPS news (#4377)
* [doc] add Series A Funding and NeurIPS news
* [kernal] fix mha kernal
* [CI] skip moe
* [CI] fix requirements
|
2023-08-04 17:42:07 +08:00 |
|
caption
|
16c0acc01b
|
[hotfix] update gradio 3.11 to 3.34.0 (#4329)
|
2023-08-01 16:25:25 +08:00 |
|
binmakeswell
|
ef4b99ebcd
|
add llama example CI
|
2023-07-26 14:12:57 +08:00 |
|
binmakeswell
|
7ff11b5537
|
[example] add llama pretraining (#4257)
|
2023-07-17 21:07:44 +08:00 |
|
github-actions[bot]
|
4e9b09c222
|
Automated submodule synchronization (#4217)
Co-authored-by: github-actions <github-actions@github.com>
|
2023-07-12 17:35:58 +08:00 |
|
digger yu
|
2d40759a53
|
fix #3852 path error (#4058)
|
2023-06-28 15:29:44 +08:00 |
|
Jianghai
|
31dc302017
|
[examples] copy resnet example to image (#4090)
* copy resnet example
* add pytest package
* skip test_ci
* skip test_ci
* skip test_ci
|
2023-06-27 16:40:46 +08:00 |
|
Baizhou Zhang
|
4da324cd60
|
[hotfix]fix argument naming in docs and examples (#4083)
|
2023-06-26 23:50:04 +08:00 |
|
LuGY
|
160c64c645
|
[example] fix bucket size in example of gpt gemini (#4028)
|
2023-06-19 11:22:42 +08:00 |
|
Baizhou Zhang
|
b3ab7fbabf
|
[example] update ViT example using booster api (#3940)
|
2023-06-12 15:02:27 +08:00 |
|
Liu Ziming
|
e277534a18
|
Merge pull request #3905 from MaruyamaAya/dreambooth
[example] Adding an example of training dreambooth with the new booster API
|
2023-06-09 08:44:18 +08:00 |
|
digger yu
|
33eef714db
|
fix typo examples and docs (#3932)
|
2023-06-08 16:09:32 +08:00 |
|
Maruyama_Aya
|
9b5e7ce21f
|
modify shell for check
|
2023-06-08 14:56:56 +08:00 |
|
digger yu
|
407aa48461
|
fix typo examples/community/roberta (#3925)
|
2023-06-08 14:28:34 +08:00 |
|
Maruyama_Aya
|
730a092ba2
|
modify shell for check
|
2023-06-08 13:38:18 +08:00 |
|
Maruyama_Aya
|
49567d56d1
|
modify shell for check
|
2023-06-08 13:36:05 +08:00 |
|
Maruyama_Aya
|
039854b391
|
modify shell for check
|
2023-06-08 13:17:58 +08:00 |
|
Baizhou Zhang
|
e417dd004e
|
[example] update opt example using booster api (#3918)
|
2023-06-08 11:27:05 +08:00 |
|
Maruyama_Aya
|
cf4792c975
|
modify shell for check
|
2023-06-08 11:15:10 +08:00 |
|
Maruyama_Aya
|
c94a33579b
|
modify shell for check
|
2023-06-07 17:23:01 +08:00 |
|
Liu Ziming
|
b306cecf28
|
[example] Modify palm example with the new booster API (#3913)
* Modify torch version requirement to adapt torch 2.0
* modify palm example using new booster API
* roll back
* fix port
* polish
* polish
|
2023-06-07 16:05:00 +08:00 |
|
wukong1992
|
a55fb00c18
|
[booster] update bert example, using booster api (#3885)
|
2023-06-07 15:51:00 +08:00 |
|
Maruyama_Aya
|
4fc8bc68ac
|
modify file path
|
2023-06-07 11:02:19 +08:00 |
|
Maruyama_Aya
|
b4437e88c3
|
fixed port
|
2023-06-06 16:21:38 +08:00 |
|
Maruyama_Aya
|
79c9f776a9
|
fixed port
|
2023-06-06 16:20:45 +08:00 |
|
Maruyama_Aya
|
d3379f0be7
|
fixed model saving bugs
|
2023-06-06 16:07:34 +08:00 |
|
Maruyama_Aya
|
b29e1f0722
|
change directory
|
2023-06-06 15:50:03 +08:00 |
|
Maruyama_Aya
|
1c1f71cbd2
|
fixing insecure hash function
|
2023-06-06 14:51:11 +08:00 |
|
Maruyama_Aya
|
b56c7f4283
|
update shell file
|
2023-06-06 14:09:27 +08:00 |
|
Maruyama_Aya
|
176010f289
|
update performance evaluation
|
2023-06-06 14:08:22 +08:00 |
|
Maruyama_Aya
|
25447d4407
|
modify path
|
2023-06-05 11:47:07 +08:00 |
|
Maruyama_Aya
|
60ec33bb18
|
Add a new example of Dreambooth training using the booster API
|
2023-06-02 16:50:51 +08:00 |
|
jiangmingyan
|
5f79008c4a
|
[example] update gemini examples (#3868)
* [example]update gemini examples
* [example]update gemini examples
|
2023-05-30 18:41:41 +08:00 |
|
digger yu
|
518b31c059
|
[docs] change placememt_policy to placement_policy (#3829)
* fix typo colossalai/autochunk auto_parallel amp
* fix typo colossalai/auto_parallel nn utils etc.
* fix typo colossalai/auto_parallel autochunk fx/passes etc.
* fix typo docs/
* change placememt_policy to placement_policy in docs/ and examples/
|
2023-05-24 14:51:49 +08:00 |
|
github-actions[bot]
|
62c7e67f9f
|
[format] applied code formatting on changed files in pull request 3786 (#3787)
Co-authored-by: github-actions <github-actions@github.com>
|
2023-05-22 14:42:09 +08:00 |
|
binmakeswell
|
ad2cf58f50
|
[chat] add performance and tutorial (#3786)
|
2023-05-19 18:03:56 +08:00 |
|
binmakeswell
|
15024e40d9
|
[auto] fix install cmd (#3772)
|
2023-05-18 13:33:01 +08:00 |
|
digger-yu
|
b7141c36dd
|
[CI] fix some spelling errors (#3707)
* fix spelling error with examples/comminity/
* fix spelling error with tests/
* fix some spelling error with tests/ colossalai/ etc.
|
2023-05-10 17:12:03 +08:00 |
|
Hongxin Liu
|
3bf09efe74
|
[booster] update prepare dataloader method for plugin (#3706)
* [booster] add prepare dataloader method for plug
* [booster] update examples and docstr
|
2023-05-08 15:44:03 +08:00 |
|
Hongxin Liu
|
f83ea813f5
|
[example] add train resnet/vit with booster example (#3694)
* [example] add train vit with booster example
* [example] update readme
* [example] add train resnet with booster example
* [example] enable ci
* [example] enable ci
* [example] add requirements
* [hotfix] fix analyzer init
* [example] update requirements
|
2023-05-08 10:42:30 +08:00 |
|
Hongxin Liu
|
d556648885
|
[example] add finetune bert with booster example (#3693)
|
2023-05-06 11:53:13 +08:00 |
|