285 Commits

Author SHA1 Message Date
HELSON
bccbc15861 [MOE] changed parallelmode to dist process group (#460) 2022-03-19 13:46:29 +08:00
Jiarui Fang
0fcfb1e00d [test] make zero engine test really work (#447) 2022-03-17 17:24:25 +08:00
Jiarui Fang
237d08e7ee [zero] hybrid cpu adam (#445) 2022-03-17 15:05:41 +08:00
HELSON
dbdc9a7783 added Multiply Jitter and capacity factor eval for MOE (#434) 2022-03-16 16:47:44 +08:00
HELSON
3f70a2b12f removed noisy function during evaluation of MoE router (#419) 2022-03-15 12:06:09 +08:00
Jiang Zhuo
5a4a3b77d9 fix format (#376) 2022-03-11 15:50:28 +08:00
LuGY
de46450461 Added activation offload (#331)
* Added activation offload

* Fixed the import bug, used the pytest
2022-03-11 15:50:28 +08:00
Kai Wang (Victor Kai)
53bb3bcc0a fix format (#362) 2022-03-11 15:50:28 +08:00
Yuer867
4a0f8c2c50 fix format parallel_2p5d (#357) 2022-03-11 15:50:28 +08:00
Liang Bowen
7eb87f516d flake8 style (#352) 2022-03-11 15:50:28 +08:00
xuqifan897
148207048e Qifan formated file ColossalAI\colossalai\nn\layer\parallel_1d\layers.py (#342) 2022-03-11 15:50:28 +08:00
DouJS
cbb6436ff0 fix format for dir-[parallel_3d] (#333) 2022-03-11 15:50:28 +08:00
LuGY
a3269de5c9 [zero] cpu adam kernel (#288)
* Added CPU Adam

* finished the cpu adam

* updated the license

* delete useless parameters, removed resnet

* modified the method off cpu adam unittest

* deleted some useless codes

* removed useless codes

Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: jiaruifang <fangjiarui123@gmail.com>
2022-03-11 15:50:28 +08:00
1SAA
82023779bb Added TPExpert for special situation 2022-03-11 15:50:28 +08:00
HELSON
36b8477228 Fixed parameter initialization in FFNExpert (#251) 2022-03-11 15:50:28 +08:00
アマデウス
e13293bb4c fixed CI dataset directory; fixed import error of 2.5d accuracy (#255) 2022-03-11 15:50:28 +08:00
1SAA
219df6e685 Optimized MoE layer and fixed some bugs;
Decreased moe tests;

Added FFNExperts and ViTMoE model
2022-03-11 15:50:28 +08:00
zbian
3dba070580 fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial 2022-03-11 15:50:28 +08:00
アマデウス
9ee197d0e9 moved env variables to global variables; (#215)
added branch context;
added vocab parallel layers;
moved split_batch from load_batch to tensor parallel embedding layers;
updated gpt model;
updated unit test cases;
fixed few collective communicator bugs
2022-02-15 11:31:13 +08:00
HELSON
0f8c7f9804 Fixed docstring in colossalai (#171) 2022-01-21 10:44:30 +08:00
Frank Lee
e2089c5c15 adapted for sequence parallel (#163) 2022-01-20 13:44:51 +08:00
ver217
f68eddfb3d refactor kernel (#142) 2022-01-13 16:47:17 +08:00
BoxiangW
4a3d3446b0 Update layer integration documentations (#108)
Update the documentations of layer integration

Update _log_hook.py

Update _operation.py
2022-01-10 18:05:58 +08:00
HELSON
dceae85195 Added MoE parallel (#127) 2022-01-07 15:08:36 +08:00
ver217
7904baf6e1 fix layers/schedule for hybrid parallelization (#111) (#112) 2022-01-04 20:52:31 +08:00
ver217
96780e6ee4 Optimize pipeline schedule (#94)
* add pipeline shared module wrapper and update load batch

* added model parallel process group for amp and clip grad (#86)

* added model parallel process group for amp and clip grad

* update amp and clip with model parallel process group

* remove pipeline_prev/next group (#88)

* micro batch offload

* optimize pipeline gpu memory usage

* pipeline can receive tensor shape (#93)

* optimize pipeline gpu memory usage

* fix grad accumulation step counter

* rename classes and functions

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
2021-12-30 15:56:46 +08:00
アマデウス
01a80cd86d Hotfix/Colossalai layers (#92)
* optimized 1d layer apis; reorganized nn.layer modules; fixed tests

* fixed 2.5d runtime issue

* reworked split batch, now called in trainer.schedule.load_batch

Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-29 23:32:10 +08:00
アマデウス
0fedef4f3c Layer integration (#83)
* integrated parallel layers for ease of building models

* integrated 2.5d layers

* cleaned codes and unit tests

* added log metric by step hook; updated imagenet benchmark; fixed some bugs

* reworked initialization; cleaned codes

Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-27 15:04:32 +08:00
HELSON
632e622de8 overlap computation and communication in 2d operations (#75) 2021-12-16 16:05:15 +08:00
Frank Lee
35813ed3c4 update examples and sphnix docs for the new api (#63) 2021-12-13 22:07:01 +08:00
Frank Lee
da01c234e1 Develop/experiments (#59)
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

* Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
Fix convergence in cifar10, Imagenet1000

* Integrate 1d tensor parallel in Colossal-AI (#39)

* fixed 1D and 2D convergence (#38)

* optimized 2D operations

* fixed 1D ViT convergence problem

* Feature/ddp (#49)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* support torch ddp

* fix loss accumulation

* add log for ddp

* change seed

* modify timing hook

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* Feature/pipeline (#40)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* optimize communication of pipeline parallel

* fix grad clip for pipeline

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51)

* Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset

* update api for better usability (#58)

update api for better usability

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com>
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-09 15:08:29 +08:00
ver217
dbe62c67b8 add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29) 2021-11-18 23:45:09 +08:00
Frank Lee
3defa32aee Support TP-compatible Torch AMP and Update trainer API (#27)
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
2021-11-18 19:45:06 +08:00
ver217
3c7604ba30 update documentation 2021-10-29 09:29:20 +08:00
zbian
404ecbdcc6 Migrated project 2021-10-28 18:21:23 +02:00