Commit Graph

1395 Commits

Author SHA1 Message Date
Frank Lee
638a07a7f9 [test] fixed gemini plugin test (#3411)
* [test] fixed gemini plugin test

* polish code

* polish code
2023-04-03 17:12:22 +08:00
ver217
5f2e34e6c9 [booster] implement Gemini plugin (#3352)
* [booster] add gemini plugin

* [booster] update docstr

* [booster] gemini plugin add coloparam convertor

* [booster] fix coloparam convertor

* [booster] fix gemini plugin device

* [booster] add gemini plugin test

* [booster] gemini plugin ignore sync bn

* [booster] skip some model

* [booster] skip some model

* [booster] modify test world size

* [booster] modify test world size

* [booster] skip test
2023-03-31 16:06:13 +08:00
HELSON
1a1d68b053 [moe] add checkpoint for moe models (#3354)
* [moe] add checkpoint for moe models

* [hotfix] fix bugs in unit test
2023-03-31 09:20:33 +08:00
YuliangLiu0306
fee2af8610 [autoparallel] adapt autoparallel with new analyzer (#3261)
* [autoparallel] adapt autoparallel with new analyzer

* fix all node handler tests

* polish

* polish
2023-03-30 17:47:24 +08:00
Ofey Chan
8706a8c66c [NFC] polish colossalai/engine/gradient_handler/__init__.py code style (#3329) 2023-03-30 14:19:39 +08:00
yuxuan-lou
198a74b9fd [NFC] polish colossalai/context/random/__init__.py code style (#3327) 2023-03-30 14:19:26 +08:00
YuliangLiu0306
fbd2a9e05b [hotfix] meta_tensor_compatibility_with_torch2 2023-03-30 13:43:01 +08:00
Michelle
ad285e1656 [NFC] polish colossalai/fx/tracer/_tracer_utils.py (#3323)
* [NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style

* [NFC] polish colossalai/fx/tracer/_tracer_utils.py  code style

---------

Co-authored-by: Qianran Ma <qianranm@luchentech.com>
2023-03-29 17:53:32 +08:00
Xu Kai
64350029fe [NFC] polish colossalai/gemini/paramhooks/_param_hookmgr.py code style 2023-03-29 15:47:42 +08:00
RichardoLuo
1ce9d0c531 [NFC] polish initializer_data.py code style (#3287) 2023-03-29 15:22:21 +08:00
Ziheng Qin
1bed38ef37 [NFC] polish colossalai/cli/benchmark/models.py code style (#3290) 2023-03-29 15:22:21 +08:00
Kai Wang (Victor Kai)
964a28678f [NFC] polish initializer_3d.py code style (#3279) 2023-03-29 15:22:21 +08:00
Sze-qq
94eec1c5ad [NFC] polish colossalai/engine/gradient_accumulation/_gradient_accumulation.py code style (#3277)
Co-authored-by: siqi <siqi@siqis-MacBook-Pro.local>
2023-03-29 15:22:21 +08:00
Arsmart1
8af977f223 [NFC] polish colossalai/context/parallel_context.py code style (#3276) 2023-03-29 15:22:21 +08:00
Zirui Zhu
1168b50e33 [NFC] polish colossalai/engine/schedule/_pipeline_schedule_v2.py code style (#3275) 2023-03-29 15:22:21 +08:00
Tong Li
196d4696d0 [NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) 2023-03-29 15:22:21 +08:00
lucasliunju
4b95464994 [NFC] polish colossalai/amp/__init__.py code style (#3272) 2023-03-29 15:22:21 +08:00
Xuanlei Zhao
6b3bb2c249 [NFC] polish code style (#3273) 2023-03-29 15:22:21 +08:00
CZYCW
4cadb25b96 [NFC] policy colossalai/fx/proxy.py code style (#3269) 2023-03-29 15:22:21 +08:00
Yuanchen
d58fa705b2 [NFC] polish code style (#3268)
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-03-29 15:22:21 +08:00
Camille Zhong
c4a226b729 [NFC] polish tensor_placement_policy.py code style (#3265) 2023-03-29 15:22:21 +08:00
CsRic
00778abc48 [NFC] polish colossalai/fx/passes/split_module.py code style (#3263)
Co-authored-by: csric <richcsr256@gmail.com>
2023-03-29 15:22:21 +08:00
jiangmingyan
488f37048c [NFC] polish colossalai/global_variables.py code style (#3259)
Co-authored-by: luchen <luchen@luchendeMBP.lan>
2023-03-29 15:22:21 +08:00
LuGY
1ff7d5bfa5 [NFC] polish colossalai/engine/gradient_handler/_moe_gradient_handler.py (#3260) 2023-03-29 15:22:21 +08:00
dayellow
204ca2f09a [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256)
Co-authored-by: Minghao Huang <huangminghao@luchentech.com>
2023-03-29 15:22:21 +08:00
HELSON
02b058032d [fx] meta registration compatibility (#3253)
* [fx] meta registration compatibility

* fix error
2023-03-27 15:22:17 +08:00
Frank Lee
73d3e4d309 [booster] implemented the torch ddd + resnet example (#3232)
* [booster] implemented the torch ddd + resnet example

* polish code
2023-03-27 10:24:14 +08:00
YH
1a229045af Add interface for colo tesnor dp size (#3227) 2023-03-27 09:42:21 +08:00
YuliangLiu0306
4d5d8f98a4 [API] implement device mesh manager (#3221)
* [API] implement  device mesh manager

* polish
2023-03-24 13:39:12 +08:00
Frank Lee
cd142fbefa [api] implemented the checkpoint io module (#3205)
* [api] implemented the checkpoint io module

* polish code

* polish code
2023-03-23 10:53:17 +08:00
ver217
f8289d4221 [lazyinit] combine lazy tensor with dtensor (#3204)
* [lazyinit] lazy tensor add distribute

* [lazyinit] refactor distribute

* [lazyinit] add test dist lazy init

* [lazyinit] add verbose info for dist lazy init

* [lazyinit] fix rnn flatten weight op

* [lazyinit] polish test

* [lazyinit] polish test

* [lazyinit] fix lazy tensor data setter

* [lazyinit] polish test

* [lazyinit] fix clean

* [lazyinit] make materialize inplace

* [lazyinit] refactor materialize

* [lazyinit] refactor test distribute

* [lazyinit] fix requires_grad

* [lazyinit] fix tolist after materialization

* [lazyinit] refactor distribute module

* [lazyinit] polish docstr

* [lazyinit] polish lazy init context

* [lazyinit] temporarily skip test

* [lazyinit] polish test

* [lazyinit] add docstr
2023-03-23 10:53:06 +08:00
Frank Lee
e3ad88fb48 [booster] implemented the cluster module (#3191)
* [booster] implemented the cluster module

* polish code
2023-03-22 14:11:54 +08:00
YuliangLiu0306
f57d34958b [FX] refactor experimental tracer and adapt it with hf models (#3157)
* pass gpt trace and meta_prop

* pass t5 trace and meta_prop

* [FX] refactor experimental tracer and adapt it with hf models

* pass all mainstream model zoo

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* skip tests

* fix CI

* using packaging version

* polish
2023-03-22 10:40:33 +08:00
Frank Lee
e7f3bed2d3 [booster] added the plugin base and torch ddp plugin (#3180)
* [booster] added the plugin base and torch ddp plugin

* polish code

* polish code

* polish code
2023-03-21 17:39:30 +08:00
Zihao
18dbe76cae [auto-parallel] add auto-offload feature (#3154)
* add auto-offload feature

* polish code

* fix syn offload runtime pass bug

* add offload example

* fix offload testing bug

* fix example testing bug
2023-03-21 14:17:41 +08:00
YuliangLiu0306
258b43317c [hotfix] layout converting issue (#3188) 2023-03-21 13:24:18 +08:00
YH
80aed29cd3 [zero] Refactor ZeroContextConfig class using dataclass (#3186) 2023-03-21 12:36:47 +08:00
YH
9d644ff09f Fix docstr for zero statedict (#3185) 2023-03-21 11:48:21 +08:00
zbian
7bc0afc901 updated flash attention usage 2023-03-20 17:57:04 +08:00
Frank Lee
a9b8402d93 [booster] added the accelerator implementation (#3159) 2023-03-20 13:59:24 +08:00
ver217
6ae8ed0407 [lazyinit] add correctness verification (#3147)
* [lazyinit] fix shared module

* [tests] add lazy init test utils

* [tests] add torchvision for lazy init

* [lazyinit] fix pre op fn

* [lazyinit] handle legacy constructor

* [tests] refactor lazy init test models

* [tests] refactor lazy init test utils

* [lazyinit] fix ops don't support meta

* [tests] lazy init test timm models

* [lazyinit] fix set data

* [lazyinit] handle apex layers

* [tests] lazy init test transformers models

* [tests] lazy init test torchaudio models

* [lazyinit] fix import path

* [tests] lazy init test torchrec models

* [tests] update torch version in CI

* [tests] revert torch version in CI

* [tests] skip lazy init test
2023-03-17 13:49:04 +08:00
Frank Lee
ed19290560 [booster] implemented mixed precision class (#3151)
* [booster] implemented mixed precision class

* polish code
2023-03-17 11:00:15 +08:00
YuliangLiu0306
2eca4cd376 [DTensor] refactor dtensor with new components (#3089)
* [DTensor] refactor dtensor with new components

* polish
2023-03-14 16:25:47 +08:00
ver217
ed8f60b93b [lazyinit] refactor lazy tensor and lazy init ctx (#3131)
* [lazyinit] refactor lazy tensor and lazy init ctx

* [lazyinit] polish docstr

* [lazyinit] polish docstr
2023-03-14 15:37:12 +08:00
Frank Lee
95a36eae63 [kernel] added kernel loader to softmax autograd function (#3093)
* [kernel] added kernel loader to softmax autograd function

* [release] v0.2.6
2023-03-10 14:27:09 +08:00
Super Daniel
fff98f06ed [analyzer] a minimal implementation of static graph analyzer (#2852)
* [hotfix] meta tensor default device.

* [siu] add experimental submodules to main branch.

* [siu]

* [siu]

* [analyzer] init.

* [analyzer] readme.

* [analyzer] readme.

* [analyzer] readme.

* [analyzer] readme.

* [test] add test.

* Update symbolic_trace.py

* mark skip tests.

* try except.

* try except.

* try except.

* s

* init

* init

* fix

* skip

* skip

---------

Co-authored-by: Daniel Shao <superdainiu@MININT-PVARVID.fareast.corp.microsoft.com>
Co-authored-by: Daniel Shao <superdainiu@Daniels-Mac.local>
2023-03-10 13:21:05 +08:00
Xuanlei Zhao
10c61de2f7 [autochunk] support vit (#3084)
support vit for autochunk
* support some new ops for vit
* fix some bugs
* add test for vit
2023-03-10 10:23:26 +08:00
YuliangLiu0306
8e4e8601b7 [DTensor] implement layout converter (#3055)
* [DTensor] refactor LayoutConverter for DTensor

* polish code

* polish docstring
2023-03-10 09:53:52 +08:00
Frank Lee
f19b49e164 [booster] init module structure and definition (#3056) 2023-03-09 11:27:46 +08:00
Xuanlei Zhao
2ca9728cbb [autochunk] refactor chunk memory estimation (#2762)
* refact memory code

* dont log free var memory

* add memory align

* update chunk target

* update setting for new memory

* finish test

* update tracer

* update typo

* update test
2023-03-08 16:22:30 +08:00