Commit Graph

2230 Commits

Author SHA1 Message Date
Tong Li
196d4696d0 [NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) 2023-03-29 15:22:21 +08:00
lucasliunju
4b95464994 [NFC] polish colossalai/amp/__init__.py code style (#3272) 2023-03-29 15:22:21 +08:00
Xuanlei Zhao
6b3bb2c249 [NFC] polish code style (#3273) 2023-03-29 15:22:21 +08:00
CZYCW
4cadb25b96 [NFC] policy colossalai/fx/proxy.py code style (#3269) 2023-03-29 15:22:21 +08:00
Yuanchen
d58fa705b2 [NFC] polish code style (#3268)
Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2023-03-29 15:22:21 +08:00
Camille Zhong
c4a226b729 [NFC] polish tensor_placement_policy.py code style (#3265) 2023-03-29 15:22:21 +08:00
CsRic
00778abc48 [NFC] polish colossalai/fx/passes/split_module.py code style (#3263)
Co-authored-by: csric <richcsr256@gmail.com>
2023-03-29 15:22:21 +08:00
jiangmingyan
488f37048c [NFC] polish colossalai/global_variables.py code style (#3259)
Co-authored-by: luchen <luchen@luchendeMBP.lan>
2023-03-29 15:22:21 +08:00
LuGY
1ff7d5bfa5 [NFC] polish colossalai/engine/gradient_handler/_moe_gradient_handler.py (#3260) 2023-03-29 15:22:21 +08:00
dayellow
204ca2f09a [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256)
Co-authored-by: Minghao Huang <huangminghao@luchentech.com>
2023-03-29 15:22:21 +08:00
HELSON
02b058032d [fx] meta registration compatibility (#3253)
* [fx] meta registration compatibility

* fix error
2023-03-27 15:22:17 +08:00
Frank Lee
73d3e4d309 [booster] implemented the torch ddd + resnet example (#3232)
* [booster] implemented the torch ddd + resnet example

* polish code
2023-03-27 10:24:14 +08:00
YH
1a229045af Add interface for colo tesnor dp size (#3227) 2023-03-27 09:42:21 +08:00
YuliangLiu0306
4d5d8f98a4 [API] implement device mesh manager (#3221)
* [API] implement  device mesh manager

* polish
2023-03-24 13:39:12 +08:00
Frank Lee
cd142fbefa [api] implemented the checkpoint io module (#3205)
* [api] implemented the checkpoint io module

* polish code

* polish code
2023-03-23 10:53:17 +08:00
ver217
f8289d4221 [lazyinit] combine lazy tensor with dtensor (#3204)
* [lazyinit] lazy tensor add distribute

* [lazyinit] refactor distribute

* [lazyinit] add test dist lazy init

* [lazyinit] add verbose info for dist lazy init

* [lazyinit] fix rnn flatten weight op

* [lazyinit] polish test

* [lazyinit] polish test

* [lazyinit] fix lazy tensor data setter

* [lazyinit] polish test

* [lazyinit] fix clean

* [lazyinit] make materialize inplace

* [lazyinit] refactor materialize

* [lazyinit] refactor test distribute

* [lazyinit] fix requires_grad

* [lazyinit] fix tolist after materialization

* [lazyinit] refactor distribute module

* [lazyinit] polish docstr

* [lazyinit] polish lazy init context

* [lazyinit] temporarily skip test

* [lazyinit] polish test

* [lazyinit] add docstr
2023-03-23 10:53:06 +08:00
Frank Lee
e3ad88fb48 [booster] implemented the cluster module (#3191)
* [booster] implemented the cluster module

* polish code
2023-03-22 14:11:54 +08:00
YuliangLiu0306
f57d34958b [FX] refactor experimental tracer and adapt it with hf models (#3157)
* pass gpt trace and meta_prop

* pass t5 trace and meta_prop

* [FX] refactor experimental tracer and adapt it with hf models

* pass all mainstream model zoo

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* fix CI

* skip tests

* fix CI

* using packaging version

* polish
2023-03-22 10:40:33 +08:00
Frank Lee
e7f3bed2d3 [booster] added the plugin base and torch ddp plugin (#3180)
* [booster] added the plugin base and torch ddp plugin

* polish code

* polish code

* polish code
2023-03-21 17:39:30 +08:00
Zihao
18dbe76cae [auto-parallel] add auto-offload feature (#3154)
* add auto-offload feature

* polish code

* fix syn offload runtime pass bug

* add offload example

* fix offload testing bug

* fix example testing bug
2023-03-21 14:17:41 +08:00
YuliangLiu0306
258b43317c [hotfix] layout converting issue (#3188) 2023-03-21 13:24:18 +08:00
YH
80aed29cd3 [zero] Refactor ZeroContextConfig class using dataclass (#3186) 2023-03-21 12:36:47 +08:00
YH
9d644ff09f Fix docstr for zero statedict (#3185) 2023-03-21 11:48:21 +08:00
zbian
7bc0afc901 updated flash attention usage 2023-03-20 17:57:04 +08:00
Frank Lee
a9b8402d93 [booster] added the accelerator implementation (#3159) 2023-03-20 13:59:24 +08:00
ver217
6ae8ed0407 [lazyinit] add correctness verification (#3147)
* [lazyinit] fix shared module

* [tests] add lazy init test utils

* [tests] add torchvision for lazy init

* [lazyinit] fix pre op fn

* [lazyinit] handle legacy constructor

* [tests] refactor lazy init test models

* [tests] refactor lazy init test utils

* [lazyinit] fix ops don't support meta

* [tests] lazy init test timm models

* [lazyinit] fix set data

* [lazyinit] handle apex layers

* [tests] lazy init test transformers models

* [tests] lazy init test torchaudio models

* [lazyinit] fix import path

* [tests] lazy init test torchrec models

* [tests] update torch version in CI

* [tests] revert torch version in CI

* [tests] skip lazy init test
2023-03-17 13:49:04 +08:00
Frank Lee
ed19290560 [booster] implemented mixed precision class (#3151)
* [booster] implemented mixed precision class

* polish code
2023-03-17 11:00:15 +08:00
YuliangLiu0306
2eca4cd376 [DTensor] refactor dtensor with new components (#3089)
* [DTensor] refactor dtensor with new components

* polish
2023-03-14 16:25:47 +08:00
ver217
ed8f60b93b [lazyinit] refactor lazy tensor and lazy init ctx (#3131)
* [lazyinit] refactor lazy tensor and lazy init ctx

* [lazyinit] polish docstr

* [lazyinit] polish docstr
2023-03-14 15:37:12 +08:00
Frank Lee
95a36eae63 [kernel] added kernel loader to softmax autograd function (#3093)
* [kernel] added kernel loader to softmax autograd function

* [release] v0.2.6
2023-03-10 14:27:09 +08:00
Super Daniel
fff98f06ed [analyzer] a minimal implementation of static graph analyzer (#2852)
* [hotfix] meta tensor default device.

* [siu] add experimental submodules to main branch.

* [siu]

* [siu]

* [analyzer] init.

* [analyzer] readme.

* [analyzer] readme.

* [analyzer] readme.

* [analyzer] readme.

* [test] add test.

* Update symbolic_trace.py

* mark skip tests.

* try except.

* try except.

* try except.

* s

* init

* init

* fix

* skip

* skip

---------

Co-authored-by: Daniel Shao <superdainiu@MININT-PVARVID.fareast.corp.microsoft.com>
Co-authored-by: Daniel Shao <superdainiu@Daniels-Mac.local>
2023-03-10 13:21:05 +08:00
Xuanlei Zhao
10c61de2f7 [autochunk] support vit (#3084)
support vit for autochunk
* support some new ops for vit
* fix some bugs
* add test for vit
2023-03-10 10:23:26 +08:00
YuliangLiu0306
8e4e8601b7 [DTensor] implement layout converter (#3055)
* [DTensor] refactor LayoutConverter for DTensor

* polish code

* polish docstring
2023-03-10 09:53:52 +08:00
Frank Lee
f19b49e164 [booster] init module structure and definition (#3056) 2023-03-09 11:27:46 +08:00
Xuanlei Zhao
2ca9728cbb [autochunk] refactor chunk memory estimation (#2762)
* refact memory code

* dont log free var memory

* add memory align

* update chunk target

* update setting for new memory

* finish test

* update tracer

* update typo

* update test
2023-03-08 16:22:30 +08:00
YuliangLiu0306
29386a54e6 [DTensor] refactor CommSpec (#3034) 2023-03-08 10:45:31 +08:00
YuliangLiu0306
cd2b0eaa8d [DTensor] refactor sharding spec (#2987)
* [autoparallel] refactor sharding spec

* rename function name
2023-03-07 11:08:11 +08:00
Ziyue Jiang
400f63012e [pipeline] Add Simplified Alpa DP Partition (#2507)
* add alpa dp split

* add alpa dp split

* use fwd+bwd instead of fwd only

---------

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-03-07 10:34:31 +08:00
Super Daniel
b42d3d28ed [fx] remove depreciated algorithms. (#2312) (#2313) 2023-03-07 10:30:35 +08:00
github-actions[bot]
82503a96f2 [format] applied code formatting on changed files in pull request 2997 (#3008)
Co-authored-by: github-actions <github-actions@github.com>
2023-03-06 10:42:22 +08:00
binmakeswell
52a5078988 [doc] add ISC tutorial (#2997)
* [doc] add ISC tutorial

* [doc] add ISC tutorial

* [doc] add ISC tutorial

* [doc] add ISC tutorial
2023-03-06 10:36:38 +08:00
ver217
823f3b9cf4 [doc] add deepspeed citation and copyright (#2996)
* [doc] add deepspeed citation and copyright

* [doc] add deepspeed citation and copyright

* [doc] add deepspeed citation and copyright
2023-03-04 20:08:11 +08:00
YuliangLiu0306
e414e4092b [DTensor] implementation of dtensor (#2946)
* [DTensor] implementation of dtensor

* test layout convert

* polish
2023-03-01 16:34:58 +08:00
YuliangLiu0306
47fb214b3b [hotfix] add shard dim to aviod backward communication error (#2954) 2023-03-01 11:41:53 +08:00
ver217
090f14fd6b [misc] add reference (#2930)
* [misc] add reference

* [misc] add license
2023-02-28 18:07:24 +08:00
YuliangLiu0306
197d0bf4ed [autoparallel] apply repeat block to reduce solving time (#2912) 2023-02-28 11:03:30 +08:00
YH
a848091141 Fix port exception type (#2925) 2023-02-28 11:00:43 +08:00
zbian
61e687831d fixed using zero with tp cannot access weight correctly 2023-02-28 10:52:30 +08:00
YH
7b13f7db18 [zero] trivial zero optimizer refactoring (#2869)
* Fix mionr grad store interface

* Apply lint
2023-02-27 14:04:53 +08:00
Jiatong (Julius) Han
8c8a39be95 [hotfix]: Remove math.prod dependency (#2837)
* Remove math.prod dependency

* Fix style

* Fix style

---------

Co-authored-by: Jiatong Han <jiatong.han@u.nus.edu>
2023-02-23 23:56:15 +08:00