Commit Graph

152 Commits

Author SHA1 Message Date
binmakeswell
1f8ab6f1f5 [NFC] polish code format (#2367) 2023-01-06 15:34:48 +08:00
ziyuhuang123
7080a8edb0 [workflow]New version: Create workflow files for examples' auto check (#2298)
* [workflows]bug_repair

* [workflow]new_pr_fixing_bugs

Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2023-01-06 09:26:49 +08:00
YuliangLiu0306
9c9246c0d9 [device] alpha beta profiler (#2311)
* [device] alpha beta profiler

* add usage

* fix variable name
2023-01-05 16:39:55 +08:00
Ofey Chan
87d2defda6 [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/layer_norm_handler.py code style (#2305) 2023-01-04 15:09:57 +08:00
Zangwei Zheng
d1e5bafcd4 [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/__init__.py code style (#2291) 2023-01-04 15:09:57 +08:00
shenggan
950685873f [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/reshape_handler.py code style (#2292) 2023-01-04 15:09:57 +08:00
Zirui Zhu
1c29b173c9 [NFC] polish colossalai/auto_parallel/tensor_shard/node_handler/getitem_handler.py code style (#2289) 2023-01-04 15:09:57 +08:00
Boyuan Yao
d45695d94e Merge pull request #2258 from hpcaitech/debug/ckpt-autoparallel
[autockpt] provide option for activation checkpoint search in SPMD solver
2023-01-04 11:37:28 +08:00
Boyuan Yao
b904748210 [autoparallel] bypass MetaInfo when unavailable and modify BCAST_FUNC_OP metainfo (#2293)
* [autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline

* [autoparallel] using fwd_time and bwd_time instead of fwd_flop and bwd_flop

* [autoparallel] specifycomm nodes' memory cost in construct chain

* [autoparallel] fix wrong runtime apply calculation

* [autoparallel] fix wrong runtime apply calculation

* [autoparallel] fix wrong runtime apply calculation

* [autoparallel] bypass metainfo when available and modify BCAST_FUNC_OP
2023-01-03 20:28:01 +08:00
Super Daniel
8ea50d999e [hotfix] pass a parameter. (#2288)
* [autockpt] make it work.

* [autockpt] linearize / merge shape-consistency nodes.

* [autockpt] considering parameter and optimizer weights.

* [hotfix] pass a parameter.
2023-01-03 18:05:06 +08:00
Super Daniel
8e8900ff3f [autockpt] considering parameter and optimizer weights. (#2279)
* [autockpt] make it work.

* [autockpt] linearize / merge shape-consistency nodes.

* [autockpt] considering parameter and optimizer weights.
2023-01-03 16:55:49 +08:00
YuliangLiu0306
fb87322773 [autoparallel] fix spelling error (#2270) 2023-01-03 16:13:00 +08:00
Super Daniel
b0d21d0c4f [autockpt] linearize / merge shape-consistency nodes. (#2271)
* [autockpt] make it work.

* [autockpt] linearize / merge shape-consistency nodes.
2023-01-03 14:54:22 +08:00
YuliangLiu0306
4b29112ab2 [autoparallel] gpt2 autoparallel examples (#2267)
* [autoparallel] gpt2 autoparallel examples

* polish code

* polish code
2023-01-03 14:23:33 +08:00
Boyuan Yao
5c2ef9fc76 [autoparallel] modify comm nodes' memory cost in construct chain (#2263)
* [autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline

* [autoparallel] using fwd_time and bwd_time instead of fwd_flop and bwd_flop

* [autoparallel] specifycomm nodes' memory cost in construct chain
2023-01-03 11:38:48 +08:00
Boyuan Yao
1ea99b869e [autoparallel] align the data_ptr with the old version of auto activation checkpoint pipeline (#2261) 2023-01-03 10:30:15 +08:00
Super Daniel
3ccf58aa76 [autockpt] make it work. (#2257) 2023-01-02 23:37:45 +08:00
Boyuan Yao
ac3739930d [autoparallel] modify construct chain in rotor solver (#2254) 2023-01-02 16:26:12 +08:00
Boyuan Yao
ab38aebace [autoparallel] Hook all meta information on ResNet nodes for auto activation checkpoint (#2248)
* [autoparallel] hook node meta on graph nodes for checkpoint solver

* [autoparallel] polish code

* [autoparallel] restore some node handlers

* colossalai/auto_parallel/passes/meta_info_prop.py

* [autoparallel] remove some unused import

* [autoparallel] hook bwd_mem_out
2023-01-02 16:25:18 +08:00
Boyuan Yao
c8c79102f0 [autoparallel] patch torch.flatten metainfo for autoparallel (#2247)
* [autoparallel] patch torch.flatten
2023-01-02 15:51:03 +08:00
YuliangLiu0306
8897b8f753 [autoparallel] autoparallel initialize (#2238) 2022-12-31 01:02:14 +08:00
Super Daniel
b7d0990c61 [autoparallel] fix construct meta info. (#2245) 2022-12-30 19:56:44 +08:00
YuliangLiu0306
3b1b91eaf4 [autoparallel] record parameter attribute in colotracer (#2217)
* [autoparallel] record parameter attribute in collotracer

* [autoparallel] fix construct_meta_info bug
2022-12-28 19:29:08 +08:00
Boyuan Yao
24246f7aa5 [autoparallel] Attach input, buffer and output tensor to MetaInfo class (#2162)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator

* [autoparallel] add binary elementwise metainfo

* [fx] recover profiler

* [autoparallel] fix forward memory calculation

* [autoparallel] modify constants.py

* [autoparallel] remove redundant print

* [autoparallel] add F.conv metainfo

* [autoparallel] linear fix

* [autoparallel] memory estimation for communication actions

* [autoparallel] fix docstring

* [autoparallel] fix variables name

* [autoparallel] attach tensor to metainfo class

* [autoparallel] fix dangerous try except

* [autoparallel] attach memory cost to shape consistency node

* [autoparallel] attach shape consistency node's metainfo to the node

* [autoparallel] remove todo in shape consistency memory estimation

* [autoparallel] fix the annotation
2022-12-28 13:37:40 +08:00
Boyuan Yao
d0bc5a1b34 [autoparallel] new metainfoprop based on metainfo class (#2179)
* [autoparallel] new metainfoprop to combine SPMD solver and checkpoint solver

* [autoparallel] new metainfoprop to combine SPMD solver and checkpoint solver

* [autoparallel] modify placeholder handler

* [autoparallel] modify metainfoprop

* [autoparallel] fix function typo

* [autoparallel] fix placeholder handler
2022-12-28 13:35:08 +08:00
YuliangLiu0306
78509124d3 [autoparallel] update getitem handler (#2207) 2022-12-27 19:58:32 +08:00
YuliangLiu0306
4851f2d607 [autoparallel] update_getattr_handler (#2193) 2022-12-26 21:57:39 +08:00
YuliangLiu0306
550f8f8905 [autoparallel] integrate_gpt_related_tests (#2134)
* [autoparallel] integrate_gpt_related_tests

* polish code

* polish code

* add GPT2Model into runtime test
2022-12-23 12:36:59 +08:00
Boyuan Yao
cfe2a9bd90 [autoparallel] memory estimation for shape consistency (#2144)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator

* [autoparallel] add binary elementwise metainfo

* [fx] recover profiler

* [autoparallel] fix forward memory calculation

* [autoparallel] modify constants.py

* [autoparallel] remove redundant print

* [autoparallel] add F.conv metainfo

* [autoparallel] linear fix

* [autoparallel] memory estimation for communication actions

* [autoparallel] fix docstring

* [autoparallel] fix variables name
2022-12-21 10:39:37 +08:00
YuliangLiu0306
1cce6e36ca [autoparallel] use metainfo in handler (#2149) 2022-12-20 10:31:22 +08:00
YuliangLiu0306
a3c6924deb [autoparallel] process size nodes in runtime pass (#2130)
* [autoparallel] process size nodes in runtime pass

* polish code
2022-12-14 16:10:50 +08:00
YuliangLiu0306
536560ccc0 [autoparallel] implement softmax handler (#2132) 2022-12-14 16:09:53 +08:00
YuliangLiu0306
cd0af9f7f6 [autoparallel] gpt2lp runtimee test (#2113) 2022-12-12 18:06:40 +08:00
YuliangLiu0306
d3d4630495 [autoparallel] add sum handler (#2101) 2022-12-08 17:02:54 +08:00
YuliangLiu0306
3af7e65dea [autoparallel] complete gpt related module search (#2097) 2022-12-08 10:04:09 +08:00
YuliangLiu0306
7f72eb0510 [autoparallel]add embedding handler (#2089)
* [autoparallel] add embedding handler

* fix bugs
2022-12-07 09:41:46 +08:00
YuliangLiu0306
0e9db368ef [autoparallel] add tensor constructor handler (#2082) 2022-12-06 10:20:10 +08:00
YuliangLiu0306
cdf537a648 [autoparallel] add non_split linear strategy (#2078)
* [autoparallel] add non_split linear stategy

* polish
2022-12-06 10:19:33 +08:00
Boyuan Yao
cf0268da93 [autoparallel] Add F.conv metainfo (#2069)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator

* [autoparallel] add binary elementwise metainfo

* [fx] recover profiler

* [autoparallel] fix forward memory calculation

* [autoparallel] modify constants.py

* [autoparallel] remove redundant print

* [autoparallel] add F.conv metainfo

* [autoparallel] linear fix
2022-12-06 10:17:57 +08:00
YuliangLiu0306
f123476666 [autoparallel] complete gpt block searching (#2065)
* [autoparallel] complete gpt block searching

* fix test
2022-12-06 10:17:10 +08:00
Boyuan Yao
616da17fab [autoparallel] add binary elementwise metainfo for auto parallel (#2058)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator

* [autoparallel] add binary elementwise metainfo

* [fx] recover profiler

* [autoparallel] fix forward memory calculation

* [autoparallel] modify constants.py

* [autoparallel] remove redundant print
2022-12-04 15:18:51 +08:00
Boyuan Yao
4b40fbd743 [autoparallel] fix forward memory calculation (#2062) 2022-12-04 15:00:16 +08:00
YuliangLiu0306
e4293e5077 [hotfix] update test for latest version (#2060) 2022-12-02 18:12:30 +08:00
YuliangLiu0306
1c1fe44305 [autoparallel] adapt solver with self attention (#2037)
* [autoparallel] adapt solver with self attention

* polish code
2022-12-01 17:53:15 +08:00
YuliangLiu0306
0dbcd4a6f5 [autoparallel] add split handler (#2032)
* [autoparallel] add split handler

* add numerical test and runtime passes
2022-11-29 11:03:51 +08:00
YuliangLiu0306
81330b0352 [autoparallel] add experimental permute handler (#2029) 2022-11-27 20:26:52 +08:00
YuliangLiu0306
ea0f6b8df9 [autoparallel] add runtime pass and numerical test for view handler (#2018) 2022-11-25 15:50:16 +08:00
YuliangLiu0306
1438993113 [autoparallel] add experimental view handler (#2011)
* [autoparallel] add experimental view handler

* polish

* polish

* polish code

* rename variables
2022-11-24 11:34:41 +08:00
Boyuan Yao
6cd784ffee [autoparallel] Add metainfo support for F.linear (#1987)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler

* [fx] add relu metainfo class

* [fx] restore profiler

* [autoparallel] modify metainfo input

* [autoparallel] add pooling metainfo

* [autoparallel] add F.linear metainfo generator
2022-11-23 14:12:34 +08:00
YuliangLiu0306
155891113e [autoparallel] use pytree map style to process data (#1989) 2022-11-21 10:44:22 +08:00