695 Commits

Author SHA1 Message Date
Boyuan Yao
0385b26ebf [autoparallel] Patch meta information of torch.nn.LayerNorm (#2647)
* [autoparallel] layernorm metainfo patch

* [autoparallel] polish test
2023-02-10 14:29:24 +08:00
YuliangLiu0306
37df666f38 [autoparallel] refactor handlers which reshape input tensors (#2615)
* [autoparallel] refactor handlers which reshape input tensors

* polish
2023-02-08 15:02:49 +08:00
YuliangLiu0306
cb3d1bef62 [autoparallel] adapt autoparallel tests with latest api (#2626) 2023-02-08 15:02:12 +08:00
Boyuan Yao
90a9fdd91d [autoparallel] Patch meta information of torch.matmul (#2584)
* [autoparallel] matmul metainfo

* [auto_parallel] remove unused print

* [tests] skip test_matmul_handler when torch version is lower than 1.12.0
2023-02-08 11:05:31 +08:00
oahzxl
6ba8364881 [autochunk] support diffusion for autochunk (#2621)
* add alphafold benchmark

* renae alphafold test

* rename tests

* rename diffuser

* renme

* rename

* update transformer

* update benchmark

* update benchmark

* update bench memory

* update transformer benchmark

* rename

* support diffuser

* support unet metainfo prop

* fix bug and simplify code

* update linear and support some op

* optimize max region search, support conv

* update unet test

* support some op

* support groupnorm and interpolate

* update flow search

* add fix dim in node flow

* fix utils

* rename

* support diffusion

* update diffuser

* update chunk search

* optimize imports

* import

* finish autochunk
2023-02-07 16:32:45 +08:00
oahzxl
c4b15661d7 [autochunk] add benchmark for transformer and alphafold (#2543) 2023-02-02 15:06:43 +08:00
oahzxl
05671fcb42 [autochunk] support multi outputs chunk search (#2538)
Support multi outputs chunk search. Previously we only support single output chunk search. It is more flexible and improve performance by a large margin. For transformer, we reduce memory by 40% than previous search strategy.

1. rewrite search strategy to support multi outputs chunk search
2. fix many, many bugs
3. update tests
2023-02-01 13:18:51 +08:00
oahzxl
63199c6687 [autochunk] support transformer (#2526) 2023-01-31 16:00:06 +08:00
Frank Lee
b55deb0662 [workflow] only report coverage for changed files (#2524)
* [workflow] only report coverage for changed files

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file
2023-01-30 21:28:27 +08:00
HELSON
b528eea0f0 [zero] add zero wrappers (#2523)
* [zero] add zero wrappers

* change names

* add wrapper functions to init
2023-01-29 17:52:58 +08:00
HELSON
077a5cdde4 [zero] fix gradient clipping in hybrid parallelism (#2521)
* [zero] fix gradient clipping in hybrid parallelism

* [testing] change model name to avoid pytest warning

* [hotfix] fix unit testing
2023-01-29 15:09:57 +08:00
HELSON
707b11d4a0 [gemini] update ddp strict mode (#2518)
* [zero] add strict ddp mode for chunk init

* [gemini] update gpt example
2023-01-28 14:35:25 +08:00
HELSON
2d1a7dfe5f [zero] add strict ddp mode (#2508)
* [zero] add strict ddp mode

* [polish] add comments for strict ddp mode

* [zero] fix test error
2023-01-20 14:04:38 +08:00
oahzxl
c04f183237 [autochunk] support parsing blocks (#2506) 2023-01-20 11:18:17 +08:00
oahzxl
72341e65f4 [auto-chunk] support extramsa (#3) (#2504) 2023-01-20 10:13:03 +08:00
oahzxl
ecccc91f21 [autochunk] support autochunk on evoformer (#2497) 2023-01-19 11:41:00 +08:00
HELSON
d565a24849 [zero] add unit testings for hybrid parallelism (#2486) 2023-01-18 10:36:10 +08:00
oahzxl
4953b4ace1 [autochunk] support evoformer tracer (#2485)
support full evoformer tracer, which is a main module of alphafold. previously we just support a simplifed version of it.
1. support some evoformer's op in fx
2. support evoformer test
3. add repos for test code
2023-01-16 19:25:05 +08:00
YuliangLiu0306
67e1912b59 [autoparallel] support origin activation ckpt on autoprallel system (#2468) 2023-01-16 16:25:13 +08:00
HELSON
21c88220ce [zero] add unit test for low-level zero init (#2474) 2023-01-15 10:42:01 +08:00
HELSON
a5dc4253c6 [zero] polish low level optimizer (#2473) 2023-01-13 14:56:17 +08:00
Jiarui Fang
867c8c2d3a [zero] low level optim supports ProcessGroup (#2464) 2023-01-13 10:05:58 +08:00
YuliangLiu0306
8221fd7485 [autoparallel] update binary elementwise handler (#2451)
* [autoparallel] update binary elementwise handler

* polish
2023-01-12 09:35:10 +08:00
HELSON
5521af7877 [zero] fix state_dict and load_state_dict for ddp ignored parameters (#2443)
* [ddp] add is_ddp_ignored

[ddp] rename to is_ddp_ignored

* [zero] fix state_dict and load_state_dict

* fix bugs

* [zero] update unit test for ZeroDDP
2023-01-11 14:55:41 +08:00
YuliangLiu0306
41429b9b28 [autoparallel] add shard option (#2423) 2023-01-11 13:40:33 +08:00
HELSON
bb4e9a311a [zero] add inference mode and its unit test (#2418) 2023-01-11 10:07:37 +08:00
oahzxl
61fdd3464a update doc 2023-01-10 12:29:09 +08:00
oahzxl
36ab2cb783 change import 2023-01-10 12:20:40 +08:00
oahzxl
7ab2db206f adapt new fx 2023-01-10 11:56:00 +08:00
oahzxl
e532679c95 Merge branch 'main' of https://github.com/oahzxl/ColossalAI into chunk 2023-01-10 11:29:01 +08:00
oahzxl
c1492e5013 add test in import 2023-01-10 11:20:28 +08:00
HELSON
ea13a201bb [polish] polish code for get_static_torch_model (#2405)
* [gemini] polish code

* [testing] remove code

* [gemini] make more robust
2023-01-09 17:41:38 +08:00
oahzxl
212b5b1b5f add comments 2023-01-09 16:29:33 +08:00
oahzxl
aafc3516a5 add available 2023-01-09 15:32:19 +08:00
oahzxl
d5c4f0bf95 code style 2023-01-09 15:22:09 +08:00
oahzxl
d106b271f8 add chunk search test 2023-01-09 15:19:08 +08:00
oahzxl
a005965d2d update codegen test 2023-01-09 14:57:47 +08:00
oahzxl
3abbaf8bc6 update codegen test 2023-01-09 14:53:04 +08:00
oahzxl
74b81395a2 update codegen test 2023-01-09 14:26:22 +08:00
oahzxl
18a51c87fe rename test 2023-01-09 14:20:54 +08:00
oahzxl
cb68ee864a set benchmark 2023-01-09 14:20:41 +08:00
Jiarui Fang
4e96039649 [device] find best logical mesh 2023-01-07 14:04:30 +08:00
Frank Lee
40d376c566 [setup] support pre-build and jit-build of cuda kernels (#2374)
* [setup] support pre-build and jit-build of cuda kernels

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-01-06 20:50:26 +08:00
oahzxl
a6cdbf9161 seperate trace flow 2023-01-06 17:24:23 +08:00
oahzxl
da4076846d rename 2023-01-06 17:09:37 +08:00
oahzxl
fd87d78a28 rename ambiguous variable 2023-01-06 14:28:04 +08:00
oahzxl
8a634af2f5 close mem and code print 2023-01-06 14:19:45 +08:00
oahzxl
1a6d2a740b take apart chunk code gen 2023-01-06 14:14:45 +08:00
HELSON
48d33b1b17 [gemini] add get static torch model (#2356) 2023-01-06 13:41:19 +08:00
oahzxl
d1f0773182 rename 2023-01-06 11:48:33 +08:00