Commit Graph

1867 Commits

Author SHA1 Message Date
HELSON
707b11d4a0 [gemini] update ddp strict mode (#2518)
* [zero] add strict ddp mode for chunk init

* [gemini] update gpt example
2023-01-28 14:35:25 +08:00
HELSON
2d1a7dfe5f [zero] add strict ddp mode (#2508)
* [zero] add strict ddp mode

* [polish] add comments for strict ddp mode

* [zero] fix test error
2023-01-20 14:04:38 +08:00
oahzxl
c04f183237 [autochunk] support parsing blocks (#2506) 2023-01-20 11:18:17 +08:00
Super Daniel
35c0c0006e [utils] lazy init. (#2148)
* [utils] lazy init.

* [utils] remove description.

* [utils] complete.

* [utils] finalize.

* [utils] fix names.
2023-01-20 10:49:00 +08:00
oahzxl
72341e65f4 [auto-chunk] support extramsa (#3) (#2504) 2023-01-20 10:13:03 +08:00
Ziyue Jiang
0f02b8c6e6 add avg partition (#2483)
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-19 13:54:50 +08:00
アマデウス
99d9713b02 Revert "Update parallel_context.py (#2408)"
This reverts commit 7d5640b9db.
2023-01-19 12:27:48 +08:00
oahzxl
ecccc91f21 [autochunk] support autochunk on evoformer (#2497) 2023-01-19 11:41:00 +08:00
oahzxl
5db3a5bf42 [fx] allow control of ckpt_codegen init (#2498)
* [fx] allow control of ckpt_codegen init

Currently in ColoGraphModule, ActivationCheckpointCodeGen will be set automatically in __init__. But other codegen can't be set if so. 
So I add an arg to control whether to set ActivationCheckpointCodeGen in __init__.

* code style
2023-01-18 17:02:46 +08:00
HELSON
d565a24849 [zero] add unit testings for hybrid parallelism (#2486) 2023-01-18 10:36:10 +08:00
oahzxl
4953b4ace1 [autochunk] support evoformer tracer (#2485)
support full evoformer tracer, which is a main module of alphafold. previously we just support a simplifed version of it.
1. support some evoformer's op in fx
2. support evoformer test
3. add repos for test code
2023-01-16 19:25:05 +08:00
YuliangLiu0306
67e1912b59 [autoparallel] support origin activation ckpt on autoprallel system (#2468) 2023-01-16 16:25:13 +08:00
Ziyue Jiang
fef5c949c3 polish pp middleware (#2476)
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-13 16:56:01 +08:00
HELSON
a5dc4253c6 [zero] polish low level optimizer (#2473) 2023-01-13 14:56:17 +08:00
Frank Lee
8b7495dd54 [example] integrate seq-parallel tutorial with CI (#2463) 2023-01-13 14:40:05 +08:00
Jiarui Fang
867c8c2d3a [zero] low level optim supports ProcessGroup (#2464) 2023-01-13 10:05:58 +08:00
Frank Lee
14d9299360 [cli] fixed hostname mismatch error (#2465) 2023-01-12 14:52:09 +08:00
Haofan Wang
9358262992 Fix False warning in initialize.py (#2456)
* Update initialize.py

* pre-commit run check
2023-01-12 13:49:01 +08:00
YuliangLiu0306
8221fd7485 [autoparallel] update binary elementwise handler (#2451)
* [autoparallel] update binary elementwise handler

* polish
2023-01-12 09:35:10 +08:00
HELSON
2bfeb24308 [zero] add warning for ignored parameters (#2446) 2023-01-11 15:30:09 +08:00
Frank Lee
39163417a1 [example] updated the hybrid parallel tutorial (#2444)
* [example] updated the hybrid parallel tutorial

* polish code
2023-01-11 15:17:17 +08:00
HELSON
5521af7877 [zero] fix state_dict and load_state_dict for ddp ignored parameters (#2443)
* [ddp] add is_ddp_ignored

[ddp] rename to is_ddp_ignored

* [zero] fix state_dict and load_state_dict

* fix bugs

* [zero] update unit test for ZeroDDP
2023-01-11 14:55:41 +08:00
YuliangLiu0306
2731531bc2 [autoparallel] integrate device mesh initialization into autoparallelize (#2393)
* [autoparallel] integrate device mesh initialization into autoparallelize

* add megatron solution

* update gpt autoparallel examples with latest api

* adapt beta value to fit the current computation cost
2023-01-11 14:03:49 +08:00
Frank Lee
c72c827e95 [cli] provided more details if colossalai run fail (#2442) 2023-01-11 13:56:42 +08:00
Super Daniel
c41e59e5ad [fx] allow native ckpt trace and codegen. (#2438) 2023-01-11 13:49:59 +08:00
YuliangLiu0306
41429b9b28 [autoparallel] add shard option (#2423) 2023-01-11 13:40:33 +08:00
HELSON
7829aa094e [ddp] add is_ddp_ignored (#2434)
[ddp] rename to is_ddp_ignored
2023-01-11 12:22:45 +08:00
HELSON
bb4e9a311a [zero] add inference mode and its unit test (#2418) 2023-01-11 10:07:37 +08:00
Jiarui Fang
93f62dd152 [autochunk] add autochunk feature 2023-01-10 16:04:42 +08:00
HELSON
dddacd2d2c [hotfix] add norm clearing for the overflow step (#2416) 2023-01-10 15:43:06 +08:00
oahzxl
7ab2db206f adapt new fx 2023-01-10 11:56:00 +08:00
oahzxl
e532679c95 Merge branch 'main' of https://github.com/oahzxl/ColossalAI into chunk 2023-01-10 11:29:01 +08:00
Haofan Wang
7d5640b9db Update parallel_context.py (#2408) 2023-01-10 11:27:23 +08:00
oahzxl
fd818cf144 change imports 2023-01-10 11:10:45 +08:00
oahzxl
a591d45b29 add available 2023-01-10 10:56:39 +08:00
oahzxl
615e7e68d9 update doc 2023-01-10 10:44:07 +08:00
oahzxl
7d4abaa525 add doc 2023-01-10 09:59:47 +08:00
oahzxl
1be0ac3cbf add doc for trace indice 2023-01-09 17:59:52 +08:00
oahzxl
0b6af554df remove useless function 2023-01-09 17:46:43 +08:00
oahzxl
d914a21d64 rename 2023-01-09 17:45:36 +08:00
oahzxl
865f2e0196 rename 2023-01-09 17:42:25 +08:00
HELSON
ea13a201bb [polish] polish code for get_static_torch_model (#2405)
* [gemini] polish code

* [testing] remove code

* [gemini] make more robust
2023-01-09 17:41:38 +08:00
oahzxl
a4ed5b0d0d rename in doc 2023-01-09 17:41:26 +08:00
oahzxl
1bb1f2ad89 rename 2023-01-09 17:38:16 +08:00
oahzxl
cb9817f75d rename function from index to indice 2023-01-09 17:34:30 +08:00
oahzxl
0ea903b94e rename trace_index to trace_indice 2023-01-09 17:25:13 +08:00
Frank Lee
551cafec14 [doc] updated kernel-related optimisers' docstring (#2385)
* [doc] updated kernel-related optimisers' docstring

* polish doc
2023-01-09 17:13:53 +08:00
oahzxl
065f0b4c27 add doc for search 2023-01-09 17:11:51 +08:00
oahzxl
a68d240ed5 add doc for search chunk 2023-01-09 16:54:08 +08:00
oahzxl
1951f7fa87 code style 2023-01-09 16:30:16 +08:00