Commit Graph

1419 Commits

Author SHA1 Message Date
YuliangLiu0306
c7eca40f51 Merge pull request #812 from FrankLeeeee/feature/cli
[cli] fixed single-node process launching
2022-04-20 11:40:07 +08:00
Jiarui Fang
3ddbd1bce1 [gemini] collect cpu-gpu moving volume in each iteration (#813) 2022-04-20 11:29:48 +08:00
FrankLeeeee
d522cb704e [cli] fixed single-node process launching 2022-04-20 10:46:51 +08:00
Jiarui Fang
61c20b44bc [log] local throughput metrics (#811)
* Revert "[zero] add ZeroTensorShardStrategy (#793)"

This reverts commit 88759e289e.

* [gemini] set cpu memory capacity

* [log] local throughput collecting

* polish

* polish

* polish

* polish code

* polish
2022-04-20 10:05:39 +08:00
ver217
dd92b90a68 [DO NOT MERGE] [zero] init fp16 params directly in ZeroInitContext (#808)
* init fp16 param directly

* polish code
2022-04-19 16:16:48 +08:00
Jiarui Fang
227d1cd4b3 [gemini] APIs to set cpu memory capacity (#809) 2022-04-19 16:05:22 +08:00
FrankLeeeee
f63e91d280 [cli] fixed a bug in user args and refactored the module structure 2022-04-19 15:15:16 +08:00
Jiarui Fang
e761ad2cd7 Revert "[zero] add ZeroTensorShardStrategy (#793)" (#806) 2022-04-19 14:40:02 +08:00
HELSON
88759e289e [zero] add ZeroTensorShardStrategy (#793) 2022-04-19 14:32:45 +08:00
Jiarui Fang
681addb512 [refactor] moving grad acc logic to engine (#804) 2022-04-19 14:03:21 +08:00
Frank Lee
05d9ae5999 [cli] add missing requirement (#805) 2022-04-19 13:56:59 +08:00
YuliangLiu0306
de2f581d43 [cli] added micro benchmarking for tp (#789)
* [CLI] add CLI launcher

* Revert "[CLI] add CLI launcher"

This reverts commit df7e6506d4.

* [CLI]add cli benchmark feature

* fix CodeFactor issues.

* refactor the module structure.
2022-04-19 12:08:28 +08:00
YuliangLiu0306
cfadc9df8e [cli] added distributed launcher command (#791)
* [CLI] add CLI launcher

* Revert "[CLI] add CLI launcher"

This reverts commit df7e6506d4.

* [CLI]add cli launcher feature

* remove testing message used during developing

* refactor the module structure.
2022-04-19 10:59:44 +08:00
Jiarui Fang
4d9332b4c5 [refactor] moving memtracer to gemini (#801) 2022-04-19 10:13:08 +08:00
Jiarui Fang
8711c706f4 [hotfix] fix grad offload when enabling reuse_fp16_shard 2022-04-18 14:58:21 +08:00
ver217
f1fa1a675f fix grad offload when enabling reuse_fp16_shard 2022-04-18 14:07:39 +08:00
HELSON
4c4388c46e [hotfix] fix memory leak in zero (#781) 2022-04-18 13:57:03 +08:00
Ziyue Jiang
4b01da24cd [TP] change the check assert in split batch 2d (#772) 2022-04-16 21:29:57 +08:00
ver217
846406a07a [gemini] fix auto tensor placement policy (#775) 2022-04-16 21:29:31 +08:00
HELSON
a65cbb7e4e [zero] refactor shard and gather operation (#773) 2022-04-15 14:41:31 +08:00
ver217
6e553748a7 polish sharded optim docstr and warning (#770) 2022-04-14 21:03:59 +08:00
LuGY
80e37eec42 fix the ckpt bugs when using DDP (#769) 2022-04-14 21:03:24 +08:00
Frank Lee
920fe31526 [compatibility] used backward-compatible API for global process group (#758) 2022-04-14 17:20:35 +08:00
Frank Lee
4ea49cb536 [test] added a decorator for address already in use error with backward compatibility (#760)
* [test] added a decorator for address already in use error with backward compatibility

* [test] added a decorator for address already in use error with backward compatibility
2022-04-14 16:48:44 +08:00
Jiarui Fang
10ef8afdd2 [gemini] init genimi individual directory (#754) 2022-04-14 16:40:26 +08:00
ver217
dcca614eee [hotfix] fix test_stateful_tensor_mgr (#762) 2022-04-14 15:50:09 +08:00
ver217
a93a7d7364 [hotfix] fix reuse_fp16_shard of sharded model (#756)
* fix reuse_fp16_shard

* disable test stm

* polish code
2022-04-14 14:56:46 +08:00
ver217
8f7ce94b8e [hotfix] fix auto tensor placement policy (#753) 2022-04-14 12:04:45 +08:00
HELSON
84c6700b2a [zero] refactor memstats_collector (#746) 2022-04-14 12:01:12 +08:00
アマデウス
b8899e0905 [TP] allow layernorm without bias (#750) 2022-04-14 11:43:56 +08:00
Jiarui Fang
3d7dc46d33 [zero] use factory pattern for tensor_placement_policy (#752) 2022-04-14 11:07:29 +08:00
ver217
4b048a8728 fix prepare grads in sharded optim (#749) 2022-04-13 22:36:11 +08:00
ver217
097772546e fix initialize about zero 2022-04-13 19:10:21 +08:00
ver217
e396bb71f2 [zero] add tensor placement policies (#743)
* add tensor placement policies

* polish comments

* polish comments

* update moe unit tests
2022-04-13 15:00:48 +08:00
HELSON
22c4b88d56 [zero] refactor ShardedParamV2 for convenience (#742) 2022-04-13 14:54:26 +08:00
HELSON
340e59f968 [utils] add synchronized cuda memory monitor (#740) 2022-04-13 10:50:54 +08:00
ver217
e6212f56cd [hotfix] fix memory leak in backward of sharded model (#741) 2022-04-13 09:59:05 +08:00
Frank Lee
a4e91bc87f [bug] fixed grad scaler compatibility with torch 1.8 (#735) 2022-04-12 16:04:21 +08:00
Jiarui Fang
53cb584808 [utils] correct cpu memory used and capacity in the context of multi-process (#726) 2022-04-12 14:57:54 +08:00
Jiarui Fang
7db3ccc79b [hotfix] remove duplicated param register to stateful tensor manager (#728) 2022-04-12 13:55:25 +08:00
Frank Lee
1cb7bdad3b [util] fixed communication API depth with PyTorch 1.9 (#721) 2022-04-12 09:44:40 +08:00
Frank Lee
2412429d54 [util] fixed activation checkpointing on torch 1.9 (#719) 2022-04-12 09:35:45 +08:00
Frank Lee
04ff5ea546 [utils] support detection of number of processes on current node (#723) 2022-04-12 09:28:19 +08:00
Jiarui Fang
4d90a7b513 [refactor] zero directory (#724) 2022-04-11 23:13:02 +08:00
Jiarui Fang
193dc8dacb [refactor] refactor the memory utils (#715) 2022-04-11 16:47:57 +08:00
HELSON
dbd96fe90a [zero] check whether gradients have inf and nan in gpu (#712) 2022-04-11 15:40:13 +08:00
ver217
715b86eadd [hotfix] fix stm cuda model data size (#710) 2022-04-11 15:10:39 +08:00
LuGY
140263a394 [hotfix]fixed bugs of assigning grad states to non leaf nodes (#711)
* fixed bugs of assigning grad states to non leaf nodes

* use detach()
2022-04-11 14:04:58 +08:00
Frank Lee
eda30a058e [compatibility] fixed tensor parallel compatibility with torch 1.9 (#700) 2022-04-11 13:44:50 +08:00
HELSON
a9b8300d54 [zero] improve adaptability for not-shard parameters (#708)
* adapt post grad hooks for not-shard parameters
* adapt optimizer for not-shard parameters
* offload gradients for not-replicated parameters
2022-04-11 13:38:51 +08:00