Baizhou Zhang
c9625dbb63
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)
* implement sharded optimizer saving
* add more param info
* finish implementation of sharded optimizer saving
* fix bugs in optimizer sharded saving
* add pp+zero test
* param group loading
* greedy loading of optimizer
* fix bug when loading
* implement optimizer sharded saving
* add optimizer test & arrange checkpointIO utils
* fix gemini sharding state_dict
* add verbose option
* add loading of master params
* fix typehint
* fix master/working mapping in fp16 amp
2023-08-31 14:50:47 +08:00
..
2023-05-11 16:30:58 +08:00
2023-08-22 23:59:31 +08:00
2023-04-06 14:51:35 +08:00
2023-05-15 17:20:56 +08:00
2023-06-25 13:34:15 +08:00
2023-07-04 16:05:01 +08:00
2023-08-15 23:25:14 +08:00
2023-08-31 14:50:47 +08:00
2023-08-15 23:25:14 +08:00
2023-04-06 14:51:35 +08:00
2023-08-16 18:56:52 +08:00
2023-08-16 18:56:52 +08:00
2023-08-16 18:56:52 +08:00
2023-05-11 16:30:58 +08:00
2023-04-06 14:51:35 +08:00
2023-07-04 16:07:47 +08:00
2023-04-06 14:51:35 +08:00
2023-08-15 23:25:14 +08:00
2023-07-18 23:53:38 +08:00
2023-05-11 16:30:58 +08:00
2023-08-15 23:25:14 +08:00
2023-05-11 16:30:58 +08:00
2023-04-06 14:51:35 +08:00
2023-06-05 15:58:31 +08:00
2023-08-18 21:29:25 +08:00
2023-08-31 09:57:18 +08:00
2023-07-04 16:05:01 +08:00
2023-05-11 16:30:58 +08:00
2023-08-16 18:56:52 +08:00
2023-08-11 15:09:24 +08:00
2022-03-11 15:50:28 +08:00