[checkpointio] support unsharded checkpointIO for hybrid parallel (#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix
This commit is contained in:
Baizhou Zhang
2023-09-26 10:58:03 +08:00
committed by GitHub
parent a2db75546d
commit 64a08b2dc3
4 changed files with 197 additions and 28 deletions

View File

@@ -74,8 +74,6 @@ This plugin implements the combination of various parallel training strategies a
> ⚠ When using this plugin, only the subset of Huggingface transformers supported by Shardformer are compatible with tensor parallel, pipeline parallel and optimization tools. Mainstream transformers such as Llama 1, Llama 2, OPT, Bloom, Bert and GPT2 etc. are all supported by Shardformer.
> ⚠ This plugin only supports sharded checkpointing methods for model/optimizer at present. Unsharded checkpointing methods will be supported in future release.
{{ autodoc:colossalai.booster.plugin.HybridParallelPlugin }}
### Torch DDP Plugin