* sharded optimizer checkpoint for gemini plugin * modify test to reduce testing time * update doc * fix bug when keep_gatherd is true under GeminiPlugin
1.9 KiB
Booster Checkpoint
Author: Hongxin Liu
Prerequisite:
Introduction
We've introduced the Booster API in the previous tutorial. In this tutorial, we will introduce how to save and load checkpoints using booster.
Model Checkpoint
{{ autodoc:colossalai.booster.Booster.save_model }}
Model must be boosted by colossalai.booster.Booster before saving. checkpoint is the path to saved checkpoint. It can be a file, if shard=False. Otherwise, it should be a directory. If shard=True, the checkpoint will be saved in a sharded way. This is useful when the checkpoint is too large to be saved in a single file. Our sharded checkpoint format is compatible with huggingface/transformers.
{{ autodoc:colossalai.booster.Booster.load_model }}
Model must be boosted by colossalai.booster.Booster before loading. It will detect the checkpoint format automatically, and load in corresponding way.
Optimizer Checkpoint
{{ autodoc:colossalai.booster.Booster.save_optimizer }}
Optimizer must be boosted by colossalai.booster.Booster before saving.
{{ autodoc:colossalai.booster.Booster.load_optimizer }}
Optimizer must be boosted by colossalai.booster.Booster before loading.
LR Scheduler Checkpoint
{{ autodoc:colossalai.booster.Booster.save_lr_scheduler }}
LR scheduler must be boosted by colossalai.booster.Booster before saving. checkpoint is the local path to checkpoint file.
{{ autodoc:colossalai.booster.Booster.load_lr_scheduler }}
LR scheduler must be boosted by colossalai.booster.Booster before loading. checkpoint is the local path to checkpoint file.
Checkpoint design
More details about checkpoint design can be found in our discussion A Unified Checkpoint System Design.