ColossalAI/docs/source/en/basics/booster_checkpoint.md
Baizhou Zhang 1d454733c4
[doc] Update booster user documents. (#4669)
* update booster_api.md

* update booster_checkpoint.md

* update booster_plugins.md

* move transformers importing inside function

* fix Dict typing

* fix autodoc bug

* small fix
2023-09-12 10:47:23 +08:00

2.0 KiB

Booster Checkpoint

Author: Hongxin Liu

Prerequisite:

Introduction

We've introduced the Booster API in the previous tutorial. In this tutorial, we will introduce how to save and load checkpoints using booster.

Model Checkpoint

{{ autodoc:colossalai.booster.Booster.save_model }}

Model must be boosted by colossalai.booster.Booster before saving. checkpoint is the path to saved checkpoint. It can be a file, if shard=False. Otherwise, it should be a directory. If shard=True, the checkpoint will be saved in a sharded way. This is useful when the checkpoint is too large to be saved in a single file. Our sharded checkpoint format is compatible with huggingface/transformers, so you can use huggingface from_pretrained method to load model from our sharded checkpoint.

{{ autodoc:colossalai.booster.Booster.load_model }}

Model must be boosted by colossalai.booster.Booster before loading. It will detect the checkpoint format automatically, and load in corresponding way.

Optimizer Checkpoint

{{ autodoc:colossalai.booster.Booster.save_optimizer }}

Optimizer must be boosted by colossalai.booster.Booster before saving.

{{ autodoc:colossalai.booster.Booster.load_optimizer }}

Optimizer must be boosted by colossalai.booster.Booster before loading.

LR Scheduler Checkpoint

{{ autodoc:colossalai.booster.Booster.save_lr_scheduler }}

LR scheduler must be boosted by colossalai.booster.Booster before saving. checkpoint is the local path to checkpoint file.

{{ autodoc:colossalai.booster.Booster.load_lr_scheduler }}

LR scheduler must be boosted by colossalai.booster.Booster before loading. checkpoint is the local path to checkpoint file.

Checkpoint design

More details about checkpoint design can be found in our discussion A Unified Checkpoint System Design.