[doc] update booster tutorials (#3718)

* [booster] update booster tutorials#3717 * [booster] update booster tutorials#3717, fix * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, update setup doc * [booster] update booster tutorials#3717, rename colossalai booster.md * [booster] update booster tutorials#3717, rename colossalai booster.md * [booster] update booster tutorials#3717, rename colossalai booster.md * [booster] update booster tutorials#3717, fix * [booster] update booster tutorials#3717, fix * [booster] update tutorials#3717, update booster api doc * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, modify file * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3717, fix reference link * [booster] update tutorials#3713 * [booster] update tutorials#3713, modify file
2025-09-03 10:06:44 +00:00 · 2023-05-18 11:41:56 +08:00
parent 05759839bd
commit d449525acf
7 changed files with 213 additions and 17 deletions
--- a/docs/source/en/basics/booster_api.md
+++ b/docs/source/en/basics/booster_api.md
@@ -0,0 +1,89 @@
+# Booster API
+Author: [Mingyan Jiang](https://github.com/jiangmingyan)
+
+**Prerequisite:**
+- [Distributed Training](../concepts/distributed_training.md)
+- [Colossal-AI Overview](../concepts/colossalai_overview.md)
+
+**Example Code**
+- [Train with Booster](https://github.com/hpcaitech/ColossalAI/blob/main/examples/tutorial/new_api/cifar_resnet/README.md)
+
+## Introduction
+In our new design, `colossalai.booster` replaces the role of `colossalai.initialize` to inject features into your training components (e.g. model, optimizer, dataloader) seamlessly. With these new APIs, you can integrate your model with our parallelism features more friendly. Also calling `colossalai.booster` is the standard procedure before you run into your training loops. In the sections below, I will cover how `colossalai.booster` works and what we should take note of.
+
+### Plugin
+Plugin is an important component that manages parallel configuration (eg: The gemini plugin encapsulates the gemini acceleration solution). Currently supported plugins are as follows:
+
+***GeminiPlugin:*** This plugin wrapps the Gemini acceleration solution, that ZeRO with chunk-based memory management.
+
+***TorchDDPPlugin:*** This plugin wrapps the DDP acceleration solution, it implements data parallelism at the module level which can run across multiple machines.
+
+***LowLevelZeroPlugin:*** This plugin wraps the 1/2 stage of Zero Redundancy Optimizer. Stage 1 : Shards optimizer states across data parallel workers/GPUs. Stage 2 : Shards optimizer states + gradients across data parallel workers/GPUs.
+
+### API of booster
+
+
+{{ autodoc:colossalai.booster.Booster }}
+
+{{ autodoc:colossalai.booster.Booster.boost }}
+
+{{ autodoc:colossalai.booster.Booster.backward }}
+
+{{ autodoc:colossalai.booster.Booster.no_sync }}
+
+{{ autodoc:colossalai.booster.Booster.save_model }}
+
+{{ autodoc:colossalai.booster.Booster.load_model }}
+
+{{ autodoc:colossalai.booster.Booster.save_optimizer }}
+
+{{ autodoc:colossalai.booster.Booster.load_optimizer }}
+
+{{ autodoc:colossalai.booster.Booster.save_lr_scheduler }}
+
+{{ autodoc:colossalai.booster.Booster.load_lr_scheduler }}
+
+## Usage
+In a typical workflow, you should launch distributed environment at the beginning of training script and create objects needed (such as models, optimizers, loss function, data loaders etc.) firstly, then call `colossalai.booster` to inject features into these objects, After that, you can use our booster APIs and these returned objects to continue the rest of your training processes.
+
+A pseudo-code example is like below:
+
+```python
+import torch
+from torch.optim import SGD
+from torchvision.models import resnet18
+
+import colossalai
+from colossalai.booster import Booster
+from colossalai.booster.plugin import TorchDDPPlugin
+
+def train():
+    colossalai.launch(config=dict(), rank=rank, world_size=world_size, port=port, host='localhost')
+    plugin = TorchDDPPlugin()
+    booster = Booster(plugin=plugin)
+    model = resnet18()
+    criterion = lambda x: x.mean()
+    optimizer = SGD((model.parameters()), lr=0.001)
+    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)
+    model, optimizer, criterion, _, scheduler = booster.boost(model, optimizer, criterion, lr_scheduler=scheduler)
+
+    x = torch.randn(4, 3, 224, 224)
+    x = x.to('cuda')
+    output = model(x)
+    loss = criterion(output)
+    booster.backward(loss, optimizer)
+    optimizer.clip_grad_by_norm(1.0)
+    optimizer.step()
+    scheduler.step()
+
+    save_path = "./model"
+    booster.save_model(model, save_path, True, True, "", 10, use_safetensors=use_safetensors)
+
+    new_model = resnet18()
+    booster.load_model(new_model, save_path)
+```
+
+[more design details](https://github.com/hpcaitech/ColossalAI/discussions/3046)
+
+
+<!-- doc-test-command: torchrun --standalone --nproc_per_node=1 booster_api.py  -->
--- a/docs/source/en/basics/launch_colossalai.md
+++ b/docs/source/en/basics/launch_colossalai.md
@@ -87,14 +87,13 @@ import colossalai
 args = colossalai.get_default_parser().parse_args()

 # launch distributed environment
-colossalai.launch(config=<CONFIG>,
+colossalai.launch(config=args.config,
                  rank=args.rank,
                  world_size=args.world_size,
                  host=args.host,
                  port=args.port,
                  backend=args.backend
 )
-
 ```


@@ -107,12 +106,21 @@ First, we need to set the launch method in our code. As this is a wrapper of the
 use `colossalai.launch_from_torch`. The arguments required for distributed environment such as rank, world size, host and port are all set by the PyTorch
 launcher and can be read from the environment variable directly.

+config.py
+```python
+BATCH_SIZE = 512
+LEARNING_RATE = 3e-3
+WEIGHT_DECAY = 0.3
+NUM_EPOCHS = 2
+```
+train.py
 ```python
 import colossalai

 colossalai.launch_from_torch(
-    config=<CONFIG>,
+    config="./config.py",
 )
+...
 ```

 Next, we can easily start multiple processes with `colossalai run` in your terminal. Below is an example to run the code