Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler * fix FP16 optimizer and adapted torch amp with tensor parallel (#18) * fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes * fixed trainer * Revert "fixed trainer" This reverts commit 2e0b0b7699. * improved consistency between trainer, engine and schedule (#23) Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: 1SAA <c2h214748@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com>
2025-09-02 01:28:31 +00:00 · 2021-11-18 19:45:06 +08:00
parent 2b05de4c64
commit 3defa32aee
80 changed files with 2194 additions and 1584 deletions
--- a/docs/colossalai/colossalai.engine.amp.amp_type.rst
+++ b/docs/colossalai/colossalai.engine.amp.amp_type.rst
@@ -0,0 +1,5 @@
+colossalai.engine.amp.amp\_type
+===============================
+
+.. automodule:: colossalai.engine.amp.amp_type
+   :members:
--- a/docs/colossalai/colossalai.engine.amp.grad_scaler.rst
+++ b/docs/colossalai/colossalai.engine.amp.grad_scaler.rst
@@ -0,0 +1,5 @@
+colossalai.engine.amp.grad\_scaler
+==================================
+
+.. automodule:: colossalai.engine.amp.grad_scaler
+   :members:
--- a/docs/colossalai/colossalai.engine.amp.rst
+++ b/docs/colossalai/colossalai.engine.amp.rst
@@ -0,0 +1,12 @@
+colossalai.engine.amp
+=====================
+
+.. automodule:: colossalai.engine.amp
+   :members:
+
+
+.. toctree::
+   :maxdepth: 2
+
+   colossalai.engine.amp.amp_type
+   colossalai.engine.amp.grad_scaler
--- a/docs/colossalai/colossalai.engine.amp_type.rst
+++ b/docs/colossalai/colossalai.engine.amp_type.rst
@@ -1,5 +0,0 @@
-colossalai.engine.amp\_type
-===========================
-
-.. automodule:: colossalai.engine.amp_type
-   :members:
--- a/docs/colossalai/colossalai.engine.rst
+++ b/docs/colossalai/colossalai.engine.rst
@@ -7,11 +7,6 @@ colossalai.engine
 .. toctree::
   :maxdepth: 2

+   colossalai.engine.amp
   colossalai.engine.gradient_handler
   colossalai.engine.schedule
-
-
-.. toctree::
-   :maxdepth: 2
-
-   colossalai.engine.amp_type
--- a/docs/colossalai/colossalai.rst
+++ b/docs/colossalai/colossalai.rst
@@ -21,7 +21,6 @@ colossalai
 .. toctree::
   :maxdepth: 2

-   colossalai.checkpointing
   colossalai.constants
   colossalai.core
   colossalai.initialize
--- a/docs/colossalai/colossalai.utils.checkpointing.rst
+++ b/docs/colossalai/colossalai.utils.checkpointing.rst
@@ -0,0 +1,5 @@
+colossalai.utils.checkpointing
+==============================
+
+.. automodule:: colossalai.utils.checkpointing
+   :members:
--- a/docs/colossalai/colossalai.utils.rst
+++ b/docs/colossalai/colossalai.utils.rst
@@ -9,6 +9,7 @@ colossalai.utils
   :maxdepth: 2

   colossalai.utils.activation_checkpoint
+   colossalai.utils.checkpointing
   colossalai.utils.common
   colossalai.utils.cuda
   colossalai.utils.memory
--- a/docs/parallelization.md
+++ b/docs/parallelization.md
@@ -17,38 +17,40 @@ parallel = dict(
 )
 ```

-The name of the dictionary variable should be **parallel**. All the arguments even **parallel** itself are optional and data,
-pipeline, tensor parallel size will be set to defaulted value 1. The value of data, pipeline and tensor can be a int
-representing the size of specific parallel dimension or a dictionary with a key called "size". The key "mode"
+The name of the dictionary variable should be **parallel**. All the arguments even **parallel** itself are optional and
+data, pipeline, tensor parallel size will be set to defaulted value 1. The value of data, pipeline and tensor can be a
+int representing the size of specific parallel dimension or a dictionary with a key called "size". The key "mode"
 represents the way of tensor parallelism.

 ## Data Parallel

-Data parallel is the most common way to distribute your training task by splitting data into several shards and train 
-on a single shard on each device. The configuration for data parallel is detected automatically and set for you. You do 
-not have to explicitly set them in your configurations. When data parallel size is larger than 1, Colossal-AI automatically 
+Data parallel is the most common way to distribute your training task by splitting data into several shards and train on
+a single shard on each device. The configuration for data parallel is detected automatically and set for you. You do not
+have to explicitly set them in your configurations. When data parallel size is larger than 1, Colossal-AI automatically
 adds the distributed data sampler to the dataloader to shard the dataset.

 ## 1D, 2D, 2.5D and 3D Parallel

-To enable hybrid parallelism, we provide an array of tensor parallelism. We provide the list of papers which match each 
+To enable hybrid parallelism, we provide an array of tensor parallelism. We provide the list of papers which match each
 tensor parallel method. These parallel modes need to work with the distributed layers provided by Colossal-AI.
- 1D: [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053)
+
+-
+1D: [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053)

 - 2D: [An Efficient 2D Method for Training Super-Large Deep Learning Models](https://arxiv.org/abs/2104.05343)  
-2D parallel relies on the SUMMA matrix multiplication algorithm and splits the input data, 
-model weights and layer outputs along two different dimensions. The tensor chunks are distributed over a 2D mesh of $P = N^2$ 
-devices where $N$ is the number of tensor chunks in a single dimension.
+  2D parallel relies on the SUMMA matrix multiplication algorithm and splits the input data, model weights and layer
+  outputs along two different dimensions. The tensor chunks are distributed over a 2D mesh of $P = N^2$ devices where
+  $N$ is the number of tensor chunks in a single dimension.

 - 2.5D: [2.5-dimensional distributed model training](https://arxiv.org/abs/2105.14500)  
-Inspired by the 2.5D matrix multiplication algorithm, 2.5D parallel introduces a novel tensor parallelism which further 
-parallelizes 2D tensor parallelism. An amount of $P = N^2 ∗ d$ processors are arranged into $d$ layers, 
-where each layer performs matrix multiplication operations independently with a dimension $N$.
+  Inspired by the 2.5D matrix multiplication algorithm, 2.5D parallel introduces a novel tensor parallelism which
+  further parallelizes 2D tensor parallelism. An amount of $P = N^2 ∗ d$ processors are arranged into $d$ layers, where
+  each layer performs matrix multiplication operations independently with a dimension $N$.

 - 3D: [Maximizing Parallelism in Distributed Training for Huge Neural Networks](https://arxiv.org/abs/2105.14450)  
-We also introduce a 3D tensor parallelism that parallelizes neural networks on a 3D processor cube. This method achieves 
-the optimal, $O(P^{1/3})$ communication overhead on $P$ processors, while both computation and memory usage are evenly distributed 
-through optimized load balancing of parameters as well as activations.
+  We also introduce a 3D tensor parallelism that parallelizes neural networks on a 3D processor cube. This method
+  achieves the optimal, $O(P^{1/3})$ communication overhead on $P$ processors, while both computation and memory usage
+  are evenly distributed through optimized load balancing of parameters as well as activations.

 ```python
 # 1D parallel
@@ -78,12 +80,12 @@ parallel = dict(

 ## Pipeline Parallel (experimental)

-Pipeline parallelism is to split the model into several partitions by layer. For example, let's assume we have a simple 
-model which consists of two linear layer. We have two GPUs, and we can allocate the first linear layer to the first GPU 
+Pipeline parallelism is to split the model into several partitions by layer. For example, let's assume we have a simple
+model which consists of two linear layer. We have two GPUs, and we can allocate the first linear layer to the first GPU
 and the second layer to the second GPU. This example of course wastes the computing resources and is only to demonstrate
-the idea of pipeline parallelism. 
+the idea of pipeline parallelism.

-As PyTorch is based on dynamic computation graph, the computation flow is not known until execution. To support pipeline 
+As PyTorch is based on dynamic computation graph, the computation flow is not known until execution. To support pipeline
 parallelism in PyTorch, you may need to add one more attribute, `layers_cfg` in your model class which tells Colossal-AI
 the sequence of execution. One example you can refer is `colossalai.nn.model.VanillaResNet`.

@@ -192,9 +194,9 @@ class VanillaResNet(BaseModel):
        ]
 ```

-You can set the number of pipeline stages in your configuration file. When pipeline size is larger than 1, Colossal-AI 
-will automatically creates the pipeline schedule which defines the forward and backward step. You can specify how many microbatches
-to run in each step in the `schedule` configuration.
+You can set the number of pipeline stages in your configuration file. When pipeline size is larger than 1, Colossal-AI
+will automatically creates the pipeline schedule which defines the forward and backward step. You can specify how many
+microbatches to run in each step in the `schedule` configuration.

 ```python
 parallel = dict(
@@ -206,10 +208,11 @@ schedule = dict(
    num_microbatches = 4 # set the number of microbatches per step
 )
 ```
+
 This feature is still in development and is only experimental for now.

 ## Sequence Parallel (experimental)

-Sequence parallel is to support long-sequence modelling such as document-level text understanding and medical imaging. 
-This method is proposed in [Sequence Parallelism: Making 4D Parallelism Possible](https://arxiv.org/abs/2105.13120). 
+Sequence parallel is to support long-sequence modelling such as document-level text understanding and medical imaging.
+This method is proposed in [Sequence Parallelism: Making 4D Parallelism Possible](https://arxiv.org/abs/2105.13120).
 This feature is still in development and is only experimental for now.
--- a/docs/run_demo.md
+++ b/docs/run_demo.md
@@ -1,8 +1,8 @@
 # Quick demo

-Colossal-AI is an integrated large-scale deep learning system with efficient parallelization techniques. The system
-can accelerate model training on distributed systems with multiple GPUs by applying parallelization techniques. The
-system can also run on systems with only one GPU. Quick demos showing how to use Colossal-AI are given below.
+Colossal-AI is an integrated large-scale deep learning system with efficient parallelization techniques. The system can
+accelerate model training on distributed systems with multiple GPUs by applying parallelization techniques. The system
+can also run on systems with only one GPU. Quick demos showing how to use Colossal-AI are given below.

 ## Single GPU

@@ -32,25 +32,17 @@ realizes the training process.
 ```python
 import colossalai
 from colossalai.core import global_context as gpc
-from colossalai.engine import Engine
 from colossalai.logging import get_global_dist_logger
 from colossalai.trainer import Trainer

+
 def run_trainer():
-    model, train_dataloader, test_dataloader, criterion, optimizer, schedule, lr_scheduler = colossalai.initialize()
+    engine, train_dataloader, test_dataloader = colossalai.initialize()
    logger = get_global_dist_logger()
-    schedule.data_sync = False
-    engine = Engine(
-        model=model,
-        criterion=criterion,
-        optimizer=optimizer,
-        lr_scheduler=lr_scheduler,
-        schedule=schedule
-    )
+
    logger.info("engine is built", ranks=[0])

    trainer = Trainer(engine=engine,
-                      hooks_cfg=gpc.config.hooks,
                      verbose=True)
    logger.info("trainer is built", ranks=[0])

@@ -58,11 +50,13 @@ def run_trainer():
    trainer.fit(
        train_dataloader=train_dataloader,
        test_dataloader=test_dataloader,
-        max_epochs=gpc.config.num_epochs,
+        epochs=gpc.config.num_epochs,
+        hooks_cfg=gpc.config.hooks,
        display_progress=True,
        test_interval=2
    )

+
 if __name__ == '__main__':
    run_trainer()
 ```
@@ -72,9 +66,9 @@ Zoo. The detailed substitution process is elaborated [here](model.md).

 ## Features

-Colossal-AI provides a collection of parallel training components for you. We aim to support you with your development of
-distributed deep learning models just like how you write single-GPU deep learning models. We provide friendly tools to
-kickstart distributed training in a few lines.
+Colossal-AI provides a collection of parallel training components for you. We aim to support you with your development
+of distributed deep learning models just like how you write single-GPU deep learning models. We provide friendly tools
+to kickstart distributed training in a few lines.

 - [Data Parallelism](parallelization.md)
 - [Pipeline Parallelism](parallelization.md)
--- a/docs/run_demo_zh.md
+++ b/docs/run_demo_zh.md
@@ -4,40 +4,36 @@ Colossal-AI是一个大规模深度学习系统，其中包含高效的并行技

 ## 单GPU系统

-在带有GPU的非分布式系统上进行模型训练时，Colossal-AI可以达到当前的基线效率。[这里](https://colab.research.google.com/drive/1fJnqqFzPuzZ_kn1lwCpG2nh3l2ths0KE?usp=sharing#scrollTo=cQ_y7lBG09LS)我们给出一个Google Colab示例展现如何使用Colossal-AI与CIFAR10数据集在非分布式系统上训练一个LeNet模型。
+在带有GPU的非分布式系统上进行模型训练时，Colossal-AI可以达到当前的基线效率。[这里](https://colab.research.google.com/drive/1fJnqqFzPuzZ_kn1lwCpG2nh3l2ths0KE?usp=sharing#scrollTo=cQ_y7lBG09LS)我们给出一个Google
+Colab示例展现如何使用Colossal-AI与CIFAR10数据集在非分布式系统上训练一个LeNet模型。

 ## 多GPU系统

-在多GPU的分布式系统上训练深度学习模型时，Colossal-AI可以使用高效的并行技术来显著地加速训练过程，这些技术将在下面的[并行技术](parallelization.md)章节中被详述。下面的代码将在拥有四个GPU的分布式系统上训练一个ViT模型，其中`HOST`变量为您分布式系统的IP地址。请注意下面的代码使用了[Slurm](https://slurm.schedmd.com/documentation.html)作业调度系统。
+在多GPU的分布式系统上训练深度学习模型时，Colossal-AI可以使用高效的并行技术来显著地加速训练过程，这些技术将在下面的[并行技术](parallelization.md)
+章节中被详述。下面的代码将在拥有四个GPU的分布式系统上训练一个ViT模型，其中`HOST`
+变量为您分布式系统的IP地址。请注意下面的代码使用了[Slurm](https://slurm.schedmd.com/documentation.html)作业调度系统。

 ```bash
 HOST=xxx.xxx.xxx.xxx srun ./scripts/slurm_dist_train.sh ./examples/run_trainer.py ./configs/vit/vit_2d.py
 ```

-`./configs/vit/vit_2d.py`是一个[配置文件](config.md)，Colossal-AI使用配置文件来定义训练过程中需要用到的参数，比如模型类型、数据集、以及优化器、学习率调度器等。您可以通过编写配置文件的方式来训练不同的模型。`./examples/run_trainer.py`是一个标准的训练脚本，具体代码已经附在下面。该脚本可以读入配置文件中的训练参数并训练模型。
+`./configs/vit/vit_2d.py`是一个[配置文件](config.md)
+，Colossal-AI使用配置文件来定义训练过程中需要用到的参数，比如模型类型、数据集、以及优化器、学习率调度器等。您可以通过编写配置文件的方式来训练不同的模型。`./examples/run_trainer.py`
+是一个标准的训练脚本，具体代码已经附在下面。该脚本可以读入配置文件中的训练参数并训练模型。

 ```python
 import colossalai
 from colossalai.core import global_context as gpc
-from colossalai.engine import Engine
 from colossalai.logging import get_global_dist_logger
 from colossalai.trainer import Trainer

+
 def run_trainer():
-    model, train_dataloader, test_dataloader, criterion, optimizer, schedule, lr_scheduler = colossalai.initialize()
+    engine, train_dataloader, test_dataloader = colossalai.initialize()
    logger = get_global_dist_logger()
-    schedule.data_sync = False
-    engine = Engine(
-        model=model,
-        criterion=criterion,
-        optimizer=optimizer,
-        lr_scheduler=lr_scheduler,
-        schedule=schedule
-    )
    logger.info("engine is built", ranks=[0])

    trainer = Trainer(engine=engine,
-                      hooks_cfg=gpc.config.hooks,
                      verbose=True)
    logger.info("trainer is built", ranks=[0])

@@ -45,11 +41,13 @@ def run_trainer():
    trainer.fit(
        train_dataloader=train_dataloader,
        test_dataloader=test_dataloader,
-        max_epochs=gpc.config.num_epochs,
+        epochs=gpc.config.num_epochs,
+        hooks_cfg=gpc.config.hooks,
        display_progress=True,
        test_interval=2
    )

+
 if __name__ == '__main__':
    run_trainer()
 ```
--- a/docs/trainer_engine.md
+++ b/docs/trainer_engine.md
@@ -2,9 +2,9 @@

 ## Build your engine

-To better understand how `Engine` class works, let's start from the conception of the process function in common engines. The process function 
-usually controls the behavior over a batch of a dataset, `Engine` class just controls the process function. Here we give a standard process 
-function in the following code block.
+To better understand how `Engine` class works, let's start from the conception of the process function in common
+engines. The process function usually controls the behavior over a batch of a dataset, `Engine` class just controls the
+process function. Here we give a standard process function in the following code block.

 ```python
 def process_function(dataloader, model, criterion, optim):
@@ -16,32 +16,33 @@ def process_function(dataloader, model, criterion, optim):
    optim.setp()
 ```

-In `ignite.engine` or `keras.engine`, the process function is always provided by users. However, it is tricky for users to write their own process 
-functions for pipeline parallelism. Aiming at offering accessible hybrid parallelism for users, we provide the powerful `Engine` class. This class 
-enables pipeline parallelism and offers one-forward-one-backward non-interleaving strategy. Also, you can use pre-defined learning rate scheduler 
-in the `Engine` class to adjust learning rate during training.
+In `ignite.engine` or `keras.engine`, the process function is always provided by users. However, it is tricky for users
+to write their own process functions for pipeline parallelism. Aiming at offering accessible hybrid parallelism for
+users, we provide the powerful `Engine` class. This class enables pipeline parallelism and offers
+one-forward-one-backward non-interleaving strategy. Also, you can use pre-defined learning rate scheduler in
+the `Engine` class to adjust learning rate during training.

-In order to build your engine, just set variables `model`, `criterion`, `optimizer`, `lr_scheduler` and `schedule`. The following code block provides
-an example.
+In order to build your engine, just set variables `model`, `criterion`, `optimizer`, `lr_scheduler` and `schedule`. The
+following code block provides an example. **The engine is automatically created from the config file for you if you
+start with `colossalai.initialize`.**

 ```python
 import torch
 import torch.nn as nn
 import torchvision.models as models
 import colossalai
+from colossalai.engine import Engine

 model = models.resnet18()
 criterion = nn.CrossEntropyLoss()
-optimizer = torch.optim.Adam(model)
-lr_scheduler = colossalai.nn.lr_scheduler.CosineAnnealingLR(optimizer, 1000)
-schedule = colossalai.engine.schedule.NoPipelineSchedule()
+optimizer = torch.optim.Adam(model.parameters())
+schedule = colossalai.engine.NoPipelineSchedule()

 MyEngine = Engine(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
-    lr_scheduler=lr_scheduler,
-    schedule=schedule
+    step_schedule=schedule
 )
 ```

@@ -51,21 +52,24 @@ More information regarding the class can be found in the API references.

 ### Overview

-To learn how to customize a trainer which meets your needs, let's first give a look at the `Trainer` class. We highly recommend that you read *Get Started* 
+To learn how to customize a trainer which meets your needs, let's first give a look at the `Trainer` class. We highly
+recommend that you read *Get Started*
 section and *Build your engine* first.

-The `Trainer` class enables researchers and engineers to use our system more conveniently. Instead of having to write your own scripts, you can simply 
-construct your own trainer by calling the `Trainer` class, just like what we did in the following code block.
+The `Trainer` class enables researchers and engineers to use our system more conveniently. Instead of having to write
+your own scripts, you can simply construct your own trainer by calling the `Trainer` class, just like what we did in the
+following code block.

 ```python
-MyTrainer = Trainer(MyEngine)
+MyTrainer = Trainer(my_engine)
 ```

-After that, you can use the `fit` method to train or evaluate your model. In order to make our `Trainer` class even more powerful, we incorporate a set of 
-handy tools to the class. For example, you can monitor or record the running states and metrics which indicate the current performance of the model. These
-functions are realized by hooks. The `BasicHook` class allows you to execute your hook functions at specified time. We have already created some practical
-hooks for you, as listed below. What you need to do is just picking the right ones which suit your needs. Detailed descriptions of the class can be found 
-in the API references.
+After that, you can use the `fit` method to train or evaluate your model. In order to make our `Trainer` class even more
+powerful, we incorporate a set of handy tools to the class. For example, you can monitor or record the running states
+and metrics which indicate the current performance of the model. These functions are realized by hooks. The `BasicHook`
+class allows you to execute your hook functions at specified time. We have already created some practical hooks for you,
+as listed below. What you need to do is just picking the right ones which suit your needs. Detailed descriptions of the
+class can be found in the API references.

 ```python
 hooks = [
@@ -80,18 +84,21 @@ hooks = [
 ]
 ```

-These hook functions will record metrics, elapsed time and memory usage and write them to log after each epoch. Besides, they print the current loss and 
-accuracy to let users monitor the performance of the model.
+These hook functions will record metrics, elapsed time and memory usage and write them to log after each epoch. Besides,
+they print the current loss and accuracy to let users monitor the performance of the model.

 ### Hook

-If you have your specific needs, feel free to extend our `BaseHook` class to add your own functions, or our `MetricHook` class to write a metric collector. 
-These hook functions can be called at twelve timing in the trainer's life cycle. Besides, you can define the priorities of all hooks to arrange the execution order of them.
-More information can be found in the API references. 
+If you have your specific needs, feel free to extend our `BaseHook` class to add your own functions, or our `MetricHook`
+class to write a metric collector. These hook functions can be called at twelve timing in the trainer's life cycle.
+Besides, you can define the priorities of all hooks to arrange the execution order of them. More information can be
+found in the API references.

 ### Metric

-You can write your own metrics by extending our `Metric` class. It should be used with the `MetricHook` class. When your write your own metric hooks, please set 
-the priority carefully and make sure the hook is called before other hooks which might require the results of the metric hook.
+You can write your own metrics by extending our `Metric` class. It should be used with the `MetricHook` class. When your
+write your own metric hooks, please set the priority carefully and make sure the hook is called before other hooks which
+might require the results of the metric hook.

-We've already provided some metric hooks and we store metric objects in `runner.states['metrics']`. It is a dictionary and metrics can be accessed by their names.
+We've already provided some metric hooks and we store metric objects in `runner.states['metrics']`. It is a dictionary
+and metrics can be accessed by their names.
--- a/docs/trainer_engine_zh.md
+++ b/docs/trainer_engine_zh.md
@@ -14,28 +14,30 @@ def process_function(dataloader, model, criterion, optim):
    optim.setp()
 ```

-在`ignite.engine`与`keras.engine`中，进程函数需要由用户提供，然而，用户很难为流水线并行编写进程函数。为了向用户提供方便的混合并行，我们提供了具备强大功能的`Engine`类，该类支持流水线并行，并提供前向传播后向传播不交织的策略。同时，您可以在`Engine`类中使用您事先定义好的学习率调度器来在训练过程中调整学习率。
+在`ignite.engine`与`keras.engine`中，进程函数需要由用户提供，然而，用户很难为流水线并行编写进程函数。为了向用户提供方便的混合并行，我们提供了具备强大功能的`Engine`
+类，该类支持流水线并行，并提供前向传播后向传播不交织的策略。同时，您可以在`Engine`类中使用您事先定义好的学习率调度器来在训练过程中调整学习率。

 您在构造引擎时只需要定义`model`、`criterion`、`optimizer`、`lr_scheduler`与`schedule`等变量即可，下面的代码块给出了一个这样的例子。
+**如果你使用`colossalai.initialize`的话，engine会从config文件里自动构建。**

 ```python
 import torch
 import torch.nn as nn
 import torchvision.models as models
 import colossalai
+from colossalai.engine import Engine

 model = models.resnet18()
 criterion = nn.CrossEntropyLoss()
 optimizer = torch.optim.Adam(model)
 lr_scheduler = colossalai.nn.lr_scheduler.CosineAnnealingLR(optimizer, 1000)
-schedule = colossalai.engine.schedule.NoPipelineSchedule()
+schedule = colossalai.engine.NoPipelineSchedule()

 MyEngine = Engine(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
-    lr_scheduler=lr_scheduler,
-    schedule=schedule
+    step_schedule=schedule
 )
 ```

@@ -48,10 +50,12 @@ MyEngine = Engine(
 `Trainer`类旨在让科研工作者和工程师更加方便地使用我们的系统，您不需要自己写脚本，只需要调用`Trainer`类来构造您的训练器即可，就像下面的代码块中所做的。

 ```python
-MyTrainer = Trainer(MyEngine)
+MyTrainer = Trainer(my_trainer)
 ```

-在此之后，您可以使用`fit`方法来训练或调用您的模型。除此之外，为了让我们的`Trainer`类拥有更强大的功能，我们加入了一系列方便您使用的工具。例如，您可以在训练过程中持续监测并记录模型目前的运行状态和表现，这些功能都是通过钩子函数来实现的。我们提供的`BasicHook`类让您可以在指定时间执行您的钩子函数。如下方的代码块所示，我们事先为您定义好了一些实用的钩子函数，您需要做的就是找到符合您需求的钩子函数。更多该类的相关信息可以在API信息中找到。
+在此之后，您可以使用`fit`方法来训练或调用您的模型。除此之外，为了让我们的`Trainer`
+类拥有更强大的功能，我们加入了一系列方便您使用的工具。例如，您可以在训练过程中持续监测并记录模型目前的运行状态和表现，这些功能都是通过钩子函数来实现的。我们提供的`BasicHook`
+类让您可以在指定时间执行您的钩子函数。如下方的代码块所示，我们事先为您定义好了一些实用的钩子函数，您需要做的就是找到符合您需求的钩子函数。更多该类的相关信息可以在API信息中找到。

 ```python
 hooks = [
@@ -70,7 +74,8 @@ hooks = [

 ### 钩子函数

-如果您有个性化需求，您可以继承我们的`BaseHook`类并添加您的钩子函数，或者继承我们的`MetricHook`来编写您需要的度量标准。这些钩子函数可以在`Trainer`生命周期的12个时间点被执行。更多该类的相关信息可以在API信息中找到。
+如果您有个性化需求，您可以继承我们的`BaseHook`类并添加您的钩子函数，或者继承我们的`MetricHook`来编写您需要的度量标准。这些钩子函数可以在`Trainer`
+生命周期的12个时间点被执行。更多该类的相关信息可以在API信息中找到。

 ### 度量标准