mirror of
https://github.com/hpcaitech/ColossalAI.git
synced 2025-09-09 04:50:17 +00:00
[doc] FP8 training and communication document (#6050)
* Add FP8 training and communication document * add fp8 docstring for plugins * fix typo * fix typo
This commit is contained in:
@@ -9,6 +9,7 @@ Author: [Mingyan Jiang](https://github.com/jiangmingyan)
|
||||
**Related Paper**
|
||||
|
||||
- [Accelerating Scientific Computations with Mixed Precision Algorithms](https://arxiv.org/abs/0808.2794)
|
||||
- [FP8 Formats for Deep Learning](https://arxiv.org/pdf/2209.05433)
|
||||
|
||||
## Introduction
|
||||
|
||||
@@ -60,7 +61,11 @@ However, there are other operations, like reductions, which require the dynamic
|
||||
|
||||
## AMP in Colossal-AI
|
||||
|
||||
We supported three AMP training methods and allowed the user to train with AMP with no code. If you want to train with amp, just assign `mixed_precision` with `fp16` when you instantiate the `Booster`. Next we will support `bf16`, `fp8`.
|
||||
We supported three AMP training methods and allowed the user to train with AMP with no code. If you want to train with amp, just assign `mixed_precision` with `fp16` when you instantiate the `Booster`. Next we will support `bf16`.
|
||||
|
||||
Currently we only support `fp8` mixed precision training for the `Linear` layer. Please specify the `use_fp8` parameter when create the plugin object.
|
||||
|
||||
To reduce the communication volume inter nodes in low-bandwidth scenarios, we support FP8 communication compression. Please specify the `fp8_communication` parameter when create the plugin object.
|
||||
|
||||
### Start with Booster
|
||||
|
||||
@@ -74,7 +79,6 @@ instantiate `Booster` with `mixed_precision="fp16"`, then you can train with tor
|
||||
'fp16': torch amp
|
||||
'fp16_apex': apex amp,
|
||||
'bf16': bf16,
|
||||
'fp8': fp8,
|
||||
'fp16_naive': naive amp
|
||||
"""
|
||||
from colossalai import Booster
|
||||
@@ -128,6 +132,10 @@ The output model is converted to AMP model of smaller memory consumption.
|
||||
If your input model is already too large to fit in a GPU, please instantiate your model weights in `dtype=torch.float16`.
|
||||
Otherwise, try smaller models or checkout more parallelization training techniques!
|
||||
|
||||
### FP8 Communication
|
||||
|
||||
In low-bandwidth scenarios, to reduce the communication load multiple nodes, we support FP8 communication compression, which can be enabled by using `fp8_communication=True` when you when create the plugin object (such as `GeminiPlugin`). The all-to-all, all-gather and P2P operations inter nodes will use FP8 format for data transmission. Currently the FP8 communication of reduction operators such as all-reduce and reduce-scatter is currently not supported due to lack of support of the NCCL library.
|
||||
|
||||
## Hands-on Practice
|
||||
|
||||
Now we will introduce the use of AMP with Colossal-AI. In this practice, we will use Torch AMP as an example.
|
||||
|
Reference in New Issue
Block a user