fix typo docs/

This commit is contained in:
digger yu
2023-05-24 09:53:21 +08:00
committed by Frank Lee
parent 34966378e8
commit e90fdb1000
23 changed files with 30 additions and 30 deletions

View File

@@ -56,7 +56,7 @@ follow the steps below to create a new distributed initialization.
world_size: int,
config: Config,
data_parallel_size: int,
pipeline_parlalel_size: int,
pipeline_parallel_size: int,
tensor_parallel_size: int,
arg1,
arg2):

View File

@@ -121,7 +121,7 @@ Inside the initialization of Experts, the local expert number of each GPU will b
## Train Your Model
Do not to forget to use `colossalai.initialize` function in `colosalai` to add gradient handler for the engine.
Do not to forget to use `colossalai.initialize` function in `colossalai` to add gradient handler for the engine.
We handle the back-propagation of MoE models for you.
In `colossalai.initialize`, we will automatically create a `MoeGradientHandler` object to process gradients.
You can find more information about the handler `MoeGradientHandler` in colossal directory.

View File

@@ -53,7 +53,7 @@ export CHECKPOINT_DIR="your_opt_checkpoint_path"
# the ${CONFIG_DIR} must contain a server.sh file as the entry of service
export CONFIG_DIR="config_file_path"
docker run --gpus all --rm -it -p 8020:8020 -v ${CHECKPOINT_DIR}:/model_checkpoint -v ${CONFIG_DIR}:/config --ipc=host energonai:lastest
docker run --gpus all --rm -it -p 8020:8020 -v ${CHECKPOINT_DIR}:/model_checkpoint -v ${CONFIG_DIR}:/config --ipc=host energonai:latest
```
Then open `https://[IP-ADDRESS]:8020/docs#` in your browser to try out!

View File

@@ -69,7 +69,7 @@ After the forward operation of the embedding module, each word in all sequences
<figcaption>The embedding module</figcaption>
</figure>
Each transformer layer contains two blocks. The self-attention operation is called in the first block and a two-layer percepton is located in the second block.
Each transformer layer contains two blocks. The self-attention operation is called in the first block and a two-layer perception is located in the second block.
<figure style={{textAlign: "center"}}>
<img src="https://s2.loli.net/2022/08/17/LAVzDlpRcj4dYeb.png"/>

View File

@@ -195,7 +195,7 @@ def build_cifar(batch_size):
## Training ViT using pipeline
You can set the size of pipeline parallel and number of microbatches in config. `NUM_CHUNKS` is useful when using interleved-pipeline (for more details see [Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM](https://arxiv.org/abs/2104.04473) ). The original batch will be split into `num_microbatches`, and each stage will load a micro batch each time. Then we will generate an approriate schedule for you to execute the pipeline training. If you don't need the output and label of model, you can set `return_output_label` to `False` when calling `trainer.fit()` which can further reduce GPU memory usage.
You can set the size of pipeline parallel and number of microbatches in config. `NUM_CHUNKS` is useful when using interleaved-pipeline (for more details see [Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM](https://arxiv.org/abs/2104.04473) ). The original batch will be split into `num_microbatches`, and each stage will load a micro batch each time. Then we will generate an appropriate schedule for you to execute the pipeline training. If you don't need the output and label of model, you can set `return_output_label` to `False` when calling `trainer.fit()` which can further reduce GPU memory usage.
You should `export DATA=/path/to/cifar`.

View File

@@ -16,14 +16,14 @@ In this example for ViT model, Colossal-AI provides three different parallelism
We will show you how to train ViT on CIFAR-10 dataset with these parallelism techniques. To run this example, you will need 2-4 GPUs.
## Tabel of Contents
## Table of Contents
1. Colossal-AI installation
2. Steps to train ViT with data parallelism
3. Steps to train ViT with pipeline parallelism
4. Steps to train ViT with tensor parallelism or hybrid parallelism
## Colossal-AI Installation
You can install Colossal-AI pacakage and its dependencies with PyPI.
You can install Colossal-AI package and its dependencies with PyPI.
```bash
pip install colossalai
```
@@ -31,7 +31,7 @@ pip install colossalai
## Data Parallelism
Data parallism is one basic way to accelerate model training process. You can apply data parallelism to training by only two steps:
Data parallelism is one basic way to accelerate model training process. You can apply data parallelism to training by only two steps:
1. Define a configuration file
2. Change a few lines of code in train script
@@ -94,7 +94,7 @@ from torchvision import transforms
from torchvision.datasets import CIFAR10
```
#### Lauch Colossal-AI
#### Launch Colossal-AI
In train script, you need to initialize the distributed environment for Colossal-AI after your config file is prepared. We call this process `launch`. In Colossal-AI, we provided several launch methods to initialize the distributed backend. In most cases, you can use `colossalai.launch` and `colossalai.get_default_parser` to pass the parameters via command line. Besides, Colossal-AI can utilize the existing launch tool provided by PyTorch as many users are familiar with by using `colossalai.launch_from_torch`. For more details, you can view the related [documents](https://www.colossalai.org/docs/basics/launch_colossalai).
@@ -613,7 +613,7 @@ NUM_MICRO_BATCHES = parallel['pipeline']
TENSOR_SHAPE = (BATCH_SIZE // NUM_MICRO_BATCHES, SEQ_LENGTH, HIDDEN_SIZE)
```
Ohter configs:
Other configs:
```python
# hyper parameters
# BATCH_SIZE is as per GPU

View File

@@ -14,9 +14,9 @@ In our new design, `colossalai.booster` replaces the role of `colossalai.initial
### Plugin
Plugin is an important component that manages parallel configuration (eg: The gemini plugin encapsulates the gemini acceleration solution). Currently supported plugins are as follows:
***GeminiPlugin:*** This plugin wrapps the Gemini acceleration solution, that ZeRO with chunk-based memory management.
***GeminiPlugin:*** This plugin wraps the Gemini acceleration solution, that ZeRO with chunk-based memory management.
***TorchDDPPlugin:*** This plugin wrapps the DDP acceleration solution, it implements data parallelism at the module level which can run across multiple machines.
***TorchDDPPlugin:*** This plugin wraps the DDP acceleration solution, it implements data parallelism at the module level which can run across multiple machines.
***LowLevelZeroPlugin:*** This plugin wraps the 1/2 stage of Zero Redundancy Optimizer. Stage 1 : Shards optimizer states across data parallel workers/GPUs. Stage 2 : Shards optimizer states + gradients across data parallel workers/GPUs.

View File

@@ -52,7 +52,7 @@ An instance of class [ComputeSpec](https://colossalai.readthedocs.io/en/latest/c
## Example
Let's see an example. A ColoTensor is initialized and sharded on 8 GPUs using tp_degree=4, dp_dgree=2. And then the tensor is sharded along the last dim among the TP process groups. Finally, we reshard it along the first dim (0 dim) among the TP process groups. We encourage users to run the code and observe the shape of each tensor.
Let's see an example. A ColoTensor is initialized and sharded on 8 GPUs using tp_degree=4, dp_degree=2. And then the tensor is sharded along the last dim among the TP process groups. Finally, we reshard it along the first dim (0 dim) among the TP process groups. We encourage users to run the code and observe the shape of each tensor.
```python

View File

@@ -67,7 +67,7 @@ Given $P=q \times q \times q$ processors, we present the theoretical computation
## Usage
To enable 3D tensor parallelism for our model, e.g. on 8 GPUs, we need to configure the parallism setting as below.
To enable 3D tensor parallelism for our model, e.g. on 8 GPUs, we need to configure the parallelism setting as below.
```python
CONFIG = dict(parallel=dict(
data=1,

View File

@@ -75,7 +75,7 @@ Build your model, optimizer, loss function, lr scheduler and dataloaders. Note t
NUM_EPOCHS = 200
BATCH_SIZE = 128
GRADIENT_CLIPPING = 0.1
# build resnetå
# build resnet
model = resnet34(num_classes=10)
# build dataloaders
train_dataset = CIFAR10(root=Path(os.environ.get('DATA', './data')),

View File

@@ -53,7 +53,7 @@ It's compatible with all parallel methods in ColossalAI.
> ⚠ It only offloads optimizer states on CPU. This means it only affects CPU training or Zero/Gemini with offloading.
## Exampls
## Examples
Let's start from two simple examples -- training GPT with different methods. These examples relies on `transformers`.

View File

@@ -156,4 +156,4 @@ trainer.fit(train_dataloader=train_dataloader,
display_progress=True)
```
We use `2` pipeline stages and the batch will be splitted into `4` micro batches.
We use `2` pipeline stages and the batch will be split into `4` micro batches.

View File

@@ -72,7 +72,7 @@ chunk_manager = init_chunk_manager(model=module,
gemini_manager = GeminiManager(placement_policy, chunk_manager)
```
`hidden_dim` is the hidden dimension of DNN. Users can provide this argument to speed up searching. If users do not know this argument before training, it is ok. We will use a default value 1024. `min_chunk_size_mb` is the the minimum chunk size in MegaByte. If the aggregate size of parameters is still samller than the minimum chunk size, all parameters will be compacted into one small chunk.
`hidden_dim` is the hidden dimension of DNN. Users can provide this argument to speed up searching. If users do not know this argument before training, it is ok. We will use a default value 1024. `min_chunk_size_mb` is the the minimum chunk size in MegaByte. If the aggregate size of parameters is still smaller than the minimum chunk size, all parameters will be compacted into one small chunk.
Initialization of the optimizer.
```python