diff --git a/README-zh-Hans.md b/README-zh-Hans.md new file mode 100644 index 000000000..208f85671 --- /dev/null +++ b/README-zh-Hans.md @@ -0,0 +1,207 @@ +# Colossal-AI + +[![logo](./docs/images/Colossal-AI_logo.png)](https://www.colossalai.org/) + +
+

论文 | + 文档 | + 样例 | + 论坛 | + 博客

+
+ + [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/PR_CI.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/PR_CI.yml) + [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest) + [![codebeat badge](https://codebeat.co/badges/bfe8f98b-5d61-4256-8ad2-ccd34d9cc156)](https://codebeat.co/projects/github-com-hpcaitech-colossalai-main) + + | [English](README.md) | [中文](README-zh-Hans.md) | +
+一个整合高效并行技术的AI大模型训练系统。 + +## 特点 + +Colossal-AI为您提供了一系列并行训练组件。我们的目标是让您的分布式AI模型训练像普通的单GPU模型一样简单。我们提供的友好工具可以让您在几行代码内快速开始分布式训练。 + +- 数据并行 +- 流水线并行 +- 1维, 2维, 2.5维, 3维张量并行 +- 序列并行 +- 友好的trainer和engine +- 可扩展新的并行方式 +- 混合精度 +- 零冗余优化器 (ZeRO) + +## 样例 +### ViT + + + +- 14倍批大小 +- 5倍训练速度 + +### GPT-3 & GPT-2 + +![GPT_2_3](./docs/images/GPT_2_3.png) + +- GPT-3:释放 50% GPU 资源占用, 或 10.7% 加速 +- GPT-2:降低11倍GPU显存占用,或超线性扩展 + +### BERT + +![BERT_seq](./docs/images/BERT_seq.png) + +- 2倍训练速度 +- 1.5倍序列长度 + +请访问我们的[文档和教程](https://www.colossalai.org/)以了解详情。 + + +## 安装 + +### PyPI + +```bash +pip install colossalai +``` +该命令将会安装CUDA extension,如果你已安装CUDA, NVCC和torch。 + +如果你不想安装CUDA extension, 可在命令中添加`--global-option="--no_cuda_ext"`, 例如: +```bash +pip install colossalai --global-option="--no_cuda_ext" +``` + +如果你想使用`ZeRO`, 你可以使用: +```bash +pip install colossalai[zero] +``` + +### 从源代码安装 + +> Colossal-AI的版本将与该项目的主分支保持一致。欢迎通过issue反馈你遇到的任何问题 :) + +```shell +git clone https://github.com/hpcaitech/ColossalAI.git +cd ColossalAI +# 安装依赖 +pip install -r requirements/requirements.txt + +# 安装 colossalai +pip install . +``` + +如果你不想安装和使用CUDA kernel fusion (使用fused优化器需安装): + +```shell +pip install --global-option="--no_cuda_ext" . +``` + +## 使用 Docker + +运行以下命令从我们提供的docker文件中建立docker镜像。 + +```bash +cd ColossalAI +docker build -t colossalai ./docker +``` + +运行以下命令从以交互式启动docker镜像. + +```bash +docker run -ti --gpus all --rm --ipc=host colossalai bash +``` + +## 做出贡献 + +欢迎为该项目做出贡献,请参阅[贡献指南](./CONTRIBUTING.md)。 + + +## 快速预览 + +### Start Distributed Training in Lines + +```python +import colossalai +from colossalai.utils import get_dataloader + + +# my_config可以是config文件的路径或字典对象 +# 'localhost' 仅适用于单节点,在多节点时需指明节点名 +colossalai.launch( + config=my_config, + rank=rank, + world_size=world_size, + backend='nccl', + port=29500, + host='localhost' +) + +# 构建模型 +model = ... + +# 构建数据集, dataloader会默认处理分布式数据sampler +train_dataset = ... +train_dataloader = get_dataloader(dataset=dataset, + shuffle=True + ) + + +# 构建优化器 +optimizer = ... + +# 构建损失函数 +criterion = ... + +# 初始化colossalai +engine, train_dataloader, _, _ = colossalai.initialize( + model=model, + optimizer=optimizer, + criterion=criterion, + train_dataloader=train_dataloader +) + +# 开始训练 +engine.train() +for epoch in range(NUM_EPOCHS): + for data, label in train_dataloader: + engine.zero_grad() + output = engine(data) + loss = engine.criterion(output, label) + engine.backward(loss) + engine.step() + +``` + +### 构建一个简单的2维并行模型 + +假设我们有一个非常巨大的MLP模型,它巨大的hidden size使得它难以被单个GPU容纳。我们可以将该模型的权重以二维网格的形式分配到多个GPU上,且保持你熟悉的模型构建方式。 + +```python +from colossalai.nn import Linear2D +import torch.nn as nn + + +class MLP_2D(nn.Module): + + def __init__(self): + super().__init__() + self.linear_1 = Linear2D(in_features=1024, out_features=16384) + self.linear_2 = Linear2D(in_features=16384, out_features=1024) + + def forward(self, x): + x = self.linear_1(x) + x = self.linear_2(x) + return x + +``` + + +## 引用 + +``` +@article{bian2021colossal, + title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training}, + author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang}, + journal={arXiv preprint arXiv:2110.14883}, + year={2021} +} +``` diff --git a/README.md b/README.md index 93282185f..65e05991b 100644 --- a/README.md +++ b/README.md @@ -13,9 +13,52 @@ [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/PR_CI.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/PR_CI.yml) [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest) [![codebeat badge](https://codebeat.co/badges/bfe8f98b-5d61-4256-8ad2-ccd34d9cc156)](https://codebeat.co/projects/github-com-hpcaitech-colossalai-main) + + | [English](README.md) | [中文](README-zh-Hans.md) | An integrated large-scale model training system with efficient parallelization techniques. + +## Features + +Colossal-AI provides a collection of parallel training components for you. We aim to support you to write your +distributed deep learning models just like how you write your single-GPU model. We provide friendly tools to kickstart +distributed training in a few lines. + +- Data Parallelism +- Pipeline Parallelism +- 1D, 2D, 2.5D, 3D tensor parallelism +- Sequence parallelism +- Friendly trainer and engine +- Extensible for new parallelism +- Mixed Precision Training +- Zero Redundancy Optimizer (ZeRO) + +## Examples +### ViT + + + +- 14x larger batch size +- 5x faster training + +### GPT-3 & GPT-2 + +![GPT_2_3](./docs/images/GPT_2_3.png) + +- Free 50% GPU resources, or 10.7% acceleration for GPT-3 +- 11x lower GPU RAM, or superlinear scaling for GPT-2 + +### BERT + +![BERT_seq](./docs/images/BERT_seq.png) + +- 2x faster training +- 50% longer sequence length + +Please visit our [documentation and tutorials](https://www.colossalai.org/) for more details. + + ## Installation ### PyPI @@ -37,7 +80,7 @@ pip install colossalai[zero] ### Install From Source -> The documentation will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :) +> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :) ```shell git clone https://github.com/hpcaitech/ColossalAI.git @@ -107,13 +150,13 @@ train_dataloader = get_dataloader(dataset=dataset, ) -# build your +# build your optimizer optimizer = ... # build your loss function criterion = ... -# build your lr_scheduler +# initialize colossalai engine, train_dataloader, _, _ = colossalai.initialize( model=model, optimizer=optimizer, @@ -157,21 +200,6 @@ class MLP_2D(nn.Module): ``` -## Features - -Colossal-AI provides a collection of parallel training components for you. We aim to support you to write your -distributed deep learning models just like how you write your single-GPU model. We provide friendly tools to kickstart -distributed training in a few lines. - -- Data Parallelism -- Pipeline Parallelism -- 1D, 2D, 2.5D, 3D and sequence parallelism -- Friendly trainer and engine -- Extensible for new parallelism -- Mixed Precision Training -- Zero Redundancy Optimizer (ZeRO) - -Please visit our [documentation and tutorials](https://www.colossalai.org/) for more details. ## Cite Us diff --git a/docs/images/BERT_seq.png b/docs/images/BERT_seq.png new file mode 100644 index 000000000..1cdf78269 Binary files /dev/null and b/docs/images/BERT_seq.png differ diff --git a/docs/images/GPT_2_3.png b/docs/images/GPT_2_3.png new file mode 100644 index 000000000..08181c29d Binary files /dev/null and b/docs/images/GPT_2_3.png differ diff --git a/docs/images/ViT_TP.png b/docs/images/ViT_TP.png new file mode 100644 index 000000000..f142cfefd Binary files /dev/null and b/docs/images/ViT_TP.png differ