mirror of
https://github.com/hpcaitech/ColossalAI.git
synced 2025-07-17 17:02:04 +00:00
[nfc] fix typo and author name (#5089)
This commit is contained in:
parent
fd3567e089
commit
0d482302a1
@ -1,6 +1,6 @@
|
|||||||
# Lazy initialization
|
# Lazy initialization
|
||||||
|
|
||||||
Author: [Hongxiu Liu](https://github.com/ver217)
|
Author: [Hongxin Liu](https://github.com/ver217)
|
||||||
|
|
||||||
**Prerequisite:**
|
**Prerequisite:**
|
||||||
- [Train with booster](../basics/booster_api.md)
|
- [Train with booster](../basics/booster_api.md)
|
||||||
|
@ -20,7 +20,7 @@ Author: [Baizhou Zhang](https://github.com/Fridge003), [Bin Jia](https://github.
|
|||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
When training large transformer models such as LLaMa-2 70B or OPT 175B, model parallelism methods that divide a huge model into smaller shards, including tensor parallelism or pipeline parallism, are essential so as to meet the limitation of GPU memory.
|
When training large transformer models such as LLaMa-2 70B or OPT 175B, model parallelism methods that divide a huge model into smaller shards, including tensor parallelism or pipeline parallelism, are essential so as to meet the limitation of GPU memory.
|
||||||
However, manually cutting model and rewriting its forward/backword logic could be difficult for users who are not familiar with distributed training.
|
However, manually cutting model and rewriting its forward/backword logic could be difficult for users who are not familiar with distributed training.
|
||||||
Meanwhile, the Huggingface transformers library has gradually become users' first choice of model source, and most mainstream large models have been open-sourced in Huggingface transformers model library.
|
Meanwhile, the Huggingface transformers library has gradually become users' first choice of model source, and most mainstream large models have been open-sourced in Huggingface transformers model library.
|
||||||
|
|
||||||
@ -321,7 +321,7 @@ For example, when training LlaMa-2 with tensor parallel size as 2, the attribute
|
|||||||
|
|
||||||
3. Replacing the `forward` methods implemented by original Huggingface
|
3. Replacing the `forward` methods implemented by original Huggingface
|
||||||
Transformers libraries with our customized `forward` methods.
|
Transformers libraries with our customized `forward` methods.
|
||||||
This replacement is essential for pipeline paralellism, where a customiozed function is needed to pass intermediate hidden states between different pipeline stages.
|
This replacement is essential for pipeline parallelism, where a customized function is needed to pass intermediate hidden states between different pipeline stages.
|
||||||
Also, optimization methods such as flash attention or sequence parallel can be injected into the `forward` process through our customized `forward` method.
|
Also, optimization methods such as flash attention or sequence parallel can be injected into the `forward` process through our customized `forward` method.
|
||||||
|
|
||||||
4. Replacing the whole copy of model parameters and optimizer states with incomplete ones controlled by current device (this is why it's called Shardformer).
|
4. Replacing the whole copy of model parameters and optimizer states with incomplete ones controlled by current device (this is why it's called Shardformer).
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# Zero Redundancy Optimizer with chunk-based memory management
|
# Zero Redundancy Optimizer with chunk-based memory management
|
||||||
|
|
||||||
Author: [Hongxiu Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY)
|
Author: [Hongxin Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY)
|
||||||
|
|
||||||
**Prerequisite:**
|
**Prerequisite:**
|
||||||
- [Train with booster](../basics/booster_api.md)
|
- [Train with booster](../basics/booster_api.md)
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# 懒惰初始化
|
# 懒惰初始化
|
||||||
|
|
||||||
作者: [Hongxiu Liu](https://github.com/ver217)
|
作者: [Hongxin Liu](https://github.com/ver217)
|
||||||
|
|
||||||
**前置教程:**
|
**前置教程:**
|
||||||
- [Train with booster](../basics/booster_api.md)
|
- [Train with booster](../basics/booster_api.md)
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# 基于Chunk内存管理的零冗余优化器 (ZeRO)
|
# 基于Chunk内存管理的零冗余优化器 (ZeRO)
|
||||||
|
|
||||||
作者: [Hongxiu Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY)
|
作者: [Hongxin Liu](https://github.com/ver217), [Jiarui Fang](https://github.com/feifeibear), [Zijian Ye](https://github.com/ZijianYY)
|
||||||
|
|
||||||
**前置教程:**
|
**前置教程:**
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user