mirror of
https://github.com/hpcaitech/ColossalAI.git
synced 2025-06-23 14:10:29 +00:00
[doc] fix doc typo (#5256)
* [doc] fix annotation display * [doc] fix llama2 doc
This commit is contained in:
parent
e830ef917d
commit
c174c4fc5f
@ -116,18 +116,18 @@ We will follow this roadmap to develop Shardformer:
|
|||||||
|
|
||||||
| model | tensor parallel | pipeline parallel | lazy initialization | xformer | flash attn2 | jit fused operator | fused layernorm | sequence parallel | overlap |
|
| model | tensor parallel | pipeline parallel | lazy initialization | xformer | flash attn2 | jit fused operator | fused layernorm | sequence parallel | overlap |
|
||||||
| :------: | :-----: | :-----: | :--------: | :---------: | :------: | :-----: | :-----: | :--------: | :---------: |
|
| :------: | :-----: | :-----: | :--------: | :---------: | :------: | :-----: | :-----: | :--------: | :---------: |
|
||||||
| bert | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] |
|
| bert | [√] | [√] | [√] | [√] | [√] | [√] | [√] | [√] | [√] |
|
||||||
| t5 | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [ ] | [ ] |
|
| t5 | [√] | [√] | [√] | [√] | [√] | [√] | [√] | [ ] | [ ] |
|
||||||
| llama V1/V2 | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [ ] | [ ] |
|
| llama V1/V2 | [√] | [√] | [√] | [√] | [√] | [√] | [√] | [ ] | [ ] |
|
||||||
| gpt2 | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] |
|
| gpt2 | [√] | [√] | [√] | [√] | [√] | [√] | [√] | [√] | [√] |
|
||||||
| opt | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [ ] | [ ] |
|
| opt | [√] | [√] | [√] | [√] | [√] | [√] | [√] | [ ] | [ ] |
|
||||||
| bloom | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] |
|
| bloom | [√] | [√] | [√] | [√] | [√] | [√] | [√] | [√] | [√] |
|
||||||
| chatglm2 | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] |
|
| chatglm2 | [√] | [√] | [√] | [√] | [√] | [√] | [√] | [√] | [√] |
|
||||||
| vit | [x] | [x] | [ ] | [x] | [x] | [x] | [x] | [ ] | [ ] |
|
| vit | [√] | [√] | [ ] | [√] | [√] | [√] | [√] | [ ] | [ ] |
|
||||||
| whisper | [x] | [x] | [x] | [x] | [x] | [ ] | [x] | [ ] | [ ] |
|
| whisper | [√] | [√] | [√] | [√] | [√] | [ ] | [√] | [ ] | [ ] |
|
||||||
| sam | [x] | [ ] | [ ] | [x] | [x] | [x] | [x] | [ ] | [ ] |
|
| sam | [√] | [ ] | [ ] | [√] | [√] | [√] | [√] | [ ] | [ ] |
|
||||||
| blip2 | [x] | [ ] | [ ] | [x] | [x] | [x] | [x] | [ ] | [ ] |
|
| blip2 | [√] | [ ] | [ ] | [√] | [√] | [√] | [√] | [ ] | [ ] |
|
||||||
| falcon | [x] | [x] | [x] | [x] | [x] | [ ] | [x] | [ ] | [ ] |
|
| falcon | [√] | [√] | [√] | [√] | [√] | [ ] | [√] | [ ] | [ ] |
|
||||||
| roberta | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
| roberta | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
||||||
| albert | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
| albert | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
||||||
| ernie | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
| ernie | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
||||||
@ -137,7 +137,7 @@ We will follow this roadmap to develop Shardformer:
|
|||||||
| swin | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
| swin | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
||||||
| swin V2 | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
| swin V2 | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
||||||
| qwen | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
| qwen | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
|
||||||
| mistral | [x] | [ ] | [ ] | [x] | [x] | [x] | [x] | [ ] | [ ] |
|
| mistral | [√] | [ ] | [ ] | [√] | [√] | [√] | [√] | [ ] | [ ] |
|
||||||
|
|
||||||
|
|
||||||
## 💡 API Design
|
## 💡 API Design
|
||||||
|
@ -6,7 +6,6 @@
|
|||||||
</p>
|
</p>
|
||||||
|
|
||||||
- 70 billion parameter LLaMA2 model training accelerated by 195%
|
- 70 billion parameter LLaMA2 model training accelerated by 195%
|
||||||
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/llama2)
|
|
||||||
[[blog]](https://www.hpc-ai.tech/blog/70b-llama2-training)
|
[[blog]](https://www.hpc-ai.tech/blog/70b-llama2-training)
|
||||||
|
|
||||||
### LLaMA1
|
### LLaMA1
|
||||||
@ -15,7 +14,6 @@
|
|||||||
</p>
|
</p>
|
||||||
|
|
||||||
- 65-billion-parameter large model pretraining accelerated by 38%
|
- 65-billion-parameter large model pretraining accelerated by 38%
|
||||||
[[code]](https://github.com/hpcaitech/ColossalAI/tree/example/llama/examples/language/llama)
|
|
||||||
[[blog]](https://www.hpc-ai.tech/blog/large-model-pretraining)
|
[[blog]](https://www.hpc-ai.tech/blog/large-model-pretraining)
|
||||||
|
|
||||||
## Dataset
|
## Dataset
|
||||||
@ -123,7 +121,7 @@ Here we will show an example of how to run training
|
|||||||
llama pretraining with `gemini, batch_size=16, sequence_length=4096, gradient_checkpoint=True, flash_attn=True`.
|
llama pretraining with `gemini, batch_size=16, sequence_length=4096, gradient_checkpoint=True, flash_attn=True`.
|
||||||
|
|
||||||
#### a. Running environment
|
#### a. Running environment
|
||||||
This experiment was performed on 4 computing nodes with 32 A800 GPUs in total for LLaMA-1 65B. The nodes are
|
This experiment was performed on 4 computing nodes with 32 A800/H800 80GB GPUs in total for LLaMA-1 65B or LLaMA-2 70B. The nodes are
|
||||||
connected with RDMA and GPUs within one node are fully connected with NVLink.
|
connected with RDMA and GPUs within one node are fully connected with NVLink.
|
||||||
|
|
||||||
#### b. Running command
|
#### b. Running command
|
||||||
|
Loading…
Reference in New Issue
Block a user