Commit Graph

289 Commits

Author SHA1 Message Date
Frank Lee
cb18922c47 [doc] added documentation to chunk and chunk manager (#1094)
* [doc] added documentation to chunk and chunk manager

* polish code

* polish code

* polish code
2022-06-10 15:33:06 +08:00
ver217
1f894e033f [gemini] zero supports gemini (#1093)
* add placement policy

* add gemini mgr

* update mem stats collector

* update zero

* update zero optim

* fix bugs

* zero optim monitor os

* polish unit test

* polish unit test

* add assert
2022-06-10 14:48:28 +08:00
Frank Lee
2b2dc1c86b [pipeline] refactor the pipeline module (#1087)
* [pipeline] refactor the pipeline module

* polish code
2022-06-10 11:27:38 +08:00
ver217
be01db37c8 [tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077)
* polish chunk manager

* polish unit test

* impl add_extern_static_tensor for chunk mgr

* add mem stats collector v2

* polish code

* polish unit test

* polish code

* polish get chunks
2022-06-09 20:56:34 +08:00
Ziyue Jiang
0653c63eaa [Tensor] 1d row embedding (#1075)
* Add CPU 1d row embedding

* polish
2022-06-08 12:04:59 +08:00
Ziyue Jiang
4fc748f69b [Tensor] fix optimizer for CPU parallel (#1069) 2022-06-06 17:36:11 +08:00
Jiarui Fang
49832b2344 [refactory] add nn.parallel module (#1068) 2022-06-06 15:34:41 +08:00
Ziyue Jiang
6754f1b77f fix module utils bug (#1066) 2022-06-06 12:11:48 +08:00
Jiarui Fang
a00644079e reorgnize colotensor directory (#1062)
* reorgnize colotensor directory

* polish code
2022-06-03 18:04:22 +08:00
Ziyue Jiang
df9dcbbff6 [Tensor] add hybrid device demo and fix bugs (#1059) 2022-06-03 12:09:49 +08:00
ver217
51b9a49655 [zero] add zero optimizer for ColoTensor (#1046)
* add zero optimizer

* torch ok

* unit test ok

* polish code

* fix bugs

* polish unit test

* polish zero optim

* polish colo ddp v2

* refactor folder structure

* add comment

* polish unit test

* polish zero optim

* polish unit test
2022-06-02 12:13:15 +08:00
ver217
9492a561c3 [tensor] ColoTensor supports ZeRo (#1015)
* impl chunk manager

* impl param op hook

* add reduce_chunk

* add zero hook v2

* add zero dp

* fix TensorInfo

* impl load balancing when using zero without chunk

* fix zero hook

* polish chunk

* fix bugs

* ddp ok

* zero ok

* polish code

* fix bugs about load balancing

* polish code

* polish code

* add ene-to-end test

* polish code

* polish code

* polish code

* fix typo

* add test_chunk

* fix bugs

* fix bugs

* polish code
2022-05-31 12:00:12 +08:00
ver217
cefc29ff06 [tensor] impl ColoDDP for ColoTensor (#1009)
* impl ColoDDP for ColoTensor

* polish code
2022-05-21 13:52:04 +08:00
Ziheng Qin
571f12eff3 [NFC] polish colossalai/nn/layer/utils/common.py code style (#983) 2022-05-17 10:25:06 +08:00
shenggan
18542b47fc [NFC] polish colossalai/nn/layer/parallel_2d/layers.py code style (#976) 2022-05-17 10:25:06 +08:00
Zirui Zhu
598cde4a0f [NFC] polish colossalai/nn/layer/parallel_2p5d/layers.py code style (#972) 2022-05-17 10:25:06 +08:00
LuGY
fb5bc6cb28 [NFC] polish colossalai/nn/layer/parallel_3d/layers.py code style (#966) 2022-05-17 10:25:06 +08:00
ver217
58580b50fe Revert "[NFC] Hotfix/format (#984)" (#986)
This reverts commit 0772828fba.
2022-05-17 10:23:38 +08:00
binmakeswell
0772828fba [NFC] Hotfix/format (#984)
* [NFC] Polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code style. (#937)

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style (#939)

* [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.cpp code style (#936)

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/block_reduce.h code style (#938)

* [NFC] polish moe_cuda_kernel.cu code style (#940)

Co-authored-by: Xiao Ye <xiaoye2@illinois.edu>

* [NFC] polish pre-commit run --files colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax_cuda.cu code style (#943)

* [NFC] polish colossalai/kernel/cuda_native/csrc/moe_cuda.cpp code style (#942)

* [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.h code style (#945)

* [NFC] polish colossalai/kernel/jit/bias_gelu.py code style (#946)

Co-authored-by: jnbai <897086360@qq.com>

* [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax_cuda.cu code style (#949)

Co-authored-by: Jiatong <jiatong.han@u.nus.edu>

* [NFC] polish colossalai/builder/pipeline.py code style (#951)

* [NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.cpp code style (#952)

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cross_entropy.cu code style (#953)

Co-authored-by: 何晓昕 <cautious@hexiaoxins-MacBook-Pro.local>

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/softmax_kernels.cu code style (#954)

* [NFC] polish colossalai/kernel/cuda_native/scaled_softmax.py  code style (#955)

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/context.h code style (#956)

Co-authored-by: RichardoLuo <14049555596@qq.com>

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cross_entropy_layer.h code style (#957)

* [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu code style (#958)

* [NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.h code style (#962)

* [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.cpp code style (#959)

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/general_kernels.cu code style (#963)

Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com>

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/softmax.h code style (#964)

* [NFC] polish __init__.py code style (#965)

* [NFC] polish colossalai/nn/layer/parallel_3d/layers.py code style (#966)

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/feed_forward.h (#968)

code style

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/dropout.h code style (#970)

* [NFC] polish colossalai/nn/layer/parallel_2p5d/layers.py code style (#972)

* [NFC] polish colossalai/kernel/cuda_native/csrc/layer_norm_cuda.cpp code style (#973)

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/normalize_kernels.cu code style (#974)

* [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu code style (#977)

* [NFC] polish colossalai/nn/layer/parallel_2d/layers.py code style (#976)

* [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu code style (#978)

* [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu code style (#979)

* [NFC] polish colossalai/kernel/cuda_native/layer_norm.py code style (#980)

* [NFC] polish colossalai/nn/layer/utils/common.py code style (#983)

Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
Co-authored-by: yuxuan-lou <83441848+yuxuan-lou@users.noreply.github.com>
Co-authored-by: Geng Zhang <34452939+zxgx@users.noreply.github.com>
Co-authored-by: Maruyama_Aya <38985202+MaruyamaAya@users.noreply.github.com>
Co-authored-by: XYE <92607131+Itok2000u@users.noreply.github.com>
Co-authored-by: Xiao Ye <xiaoye2@illinois.edu>
Co-authored-by: HaoyuQin <79465534+coder-chin@users.noreply.github.com>
Co-authored-by: wky <64853922+wangkuangyi@users.noreply.github.com>
Co-authored-by: bajiaoyu517 <59548007+bajiaoyu517@users.noreply.github.com>
Co-authored-by: luoling-LC <105470086+luoling-LC@users.noreply.github.com>
Co-authored-by: jnbai <897086360@qq.com>
Co-authored-by: JT.Han <59948448+JThh@users.noreply.github.com>
Co-authored-by: Jiatong <jiatong.han@u.nus.edu>
Co-authored-by: xyupeng <99191637+xyupeng@users.noreply.github.com>
Co-authored-by: Sze-qq <68757353+Sze-qq@users.noreply.github.com>
Co-authored-by: Cautiousss <48676630+Cautiousss@users.noreply.github.com>
Co-authored-by: 何晓昕 <cautious@hexiaoxins-MacBook-Pro.local>
Co-authored-by: Luxios22 <67457897+Luxios22@users.noreply.github.com>
Co-authored-by: Wangbo Zhao(黑色枷锁) <56866854+wangbo-zhao@users.noreply.github.com>
Co-authored-by: RichardoLuo <50363844+RichardoLuo@users.noreply.github.com>
Co-authored-by: RichardoLuo <14049555596@qq.com>
Co-authored-by: doubleHU <98150031+huxin711@users.noreply.github.com>
Co-authored-by: runluo <68489000+run-qiao@users.noreply.github.com>
Co-authored-by: MaxT <854721132@qq.com>
Co-authored-by: superhao1995 <804673818@qq.com>
Co-authored-by: ziyu huang <huang0ziyu@gmail.com>
Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com>
Co-authored-by: Yuer867 <62204893+Yuer867@users.noreply.github.com>
Co-authored-by: lucasliunju <lucasliunju@gmail.com>
Co-authored-by: LuGY <74758262+Gy-Lu@users.noreply.github.com>
Co-authored-by: ExtremeViscent <zhangyiqi55732@sina.com>
Co-authored-by: Xu Kai <xukai16@foxmail.com>
Co-authored-by: Zirui Zhu <zhuzr21@gmail.com>
Co-authored-by: Ofey Chan <ofey206@gmail.com>
Co-authored-by: DouJS <dujiangsu@163.com>
Co-authored-by: Jie Zhu <chore.08-protist@icloud.com>
Co-authored-by: shenggan <csg19971016@gmail.com>
Co-authored-by: Kai Wang (Victor Kai) <37533040+kaiwang960112@users.noreply.github.com>
Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com>
Co-authored-by: Ziheng Qin <37519855+henryqin1997@users.noreply.github.com>
2022-05-17 09:54:49 +08:00
HELSON
e5ea3fdeef [gemini] add GeminiMemoryManger (#832)
* refactor StatefulTensor, tensor utilities

* add unitest for GeminiMemoryManager
2022-04-24 13:08:48 +08:00
Ziyue Jiang
4b01da24cd [TP] change the check assert in split batch 2d (#772) 2022-04-16 21:29:57 +08:00
アマデウス
b8899e0905 [TP] allow layernorm without bias (#750) 2022-04-14 11:43:56 +08:00
Frank Lee
eda30a058e [compatibility] fixed tensor parallel compatibility with torch 1.9 (#700) 2022-04-11 13:44:50 +08:00
HELSON
a9b8300d54 [zero] improve adaptability for not-shard parameters (#708)
* adapt post grad hooks for not-shard parameters
* adapt optimizer for not-shard parameters
* offload gradients for not-replicated parameters
2022-04-11 13:38:51 +08:00
アマデウス
3fc8a204dc []Corrected 3d vocab parallel embedding (#707) 2022-04-11 10:17:55 +08:00
HELSON
b31daed4cf fix bugs in CPU adam (#633)
* add cpu adam counter for all cpu adam

* fixed updating error in adam kernel
2022-04-02 17:04:05 +08:00
Liang Bowen
828e465622 [hotfix] Raise messages for indivisible batch sizes with tensor parallelism (#622) 2022-04-02 16:12:04 +08:00
アマデウス
77ad24bf94 [model checkpoint] updated saving/loading for 3d layers (#597) 2022-04-01 16:52:47 +08:00
アマデウス
93089ed708 [model checkpoint] updated saving/loading for 2.5d layers (#596) 2022-04-01 16:52:33 +08:00
アマデウス
c50bfb807b [model checkpoint] updated saving/loading for 1d layers (#594) 2022-04-01 16:51:52 +08:00
アマデウス
7636d518e1 [model checkpoint] updated saving/loading for 2d layers (#595) 2022-04-01 16:50:34 +08:00
アマデウス
cd13b63832 [model checkpoint] reworked unified layers for ease of save/load states (#593) 2022-04-01 16:49:56 +08:00
Ziyue Jiang
1c40ee8749 [TP] add assert for tp1d (#621) 2022-04-01 16:44:23 +08:00
ver217
e619a651fb polish optimizer docstring (#619) 2022-04-01 16:27:03 +08:00
ver217
8432dc7080 polish moe docsrting (#618) 2022-04-01 16:15:36 +08:00
ver217
104cbbb313 [hotfix] add hybrid adam to __init__ (#584) 2022-03-31 19:08:34 +08:00
HELSON
e6d50ec107 [zero] adapt zero for unsharded parameters (#561)
* support existing sharded and unsharded parameters in zero

* add unitest for moe-zero model init

* polish moe gradient handler
2022-03-31 18:34:11 +08:00
Wesley
46c9ba33da update code format 2022-03-31 17:15:08 +08:00
Wesley
666cfd094a fix parallel_input flag for Linear1D_Col gather_output 2022-03-31 17:15:08 +08:00
Liang Bowen
2c45efc398 html refactor (#555) 2022-03-31 11:36:56 +08:00
LuGY
c44d797072 [docs] updatad docs of hybrid adam and cpu adam (#552) 2022-03-30 18:14:59 +08:00
Ziyue Jiang
763dc325f1 [TP] Add gather_out arg to Linear (#541) 2022-03-30 09:35:46 +08:00
HELSON
8c90d4df54 [zero] add zero context manager to change config during initialization (#546) 2022-03-29 17:57:59 +08:00
Liang Bowen
ec5086c49c Refactored docstring to google style 2022-03-29 17:17:47 +08:00
LuGY
105c5301c3 [zero]added hybrid adam, removed loss scale in adam (#527)
* [zero]added hybrid adam, removed loss scale of adam

* remove useless code
2022-03-25 18:03:54 +08:00
LuGY
6a3f9fda83 [cuda] modify the fused adam, support hybrid of fp16 and fp32 (#497) 2022-03-25 14:15:53 +08:00
Jiarui Fang
a445e118cf [polish] polish singleton and global context (#500) 2022-03-23 18:03:39 +08:00
ver217
9ec1ce6ab1 [zero] sharded model support the reuse of fp16 shard (#495)
* sharded model supports reuse fp16 shard

* rename variable

* polish code

* polish code

* polish code
2022-03-23 14:59:59 +08:00
HELSON
c9023d4078 [MOE] support PR-MOE (#488) 2022-03-22 16:48:22 +08:00
ver217
62b0a8d644 [zero] sharded optim support hybrid cpu adam (#486)
* sharded optim support hybrid cpu adam

* update unit test

* polish docstring
2022-03-22 14:56:59 +08:00