ColossalAI

mirror of https://github.com/hpcaitech/ColossalAI.git synced 2025-09-24 11:08:50 +00:00

Author	SHA1	Message	Date
Frank Lee	cb18922c47	[doc] added documentation to chunk and chunk manager (#1094 ) * [doc] added documentation to chunk and chunk manager * polish code * polish code * polish code	2022-06-10 15:33:06 +08:00
ver217	1f894e033f	[gemini] zero supports gemini (#1093 ) * add placement policy * add gemini mgr * update mem stats collector * update zero * update zero optim * fix bugs * zero optim monitor os * polish unit test * polish unit test * add assert	2022-06-10 14:48:28 +08:00
Frank Lee	2b2dc1c86b	[pipeline] refactor the pipeline module (#1087 ) * [pipeline] refactor the pipeline module * polish code	2022-06-10 11:27:38 +08:00
ver217	be01db37c8	[tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077 ) * polish chunk manager * polish unit test * impl add_extern_static_tensor for chunk mgr * add mem stats collector v2 * polish code * polish unit test * polish code * polish get chunks	2022-06-09 20:56:34 +08:00
Ziyue Jiang	0653c63eaa	[Tensor] 1d row embedding (#1075 ) * Add CPU 1d row embedding * polish	2022-06-08 12:04:59 +08:00
Ziyue Jiang	4fc748f69b	[Tensor] fix optimizer for CPU parallel (#1069 )	2022-06-06 17:36:11 +08:00
Jiarui Fang	49832b2344	[refactory] add nn.parallel module (#1068 )	2022-06-06 15:34:41 +08:00
Ziyue Jiang	6754f1b77f	fix module utils bug (#1066 )	2022-06-06 12:11:48 +08:00
Jiarui Fang	a00644079e	reorgnize colotensor directory (#1062 ) * reorgnize colotensor directory * polish code	2022-06-03 18:04:22 +08:00
Ziyue Jiang	df9dcbbff6	[Tensor] add hybrid device demo and fix bugs (#1059 )	2022-06-03 12:09:49 +08:00
ver217	51b9a49655	[zero] add zero optimizer for ColoTensor (#1046 ) * add zero optimizer * torch ok * unit test ok * polish code * fix bugs * polish unit test * polish zero optim * polish colo ddp v2 * refactor folder structure * add comment * polish unit test * polish zero optim * polish unit test	2022-06-02 12:13:15 +08:00
ver217	9492a561c3	[tensor] ColoTensor supports ZeRo (#1015 ) * impl chunk manager * impl param op hook * add reduce_chunk * add zero hook v2 * add zero dp * fix TensorInfo * impl load balancing when using zero without chunk * fix zero hook * polish chunk * fix bugs * ddp ok * zero ok * polish code * fix bugs about load balancing * polish code * polish code * add ene-to-end test * polish code * polish code * polish code * fix typo * add test_chunk * fix bugs * fix bugs * polish code	2022-05-31 12:00:12 +08:00
ver217	cefc29ff06	[tensor] impl ColoDDP for ColoTensor (#1009 ) * impl ColoDDP for ColoTensor * polish code	2022-05-21 13:52:04 +08:00
Ziheng Qin	571f12eff3	[NFC] polish colossalai/nn/layer/utils/common.py code style (#983 )	2022-05-17 10:25:06 +08:00
shenggan	18542b47fc	[NFC] polish colossalai/nn/layer/parallel_2d/layers.py code style (#976 )	2022-05-17 10:25:06 +08:00
Zirui Zhu	598cde4a0f	[NFC] polish colossalai/nn/layer/parallel_2p5d/layers.py code style (#972 )	2022-05-17 10:25:06 +08:00
LuGY	fb5bc6cb28	[NFC] polish colossalai/nn/layer/parallel_3d/layers.py code style (#966 )	2022-05-17 10:25:06 +08:00
ver217	58580b50fe	Revert "[NFC] Hotfix/format (#984 )" (#986 ) This reverts commit `0772828fba`.	2022-05-17 10:23:38 +08:00
binmakeswell	0772828fba	[NFC] Hotfix/format (#984 ) * [NFC] Polish colossalai/kernel/cuda_native/csrc/multi_tensor_lamb.cu code style. (#937) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cuda_util.h code style (#939) * [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.cpp code style (#936) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/block_reduce.h code style (#938) * [NFC] polish moe_cuda_kernel.cu code style (#940) Co-authored-by: Xiao Ye <xiaoye2@illinois.edu> * [NFC] polish pre-commit run --files colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax_cuda.cu code style (#943) * [NFC] polish colossalai/kernel/cuda_native/csrc/moe_cuda.cpp code style (#942) * [NFC] polish colossalai/kernel/cuda_native/csrc/cpu_adam.h code style (#945) * [NFC] polish colossalai/kernel/jit/bias_gelu.py code style (#946) Co-authored-by: jnbai <897086360@qq.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_masked_softmax_cuda.cu code style (#949) Co-authored-by: Jiatong <jiatong.han@u.nus.edu> * [NFC] polish colossalai/builder/pipeline.py code style (#951) * [NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.cpp code style (#952) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cross_entropy.cu code style (#953) Co-authored-by: 何晓昕 <cautious@hexiaoxins-MacBook-Pro.local> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/softmax_kernels.cu code style (#954) * [NFC] polish colossalai/kernel/cuda_native/scaled_softmax.py code style (#955) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/context.h code style (#956) Co-authored-by: RichardoLuo <14049555596@qq.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/cross_entropy_layer.h code style (#957) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_l2norm_kernel.cu code style (#958) * [NFC] polish colossalai/kernel/cuda_native/csrc/multihead_attention_1d.h code style (#962) * [NFC] polish colossalai/kernel/cuda_native/csrc/scaled_upper_triang_masked_softmax.cpp code style (#959) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/general_kernels.cu code style (#963) Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com> * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/softmax.h code style (#964) * [NFC] polish __init__.py code style (#965) * [NFC] polish colossalai/nn/layer/parallel_3d/layers.py code style (#966) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/feed_forward.h (#968) code style * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/include/dropout.h code style (#970) * [NFC] polish colossalai/nn/layer/parallel_2p5d/layers.py code style (#972) * [NFC] polish colossalai/kernel/cuda_native/csrc/layer_norm_cuda.cpp code style (#973) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/normalize_kernels.cu code style (#974) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_scale_kernel.cu code style (#977) * [NFC] polish colossalai/nn/layer/parallel_2d/layers.py code style (#976) * [NFC] polish colossalai/kernel/cuda_native/csrc/multi_tensor_sgd_kernel.cu code style (#978) * [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/dropout_kernels.cu code style (#979) * [NFC] polish colossalai/kernel/cuda_native/layer_norm.py code style (#980) * [NFC] polish colossalai/nn/layer/utils/common.py code style (#983) Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com> Co-authored-by: yuxuan-lou <83441848+yuxuan-lou@users.noreply.github.com> Co-authored-by: Geng Zhang <34452939+zxgx@users.noreply.github.com> Co-authored-by: Maruyama_Aya <38985202+MaruyamaAya@users.noreply.github.com> Co-authored-by: XYE <92607131+Itok2000u@users.noreply.github.com> Co-authored-by: Xiao Ye <xiaoye2@illinois.edu> Co-authored-by: HaoyuQin <79465534+coder-chin@users.noreply.github.com> Co-authored-by: wky <64853922+wangkuangyi@users.noreply.github.com> Co-authored-by: bajiaoyu517 <59548007+bajiaoyu517@users.noreply.github.com> Co-authored-by: luoling-LC <105470086+luoling-LC@users.noreply.github.com> Co-authored-by: jnbai <897086360@qq.com> Co-authored-by: JT.Han <59948448+JThh@users.noreply.github.com> Co-authored-by: Jiatong <jiatong.han@u.nus.edu> Co-authored-by: xyupeng <99191637+xyupeng@users.noreply.github.com> Co-authored-by: Sze-qq <68757353+Sze-qq@users.noreply.github.com> Co-authored-by: Cautiousss <48676630+Cautiousss@users.noreply.github.com> Co-authored-by: 何晓昕 <cautious@hexiaoxins-MacBook-Pro.local> Co-authored-by: Luxios22 <67457897+Luxios22@users.noreply.github.com> Co-authored-by: Wangbo Zhao(黑色枷锁) <56866854+wangbo-zhao@users.noreply.github.com> Co-authored-by: RichardoLuo <50363844+RichardoLuo@users.noreply.github.com> Co-authored-by: RichardoLuo <14049555596@qq.com> Co-authored-by: doubleHU <98150031+huxin711@users.noreply.github.com> Co-authored-by: runluo <68489000+run-qiao@users.noreply.github.com> Co-authored-by: MaxT <854721132@qq.com> Co-authored-by: superhao1995 <804673818@qq.com> Co-authored-by: ziyu huang <huang0ziyu@gmail.com> Co-authored-by: “Arsmart123 <202476410arsmart@gmail.com> Co-authored-by: Yuer867 <62204893+Yuer867@users.noreply.github.com> Co-authored-by: lucasliunju <lucasliunju@gmail.com> Co-authored-by: LuGY <74758262+Gy-Lu@users.noreply.github.com> Co-authored-by: ExtremeViscent <zhangyiqi55732@sina.com> Co-authored-by: Xu Kai <xukai16@foxmail.com> Co-authored-by: Zirui Zhu <zhuzr21@gmail.com> Co-authored-by: Ofey Chan <ofey206@gmail.com> Co-authored-by: DouJS <dujiangsu@163.com> Co-authored-by: Jie Zhu <chore.08-protist@icloud.com> Co-authored-by: shenggan <csg19971016@gmail.com> Co-authored-by: Kai Wang (Victor Kai) <37533040+kaiwang960112@users.noreply.github.com> Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com> Co-authored-by: Ziheng Qin <37519855+henryqin1997@users.noreply.github.com>	2022-05-17 09:54:49 +08:00
HELSON	e5ea3fdeef	[gemini] add GeminiMemoryManger (#832 ) * refactor StatefulTensor, tensor utilities * add unitest for GeminiMemoryManager	2022-04-24 13:08:48 +08:00
Ziyue Jiang	4b01da24cd	[TP] change the check assert in split batch 2d (#772 )	2022-04-16 21:29:57 +08:00
アマデウス	b8899e0905	[TP] allow layernorm without bias (#750 )	2022-04-14 11:43:56 +08:00
Frank Lee	eda30a058e	[compatibility] fixed tensor parallel compatibility with torch 1.9 (#700 )	2022-04-11 13:44:50 +08:00
HELSON	a9b8300d54	[zero] improve adaptability for not-shard parameters (#708 ) * adapt post grad hooks for not-shard parameters * adapt optimizer for not-shard parameters * offload gradients for not-replicated parameters	2022-04-11 13:38:51 +08:00
アマデウス	3fc8a204dc	[]Corrected 3d vocab parallel embedding (#707 )	2022-04-11 10:17:55 +08:00
HELSON	b31daed4cf	fix bugs in CPU adam (#633 ) * add cpu adam counter for all cpu adam * fixed updating error in adam kernel	2022-04-02 17:04:05 +08:00
Liang Bowen	828e465622	[hotfix] Raise messages for indivisible batch sizes with tensor parallelism (#622 )	2022-04-02 16:12:04 +08:00
アマデウス	77ad24bf94	[model checkpoint] updated saving/loading for 3d layers (#597 )	2022-04-01 16:52:47 +08:00
アマデウス	93089ed708	[model checkpoint] updated saving/loading for 2.5d layers (#596 )	2022-04-01 16:52:33 +08:00
アマデウス	c50bfb807b	[model checkpoint] updated saving/loading for 1d layers (#594 )	2022-04-01 16:51:52 +08:00
アマデウス	7636d518e1	[model checkpoint] updated saving/loading for 2d layers (#595 )	2022-04-01 16:50:34 +08:00
アマデウス	cd13b63832	[model checkpoint] reworked unified layers for ease of save/load states (#593 )	2022-04-01 16:49:56 +08:00
Ziyue Jiang	1c40ee8749	[TP] add assert for tp1d (#621 )	2022-04-01 16:44:23 +08:00
ver217	e619a651fb	polish optimizer docstring (#619 )	2022-04-01 16:27:03 +08:00
ver217	8432dc7080	polish moe docsrting (#618 )	2022-04-01 16:15:36 +08:00
ver217	104cbbb313	[hotfix] add hybrid adam to __init__ (#584 )	2022-03-31 19:08:34 +08:00
HELSON	e6d50ec107	[zero] adapt zero for unsharded parameters (#561 ) * support existing sharded and unsharded parameters in zero * add unitest for moe-zero model init * polish moe gradient handler	2022-03-31 18:34:11 +08:00
Wesley	46c9ba33da	update code format	2022-03-31 17:15:08 +08:00
Wesley	666cfd094a	fix parallel_input flag for Linear1D_Col gather_output	2022-03-31 17:15:08 +08:00
Liang Bowen	2c45efc398	html refactor (#555 )	2022-03-31 11:36:56 +08:00
LuGY	c44d797072	[docs] updatad docs of hybrid adam and cpu adam (#552 )	2022-03-30 18:14:59 +08:00
Ziyue Jiang	763dc325f1	[TP] Add gather_out arg to Linear (#541 )	2022-03-30 09:35:46 +08:00
HELSON	8c90d4df54	[zero] add zero context manager to change config during initialization (#546 )	2022-03-29 17:57:59 +08:00
Liang Bowen	ec5086c49c	Refactored docstring to google style	2022-03-29 17:17:47 +08:00
LuGY	105c5301c3	[zero]added hybrid adam, removed loss scale in adam (#527 ) * [zero]added hybrid adam, removed loss scale of adam * remove useless code	2022-03-25 18:03:54 +08:00
LuGY	6a3f9fda83	[cuda] modify the fused adam, support hybrid of fp16 and fp32 (#497 )	2022-03-25 14:15:53 +08:00
Jiarui Fang	a445e118cf	[polish] polish singleton and global context (#500 )	2022-03-23 18:03:39 +08:00
ver217	9ec1ce6ab1	[zero] sharded model support the reuse of fp16 shard (#495 ) * sharded model supports reuse fp16 shard * rename variable * polish code * polish code * polish code	2022-03-23 14:59:59 +08:00
HELSON	c9023d4078	[MOE] support PR-MOE (#488 )	2022-03-22 16:48:22 +08:00
ver217	62b0a8d644	[zero] sharded optim support hybrid cpu adam (#486 ) * sharded optim support hybrid cpu adam * update unit test * polish docstring	2022-03-22 14:56:59 +08:00

... 2 3 4 5 6

289 Commits