Commit Graph

2276 Commits

Author SHA1 Message Date
hxwang
803878b2fd [moe] full test for deepseek and mixtral (pp + sp to fix) 2024-08-01 10:06:59 +08:00
hxwang
7077d38d5a [moe] finalize test (no pp) 2024-08-01 10:06:59 +08:00
haze188
2cddeac717 moe sp + ep bug fix 2024-08-01 10:06:59 +08:00
hxwang
877d94bb8c [moe] init moe plugin comm setting with sp 2024-08-01 10:06:59 +08:00
hxwang
09d6280d3e [chore] minor fix 2024-08-01 10:06:59 +08:00
Haze188
404b16faf3 [Feature] MoE Ulysses Support (#5918)
* moe sp support

* moe sp bug solve

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-08-01 10:06:59 +08:00
hxwang
3e2b6132b7 [moe] clean legacy code 2024-08-01 10:06:59 +08:00
hxwang
74eccac0db [moe] test deepseek 2024-08-01 10:06:59 +08:00
botbw
dc583aa576 [moe] implement tp 2024-08-01 10:06:59 +08:00
hxwang
102b784a10 [chore] arg pass & remove drop token 2024-08-01 10:06:59 +08:00
botbw
8dbb86899d [chore] trivial fix 2024-08-01 10:06:59 +08:00
botbw
014faf6c5a [chore] manually revert unintended commit 2024-08-01 10:06:59 +08:00
botbw
9b9b76bdcd [moe] add mixtral dp grad scaling when not all experts are activated 2024-08-01 10:06:59 +08:00
botbw
e28e05345b [moe] implement submesh initialization 2024-08-01 10:06:59 +08:00
haze188
5ed5e8cfba solve hang when parallel mode = pp + dp 2024-08-01 10:06:59 +08:00
botbw
13b48ac0aa [zero] solve hang 2024-08-01 10:06:59 +08:00
botbw
b5bfeb2efd [moe] implement transit between non moe tp and ep 2024-08-01 10:06:59 +08:00
botbw
37443cc7e4 [test] pass mixtral shardformer test 2024-08-01 10:06:59 +08:00
hxwang
46c069b0db [zero] solve hang 2024-08-01 10:06:59 +08:00
hxwang
0fad23c691 [chore] handle non member group 2024-08-01 10:06:59 +08:00
hxwang
a249e71946 [test] mixtra pp shard test 2024-08-01 10:06:59 +08:00
hxwang
8ae8525bdf [moe] fix plugin 2024-08-01 10:06:59 +08:00
hxwang
0b76b57cd6 [test] add mixtral transformer test 2024-08-01 10:06:59 +08:00
hxwang
f9b6fcf81f [test] add mixtral for sequence classification 2024-08-01 10:06:59 +08:00
Hongxin Liu
060892162a [zero] hotfix update master params (#5951) 2024-07-30 13:36:00 +08:00
Runyu Lu
bcf0181ecd [Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895)
* Distrifusion Support source

* comp comm overlap optimization

* sd3 benchmark

* pixart distrifusion bug fix

* sd3 bug fix and benchmark

* generation bug fix

* naming fix

* add docstring, fix counter and shape error

* add reference

* readme and requirement
2024-07-30 10:43:26 +08:00
Hongxin Liu
7b38964e3a [shardformer] hotfix attn mask (#5947) 2024-07-29 19:10:06 +08:00
Hongxin Liu
9664b1bc19 [shardformer] hotfix attn mask (#5945) 2024-07-29 13:58:27 +08:00
Edenzzzz
2069472e96 [Hotfix] Fix ZeRO typo #5936
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-07-25 09:59:58 +08:00
Hongxin Liu
5fd0592767 [fp8] support all-gather flat tensor (#5932) 2024-07-24 16:55:20 +08:00
Gao, Ruiyuan
5fb958cc83 [FIX BUG] convert env param to int in (#5934) 2024-07-24 10:30:40 +08:00
Insu Jang
a521ffc9f8 Add n_fused as an input from native_module (#5894) 2024-07-23 23:15:39 +08:00
Hongxin Liu
e86127925a [plugin] support all-gather overlap for hybrid parallel (#5919)
* [plugin] fixed all-gather overlap support for hybrid parallel
2024-07-18 15:33:03 +08:00
GuangyaoZhang
5b969fd831 fix shardformer fp8 communication training degradation 2024-07-18 07:16:36 +00:00
GuangyaoZhang
6a20f07b80 remove all to all 2024-07-17 07:14:55 +00:00
GuangyaoZhang
5a310b9ee1 fix rebase 2024-07-17 03:43:23 +00:00
GuangyaoZhang
457a0de79f shardformer fp8 2024-07-16 06:56:51 +00:00
アマデウス
530283dba0 fix object_to_tensor usage when torch>=2.3.0 (#5820) 2024-07-16 13:59:25 +08:00
Guangyao Zhang
2e28c793ce [compatibility] support torch 2.2 (#5875)
* Support Pytorch 2.2.2

* keep build_on_pr file and update .compatibility
2024-07-16 13:59:25 +08:00
Guangyao Zhang
1c961b20f3 [ShardFormer] fix qwen2 sp (#5903) 2024-07-15 13:58:06 +08:00
Stephan Kö
45c49dde96 [Auto Parallel]: Speed up intra-op plan generation by 44% (#5446)
* Remove unnecessary calls to deepcopy

* Build DimSpec's difference dict only once

This change considerably speeds up construction speed of DimSpec objects. The difference_dict is the same for each DimSpec object, so a single copy of it is enough.

* Fix documentation of DimSpec's difference method
2024-07-15 12:05:06 +08:00
pre-commit-ci[bot]
51f916b11d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-07-12 07:33:45 +00:00
BurkeHulk
1f1b856354 Merge remote-tracking branch 'origin/feature/fp8_comm' into feature/fp8_comm
# Conflicts:
#	colossalai/quantization/fp8.py
2024-07-12 15:29:41 +08:00
BurkeHulk
e88190184a support fp8 communication in pipeline parallelism 2024-07-12 15:25:25 +08:00
BurkeHulk
1e1959467e fix scaling algorithm in FP8 casting 2024-07-12 15:23:37 +08:00
Hongxin Liu
c068ef0fa0 [zero] support all-gather overlap (#5898)
* [zero] support all-gather overlap

* [zero] add overlap all-gather flag

* [misc] fix typo

* [zero] update api
2024-07-11 18:59:59 +08:00
GuangyaoZhang
dbfa7d39fc fix typo 2024-07-10 08:13:26 +00:00
Guangyao Zhang
669849d74b [ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (#5897) 2024-07-10 11:34:25 +08:00
Edenzzzz
fbf33ecd01 [Feature] Enable PP + SP for llama (#5868)
* fix cross-PP-stage position id length diff bug

* fix typo

* fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use a one cross entropy func for all shardformer models

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-07-09 18:05:20 +08:00
Runyu Lu
66abf1c6e8 [HotFix] CI,import,requirements-test for #5838 (#5892)
* [Hot Fix] CI,import,requirements-test

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-07-08 22:32:06 +08:00