Commit Graph

3866 Commits

Author SHA1 Message Date
wangbluo
6fb1322db1 fix 2024-09-25 18:56:18 +08:00
wangbluo
65c8297710 fix the attn 2024-09-25 18:51:03 +08:00
wangbluo
cfd9eda628 fix the ring attn 2024-09-25 18:34:29 +08:00
duanjunwen
83163fa70c [fix] fix traverse; traverse dict --> traverse tensor List; 2024-09-25 06:38:11 +00:00
duanjunwen
fc8b016887 [fix] fix stage_indices; 2024-09-25 06:15:45 +00:00
binmakeswell
cbaa104216
release FP8 news (#6068)
* add FP8 news

* release FP8 news

* release FP8 news
2024-09-25 11:57:16 +08:00
duanjunwen
8501202a35
Merge pull request #6065 from duanjunwen/dev/zero_bubble
[Feat] Support zero bubble with shardformer input
2024-09-24 19:17:37 +08:00
duanjunwen
7e6f793c51 [fix] fix detach_output_obj clone; 2024-09-24 08:08:32 +00:00
duanjunwen
6c1e1550ae [fix] fix dumb clone; 2024-09-23 06:43:49 +00:00
duanjunwen
a875212a42 [fix] fix ci --> oom in 4096 hidden dim; 2024-09-23 05:55:16 +00:00
duanjunwen
c114d1429a [fix] fix detach clone release order; 2024-09-23 04:00:24 +00:00
duanjunwen
da3220f48c [fix] fix pipeline util func deallocate --> release_tensor_data; fix bwd_b loss bwd branch; 2024-09-20 09:48:35 +00:00
duanjunwen
1739df423c [fix] fix fwd branch, fwd pass both micro_batch & internal_inputs' 2024-09-20 07:34:43 +00:00
duanjunwen
b6616f544e [fix] rm comments; 2024-09-20 07:29:41 +00:00
duanjunwen
c6d6ee39bd [fix] use tree_flatten replace dict traverse; 2024-09-20 07:18:49 +00:00
duanjunwen
26783776f1 [fix] fix input_tensors buffer append input_obj(dict) --> Tuple (microbatch, input_obj) , and all bwd b related cal logic; 2024-09-20 06:41:19 +00:00
duanjunwen
4753bf7add [fix] fix mem assert; 2024-09-19 08:27:47 +00:00
duanjunwen
a115106f8d [fix] fix bwd w input; 2024-09-19 08:10:05 +00:00
duanjunwen
349272c71f [fix] updatw bwd b&w input; dict --> list[torch.Tensor] 2024-09-19 07:47:01 +00:00
duanjunwen
6ee9584b9a [fix] fix require_grad & deallocate call; 2024-09-19 05:53:03 +00:00
duanjunwen
1f5c7258aa Merge remote-tracking branch 'upstream/feature/zerobubble' into dev/zero_bubble 2024-09-19 03:52:13 +00:00
Hongxin Liu
dabc2e7430
[release] update version (#6062) 2024-09-19 10:45:32 +08:00
Camille Zhong
f9546ba0be
[ColossalEval] support for vllm (#6056)
* support vllm

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify vllm and update readme

* run pre-commit

* remove dupilicated lines and refine code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update param name

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine code

* update readme

* refine code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-18 17:09:45 +08:00
duanjunwen
af2c2f8092 [feat] add more test; 2024-09-18 07:51:54 +00:00
duanjunwen
3dbad102cf [fix] fix zerobubble pp for shardformer type input; 2024-09-18 07:14:34 +00:00
botbw
4fa6b9509c
[moe] add parallel strategy for shared_expert && fix test for deepseek (#6063) 2024-09-18 10:09:01 +08:00
Wang Binluo
63314ce4e4
Merge pull request #6064 from wangbluo/fix_attn
[sp] : fix the attention kernel for sp
2024-09-18 10:08:15 +08:00
wangbluo
10e4f7da72 fix 2024-09-16 13:45:04 +08:00
Wang Binluo
37e35230ff
Merge pull request #6061 from wangbluo/sp_fix
[sp] : fix the attention kernel for sp
2024-09-14 20:54:35 +08:00
wangbluo
827ef3ee9a fix 2024-09-14 10:40:35 +00:00
Guangyao Zhang
bdb125f83f
[doc] FP8 training and communication document (#6050)
* Add FP8 training and communication document

* add fp8 docstring for plugins

* fix typo

* fix typo
2024-09-14 11:01:05 +08:00
Guangyao Zhang
f20b066c59
[fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059)
* all_gather only internode, fix pytest

* fix cuda arch <89 compile pytest error

* fix pytest failure

* disable all_gather_into_tensor_flat_fp8

* fix fp8 format

* fix pytest

* fix conversations

* fix chunk tuple to list
2024-09-14 10:40:01 +08:00
wangbluo
b582319273 fix 2024-09-13 10:24:41 +00:00
wangbluo
0ad3129cb9 fix 2024-09-13 09:01:26 +00:00
wangbluo
0b14a5512e fix 2024-09-13 07:06:14 +00:00
botbw
696fced0d7
[fp8] fix missing fp8_comm flag in mixtral (#6057) 2024-09-13 14:30:05 +08:00
wangbluo
dc032172c3 fix 2024-09-13 06:00:58 +00:00
wangbluo
f393867cff fix 2024-09-13 05:24:52 +00:00
wangbluo
6eb8832366 fix 2024-09-13 05:06:56 +00:00
wangbluo
683179cefd fix 2024-09-13 03:40:56 +00:00
wangbluo
0a01e2a453 fix the attn 2024-09-13 03:38:35 +00:00
pre-commit-ci[bot]
216d54e374 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-09-13 02:38:40 +00:00
wangbluo
fdd84b9087 fix the sp 2024-09-13 02:32:03 +00:00
duanjunwen
9bc3b6e220 [feat] moehybrid support zerobubble; 2024-09-12 02:51:46 +00:00
flybird11111
a35a078f08
[doc] update sp doc (#6055)
* update sp doc

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* fix

* fix

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-11 17:25:14 +08:00
Hongxin Liu
13946c4448
[fp8] hotfix backward hook (#6053)
* [fp8] hotfix backward hook

* [fp8] hotfix pipeline loss accumulation
2024-09-11 16:11:25 +08:00
duanjunwen
11ae6848c6
[zerobubble]Support ZeroBubble Pipeline (#6034)
* [feat] add zerobubble pp (just a frame now); add POC test for dx_dw; add test for zerobubble;

* [feat] add dw test;

* [fix] fix weight not close;

* [update] update text;

* [feat] add test run_fwd_bwd automatic scheduling;

* [feat] split communication and calculation; fix pop empty send_bwd_buffer error;

* [feat] add test for p & p grad;

* [feat] add comments for ZBV func;

* [fix] rm useless assign and comments;

* [fix] fix ci test; add pytest;

* [feat] add run_fwd_bwd_with_microbatch  (replace input) & test; add p&p.grad assert close test & all pass;

* [feat] add apply v_schedule graph; p & p.grad assert err exist;

* [fix] update

* [feat] fix ci; add assert;

* [feat] fix poc format

* [feat] fix func name & ci; add comments;

* [fix] fix poc test; add comments in poc;

* [feat] add optim backward_b_by_grad

* [feat] fix optimizer bwd b & w; support return accum loss & output

* [feat] add fwd_bwd_step, run_fwd_only;

* [fix] fix optim bwd; add license for v_schedule; remove redundant attributes; fix schedule loop "while"--> "for"; add communication dict;

* [fix] fix communication_map;

* [feat] update test; rm comments;

* [fix] rm zbv in hybridplugin

* [fix] fix optim bwd;

* [fix] fix optim bwd;

* [fix] rm output.data after send fwd;

* [fix] fix bwd step if condition; remove useless comments and format info;

* [fix] fix detach output & release output;

* [fix] rm requir_grad for output;

* [fix] fix requir grad position and detach position and input&output local buffer append position;

* [feat] add memory assertation;

* [fix] fix mem check;

* [fix] mem assertation'

* [fix] fix mem assertation

* [fix] fix mem; use a new model shape; only assert mem less and equal than theo;

* [fix] fix model zoo import;

* [fix] fix redundant detach & clone; add buffer assertation in the end;

* [fix] add output_obj_grad assert None at bwd b step; replace input_obj.require_grad_ with treemap;

* [fix] update optim state dict assert (include param group & state); fix mem assert after add optim;

* [fix] add testcase with microbatch 4;
2024-09-10 17:33:09 +08:00
botbw
c54c4fcd15
[hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048)
* [example] pass use_fp8_comm flag to all plugins

* [example] add mixtral benchmark

* [moe] refine assertion and check

* [moe] fix mixtral & add more tests

* [moe] consider checking dp * sp group and moe_dp_group

* [mixtral] remove gate tp & add more tests

* [deepseek] fix tp & sp for deepseek

* [mixtral] minor fix

* [deepseek] add deepseek benchmark
2024-09-10 17:30:53 +08:00
Wenxuan Tan
8fd25d6e09
[Feature] Split cross-entropy computation in SP (#5959)
* halfway

* fix cross-PP-stage position id length diff bug

* fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unified cross entropy func for all shardformer models

* remove redundant lines

* add basic ring attn; debug cross entropy

* fwd bwd logic complete

* fwd bwd logic complete; add experimental triton rescale

* precision tests passed

* precision tests passed

* fix typos and remove misc files

* update softmax_lse shape by new interface

* change tester name

* remove buffer clone; support packed seq layout

* add varlen tests

* fix typo

* all tests passed

* add dkv_group; fix mask

* remove debug statements

* adapt chatglm, command-R, qwen

* debug

* halfway

* fix cross-PP-stage position id length diff bug

* fix typo

* fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unified cross entropy func for all shardformer models

* remove redundant lines

* add basic ring attn; debug cross entropy

* fwd bwd logic complete

* fwd bwd logic complete; add experimental triton rescale

* precision tests passed

* precision tests passed

* fix typos and remove misc files

* add sp_mode to benchmark; fix varlen interface

* update softmax_lse shape by new interface

* add varlen tests

* fix typo

* all tests passed

* add dkv_group; fix mask

* remove debug statements

* add comments

* q1 index only once

* remove events to simplify stream sync

* simplify forward/backward logic

* 2d ring forward passed

* 2d ring backward passed

* fixes

* fix ring attn loss

* 2D ring backward + llama passed

* merge

* update logger

* fix typo

* rebase

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

* remove typos

* fixes

* support GPT

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-10 12:06:50 +08:00
Hongxin Liu
b3db1058ec
[release] update version (#6041)
* [release] update version

* [devops] update comp test

* [devops] update comp test debug

* [devops] debug comp test

* [devops] debug comp test

* [devops] debug comp test

* [devops] debug comp test

* [devops] debug comp test
2024-09-10 10:31:09 +08:00