flybird11111
17062c83b9
[hotfix] fix hybrid checkpointio for sp+dp ( #6184 )
...
* Update hybrid_parallel_plugin.py
* Update hybrid_parallel_plugin.py
* Update hybrid_parallel_plugin.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update build_on_pr.yml
* Update test_zerobubble_pp.py
* fix
* fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-02-06 17:21:04 +08:00
botbw
4fa6b9509c
[moe] add parallel strategy for shared_expert && fix test for deepseek ( #6063 )
2024-09-18 10:09:01 +08:00
botbw
c54c4fcd15
[hotfix] moe hybrid parallelism benchmark & follow-up fix ( #6048 )
...
* [example] pass use_fp8_comm flag to all plugins
* [example] add mixtral benchmark
* [moe] refine assertion and check
* [moe] fix mixtral & add more tests
* [moe] consider checking dp * sp group and moe_dp_group
* [mixtral] remove gate tp & add more tests
* [deepseek] fix tp & sp for deepseek
* [mixtral] minor fix
* [deepseek] add deepseek benchmark
2024-09-10 17:30:53 +08:00
botbw
62cdac6b7b
[chore] remove redundant test case, print string & reduce test tokens
2024-08-01 10:06:59 +08:00
haze188
70793ce9ed
[misc] fix ci failure: change default value to false in moe plugin
2024-08-01 10:06:59 +08:00
hxwang
cb01c0d5ce
[moe] refactor mesh assignment
2024-08-01 10:06:59 +08:00
hxwang
6c39f0b144
[test] add check
2024-08-01 10:06:59 +08:00
haze188
b2952a5982
[moe] deepseek moe sp support
2024-08-01 10:06:59 +08:00
hxwang
067e18f7e9
[test] fix test: test_zero1_2
2024-08-01 10:06:59 +08:00
hxwang
70c9924d0d
[chore] solve moe ckpt test failure and some other arg pass failure
2024-08-01 10:06:59 +08:00
hxwang
46037c2ccd
[chore] minor fix after rebase
2024-08-01 10:06:59 +08:00
hxwang
803878b2fd
[moe] full test for deepseek and mixtral (pp + sp to fix)
2024-08-01 10:06:59 +08:00