Commit Graph

3 Commits

Author SHA1 Message Date
hxwang
70c9924d0d [chore] solve moe ckpt test failure and some other arg pass failure 2024-08-01 10:06:59 +08:00
hxwang
74eccac0db [moe] test deepseek 2024-08-01 10:06:59 +08:00
botbw
9b9b76bdcd [moe] add mixtral dp grad scaling when not all experts are activated 2024-08-01 10:06:59 +08:00