Commit Graph

2 Commits

Author SHA1 Message Date
YeAnbang
14f237ce7e
[feat] Support boxed math reward (#6284)
* fix pp+tp, fix dataloader

* fixed plugin micro-batch size

* support boxed reward

* add boxed reward

* fix pp state dict incomplete issue

* Revert "fix pp state dict incomplete issue"

This reverts commit 6c1b3b694f.
2025-04-29 16:46:47 +08:00
Tong Li
8e6c9a4ab3 add reward related function 2025-02-23 11:02:54 +08:00