5 Commits

Author SHA1 Message Date
YeAnbang
6a6634b6e8 add ppo 2025-03-07 18:29:34 +08:00
Tong Li
d03cdea949 update reward fn 2025-03-06 10:53:48 +08:00
Tong Li
070907dd7f polish 2025-02-28 10:16:42 +08:00
Tong Li
ffd3878a1e add simple grpo 2025-02-23 22:54:26 +08:00
Tong Li
8e6c9a4ab3 add reward related function 2025-02-23 11:02:54 +08:00