Commit Graph

16 Commits

Author SHA1 Message Date
pre-commit-ci[bot]
6e096362ef [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-03-07 10:43:03 +00:00
YeAnbang
c8e13a9403 run pre-commit 2025-03-07 18:40:31 +08:00
YeAnbang
d31f9e4d0f run pre-commit 2025-03-07 18:30:19 +08:00
YeAnbang
6a6634b6e8 add ppo 2025-03-07 18:29:34 +08:00
pre-commit-ci[bot]
eb6337f07f [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-03-06 08:29:59 +00:00
Tong Li
0cc0c843ed add save 2025-03-06 16:26:14 +08:00
Tong Li
0f566cc2d4 add algo selection 2025-03-06 14:29:22 +08:00
Tong Li
d03cdea949 update reward fn 2025-03-06 10:53:48 +08:00
Tong Li
678f5a9eca update loss 2025-03-06 10:53:03 +08:00
Tong Li
b96d69055e grpo consumer 2025-03-06 10:51:27 +08:00
Tong Li
070907dd7f polish 2025-02-28 10:16:42 +08:00
Tong Li
f736d747e3 update grpo 2025-02-25 18:12:04 +08:00
Tong Li
ffd3878a1e add simple grpo 2025-02-23 22:54:26 +08:00
Tong Li
8e6c9a4ab3 add reward related function 2025-02-23 11:02:54 +08:00
Hongxin Liu
de282dd694
[feature] fit RL style generation (#6213)
* [feature] fit rl style generation

* [doc] add docstr

* [doc] add docstr
2025-02-21 17:28:19 +08:00
Hongxin Liu
43c9b5fb44
[chat] add distributed impl (#6210) 2025-02-21 15:24:23 +08:00