pre-commit-ci[bot]
|
6e096362ef
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2025-03-07 10:43:03 +00:00 |
|
YeAnbang
|
c8e13a9403
|
run pre-commit
|
2025-03-07 18:40:31 +08:00 |
|
YeAnbang
|
d31f9e4d0f
|
run pre-commit
|
2025-03-07 18:30:19 +08:00 |
|
YeAnbang
|
6a6634b6e8
|
add ppo
|
2025-03-07 18:29:34 +08:00 |
|
pre-commit-ci[bot]
|
eb6337f07f
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2025-03-06 08:29:59 +00:00 |
|
Tong Li
|
0cc0c843ed
|
add save
|
2025-03-06 16:26:14 +08:00 |
|
Tong Li
|
0f566cc2d4
|
add algo selection
|
2025-03-06 14:29:22 +08:00 |
|
Tong Li
|
d03cdea949
|
update reward fn
|
2025-03-06 10:53:48 +08:00 |
|
Tong Li
|
678f5a9eca
|
update loss
|
2025-03-06 10:53:03 +08:00 |
|
Tong Li
|
b96d69055e
|
grpo consumer
|
2025-03-06 10:51:27 +08:00 |
|
Tong Li
|
070907dd7f
|
polish
|
2025-02-28 10:16:42 +08:00 |
|
Tong Li
|
f736d747e3
|
update grpo
|
2025-02-25 18:12:04 +08:00 |
|
Tong Li
|
ffd3878a1e
|
add simple grpo
|
2025-02-23 22:54:26 +08:00 |
|
Tong Li
|
8e6c9a4ab3
|
add reward related function
|
2025-02-23 11:02:54 +08:00 |
|
Hongxin Liu
|
de282dd694
|
[feature] fit RL style generation (#6213)
* [feature] fit rl style generation
* [doc] add docstr
* [doc] add docstr
|
2025-02-21 17:28:19 +08:00 |
|
Hongxin Liu
|
43c9b5fb44
|
[chat] add distributed impl (#6210)
|
2025-02-21 15:24:23 +08:00 |
|