YeAnbang
|
352a8e0430
|
fix code evaluation
|
2025-08-05 14:04:10 +08:00 |
|
YeAnbang
|
594c2c6522
|
[feat[ Support one-behind to reduce bubble time. Add profiling code (#6353)
* support n_behind, add profiling
* fix bugs
* fix visualization
* fix behind
* fix loop issue
* add profiling
* fix update
* update assert
* remove assert
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
|
2025-08-05 14:04:10 +08:00 |
|
Tong Li
|
685e0bd8da
|
add dp rank for multi-dp (#6351)
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
|
2025-08-05 14:04:10 +08:00 |
|
YeAnbang
|
b314da19f4
|
fix small bug
|
2025-08-05 14:04:10 +08:00 |
|
YeAnbang
|
245c8c2fbc
|
implement memory efficient logprob
|
2025-08-05 14:04:10 +08:00 |
|
YeAnbang
|
a960990f1e
|
optimize pp log_softmax OOM
|
2025-08-05 14:04:10 +08:00 |
|
YeAnbang
|
0f71c79760
|
fix num_update_per_episode
|
2025-08-05 14:04:09 +08:00 |
|
YeAnbang
|
73384bea19
|
Update README.md
|
2025-08-05 14:04:09 +08:00 |
|
YeAnbang
|
80c576f5ea
|
add ray timeout handling instruction
|
2025-08-05 14:04:09 +08:00 |
|
YeAnbang
|
79a7b99fe6
|
update readme
|
2025-08-05 14:04:09 +08:00 |
|
YeAnbang
|
6a0b809fd1
|
modify readme
|
2025-08-05 14:04:09 +08:00 |
|
YeAnbang
|
3b3c48d9a8
|
Manually schedule resources and support auto master address assigning
|
2025-08-05 14:04:09 +08:00 |
|
Tong Li
|
3a4681fdd9
|
fix pp memory issue (#6344)
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
|
2025-08-05 14:04:09 +08:00 |
|
Tong Li
|
6ae54a6dce
|
move out evaluation func (#6343)
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
|
2025-08-05 14:04:09 +08:00 |
|
pre-commit-ci[bot]
|
72b2d989df
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2025-08-05 14:04:09 +08:00 |
|
YeAnbang
|
9dbb0ff89f
|
remove debug code
|
2025-08-05 14:04:09 +08:00 |
|
YeAnbang
|
de40c736d0
|
fix bug, tested
|
2025-08-05 14:04:09 +08:00 |
|
YeAnbang
|
177144794b
|
support code generation tasks
|
2025-08-05 14:04:09 +08:00 |
|
YeAnbang
|
a9a3f374e5
|
fix typ and parameter description
|
2025-08-05 14:04:09 +08:00 |
|
pre-commit-ci[bot]
|
8d52441f6d
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2025-08-05 14:04:08 +08:00 |
|
Tong Li
|
a246bf25c3
|
add overlength sample count (#6332)
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
|
2025-08-05 14:02:59 +08:00 |
|
YeAnbang
|
60510010d1
|
address conversation
|
2025-08-05 14:02:36 +08:00 |
|
Tong Li
|
382307a62c
|
fix default eval setting (#6321)
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
|
2025-08-05 14:02:35 +08:00 |
|
YeAnbang
|
2a39d3afd9
|
address conversation
|
2025-08-05 14:02:02 +08:00 |
|
YeAnbang
|
4b1c515f52
|
fix missing tags parameter
|
2025-08-05 14:01:45 +08:00 |
|
Tong Li
|
5bbfe1567f
|
fix empty tensor (#6319)
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
|
2025-08-05 14:01:45 +08:00 |
|
YeAnbang
|
70c3daa4ee
|
add uuid to rollout log
|
2025-08-05 14:01:43 +08:00 |
|
YeAnbang
|
06cfbe313b
|
fix metric calculation
|
2025-08-05 14:01:20 +08:00 |
|
YeAnbang
|
c7c73df60a
|
fix logging rollouts
|
2025-08-05 14:01:20 +08:00 |
|
YeAnbang
|
9cbc5dd924
|
upgrade reward functions
|
2025-08-05 14:01:20 +08:00 |
|
YeAnbang
|
6095274be6
|
support logging rollouts to wandb
|
2025-08-05 14:01:20 +08:00 |
|
YeAnbang
|
654aefc3c3
|
address conversation
|
2025-08-05 14:01:18 +08:00 |
|
YeAnbang
|
e7f61be51a
|
fix evaluation
|
2025-08-05 14:00:44 +08:00 |
|
Tong Li
|
6ebd813b5f
|
handle empty index
|
2025-08-05 14:00:43 +08:00 |
|
YeAnbang
|
88f49ddc5e
|
remove redundant code and fix bugs
|
2025-08-05 13:59:56 +08:00 |
|
YeAnbang
|
d19f1f21b6
|
move prompt-level-filtering to buffer side
|
2025-08-05 13:59:56 +08:00 |
|
YeAnbang
|
f79dbdb2df
|
move prompt-level-filtering to buffer side
|
2025-08-05 13:59:56 +08:00 |
|
YeAnbang
|
0d0fef771f
|
disable wandb tb syncing
|
2025-08-05 13:59:56 +08:00 |
|
YeAnbang
|
280aa0b830
|
use consumer global step
|
2025-08-05 13:59:56 +08:00 |
|
Tong Li
|
5a6e4a6d75
|
[feat] Support prompt level dynamic (#6300)
* adjust to dynamic prompt bs
* remove debug
* update pad seq (#6303)
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
* adjust to dynamic prompt bs
* remove debug
* fix dp issue
* fix
* fix default settings
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
|
2025-08-05 13:59:53 +08:00 |
|
YeAnbang
|
3416a4fc9c
|
move logging to producer
|
2025-08-05 13:59:03 +08:00 |
|
YeAnbang
|
af4366f0cb
|
Support evaluation during training
|
2025-08-05 13:59:03 +08:00 |
|
Tong Li
|
4ac7d065a6
|
update pad seq (#6303)
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
|
2025-08-05 13:59:03 +08:00 |
|
YeAnbang
|
9544c51a74
|
[fix] revert reward update and evaluation (#6295)
* Revert "rewrite reward fn"
This reverts commit d06042b434 .
* Revert "upgrade reward math verification"
This reverts commit a6085ff676 .
* Revert "fix bug"
This reverts commit 01640ebd65 .
* Revert "reuse comm-group"
This reverts commit bd61918dcf .
* Revert "Support evaluation during training"
This reverts commit 57a88395fe .
|
2025-08-05 13:59:02 +08:00 |
|
YeAnbang
|
06b892bf4d
|
rewrite reward fn
|
2025-08-05 13:59:02 +08:00 |
|
YeAnbang
|
9642b75581
|
upgrade reward math verification
|
2025-08-05 13:59:02 +08:00 |
|
YeAnbang
|
1be993de3e
|
fix bug
|
2025-08-05 13:59:02 +08:00 |
|
YeAnbang
|
de0c267f5a
|
reuse comm-group
|
2025-08-05 13:59:02 +08:00 |
|
YeAnbang
|
16600f3509
|
Support evaluation during training
|
2025-08-05 13:59:02 +08:00 |
|
Tong Li
|
6a1bd833e0
|
[feat] Sync shard model (#6289)
* [feat] support hybrid parallel model sync
* update consumer and producer
* update files
* update producer
* remove print
* update
---------
Co-authored-by: duanjunwen <935724073@qq.com>
Co-authored-by: YeAnbang <44796419+YeAnbang@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
|
2025-08-05 13:59:02 +08:00 |
|