pre-commit-ci[bot]
72b2d989df
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-05 14:04:09 +08:00
YeAnbang
9dbb0ff89f
remove debug code
2025-08-05 14:04:09 +08:00
YeAnbang
de40c736d0
fix bug, tested
2025-08-05 14:04:09 +08:00
YeAnbang
177144794b
support code generation tasks
2025-08-05 14:04:09 +08:00
YeAnbang
a9a3f374e5
fix typ and parameter description
2025-08-05 14:04:09 +08:00
pre-commit-ci[bot]
8d52441f6d
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-05 14:04:08 +08:00
Tong Li
a246bf25c3
add overlength sample count ( #6332 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:02:59 +08:00
YeAnbang
60510010d1
address conversation
2025-08-05 14:02:36 +08:00
Tong Li
382307a62c
fix default eval setting ( #6321 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:02:35 +08:00
YeAnbang
2a39d3afd9
address conversation
2025-08-05 14:02:02 +08:00
YeAnbang
4b1c515f52
fix missing tags parameter
2025-08-05 14:01:45 +08:00
Tong Li
5bbfe1567f
fix empty tensor ( #6319 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:01:45 +08:00
YeAnbang
70c3daa4ee
add uuid to rollout log
2025-08-05 14:01:43 +08:00
YeAnbang
06cfbe313b
fix metric calculation
2025-08-05 14:01:20 +08:00
YeAnbang
c7c73df60a
fix logging rollouts
2025-08-05 14:01:20 +08:00
YeAnbang
9cbc5dd924
upgrade reward functions
2025-08-05 14:01:20 +08:00
YeAnbang
6095274be6
support logging rollouts to wandb
2025-08-05 14:01:20 +08:00
YeAnbang
654aefc3c3
address conversation
2025-08-05 14:01:18 +08:00
YeAnbang
e7f61be51a
fix evaluation
2025-08-05 14:00:44 +08:00
Tong Li
6ebd813b5f
handle empty index
2025-08-05 14:00:43 +08:00
YeAnbang
88f49ddc5e
remove redundant code and fix bugs
2025-08-05 13:59:56 +08:00
YeAnbang
d19f1f21b6
move prompt-level-filtering to buffer side
2025-08-05 13:59:56 +08:00
YeAnbang
f79dbdb2df
move prompt-level-filtering to buffer side
2025-08-05 13:59:56 +08:00
YeAnbang
0d0fef771f
disable wandb tb syncing
2025-08-05 13:59:56 +08:00
YeAnbang
280aa0b830
use consumer global step
2025-08-05 13:59:56 +08:00
Tong Li
5a6e4a6d75
[feat] Support prompt level dynamic ( #6300 )
...
* adjust to dynamic prompt bs
* remove debug
* update pad seq (#6303 )
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
* adjust to dynamic prompt bs
* remove debug
* fix dp issue
* fix
* fix default settings
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 13:59:53 +08:00
YeAnbang
3416a4fc9c
move logging to producer
2025-08-05 13:59:03 +08:00
YeAnbang
af4366f0cb
Support evaluation during training
2025-08-05 13:59:03 +08:00
Tong Li
4ac7d065a6
update pad seq ( #6303 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 13:59:03 +08:00
YeAnbang
9544c51a74
[fix] revert reward update and evaluation ( #6295 )
...
* Revert "rewrite reward fn"
This reverts commit d06042b434
.
* Revert "upgrade reward math verification"
This reverts commit a6085ff676
.
* Revert "fix bug"
This reverts commit 01640ebd65
.
* Revert "reuse comm-group"
This reverts commit bd61918dcf
.
* Revert "Support evaluation during training"
This reverts commit 57a88395fe
.
2025-08-05 13:59:02 +08:00
YeAnbang
06b892bf4d
rewrite reward fn
2025-08-05 13:59:02 +08:00
YeAnbang
9642b75581
upgrade reward math verification
2025-08-05 13:59:02 +08:00
YeAnbang
1be993de3e
fix bug
2025-08-05 13:59:02 +08:00
YeAnbang
de0c267f5a
reuse comm-group
2025-08-05 13:59:02 +08:00
YeAnbang
16600f3509
Support evaluation during training
2025-08-05 13:59:02 +08:00
Tong Li
6a1bd833e0
[feat] Sync shard model ( #6289 )
...
* [feat] support hybrid parallel model sync
* update consumer and producer
* update files
* update producer
* remove print
* update
---------
Co-authored-by: duanjunwen <935724073@qq.com>
Co-authored-by: YeAnbang <44796419+YeAnbang@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 13:59:02 +08:00
YeAnbang
e181318d51
[feat] Support boxed math reward ( #6284 )
...
* fix pp+tp, fix dataloader
* fixed plugin micro-batch size
* support boxed reward
* add boxed reward
* fix pp state dict incomplete issue
* Revert "fix pp state dict incomplete issue"
This reverts commit 6c1b3b694f
.
2025-08-05 13:59:02 +08:00
YeAnbang
fb4e507d00
fix pp+tp, fix dataloader ( #6280 )
2025-08-05 13:59:02 +08:00
Tong Li
37a8be7651
fix save issue ( #6279 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 13:59:02 +08:00
YeAnbang
673682e716
fix checkpoint naming; add num_epoch parameter ( #6277 )
2025-08-05 13:59:02 +08:00
YeAnbang
5f913e8b77
[feat] Support DAPO ( #6263 )
...
* update help information
* update style
* fix
* minor fix
* support PP training
* add pp support
* remove unused code
* address conversation
* fix memory leakage support tp+pp
* move empty cache
* move empty cache
* add DAPO support
* remove format reward
* fix filtering, still buggy
* small fix
* add DAPO support
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* tested multi-node training; fix bind_batch bug
* fix conversation; support sleep mode
* support reusing excessive samples
* add dynamic batching control flag
* add dynamic batching control flag
* refactored
* fix logging
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-05 13:59:02 +08:00
Tong Li
b34d707cdc
[feat] Add final save at the end ( #6274 )
...
* add final save
* default 1 episode
2025-08-05 13:59:02 +08:00
Tong Li
befd4f1487
add prompt template ( #6273 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 13:59:02 +08:00
YeAnbang
3bd6fa3c67
[hot-fix] Fix memory leakage bug, support TP+PP ( #6258 )
...
* update help information
* update style
* fix
* minor fix
* support PP training
* add pp support
* remove unused code
* address conversation
* fix memory leakage support tp+pp
* move empty cache
* move empty cache
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 13:59:02 +08:00
YeAnbang
5d79b9e692
[Distributed RLHF] Integration of PP ( #6257 )
...
* update help information
* update style
* fix
* minor fix
* support PP training
* add pp support
* remove unused code
* address conversation
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 13:59:02 +08:00
YeAnbang
12da4d14aa
[feat] add microbatch forwarding ( #6251 )
...
* add microbatch forwarding
* fix forward microbatch
* fix producer OOM
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* change project name
* fix temperature annealing
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* address conversation
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-05 13:59:02 +08:00
YeAnbang
c627b60551
update logging
2025-08-05 13:59:02 +08:00
YeAnbang
23aac43dcf
simplify vllm preprocessing input ids
2025-08-05 13:59:02 +08:00
YeAnbang
16e68a071d
fix logprob, add filtering, temperature annealing, lr descent
2025-08-05 13:59:02 +08:00
YeAnbang
f983071b10
fix vllm
2025-08-05 13:59:02 +08:00