YeAnbang
eb158eb201
fix ci; remove test cases that failed on 3080 (those with tps), can pass locally
2025-11-12 18:35:34 +08:00
YeAnbang
7f91b7e6f5
fix ci; specify flash-attn version
2025-11-11 15:38:41 +08:00
YeAnbang
1b65963c02
fix readme
2025-11-10 15:47:18 +08:00
YeAnbang
4c53210aaf
Merge branch 'grpo-zero-bubble-rebase' of https://github.com/hpcaitech/ColossalAI into grpo-zero-bubble-rebase
2025-11-07 19:22:31 +08:00
YeAnbang
535eba85e2
update readme
2025-11-07 19:19:54 +08:00
pre-commit-ci[bot]
6f7e8595fc
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-11-07 08:18:24 +00:00
YeAnbang
40b6a914f3
all tests passed
2025-11-07 16:08:36 +08:00
YeAnbang
c865de32a5
cherry pick zero bubble RL
2025-11-06 15:12:51 +08:00
YeAnbang
2336d7f6d6
fix racing condition
2025-11-06 10:59:57 +08:00
YeAnbang
ddda79c36f
add entropy
2025-11-06 10:57:32 +08:00
YeAnbang
dba0c0c4ed
fix code evaluation
2025-11-06 10:54:58 +08:00
YeAnbang
b47b610d98
add code for zero-bubble implementation
2025-11-06 10:51:07 +08:00
Yanjia0
e5fdefa6cf
update B200 info/img/benchmark ( #6385 )
...
* Update README.md
text update
* Update README.md
image update
* Update README.md
add benchamrk
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-09-26 14:54:08 +08:00
sglucas
083766d54c
Add new implementations of RL algorithms ( #6383 )
...
* add new algorithm
* move common calculations
* delete data
* move common calculations of rewards
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-09-03 13:48:06 +08:00
Wenxuan Tan
48a673dcb0
[Ring Attention] Add more detailed references ( #6294 )
...
* fix
* fix
2025-08-26 21:51:16 +08:00
YeAnbang
4ac2227488
Merge pull request #6378 from hpcaitech/grpo-latest-rebase-fix-resume
...
[feat] fix resume training
2025-08-18 17:09:53 +08:00
Hanks
b38248d35f
Merge pull request #6376 from hpcaitech/grpo-latest-rebase-main
...
[feat] Add distributed RLFT training framework
2025-08-15 17:24:47 +08:00
YeAnbang
fe1f429574
Merge branch 'grpo-latest-rebase-main' of https://github.com/hpcaitech/ColossalAI into grpo-latest-rebase-main
2025-08-15 10:16:49 +08:00
YeAnbang
4152c0b30f
fix dist log prob test
2025-08-15 10:11:54 +08:00
pre-commit-ci[bot]
73bdfd8891
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-14 11:05:42 +00:00
YeAnbang
99ba48fc40
Merge branch 'grpo-latest-rebase-main' of https://github.com/hpcaitech/ColossalAI into grpo-latest-rebase-main
2025-08-14 19:03:04 +08:00
YeAnbang
762150cf51
fix ci
2025-08-14 19:00:30 +08:00
YeAnbang
bbc5fb4ed8
fix ci
2025-08-14 18:59:54 +08:00
Hanks
94e972fda6
Update timeout
2025-08-14 09:42:21 +08:00
Hanks
c83dc66645
Update timeout
2025-08-14 09:39:49 +08:00
Hanks
9db9892f63
reduce memory consumption
2025-08-13 16:45:43 +08:00
Hanks
b6a5f678cd
reduce memory consumption
2025-08-13 16:37:49 +08:00
YeAnbang
e589ec505e
support resume training
2025-08-12 08:10:56 +00:00
pre-commit-ci[bot]
08a1244ef1
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-06 06:16:38 +00:00
YeAnbang
32b2148670
tested after rebasing, fix importance sampling bug
2025-08-06 06:15:15 +00:00
YeAnbang
3746f73854
fix missing or wrong file during rebase
2025-08-05 14:41:12 +08:00
YeAnbang
118a66fd46
[Fix] Add L2 Regularization ( #6372 )
...
* fix no L2 regularization error
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-05 14:04:42 +08:00
YeAnbang
c7829769e9
hotfix entropy calculation ( #6364 )
2025-08-05 14:04:42 +08:00
YeAnbang
3d9dd34973
add entropy ( #6363 )
2025-08-05 14:04:40 +08:00
YeAnbang
eafbc89b1b
fix style
2025-08-05 14:04:10 +08:00
YeAnbang
352a8e0430
fix code evaluation
2025-08-05 14:04:10 +08:00
YeAnbang
594c2c6522
[feat[ Support one-behind to reduce bubble time. Add profiling code ( #6353 )
...
* support n_behind, add profiling
* fix bugs
* fix visualization
* fix behind
* fix loop issue
* add profiling
* fix update
* update assert
* remove assert
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com >
2025-08-05 14:04:10 +08:00
Tong Li
685e0bd8da
add dp rank for multi-dp ( #6351 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com >
2025-08-05 14:04:10 +08:00
YeAnbang
b314da19f4
fix small bug
2025-08-05 14:04:10 +08:00
YeAnbang
245c8c2fbc
implement memory efficient logprob
2025-08-05 14:04:10 +08:00
YeAnbang
a960990f1e
optimize pp log_softmax OOM
2025-08-05 14:04:10 +08:00
YeAnbang
0f71c79760
fix num_update_per_episode
2025-08-05 14:04:09 +08:00
YeAnbang
73384bea19
Update README.md
2025-08-05 14:04:09 +08:00
YeAnbang
80c576f5ea
add ray timeout handling instruction
2025-08-05 14:04:09 +08:00
YeAnbang
79a7b99fe6
update readme
2025-08-05 14:04:09 +08:00
YeAnbang
6a0b809fd1
modify readme
2025-08-05 14:04:09 +08:00
YeAnbang
3b3c48d9a8
Manually schedule resources and support auto master address assigning
2025-08-05 14:04:09 +08:00
Tong Li
3a4681fdd9
fix pp memory issue ( #6344 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com >
2025-08-05 14:04:09 +08:00
Tong Li
6ae54a6dce
move out evaluation func ( #6343 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com >
2025-08-05 14:04:09 +08:00
pre-commit-ci[bot]
72b2d989df
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-05 14:04:09 +08:00