Commit Graph

3973 Commits

Author SHA1 Message Date
YeAnbang
eb158eb201 fix ci; remove test cases that failed on 3080 (those with tps), can pass locally 2025-11-12 18:35:34 +08:00
YeAnbang
7f91b7e6f5 fix ci; specify flash-attn version 2025-11-11 15:38:41 +08:00
YeAnbang
1b65963c02 fix readme 2025-11-10 15:47:18 +08:00
YeAnbang
4c53210aaf Merge branch 'grpo-zero-bubble-rebase' of https://github.com/hpcaitech/ColossalAI into grpo-zero-bubble-rebase 2025-11-07 19:22:31 +08:00
YeAnbang
535eba85e2 update readme 2025-11-07 19:19:54 +08:00
pre-commit-ci[bot]
6f7e8595fc [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-11-07 08:18:24 +00:00
YeAnbang
40b6a914f3 all tests passed 2025-11-07 16:08:36 +08:00
YeAnbang
c865de32a5 cherry pick zero bubble RL 2025-11-06 15:12:51 +08:00
YeAnbang
2336d7f6d6 fix racing condition 2025-11-06 10:59:57 +08:00
YeAnbang
ddda79c36f add entropy 2025-11-06 10:57:32 +08:00
YeAnbang
dba0c0c4ed fix code evaluation 2025-11-06 10:54:58 +08:00
YeAnbang
b47b610d98 add code for zero-bubble implementation 2025-11-06 10:51:07 +08:00
Yanjia0
e5fdefa6cf update B200 info/img/benchmark (#6385)
* Update README.md

text update

* Update README.md

image update

* Update README.md

add benchamrk

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-09-26 14:54:08 +08:00
sglucas
083766d54c Add new implementations of RL algorithms (#6383)
* add new algorithm

* move common calculations

* delete data

* move common calculations of rewards

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-09-03 13:48:06 +08:00
Wenxuan Tan
48a673dcb0 [Ring Attention] Add more detailed references (#6294)
* fix

* fix
2025-08-26 21:51:16 +08:00
YeAnbang
4ac2227488 Merge pull request #6378 from hpcaitech/grpo-latest-rebase-fix-resume
[feat] fix resume training
2025-08-18 17:09:53 +08:00
Hanks
b38248d35f Merge pull request #6376 from hpcaitech/grpo-latest-rebase-main
[feat] Add distributed RLFT training framework
2025-08-15 17:24:47 +08:00
YeAnbang
fe1f429574 Merge branch 'grpo-latest-rebase-main' of https://github.com/hpcaitech/ColossalAI into grpo-latest-rebase-main 2025-08-15 10:16:49 +08:00
YeAnbang
4152c0b30f fix dist log prob test 2025-08-15 10:11:54 +08:00
pre-commit-ci[bot]
73bdfd8891 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-14 11:05:42 +00:00
YeAnbang
99ba48fc40 Merge branch 'grpo-latest-rebase-main' of https://github.com/hpcaitech/ColossalAI into grpo-latest-rebase-main 2025-08-14 19:03:04 +08:00
YeAnbang
762150cf51 fix ci 2025-08-14 19:00:30 +08:00
YeAnbang
bbc5fb4ed8 fix ci 2025-08-14 18:59:54 +08:00
Hanks
94e972fda6 Update timeout 2025-08-14 09:42:21 +08:00
Hanks
c83dc66645 Update timeout 2025-08-14 09:39:49 +08:00
Hanks
9db9892f63 reduce memory consumption 2025-08-13 16:45:43 +08:00
Hanks
b6a5f678cd reduce memory consumption 2025-08-13 16:37:49 +08:00
YeAnbang
e589ec505e support resume training 2025-08-12 08:10:56 +00:00
pre-commit-ci[bot]
08a1244ef1 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-06 06:16:38 +00:00
YeAnbang
32b2148670 tested after rebasing, fix importance sampling bug 2025-08-06 06:15:15 +00:00
YeAnbang
3746f73854 fix missing or wrong file during rebase 2025-08-05 14:41:12 +08:00
YeAnbang
118a66fd46 [Fix] Add L2 Regularization (#6372)
* fix no L2 regularization error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-05 14:04:42 +08:00
YeAnbang
c7829769e9 hotfix entropy calculation (#6364) 2025-08-05 14:04:42 +08:00
YeAnbang
3d9dd34973 add entropy (#6363) 2025-08-05 14:04:40 +08:00
YeAnbang
eafbc89b1b fix style 2025-08-05 14:04:10 +08:00
YeAnbang
352a8e0430 fix code evaluation 2025-08-05 14:04:10 +08:00
YeAnbang
594c2c6522 [feat[ Support one-behind to reduce bubble time. Add profiling code (#6353)
* support n_behind, add profiling

* fix bugs

* fix visualization

* fix behind

* fix loop issue

* add profiling

* fix update

* update assert

* remove assert

---------

Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:04:10 +08:00
Tong Li
685e0bd8da add dp rank for multi-dp (#6351)
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:04:10 +08:00
YeAnbang
b314da19f4 fix small bug 2025-08-05 14:04:10 +08:00
YeAnbang
245c8c2fbc implement memory efficient logprob 2025-08-05 14:04:10 +08:00
YeAnbang
a960990f1e optimize pp log_softmax OOM 2025-08-05 14:04:10 +08:00
YeAnbang
0f71c79760 fix num_update_per_episode 2025-08-05 14:04:09 +08:00
YeAnbang
73384bea19 Update README.md 2025-08-05 14:04:09 +08:00
YeAnbang
80c576f5ea add ray timeout handling instruction 2025-08-05 14:04:09 +08:00
YeAnbang
79a7b99fe6 update readme 2025-08-05 14:04:09 +08:00
YeAnbang
6a0b809fd1 modify readme 2025-08-05 14:04:09 +08:00
YeAnbang
3b3c48d9a8 Manually schedule resources and support auto master address assigning 2025-08-05 14:04:09 +08:00
Tong Li
3a4681fdd9 fix pp memory issue (#6344)
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:04:09 +08:00
Tong Li
6ae54a6dce move out evaluation func (#6343)
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:04:09 +08:00
pre-commit-ci[bot]
72b2d989df [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-05 14:04:09 +08:00