YeAnbang
eb158eb201
fix ci; remove test cases that failed on 3080 (those with tps), can pass locally
2025-11-12 18:35:34 +08:00
YeAnbang
1b65963c02
fix readme
2025-11-10 15:47:18 +08:00
YeAnbang
4c53210aaf
Merge branch 'grpo-zero-bubble-rebase' of https://github.com/hpcaitech/ColossalAI into grpo-zero-bubble-rebase
2025-11-07 19:22:31 +08:00
YeAnbang
535eba85e2
update readme
2025-11-07 19:19:54 +08:00
pre-commit-ci[bot]
6f7e8595fc
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-11-07 08:18:24 +00:00
YeAnbang
40b6a914f3
all tests passed
2025-11-07 16:08:36 +08:00
YeAnbang
c865de32a5
cherry pick zero bubble RL
2025-11-06 15:12:51 +08:00
YeAnbang
2336d7f6d6
fix racing condition
2025-11-06 10:59:57 +08:00
YeAnbang
ddda79c36f
add entropy
2025-11-06 10:57:32 +08:00
YeAnbang
b47b610d98
add code for zero-bubble implementation
2025-11-06 10:51:07 +08:00
sglucas
083766d54c
Add new implementations of RL algorithms ( #6383 )
...
* add new algorithm
* move common calculations
* delete data
* move common calculations of rewards
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-09-03 13:48:06 +08:00
YeAnbang
4ac2227488
Merge pull request #6378 from hpcaitech/grpo-latest-rebase-fix-resume
...
[feat] fix resume training
2025-08-18 17:09:53 +08:00
pre-commit-ci[bot]
73bdfd8891
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-14 11:05:42 +00:00
YeAnbang
99ba48fc40
Merge branch 'grpo-latest-rebase-main' of https://github.com/hpcaitech/ColossalAI into grpo-latest-rebase-main
2025-08-14 19:03:04 +08:00
YeAnbang
762150cf51
fix ci
2025-08-14 19:00:30 +08:00
YeAnbang
bbc5fb4ed8
fix ci
2025-08-14 18:59:54 +08:00
YeAnbang
e589ec505e
support resume training
2025-08-12 08:10:56 +00:00
pre-commit-ci[bot]
08a1244ef1
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-06 06:16:38 +00:00
YeAnbang
32b2148670
tested after rebasing, fix importance sampling bug
2025-08-06 06:15:15 +00:00
YeAnbang
3746f73854
fix missing or wrong file during rebase
2025-08-05 14:41:12 +08:00
YeAnbang
118a66fd46
[Fix] Add L2 Regularization ( #6372 )
...
* fix no L2 regularization error
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-05 14:04:42 +08:00
YeAnbang
c7829769e9
hotfix entropy calculation ( #6364 )
2025-08-05 14:04:42 +08:00
YeAnbang
3d9dd34973
add entropy ( #6363 )
2025-08-05 14:04:40 +08:00
YeAnbang
eafbc89b1b
fix style
2025-08-05 14:04:10 +08:00
YeAnbang
352a8e0430
fix code evaluation
2025-08-05 14:04:10 +08:00
YeAnbang
594c2c6522
[feat[ Support one-behind to reduce bubble time. Add profiling code ( #6353 )
...
* support n_behind, add profiling
* fix bugs
* fix visualization
* fix behind
* fix loop issue
* add profiling
* fix update
* update assert
* remove assert
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com >
2025-08-05 14:04:10 +08:00
Tong Li
685e0bd8da
add dp rank for multi-dp ( #6351 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com >
2025-08-05 14:04:10 +08:00
YeAnbang
b314da19f4
fix small bug
2025-08-05 14:04:10 +08:00
YeAnbang
245c8c2fbc
implement memory efficient logprob
2025-08-05 14:04:10 +08:00
YeAnbang
a960990f1e
optimize pp log_softmax OOM
2025-08-05 14:04:10 +08:00
YeAnbang
0f71c79760
fix num_update_per_episode
2025-08-05 14:04:09 +08:00
YeAnbang
73384bea19
Update README.md
2025-08-05 14:04:09 +08:00
YeAnbang
80c576f5ea
add ray timeout handling instruction
2025-08-05 14:04:09 +08:00
YeAnbang
79a7b99fe6
update readme
2025-08-05 14:04:09 +08:00
YeAnbang
6a0b809fd1
modify readme
2025-08-05 14:04:09 +08:00
YeAnbang
3b3c48d9a8
Manually schedule resources and support auto master address assigning
2025-08-05 14:04:09 +08:00
Tong Li
3a4681fdd9
fix pp memory issue ( #6344 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com >
2025-08-05 14:04:09 +08:00
Tong Li
6ae54a6dce
move out evaluation func ( #6343 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com >
2025-08-05 14:04:09 +08:00
pre-commit-ci[bot]
72b2d989df
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-05 14:04:09 +08:00
YeAnbang
9dbb0ff89f
remove debug code
2025-08-05 14:04:09 +08:00
YeAnbang
de40c736d0
fix bug, tested
2025-08-05 14:04:09 +08:00
YeAnbang
177144794b
support code generation tasks
2025-08-05 14:04:09 +08:00
YeAnbang
a9a3f374e5
fix typ and parameter description
2025-08-05 14:04:09 +08:00
pre-commit-ci[bot]
8d52441f6d
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-05 14:04:08 +08:00
Tong Li
a246bf25c3
add overlength sample count ( #6332 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com >
2025-08-05 14:02:59 +08:00
YeAnbang
60510010d1
address conversation
2025-08-05 14:02:36 +08:00
Tong Li
382307a62c
fix default eval setting ( #6321 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com >
2025-08-05 14:02:35 +08:00
YeAnbang
2a39d3afd9
address conversation
2025-08-05 14:02:02 +08:00
YeAnbang
4b1c515f52
fix missing tags parameter
2025-08-05 14:01:45 +08:00
Tong Li
5bbfe1567f
fix empty tensor ( #6319 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com >
2025-08-05 14:01:45 +08:00