YeAnbang
4152c0b30f
fix dist log prob test
2025-08-15 10:11:54 +08:00
YeAnbang
99ba48fc40
Merge branch 'grpo-latest-rebase-main' of https://github.com/hpcaitech/ColossalAI into grpo-latest-rebase-main
2025-08-14 19:03:04 +08:00
YeAnbang
762150cf51
fix ci
2025-08-14 19:00:30 +08:00
YeAnbang
bbc5fb4ed8
fix ci
2025-08-14 18:59:54 +08:00
Hanks
94e972fda6
Update timeout
2025-08-14 09:42:21 +08:00
Hanks
c83dc66645
Update timeout
2025-08-14 09:39:49 +08:00
Hanks
9db9892f63
reduce memory consumption
2025-08-13 16:45:43 +08:00
Hanks
b6a5f678cd
reduce memory consumption
2025-08-13 16:37:49 +08:00
pre-commit-ci[bot]
08a1244ef1
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-06 06:16:38 +00:00
YeAnbang
32b2148670
tested after rebasing, fix importance sampling bug
2025-08-06 06:15:15 +00:00
YeAnbang
3746f73854
fix missing or wrong file during rebase
2025-08-05 14:41:12 +08:00
YeAnbang
118a66fd46
[Fix] Add L2 Regularization ( #6372 )
...
* fix no L2 regularization error
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-05 14:04:42 +08:00
YeAnbang
c7829769e9
hotfix entropy calculation ( #6364 )
2025-08-05 14:04:42 +08:00
YeAnbang
3d9dd34973
add entropy ( #6363 )
2025-08-05 14:04:40 +08:00
YeAnbang
eafbc89b1b
fix style
2025-08-05 14:04:10 +08:00
YeAnbang
352a8e0430
fix code evaluation
2025-08-05 14:04:10 +08:00
YeAnbang
594c2c6522
[feat[ Support one-behind to reduce bubble time. Add profiling code ( #6353 )
...
* support n_behind, add profiling
* fix bugs
* fix visualization
* fix behind
* fix loop issue
* add profiling
* fix update
* update assert
* remove assert
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:04:10 +08:00
Tong Li
685e0bd8da
add dp rank for multi-dp ( #6351 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:04:10 +08:00
YeAnbang
b314da19f4
fix small bug
2025-08-05 14:04:10 +08:00
YeAnbang
245c8c2fbc
implement memory efficient logprob
2025-08-05 14:04:10 +08:00
YeAnbang
a960990f1e
optimize pp log_softmax OOM
2025-08-05 14:04:10 +08:00
YeAnbang
0f71c79760
fix num_update_per_episode
2025-08-05 14:04:09 +08:00
YeAnbang
73384bea19
Update README.md
2025-08-05 14:04:09 +08:00
YeAnbang
80c576f5ea
add ray timeout handling instruction
2025-08-05 14:04:09 +08:00
YeAnbang
79a7b99fe6
update readme
2025-08-05 14:04:09 +08:00
YeAnbang
6a0b809fd1
modify readme
2025-08-05 14:04:09 +08:00
YeAnbang
3b3c48d9a8
Manually schedule resources and support auto master address assigning
2025-08-05 14:04:09 +08:00
Tong Li
3a4681fdd9
fix pp memory issue ( #6344 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:04:09 +08:00
Tong Li
6ae54a6dce
move out evaluation func ( #6343 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:04:09 +08:00
pre-commit-ci[bot]
72b2d989df
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-05 14:04:09 +08:00
YeAnbang
9dbb0ff89f
remove debug code
2025-08-05 14:04:09 +08:00
YeAnbang
de40c736d0
fix bug, tested
2025-08-05 14:04:09 +08:00
YeAnbang
177144794b
support code generation tasks
2025-08-05 14:04:09 +08:00
YeAnbang
a9a3f374e5
fix typ and parameter description
2025-08-05 14:04:09 +08:00
pre-commit-ci[bot]
8d52441f6d
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-05 14:04:08 +08:00
Tong Li
a246bf25c3
add overlength sample count ( #6332 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:02:59 +08:00
YeAnbang
60510010d1
address conversation
2025-08-05 14:02:36 +08:00
Tong Li
382307a62c
fix default eval setting ( #6321 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:02:35 +08:00
YeAnbang
2a39d3afd9
address conversation
2025-08-05 14:02:02 +08:00
YeAnbang
4b1c515f52
fix missing tags parameter
2025-08-05 14:01:45 +08:00
Tong Li
5bbfe1567f
fix empty tensor ( #6319 )
...
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 14:01:45 +08:00
YeAnbang
70c3daa4ee
add uuid to rollout log
2025-08-05 14:01:43 +08:00
YeAnbang
06cfbe313b
fix metric calculation
2025-08-05 14:01:20 +08:00
YeAnbang
c7c73df60a
fix logging rollouts
2025-08-05 14:01:20 +08:00
YeAnbang
9cbc5dd924
upgrade reward functions
2025-08-05 14:01:20 +08:00
YeAnbang
6095274be6
support logging rollouts to wandb
2025-08-05 14:01:20 +08:00
YeAnbang
654aefc3c3
address conversation
2025-08-05 14:01:18 +08:00
YeAnbang
e7f61be51a
fix evaluation
2025-08-05 14:00:44 +08:00
Tong Li
6ebd813b5f
handle empty index
2025-08-05 14:00:43 +08:00
YeAnbang
88f49ddc5e
remove redundant code and fix bugs
2025-08-05 13:59:56 +08:00