Commit Graph

3874 Commits

Author SHA1 Message Date
duanjunwen
455185345e [Feature] Support Distributed LogProb for GRPO Training (#6247)
* [fix] fix qwen VocabParallelLMHead1D and gather output

* fix tp bug

* fix consumer

* [feat] Support Distributed LogProb for GRPO Training

* [fix] fix loss func

* [fix] fix log prob plugin

* [fix] fix qwen modeling param

* [fix] rm comments

* [fix] rm hard-code;fix non-dist version

* [fix] fix test file param name and benchmark tp gather output=True/False

* [fix] rm non-dist version in dist log prob

* [fix] fix comments

* [fix] fix dis log prob plugin

* [fix] fix test case

* [fix] fix qwen VocabParallelLMHead1D and gather output

* [fix] fix DistLogProb comments

* [fix] restore tp size

* [fix] fix comments

* [fix] fix comment; fix LogSoftmax usage

---------

Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 13:59:02 +08:00
YeAnbang
35dabd718e fix transformers backend 2025-08-05 13:59:02 +08:00
Tong Li
e224673c44 setup update 2025-08-05 13:59:02 +08:00
Tong Li
bfc45829c3 print results 2025-08-05 13:59:02 +08:00
Tong Li
30c7ddd9f1 convert to 8 generation 2025-08-05 13:59:02 +08:00
Tong Li
a2ae82a417 fix consumer 2025-08-05 13:59:02 +08:00
Tong Li
b19355f8f0 fix tp bug 2025-08-05 13:59:02 +08:00
Tong Li
69a1a325ee detach 2025-08-05 13:59:02 +08:00
Tong Li
b951d0b224 add response length 2025-08-05 13:59:02 +08:00
Tong Li
a4862a2349 fix reward score 2025-08-05 13:59:02 +08:00
Tong Li
a537aa1c20 update reward 2025-08-05 13:59:02 +08:00
Tong Li
c8db826782 update reward fn 2025-08-05 13:59:02 +08:00
Tong Li
fe017d34c5 update grpo 2025-08-05 13:59:02 +08:00
pre-commit-ci[bot]
bc538ba049 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-05 13:59:02 +08:00
pre-commit-ci[bot]
f71d422690 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-05 13:59:01 +08:00
Tong Li
246f16d7bc update select algo 2025-08-05 13:59:01 +08:00
Tong Li
88eb6e5f04 add save 2025-08-05 13:59:01 +08:00
Tong Li
1f15dc70df add algo selection 2025-08-05 13:59:01 +08:00
Tong Li
cc4cc78169 update loader 2025-08-05 13:59:01 +08:00
Tong Li
5c75d5b07c update example 2025-08-05 13:59:01 +08:00
Tong Li
f8899dda70 update reward fn 2025-08-05 13:59:01 +08:00
Tong Li
9754a11398 update loss 2025-08-05 13:59:01 +08:00
Tong Li
5f178a7d24 grpo consumer 2025-08-05 13:59:01 +08:00
Tong Li
b7842f8a5d modify data loader 2025-08-05 13:59:01 +08:00
Tong Li
718c4b76cc polish 2025-08-05 13:59:01 +08:00
Tong Li
1f07b716bf update grpo 2025-08-05 13:59:01 +08:00
Tong Li
40d601802d add simple grpo 2025-08-05 13:59:01 +08:00
Tong Li
fa1272f9f2 add reward related function 2025-08-05 13:59:01 +08:00
Hongxin Liu
7a2d455136 [feature] fit RL style generation (#6213)
* [feature] fit rl style generation

* [doc] add docstr

* [doc] add docstr
2025-08-05 13:59:01 +08:00
Hongxin Liu
162bb42321 [chat] add distributed impl (#6210) 2025-08-05 13:59:01 +08:00
Hanks
edd65a84dd
Merge pull request #6362 from hpcaitech/CI/test_build_on_schedule
[CI] Fix CI build on schedule error
2025-07-15 14:25:10 +08:00
botbw
908c634686 [CI] disable timm_regnetv_040 as aten::_unique2 is not supproted 2025-07-14 07:50:38 +00:00
botbw
e285eb6993 [CI] install flash-attn 2.7.4.post1 2025-07-14 02:38:02 +00:00
botbw
d097224d90
[feat] support qwen3 in shardformer 2025-07-10 13:57:52 +08:00
Hanks
97f4bee9d8
Merge pull request #6340 from hpcaitech/release/v0.5.0
[release] update version
2025-06-04 13:57:10 +08:00
BurkeHulk
e00c9bbf38 upgrade python 2025-06-03 18:51:39 +08:00
BurkeHulk
91f08c64a7 upgrade python 2025-06-03 18:41:37 +08:00
BurkeHulk
043c46941c upgrade python 2025-06-03 18:38:07 +08:00
Hanks
916a8fef0e
Update release_test_pypi_before_merge.yml 2025-06-03 18:25:01 +08:00
Hanks
0ba96e88d2
Update release_test_pypi_before_merge.yml 2025-06-03 18:12:19 +08:00
Hanks
b9535f3c44
Update version.txt 2025-06-03 17:56:08 +08:00
Hanks
c4fe9e812e
Update release_pypi_after_merge.yml 2025-06-03 17:55:49 +08:00
Hanks
6dfedea98b
Update release_test_pypi_before_merge.yml 2025-06-03 17:55:21 +08:00
Hanks
b4ec405778
Merge pull request #6336 from BurkeHulk/fix/update-test-config
[fix] fix CI machine tag
2025-06-03 09:46:06 +08:00
BurkeHulk
067dd43246 fix pre-commit err 2025-06-02 18:13:34 +08:00
BurkeHulk
c9cba49ab5 fix CI machine tag 2025-06-02 17:45:40 +08:00
Hanks
fd56b22278
Merge pull request #6334 from flybird11111/main
[release] release version
2025-06-02 17:08:24 +08:00
Hanks
6f19618bb4
[fix] fix_lazy_init for deepseek model in transformers 2025-06-02 11:31:45 +08:00
Hanks
060102372e
Update release_pypi_after_merge.yml 2025-05-30 16:54:27 +08:00
Hanks
374dcd4da9
Update release_test_pypi_before_merge.yml 2025-05-30 16:09:14 +08:00