YeAnbang
3bd6fa3c67
[hot-fix] Fix memory leakage bug, support TP+PP ( #6258 )
...
* update help information
* update style
* fix
* minor fix
* support PP training
* add pp support
* remove unused code
* address conversation
* fix memory leakage support tp+pp
* move empty cache
* move empty cache
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 13:59:02 +08:00
YeAnbang
5d79b9e692
[Distributed RLHF] Integration of PP ( #6257 )
...
* update help information
* update style
* fix
* minor fix
* support PP training
* add pp support
* remove unused code
* address conversation
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 13:59:02 +08:00
YeAnbang
12da4d14aa
[feat] add microbatch forwarding ( #6251 )
...
* add microbatch forwarding
* fix forward microbatch
* fix producer OOM
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* change project name
* fix temperature annealing
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* address conversation
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-05 13:59:02 +08:00
YeAnbang
c627b60551
update logging
2025-08-05 13:59:02 +08:00
YeAnbang
23aac43dcf
simplify vllm preprocessing input ids
2025-08-05 13:59:02 +08:00
YeAnbang
16e68a071d
fix logprob, add filtering, temperature annealing, lr descent
2025-08-05 13:59:02 +08:00
YeAnbang
f983071b10
fix vllm
2025-08-05 13:59:02 +08:00
duanjunwen
455185345e
[Feature] Support Distributed LogProb for GRPO Training ( #6247 )
...
* [fix] fix qwen VocabParallelLMHead1D and gather output
* fix tp bug
* fix consumer
* [feat] Support Distributed LogProb for GRPO Training
* [fix] fix loss func
* [fix] fix log prob plugin
* [fix] fix qwen modeling param
* [fix] rm comments
* [fix] rm hard-code;fix non-dist version
* [fix] fix test file param name and benchmark tp gather output=True/False
* [fix] rm non-dist version in dist log prob
* [fix] fix comments
* [fix] fix dis log prob plugin
* [fix] fix test case
* [fix] fix qwen VocabParallelLMHead1D and gather output
* [fix] fix DistLogProb comments
* [fix] restore tp size
* [fix] fix comments
* [fix] fix comment; fix LogSoftmax usage
---------
Co-authored-by: Tong Li <tong.li35271158@gmail.com>
2025-08-05 13:59:02 +08:00
YeAnbang
35dabd718e
fix transformers backend
2025-08-05 13:59:02 +08:00
Tong Li
e224673c44
setup update
2025-08-05 13:59:02 +08:00
Tong Li
bfc45829c3
print results
2025-08-05 13:59:02 +08:00
Tong Li
30c7ddd9f1
convert to 8 generation
2025-08-05 13:59:02 +08:00
Tong Li
a2ae82a417
fix consumer
2025-08-05 13:59:02 +08:00
Tong Li
b19355f8f0
fix tp bug
2025-08-05 13:59:02 +08:00
Tong Li
69a1a325ee
detach
2025-08-05 13:59:02 +08:00
Tong Li
b951d0b224
add response length
2025-08-05 13:59:02 +08:00
Tong Li
a4862a2349
fix reward score
2025-08-05 13:59:02 +08:00
Tong Li
a537aa1c20
update reward
2025-08-05 13:59:02 +08:00
Tong Li
c8db826782
update reward fn
2025-08-05 13:59:02 +08:00
Tong Li
fe017d34c5
update grpo
2025-08-05 13:59:02 +08:00
pre-commit-ci[bot]
bc538ba049
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-05 13:59:02 +08:00
pre-commit-ci[bot]
f71d422690
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2025-08-05 13:59:01 +08:00
Tong Li
246f16d7bc
update select algo
2025-08-05 13:59:01 +08:00
Tong Li
88eb6e5f04
add save
2025-08-05 13:59:01 +08:00
Tong Li
1f15dc70df
add algo selection
2025-08-05 13:59:01 +08:00
Tong Li
cc4cc78169
update loader
2025-08-05 13:59:01 +08:00
Tong Li
5c75d5b07c
update example
2025-08-05 13:59:01 +08:00
Tong Li
f8899dda70
update reward fn
2025-08-05 13:59:01 +08:00
Tong Li
9754a11398
update loss
2025-08-05 13:59:01 +08:00
Tong Li
5f178a7d24
grpo consumer
2025-08-05 13:59:01 +08:00
Tong Li
b7842f8a5d
modify data loader
2025-08-05 13:59:01 +08:00
Tong Li
718c4b76cc
polish
2025-08-05 13:59:01 +08:00
Tong Li
1f07b716bf
update grpo
2025-08-05 13:59:01 +08:00
Tong Li
40d601802d
add simple grpo
2025-08-05 13:59:01 +08:00
Tong Li
fa1272f9f2
add reward related function
2025-08-05 13:59:01 +08:00
Hongxin Liu
7a2d455136
[feature] fit RL style generation ( #6213 )
...
* [feature] fit rl style generation
* [doc] add docstr
* [doc] add docstr
2025-08-05 13:59:01 +08:00
Hongxin Liu
162bb42321
[chat] add distributed impl ( #6210 )
2025-08-05 13:59:01 +08:00
Hanks
edd65a84dd
Merge pull request #6362 from hpcaitech/CI/test_build_on_schedule
...
[CI] Fix CI build on schedule error
2025-07-15 14:25:10 +08:00
botbw
908c634686
[CI] disable timm_regnetv_040 as aten::_unique2 is not supproted
2025-07-14 07:50:38 +00:00
botbw
e285eb6993
[CI] install flash-attn 2.7.4.post1
2025-07-14 02:38:02 +00:00
botbw
d097224d90
[feat] support qwen3 in shardformer
2025-07-10 13:57:52 +08:00
Hanks
97f4bee9d8
Merge pull request #6340 from hpcaitech/release/v0.5.0
...
[release] update version
2025-06-04 13:57:10 +08:00
BurkeHulk
e00c9bbf38
upgrade python
2025-06-03 18:51:39 +08:00
BurkeHulk
91f08c64a7
upgrade python
2025-06-03 18:41:37 +08:00
BurkeHulk
043c46941c
upgrade python
2025-06-03 18:38:07 +08:00
Hanks
916a8fef0e
Update release_test_pypi_before_merge.yml
2025-06-03 18:25:01 +08:00
Hanks
0ba96e88d2
Update release_test_pypi_before_merge.yml
2025-06-03 18:12:19 +08:00
Hanks
b9535f3c44
Update version.txt
2025-06-03 17:56:08 +08:00
Hanks
c4fe9e812e
Update release_pypi_after_merge.yml
2025-06-03 17:55:49 +08:00
Hanks
6dfedea98b
Update release_test_pypi_before_merge.yml
2025-06-03 17:55:21 +08:00