ColossalAI

mirror of https://github.com/hpcaitech/ColossalAI.git synced 2025-08-19 00:17:18 +00:00

Author	SHA1	Message	Date
YeAnbang	9cbc5dd924	upgrade reward functions	2025-08-05 14:01:20 +08:00
YeAnbang	6095274be6	support logging rollouts to wandb	2025-08-05 14:01:20 +08:00
YeAnbang	654aefc3c3	address conversation	2025-08-05 14:01:18 +08:00
YeAnbang	e7f61be51a	fix evaluation	2025-08-05 14:00:44 +08:00
Tong Li	6ebd813b5f	handle empty index	2025-08-05 14:00:43 +08:00
YeAnbang	88f49ddc5e	remove redundant code and fix bugs	2025-08-05 13:59:56 +08:00
YeAnbang	d19f1f21b6	move prompt-level-filtering to buffer side	2025-08-05 13:59:56 +08:00
YeAnbang	f79dbdb2df	move prompt-level-filtering to buffer side	2025-08-05 13:59:56 +08:00
YeAnbang	0d0fef771f	disable wandb tb syncing	2025-08-05 13:59:56 +08:00
YeAnbang	280aa0b830	use consumer global step	2025-08-05 13:59:56 +08:00
Tong Li	5a6e4a6d75	[feat] Support prompt level dynamic (#6300 ) * adjust to dynamic prompt bs * remove debug * update pad seq (#6303) Co-authored-by: Tong Li <tong.li35271158@gmail.com> * adjust to dynamic prompt bs * remove debug * fix dp issue * fix * fix default settings --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:53 +08:00
YeAnbang	3416a4fc9c	move logging to producer	2025-08-05 13:59:03 +08:00
YeAnbang	af4366f0cb	Support evaluation during training	2025-08-05 13:59:03 +08:00
Tong Li	4ac7d065a6	update pad seq (#6303 ) Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:03 +08:00
YeAnbang	9544c51a74	[fix] revert reward update and evaluation (#6295 ) * Revert "rewrite reward fn" This reverts commit `d06042b434`. * Revert "upgrade reward math verification" This reverts commit `a6085ff676`. * Revert "fix bug" This reverts commit `01640ebd65`. * Revert "reuse comm-group" This reverts commit `bd61918dcf`. * Revert "Support evaluation during training" This reverts commit `57a88395fe`.	2025-08-05 13:59:02 +08:00
YeAnbang	06b892bf4d	rewrite reward fn	2025-08-05 13:59:02 +08:00
YeAnbang	9642b75581	upgrade reward math verification	2025-08-05 13:59:02 +08:00
YeAnbang	1be993de3e	fix bug	2025-08-05 13:59:02 +08:00
YeAnbang	de0c267f5a	reuse comm-group	2025-08-05 13:59:02 +08:00
YeAnbang	16600f3509	Support evaluation during training	2025-08-05 13:59:02 +08:00
Tong Li	6a1bd833e0	[feat] Sync shard model (#6289 ) * [feat] support hybrid parallel model sync * update consumer and producer * update files * update producer * remove print * update --------- Co-authored-by: duanjunwen <935724073@qq.com> Co-authored-by: YeAnbang <44796419+YeAnbang@users.noreply.github.com> Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:02 +08:00
YeAnbang	e181318d51	[feat] Support boxed math reward (#6284 ) * fix pp+tp, fix dataloader * fixed plugin micro-batch size * support boxed reward * add boxed reward * fix pp state dict incomplete issue * Revert "fix pp state dict incomplete issue" This reverts commit `6c1b3b694f`.	2025-08-05 13:59:02 +08:00
YeAnbang	fb4e507d00	fix pp+tp, fix dataloader (#6280 )	2025-08-05 13:59:02 +08:00
Tong Li	37a8be7651	fix save issue (#6279 ) Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:02 +08:00
YeAnbang	673682e716	fix checkpoint naming; add num_epoch parameter (#6277 )	2025-08-05 13:59:02 +08:00
YeAnbang	5f913e8b77	[feat] Support DAPO (#6263 ) * update help information * update style * fix * minor fix * support PP training * add pp support * remove unused code * address conversation * fix memory leakage support tp+pp * move empty cache * move empty cache * add DAPO support * remove format reward * fix filtering, still buggy * small fix * add DAPO support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tested multi-node training; fix bind_batch bug * fix conversation; support sleep mode * support reusing excessive samples * add dynamic batching control flag * add dynamic batching control flag * refactored * fix logging --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-08-05 13:59:02 +08:00
Tong Li	b34d707cdc	[feat] Add final save at the end (#6274 ) * add final save * default 1 episode	2025-08-05 13:59:02 +08:00
Tong Li	befd4f1487	add prompt template (#6273 ) Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:02 +08:00
YeAnbang	3bd6fa3c67	[hot-fix] Fix memory leakage bug, support TP+PP (#6258 ) * update help information * update style * fix * minor fix * support PP training * add pp support * remove unused code * address conversation * fix memory leakage support tp+pp * move empty cache * move empty cache --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:02 +08:00
YeAnbang	5d79b9e692	[Distributed RLHF] Integration of PP (#6257 ) * update help information * update style * fix * minor fix * support PP training * add pp support * remove unused code * address conversation --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:02 +08:00
YeAnbang	12da4d14aa	[feat] add microbatch forwarding (#6251 ) * add microbatch forwarding * fix forward microbatch * fix producer OOM * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change project name * fix temperature annealing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address conversation --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-08-05 13:59:02 +08:00
YeAnbang	c627b60551	update logging	2025-08-05 13:59:02 +08:00
YeAnbang	23aac43dcf	simplify vllm preprocessing input ids	2025-08-05 13:59:02 +08:00
YeAnbang	16e68a071d	fix logprob, add filtering, temperature annealing, lr descent	2025-08-05 13:59:02 +08:00
YeAnbang	f983071b10	fix vllm	2025-08-05 13:59:02 +08:00
duanjunwen	455185345e	[Feature] Support Distributed LogProb for GRPO Training (#6247 ) * [fix] fix qwen VocabParallelLMHead1D and gather output * fix tp bug * fix consumer * [feat] Support Distributed LogProb for GRPO Training * [fix] fix loss func * [fix] fix log prob plugin * [fix] fix qwen modeling param * [fix] rm comments * [fix] rm hard-code;fix non-dist version * [fix] fix test file param name and benchmark tp gather output=True/False * [fix] rm non-dist version in dist log prob * [fix] fix comments * [fix] fix dis log prob plugin * [fix] fix test case * [fix] fix qwen VocabParallelLMHead1D and gather output * [fix] fix DistLogProb comments * [fix] restore tp size * [fix] fix comments * [fix] fix comment; fix LogSoftmax usage --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:02 +08:00
YeAnbang	35dabd718e	fix transformers backend	2025-08-05 13:59:02 +08:00
Tong Li	e224673c44	setup update	2025-08-05 13:59:02 +08:00
Tong Li	bfc45829c3	print results	2025-08-05 13:59:02 +08:00
Tong Li	30c7ddd9f1	convert to 8 generation	2025-08-05 13:59:02 +08:00
Tong Li	a2ae82a417	fix consumer	2025-08-05 13:59:02 +08:00
Tong Li	b19355f8f0	fix tp bug	2025-08-05 13:59:02 +08:00
Tong Li	69a1a325ee	detach	2025-08-05 13:59:02 +08:00
Tong Li	b951d0b224	add response length	2025-08-05 13:59:02 +08:00
Tong Li	a4862a2349	fix reward score	2025-08-05 13:59:02 +08:00
Tong Li	a537aa1c20	update reward	2025-08-05 13:59:02 +08:00
Tong Li	c8db826782	update reward fn	2025-08-05 13:59:02 +08:00
Tong Li	fe017d34c5	update grpo	2025-08-05 13:59:02 +08:00
pre-commit-ci[bot]	bc538ba049	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-08-05 13:59:02 +08:00
pre-commit-ci[bot]	f71d422690	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-08-05 13:59:01 +08:00

1 2 3 4 5 ...

3909 Commits