ColossalAI

mirror of https://github.com/hpcaitech/ColossalAI.git synced 2025-12-01 00:24:04 +00:00

Author	SHA1	Message	Date
YeAnbang	fb4e507d00	fix pp+tp, fix dataloader (#6280 )	2025-08-05 13:59:02 +08:00
Tong Li	37a8be7651	fix save issue (#6279 ) Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:02 +08:00
YeAnbang	673682e716	fix checkpoint naming; add num_epoch parameter (#6277 )	2025-08-05 13:59:02 +08:00
YeAnbang	5f913e8b77	[feat] Support DAPO (#6263 ) * update help information * update style * fix * minor fix * support PP training * add pp support * remove unused code * address conversation * fix memory leakage support tp+pp * move empty cache * move empty cache * add DAPO support * remove format reward * fix filtering, still buggy * small fix * add DAPO support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tested multi-node training; fix bind_batch bug * fix conversation; support sleep mode * support reusing excessive samples * add dynamic batching control flag * add dynamic batching control flag * refactored * fix logging --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-08-05 13:59:02 +08:00
Tong Li	b34d707cdc	[feat] Add final save at the end (#6274 ) * add final save * default 1 episode	2025-08-05 13:59:02 +08:00
Tong Li	befd4f1487	add prompt template (#6273 ) Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:02 +08:00
YeAnbang	3bd6fa3c67	[hot-fix] Fix memory leakage bug, support TP+PP (#6258 ) * update help information * update style * fix * minor fix * support PP training * add pp support * remove unused code * address conversation * fix memory leakage support tp+pp * move empty cache * move empty cache --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:02 +08:00
YeAnbang	5d79b9e692	[Distributed RLHF] Integration of PP (#6257 ) * update help information * update style * fix * minor fix * support PP training * add pp support * remove unused code * address conversation --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:02 +08:00
YeAnbang	12da4d14aa	[feat] add microbatch forwarding (#6251 ) * add microbatch forwarding * fix forward microbatch * fix producer OOM * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change project name * fix temperature annealing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address conversation --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-08-05 13:59:02 +08:00
YeAnbang	c627b60551	update logging	2025-08-05 13:59:02 +08:00
YeAnbang	23aac43dcf	simplify vllm preprocessing input ids	2025-08-05 13:59:02 +08:00
YeAnbang	16e68a071d	fix logprob, add filtering, temperature annealing, lr descent	2025-08-05 13:59:02 +08:00
YeAnbang	f983071b10	fix vllm	2025-08-05 13:59:02 +08:00
duanjunwen	455185345e	[Feature] Support Distributed LogProb for GRPO Training (#6247 ) * [fix] fix qwen VocabParallelLMHead1D and gather output * fix tp bug * fix consumer * [feat] Support Distributed LogProb for GRPO Training * [fix] fix loss func * [fix] fix log prob plugin * [fix] fix qwen modeling param * [fix] rm comments * [fix] rm hard-code;fix non-dist version * [fix] fix test file param name and benchmark tp gather output=True/False * [fix] rm non-dist version in dist log prob * [fix] fix comments * [fix] fix dis log prob plugin * [fix] fix test case * [fix] fix qwen VocabParallelLMHead1D and gather output * [fix] fix DistLogProb comments * [fix] restore tp size * [fix] fix comments * [fix] fix comment; fix LogSoftmax usage --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com>	2025-08-05 13:59:02 +08:00
YeAnbang	35dabd718e	fix transformers backend	2025-08-05 13:59:02 +08:00
Tong Li	e224673c44	setup update	2025-08-05 13:59:02 +08:00
Tong Li	bfc45829c3	print results	2025-08-05 13:59:02 +08:00
Tong Li	30c7ddd9f1	convert to 8 generation	2025-08-05 13:59:02 +08:00
Tong Li	a2ae82a417	fix consumer	2025-08-05 13:59:02 +08:00
Tong Li	69a1a325ee	detach	2025-08-05 13:59:02 +08:00
Tong Li	b951d0b224	add response length	2025-08-05 13:59:02 +08:00
Tong Li	a4862a2349	fix reward score	2025-08-05 13:59:02 +08:00
Tong Li	a537aa1c20	update reward	2025-08-05 13:59:02 +08:00
Tong Li	c8db826782	update reward fn	2025-08-05 13:59:02 +08:00
Tong Li	fe017d34c5	update grpo	2025-08-05 13:59:02 +08:00
pre-commit-ci[bot]	bc538ba049	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-08-05 13:59:02 +08:00
pre-commit-ci[bot]	f71d422690	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-08-05 13:59:01 +08:00
Tong Li	246f16d7bc	update select algo	2025-08-05 13:59:01 +08:00
Tong Li	88eb6e5f04	add save	2025-08-05 13:59:01 +08:00
Tong Li	1f15dc70df	add algo selection	2025-08-05 13:59:01 +08:00
Tong Li	cc4cc78169	update loader	2025-08-05 13:59:01 +08:00
Tong Li	5c75d5b07c	update example	2025-08-05 13:59:01 +08:00
Tong Li	f8899dda70	update reward fn	2025-08-05 13:59:01 +08:00
Tong Li	9754a11398	update loss	2025-08-05 13:59:01 +08:00
Tong Li	5f178a7d24	grpo consumer	2025-08-05 13:59:01 +08:00
Tong Li	b7842f8a5d	modify data loader	2025-08-05 13:59:01 +08:00
Tong Li	718c4b76cc	polish	2025-08-05 13:59:01 +08:00
Tong Li	1f07b716bf	update grpo	2025-08-05 13:59:01 +08:00
Tong Li	40d601802d	add simple grpo	2025-08-05 13:59:01 +08:00
Tong Li	fa1272f9f2	add reward related function	2025-08-05 13:59:01 +08:00
Hongxin Liu	7a2d455136	[feature] fit RL style generation (#6213 ) * [feature] fit rl style generation * [doc] add docstr * [doc] add docstr	2025-08-05 13:59:01 +08:00
Hongxin Liu	162bb42321	[chat] add distributed impl (#6210 )	2025-08-05 13:59:01 +08:00
duanjunwen	44d4053fec	[HotFix] update load lora model Readme; (#6240 ) * [fix] update load lora model Readme; * [fix] update lora infer readme * [fix] remove useless comments	2025-03-07 14:14:26 +08:00
Hongxin Liu	56fe130b15	[hotfix] fix lora load (#6231 ) * [hotfix] fix lora load * [hotfix] fix hp load * accelerate deepseek loading	2025-03-01 19:04:14 +08:00
pre-commit-ci[bot]	7595c453a5	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-02-20 10:25:19 +00:00
YeAnbang	53834b74b9	fix num_train_step update	2025-02-20 18:24:04 +08:00
YeAnbang	0171884664	fix inference rebatching bug	2025-02-20 17:28:49 +08:00
Hongxin Liu	f73ae55394	[application] add lora sft example data (#6198 )	2025-02-18 20:18:18 +08:00
Tong Li	f8b9e88484	[application] Update README (#6196 ) * remove unused ray * remove unused readme * update readme * update readme * update * update * add link * update readme * update readme * fix link * update code * update cititaion * update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update readme * update project * add images * update link * update note --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-02-18 20:17:56 +08:00
Hongxin Liu	d54642a263	[application] add lora sft example (#6192 ) * [application] add lora sft example * update requirements * update readme * update comment * update ci	2025-02-18 13:06:38 +08:00

1 2 3

115 Commits