ColossalAI

mirror of https://github.com/hpcaitech/ColossalAI.git synced 2025-08-10 12:22:28 +00:00

History

YeAnbang 26d859f68e [feat] Support DAPO (#6263 ) * update help information * update style * fix * minor fix * support PP training * add pp support * remove unused code * address conversation * fix memory leakage support tp+pp * move empty cache * move empty cache * add DAPO support * remove format reward * fix filtering, still buggy * small fix * add DAPO support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tested multi-node training; fix bind_batch bug * fix conversation; support sleep mode * support reusing excessive samples * add dynamic batching control flag * add dynamic batching control flag * refactored * fix logging --------- Co-authored-by: Tong Li <tong.li35271158@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2025-04-25 17:39:17 +08:00
..
callbacks	[ColossalChat] Update RLHF V2 (#5286 )	2024-03-29 14:12:29 +08:00
__init__.py	Add GRPO and Support RLVR for PPO (#6186 )	2025-02-18 09:43:36 +08:00
base.py	Add GRPO and Support RLVR for PPO (#6186 )	2025-02-18 09:43:36 +08:00
dpo.py	Add GRPO and Support RLVR for PPO (#6186 )	2025-02-18 09:43:36 +08:00
grpo.py	Add GRPO and Support RLVR for PPO (#6186 )	2025-02-18 09:43:36 +08:00
kto.py	Add GRPO and Support RLVR for PPO (#6186 )	2025-02-18 09:43:36 +08:00
orpo.py	Add GRPO and Support RLVR for PPO (#6186 )	2025-02-18 09:43:36 +08:00
ppo.py	Add GRPO and Support RLVR for PPO (#6186 )	2025-02-18 09:43:36 +08:00
rm.py	Add GRPO and Support RLVR for PPO (#6186 )	2025-02-18 09:43:36 +08:00
sft.py	Add GRPO and Support RLVR for PPO (#6186 )	2025-02-18 09:43:36 +08:00
utils.py	[feat] Support DAPO (#6263 )	2025-04-25 17:39:17 +08:00