ColossalAI/applications/ColossalChat/coati/trainer
YeAnbang 26d859f68e
[feat] Support DAPO (#6263)
* update help information

* update style

* fix

* minor fix

* support PP training

* add pp support

* remove unused code

* address conversation

* fix memory leakage support tp+pp

* move empty cache

* move empty cache

* add DAPO support

* remove format reward

* fix filtering, still buggy

* small fix

* add DAPO support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tested multi-node training; fix bind_batch bug

* fix conversation; support sleep mode

* support reusing excessive samples

* add dynamic batching control flag

* add dynamic batching control flag

* refactored

* fix logging

---------

Co-authored-by: Tong Li <tong.li35271158@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-04-25 17:39:17 +08:00
..
callbacks [ColossalChat] Update RLHF V2 (#5286) 2024-03-29 14:12:29 +08:00
__init__.py Add GRPO and Support RLVR for PPO (#6186) 2025-02-18 09:43:36 +08:00
base.py Add GRPO and Support RLVR for PPO (#6186) 2025-02-18 09:43:36 +08:00
dpo.py Add GRPO and Support RLVR for PPO (#6186) 2025-02-18 09:43:36 +08:00
grpo.py Add GRPO and Support RLVR for PPO (#6186) 2025-02-18 09:43:36 +08:00
kto.py Add GRPO and Support RLVR for PPO (#6186) 2025-02-18 09:43:36 +08:00
orpo.py Add GRPO and Support RLVR for PPO (#6186) 2025-02-18 09:43:36 +08:00
ppo.py Add GRPO and Support RLVR for PPO (#6186) 2025-02-18 09:43:36 +08:00
rm.py Add GRPO and Support RLVR for PPO (#6186) 2025-02-18 09:43:36 +08:00
sft.py Add GRPO and Support RLVR for PPO (#6186) 2025-02-18 09:43:36 +08:00
utils.py [feat] Support DAPO (#6263) 2025-04-25 17:39:17 +08:00