Commit Graph

363 Commits

Author SHA1 Message Date
YeAnbang
35dabd718e fix transformers backend 2025-08-05 13:59:02 +08:00
Tong Li
e224673c44 setup update 2025-08-05 13:59:02 +08:00
Tong Li
bfc45829c3 print results 2025-08-05 13:59:02 +08:00
Tong Li
30c7ddd9f1 convert to 8 generation 2025-08-05 13:59:02 +08:00
Tong Li
a2ae82a417 fix consumer 2025-08-05 13:59:02 +08:00
Tong Li
69a1a325ee detach 2025-08-05 13:59:02 +08:00
Tong Li
b951d0b224 add response length 2025-08-05 13:59:02 +08:00
Tong Li
a4862a2349 fix reward score 2025-08-05 13:59:02 +08:00
Tong Li
a537aa1c20 update reward 2025-08-05 13:59:02 +08:00
Tong Li
c8db826782 update reward fn 2025-08-05 13:59:02 +08:00
Tong Li
fe017d34c5 update grpo 2025-08-05 13:59:02 +08:00
pre-commit-ci[bot]
bc538ba049 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-05 13:59:02 +08:00
pre-commit-ci[bot]
f71d422690 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-08-05 13:59:01 +08:00
Tong Li
246f16d7bc update select algo 2025-08-05 13:59:01 +08:00
Tong Li
88eb6e5f04 add save 2025-08-05 13:59:01 +08:00
Tong Li
1f15dc70df add algo selection 2025-08-05 13:59:01 +08:00
Tong Li
cc4cc78169 update loader 2025-08-05 13:59:01 +08:00
Tong Li
5c75d5b07c update example 2025-08-05 13:59:01 +08:00
Tong Li
f8899dda70 update reward fn 2025-08-05 13:59:01 +08:00
Tong Li
9754a11398 update loss 2025-08-05 13:59:01 +08:00
Tong Li
5f178a7d24 grpo consumer 2025-08-05 13:59:01 +08:00
Tong Li
b7842f8a5d modify data loader 2025-08-05 13:59:01 +08:00
Tong Li
718c4b76cc polish 2025-08-05 13:59:01 +08:00
Tong Li
1f07b716bf update grpo 2025-08-05 13:59:01 +08:00
Tong Li
40d601802d add simple grpo 2025-08-05 13:59:01 +08:00
Tong Li
fa1272f9f2 add reward related function 2025-08-05 13:59:01 +08:00
Hongxin Liu
7a2d455136 [feature] fit RL style generation (#6213)
* [feature] fit rl style generation

* [doc] add docstr

* [doc] add docstr
2025-08-05 13:59:01 +08:00
Hongxin Liu
162bb42321 [chat] add distributed impl (#6210) 2025-08-05 13:59:01 +08:00
duanjunwen
44d4053fec [HotFix] update load lora model Readme; (#6240)
* [fix] update load lora model Readme;

* [fix] update lora infer readme

* [fix] remove useless comments
2025-03-07 14:14:26 +08:00
Hongxin Liu
56fe130b15 [hotfix] fix lora load (#6231)
* [hotfix] fix lora load

* [hotfix] fix hp load

* accelerate deepseek loading
2025-03-01 19:04:14 +08:00
pre-commit-ci[bot]
7595c453a5 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-02-20 10:25:19 +00:00
YeAnbang
53834b74b9 fix num_train_step update 2025-02-20 18:24:04 +08:00
YeAnbang
0171884664 fix inference rebatching bug 2025-02-20 17:28:49 +08:00
Hongxin Liu
f73ae55394 [application] add lora sft example data (#6198) 2025-02-18 20:18:18 +08:00
Tong Li
f8b9e88484 [application] Update README (#6196)
* remove unused ray

* remove unused readme

* update readme

* update readme

* update

* update

* add link

* update readme

* update readme

* fix link

* update code

* update cititaion

* update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update readme

* update project

* add images

* update link

* update note

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-02-18 20:17:56 +08:00
Hongxin Liu
d54642a263 [application] add lora sft example (#6192)
* [application] add lora sft example

* update requirements

* update readme

* update comment

* update ci
2025-02-18 13:06:38 +08:00
YeAnbang
d20c8ffd97 Add GRPO and Support RLVR for PPO (#6186)
* add grpo, support rlvr

* add grpo, support rlvr

* tested deepseek r1 pipeline

* add ci

* verify grpo r1

* verify grpo r1

* update readme, remove unused code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove path

* clean code

* fix circular import

* fix ci OOM

* fix ci OOM

* skip kto tp, fix qwen generation

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-02-18 09:43:36 +08:00
flybird11111
aaafb38851 [Device]Support npu (#6159)
* support npu

* support pretrain

support pretrain

fix

* support lora

fix

fix

* support chatglm

fix

fxi

fix

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

fix

fix

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

fix

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

fix

fix

fix

* Update train.py

* Update train.py

* fix

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* fix

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-17 15:42:39 +08:00
Tong Li
30a9443132 [Coati] Refine prompt for better inference (#6117)
* refine prompt

* update prompt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-08 11:00:37 +08:00
Tong Li
7a60161035 update readme (#6116) 2024-11-06 17:24:08 +08:00
Tong Li
89a9a600bc [MCTS] Add self-refined MCTS (#6098)
* add reasoner

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update code

* delete llama

* update prompts

* update readme

* update readme

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-10-24 17:51:19 +08:00
Tong Li
4c8e85ee0d [Coati] Train DPO using PP (#6054)
* update dpo

* remove unsupport plugin

* update msg

* update dpo

* remove unsupport plugin

* update msg

* update template

* update dataset

* add pp for dpo

* update dpo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add dpo fn

* update dpo

* update dpo

* update dpo

* update dpo

* minor update

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update loss

* update help

* polish code

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-10-11 19:32:00 +08:00
Camille Zhong
f9546ba0be [ColossalEval] support for vllm (#6056)
* support vllm

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify vllm and update readme

* run pre-commit

* remove dupilicated lines and refine code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update param name

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine code

* update readme

* refine code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-18 17:09:45 +08:00
Tong Li
c650a906db [Hotfix] Remove deprecated install (#6042)
* remove deprecated install

* remove unused folder
2024-09-03 10:33:18 +08:00
Tong Li
0d3a85d04f add fused norm (#6038) 2024-08-28 17:12:51 +08:00
Tong Li
4a68efb7da [Colossal-LLaMA] Refactor latest APIs (#6030)
* refactor latest code

* update api

* add dummy dataset

* update Readme

* add setup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update files

* add PP support

* update arguments

* update argument

* reorg folder

* update version

* remove IB infor

* update utils

* update readme

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update save for zero

* update save

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add apex

* update

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-08-28 17:01:58 +08:00
Tong Li
39e2597426 [ColossalChat] Add PP support (#6001)
* support pp training

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update rm

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update test case

* fix

* change to 4

* fix eval

* test

* add pp

* hotfix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support pp training

* update rm

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update test case

* fix

* change to 4

* fix eval

* test

* add pp

* hotfix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* skip pp eval

* update all reduce

* update sft

* update ignore

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update no cache

* add eval

* remove fi

* remove debug

* remove parentheses to avoid warning

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "add eval"

This reverts commit 3ab2f6fa32.

* add all reduce

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-08-21 10:47:39 +08:00
YeAnbang
ed97d3a5d3 [Chat] fix readme (#5989)
* fix readme

* fix readme, tokenization fully tested

* fix readme, tokenization fully tested

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: root <root@notebook-8f919155-6035-47b4-9c6f-1be133b9e2c9-0.notebook-8f919155-6035-47b4-9c6f-1be133b9e2c9.colossal-ai.svc.cluster.local>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-08-12 14:55:17 +08:00
Tong Li
ad3fa4f49c [Hotfix] README link (#5966)
* update ignore

* update readme

* run style

* update readme
2024-08-08 18:04:47 +08:00
YeAnbang
0b2d55c4ab Support overall loss, update KTO logging 2024-08-02 06:51:38 +00:00