Hongxin Liu
d54642a263
[application] add lora sft example ( #6192 )
...
* [application] add lora sft example
* update requirements
* update readme
* update comment
* update ci
2025-02-18 13:06:38 +08:00
YeAnbang
d20c8ffd97
Add GRPO and Support RLVR for PPO ( #6186 )
...
* add grpo, support rlvr
* add grpo, support rlvr
* tested deepseek r1 pipeline
* add ci
* verify grpo r1
* verify grpo r1
* update readme, remove unused code
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove path
* clean code
* fix circular import
* fix ci OOM
* fix ci OOM
* skip kto tp, fix qwen generation
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-02-18 09:43:36 +08:00
Wenxuan Tan
d383449fc4
[CI] Remove triton version for compatibility bug; update req torch >=2.2 ( #6018 )
...
* remove triton version
* remove torch 2.2
* remove torch 2.1
* debug
* remove 2.1 build tests
* require torch >=2.2
---------
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-08-27 10:12:21 +08:00
Tong Li
39e2597426
[ColossalChat] Add PP support ( #6001 )
...
* support pp training
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update rm
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* refactor
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update test case
* fix
* change to 4
* fix eval
* test
* add pp
* hotfix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* support pp training
* update rm
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* refactor
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update test case
* fix
* change to 4
* fix eval
* test
* add pp
* hotfix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
* skip pp eval
* update all reduce
* update sft
* update ignore
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update no cache
* add eval
* remove fi
* remove debug
* remove parentheses to avoid warning
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Revert "add eval"
This reverts commit 3ab2f6fa32
.
* add all reduce
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-08-21 10:47:39 +08:00
YeAnbang
09d5ffca1a
add kto
2024-07-18 07:54:11 +00:00
YeAnbang
790e1362a6
merge
2024-06-07 07:01:32 +00:00
YeAnbang
b1031f7244
fix ci
2024-06-07 07:01:31 +00:00
YeAnbang
df5e9c53cf
[ColossalChat] Update RLHF V2 ( #5286 )
...
* Add dpo. Fix sft, ppo, lora. Refactor all
* fix and tested ppo
* 2 nd round refactor
* add ci tests
* fix ci
* fix ci
* fix readme, style
* fix readme style
* fix style, fix benchmark
* reproduce benchmark result, remove useless files
* rename to ColossalChat
* use new image
* fix ci workflow
* fix ci
* use local model/tokenizer for ci tests
* fix ci
* fix ci
* fix ci
* fix ci timeout
* fix rm progress bar. fix ci timeout
* fix ci
* fix ci typo
* remove 3d plugin from ci temporary
* test environment
* cannot save optimizer
* support chat template
* fix readme
* fix path
* test ci locally
* restore build_or_pr
* fix ci data path
* fix benchmark
* fix ci, move ci tests to 3080, disable fast tokenizer
* move ci to 85
* support flash attention 2
* add all-in-one data preparation script. Fix colossal-llama2-chat chat template
* add hardware requirements
* move ci test data
* fix save_model, add unwrap
* fix missing bos
* fix missing bos; support grad accumulation with gemini
* fix ci
* fix ci
* fix ci
* fix llama2 chat template config
* debug sft
* debug sft
* fix colossalai version requirement
* fix ci
* add sanity check to prevent NaN loss
* fix requirements
* add dummy data generation script
* add dummy data generation script
* add dummy data generation script
* add dummy data generation script
* update readme
* update readme
* update readme and ignore
* fix logger bug
* support parallel_output
* modify data preparation logic
* fix tokenization
* update lr
* fix inference
* run pre-commit
---------
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
2024-03-29 14:12:29 +08:00
Frank Lee
73f4dc578e
[workflow] updated CI image ( #5318 )
2024-01-29 11:53:07 +08:00
Wenhao Chen
7b9b86441f
[chat]: update rm, add wandb and fix bugs ( #4471 )
...
* feat: modify forward fn of critic and reward model
* feat: modify calc_action_log_probs
* to: add wandb in sft and rm trainer
* feat: update train_sft
* feat: update train_rm
* style: modify type annotation and add warning
* feat: pass tokenizer to ppo trainer
* to: modify trainer base and maker base
* feat: add wandb in ppo trainer
* feat: pass tokenizer to generate
* test: update generate fn tests
* test: update train tests
* fix: remove action_mask
* feat: remove unused code
* fix: fix wrong ignore_index
* fix: fix mock tokenizer
* chore: update requirements
* revert: modify make_experience
* fix: fix inference
* fix: add padding side
* style: modify _on_learn_batch_end
* test: use mock tokenizer
* fix: use bf16 to avoid overflow
* fix: fix workflow
* [chat] fix gemini strategy
* [chat] fix
* sync: update colossalai strategy
* fix: fix args and model dtype
* fix: fix checkpoint test
* fix: fix requirements
* fix: fix missing import and wrong arg
* fix: temporarily skip gemini test in stage 3
* style: apply pre-commit
* fix: temporarily skip gemini test in stage 1&2
---------
Co-authored-by: Mingyan Jiang <1829166702@qq.com>
2023-09-20 15:53:58 +08:00
ver217
1c43bfd54e
[coati] update ci
2023-08-30 10:55:56 +08:00
Wenhao Chen
da4f7b855f
[chat] fix bugs and add unit tests ( #4213 )
...
* style: rename replay buffer
Experience replay is typically for off policy algorithms.
Use this name in PPO maybe misleading.
* fix: fix wrong zero2 default arg
* test: update experience tests
* style: rename zero_pad fn
* fix: defer init in CycledDataLoader
* test: add benchmark test
* style: rename internal fn of generation
* style: rename internal fn of lora
* fix: remove unused loss fn
* fix: remove unused utils fn
* refactor: remove generate_with_actor fn
* fix: fix type annotation
* test: add models tests
* fix: skip llama due to long execution time
* style: modify dataset
* style: apply formatter
* perf: update reward dataset
* fix: fix wrong IGNORE_INDEX in sft dataset
* fix: remove DataCollatorForSupervisedDataset
* test: add dataset tests
* style: apply formatter
* style: rename test_ci to test_train
* feat: add llama in inference
* test: add inference tests
* test: change test scripts directory
* fix: update ci
* fix: fix typo
* fix: skip llama due to oom
* fix: fix file mod
* style: apply formatter
* refactor: remove duplicated llama_gptq
* style: apply formatter
* to: update rm test
* feat: add tokenizer arg
* feat: add download model script
* test: update train tests
* fix: modify gemini load and save pretrained
* test: update checkpoint io test
* to: modify nproc_per_node
* fix: do not remove existing dir
* fix: modify save path
* test: add random choice
* fix: fix sft path
* fix: enlarge nproc_per_node to avoid oom
* fix: add num_retry
* fix: make lora config of rm and critic consistent
* fix: add warning about lora weights
* fix: skip some gpt2 tests
* fix: remove grad ckpt in rm and critic due to errors
* refactor: directly use Actor in train_sft
* test: add more arguments
* fix: disable grad ckpt when using lora
* fix: fix save_pretrained and related tests
* test: enable zero2 tests
* revert: remove useless fn
* style: polish code
* test: modify test args
2023-08-02 10:17:36 +08:00
Wenhao Chen
3d8d5d0d58
[chat] use official transformers and fix some issues ( #4117 )
...
* feat: remove on_learn_epoch fn as not used
* revert: add _on_learn_epoch fn
* feat: remove NaiveStrategy
* test: update train_prompts tests
* fix: remove prepare_llama_tokenizer_and_embedding
* test: add lora arg
* feat: remove roberta support in train_prompts due to runtime errs
* feat: remove deberta & roberta in rm as not used
* test: remove deberta and roberta tests
* feat: remove deberta and roberta models as not used
* fix: remove calls to roberta
* fix: remove prepare_llama_tokenizer_and_embedding
* chore: update transformers version
* docs: update transformers version
* fix: fix actor inference
* fix: fix ci
* feat: change llama pad token to unk
* revert: revert ddp setup_distributed
* fix: change llama pad token to unk
* revert: undo unnecessary changes
* fix: use pip to install transformers
2023-07-04 13:49:09 +08:00
Hongxin Liu
b5f0566363
[chat] add distributed PPO trainer ( #3740 )
...
* Detached ppo (#9 )
* run the base
* working on dist ppo
* sync
* detached trainer
* update detached trainer. no maker update function
* facing init problem
* 1 maker 1 trainer detached run. but no model update
* facing cuda problem
* fix save functions
* verified maker update
* nothing
* add ignore
* analyize loss issue
* remove some debug codes
* facing 2m1t stuck issue
* 2m1t verified
* do not use torchrun
* working on 2m2t
* working on 2m2t
* initialize strategy in ray actor env
* facing actor's init order issue
* facing ddp model update issue (need unwarp ddp)
* unwrap ddp actor
* checking 1m2t stuck problem
* nothing
* set timeout for trainer choosing. It solves the stuck problem!
* delete some debug output
* rename to sync with upstream
* rename to sync with upstream
* coati rename
* nothing
* I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations
* experience_maker_holder performs target-revolving _send_experience() instead of length comparison.
* move code to ray subfolder
* working on pipeline inference
* apply comments
* working on pipeline strategy. in progress.
* remove pipeline code. clean this branch
* update remote parameters by state_dict. no test
* nothing
* state_dict sharding transfer
* merge debug branch
* gemini _unwrap_model fix
* simplify code
* simplify code & fix LoRALinear AttributeError
* critic unwrapped state_dict
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] add perfomance evaluator and fix bugs (#10 )
* [chat] add performance evaluator for ray
* [chat] refactor debug arg
* [chat] support hf config
* [chat] fix generation
* [chat] add 1mmt dummy example
* [chat] fix gemini ckpt
* split experience to send (#11 )
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] refactor trainer and maker (#12 )
* [chat] refactor experience maker holder
* [chat] refactor model init
* [chat] refactor trainer args
* [chat] refactor model init
* [chat] refactor trainer
* [chat] refactor experience sending logic and training loop args (#13 )
* [chat] refactor experience send logic
* [chat] refactor trainer
* [chat] refactor trainer
* [chat] refactor experience maker
* [chat] refactor pbar
* [chat] refactor example folder (#14 )
* [chat] support quant (#15 )
* [chat] add quant
* [chat] add quant example
* prompt example (#16 )
* prompt example
* prompt load csv data
* remove legacy try
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] add mmmt dummy example and refactor experience sending (#17 )
* [chat] add mmmt dummy example
* [chat] refactor naive strategy
* [chat] fix struck problem
* [chat] fix naive strategy
* [chat] optimize experience maker sending logic
* [chat] refactor sending assignment
* [chat] refactor performance evaluator (#18 )
* Prompt Example & requires_grad state_dict & sharding state_dict (#19 )
* prompt example
* prompt load csv data
* remove legacy try
* maker models require_grad set to False
* working on zero redundancy update
* mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
* remove legacy examples
* remove legacy examples
* remove replay buffer tp state. bad design
---------
Co-authored-by: csric <richcsr256@gmail.com>
* state_dict sending adapts to new unwrap function (#20 )
* prompt example
* prompt load csv data
* remove legacy try
* maker models require_grad set to False
* working on zero redundancy update
* mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
* remove legacy examples
* remove legacy examples
* remove replay buffer tp state. bad design
* opt benchmark
* better script
* nothing
* [chat] strategy refactor unwrap model
* [chat] strategy refactor save model
* [chat] add docstr
* [chat] refactor trainer save model
* [chat] fix strategy typing
* [chat] refactor trainer save model
* [chat] update readme
* [chat] fix unit test
* working on lora reconstruction
* state_dict sending adapts to new unwrap function
* remove comments
---------
Co-authored-by: csric <richcsr256@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
* [chat-ray] add readme (#21 )
* add readme
* transparent graph
* add note background
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] get images from url (#22 )
* Refactor/chat ray (#23 )
* [chat] lora add todo
* [chat] remove unused pipeline strategy
* [chat] refactor example structure
* [chat] setup ci for ray
* [chat-ray] Support LoRA trainer. LoRA weights reconstruction. (#24 )
* lora support prototype
* lora support
* 1mmt lora & remove useless code
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] fix test ci for ray
* [chat] fix test ci requirements for ray
* [chat] fix ray runtime env
* [chat] fix ray runtime env
* [chat] fix example ci docker args
* [chat] add debug info in trainer
* [chat] add nccl debug info
* [chat] skip ray test
* [doc] fix typo
---------
Co-authored-by: csric <59389055+CsRic@users.noreply.github.com>
Co-authored-by: csric <richcsr256@gmail.com>
2023-06-07 10:41:16 +08:00
Hongxin Liu
179558a87a
[devops] fix chat ci ( #3628 )
2023-04-24 10:55:14 +08:00
Camille Zhong
36a519b49f
Update test_ci.sh
...
update
Update test_ci.sh
Update test_ci.sh
Update test_ci.sh
Update test_ci.sh
Update test_ci.sh
Update test_ci.sh
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update test_ci.sh
Update test_ci.sh
update
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
update ci
Update test_ci.sh
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update test_ci.sh
Update test_ci.sh
Update run_chatgpt_examples.yml
Update test_ci.sh
Update test_ci.sh
Update test_ci.sh
update test ci
RoBERTa for RLHF Stage 2 & 3 (still in testing)
Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"
This reverts commit 06741d894d
.
Add RoBERTa for RLHF stage 2 & 3
1. add roberta folder under model folder
2. add roberta option in train_reward_model.py
3. add some test in testci
Update test_ci.sh
Revert "Update test_ci.sh"
This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.
Add RoBERTa for RLHF Stage 2 & 3 (test)
RoBERTa for RLHF Stage 2 & 3 (still in testing)
Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"
This reverts commit 06741d894d
.
Add RoBERTa for RLHF stage 2 & 3
1. add roberta folder under model folder
2. add roberta option in train_reward_model.py
3. add some test in testci
Update test_ci.sh
Revert "Update test_ci.sh"
This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.
update roberta with coati
chat ci update
Revert "chat ci update"
This reverts commit 17ae7ae01fa752bd3289fc39069868fde99cf846.
[test]chat_update_ci
Update test_ci.sh
Update test_ci.sh
test
Update gpt_critic.py
Update gpt_critic.py
Update run_chatgpt_unit_tests.yml
update test ci
update
update
update
update
Update test_ci.sh
update
Update test_ci.sh
Update test_ci.sh
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
2023-04-18 14:33:12 +08:00
Frank Lee
169ed4d24e
[workflow] purged extension cache before GPT test ( #3128 )
2023-03-14 10:11:32 +08:00
ver217
9c0943ecdb
[chatgpt] optimize generation kwargs ( #2717 )
...
* [chatgpt] ppo trainer use default generate args
* [chatgpt] example remove generation preparing fn
* [chatgpt] benchmark remove generation preparing fn
* [chatgpt] fix ci
2023-02-15 13:59:58 +08:00
ver217
f6b4ca4e6c
[devops] add chatgpt ci ( #2713 )
2023-02-15 10:53:54 +08:00