Commit Graph

11 Commits

Author SHA1 Message Date
Hongxin Liu
641b1ee71a
[devops] remove post commit ci (#5566)
* [devops] remove post commit ci

* [misc] run pre-commit on all files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-08 15:09:40 +08:00
YeAnbang
df5e9c53cf
[ColossalChat] Update RLHF V2 (#5286)
* Add dpo. Fix sft, ppo, lora. Refactor all

* fix and tested ppo

* 2 nd round refactor

* add ci tests

* fix ci

* fix ci

* fix readme, style

* fix readme style

* fix style, fix benchmark

* reproduce benchmark result, remove useless files

* rename to ColossalChat

* use new image

* fix ci workflow

* fix ci

* use local model/tokenizer for ci tests

* fix ci

* fix ci

* fix ci

* fix ci timeout

* fix rm progress bar. fix ci timeout

* fix ci

* fix ci typo

* remove 3d plugin from ci temporary

* test environment

* cannot save optimizer

* support chat template

* fix readme

* fix path

* test ci locally

* restore build_or_pr

* fix ci data path

* fix benchmark

* fix ci, move ci tests to 3080, disable fast tokenizer

* move ci to 85

* support flash attention 2

* add all-in-one data preparation script. Fix colossal-llama2-chat chat template

* add hardware requirements

* move ci test data

* fix save_model, add unwrap

* fix missing bos

* fix missing bos; support grad accumulation with gemini

* fix ci

* fix ci

* fix ci

* fix llama2 chat template config

* debug sft

* debug sft

* fix colossalai version requirement

* fix ci

* add sanity check to prevent NaN loss

* fix requirements

* add dummy data generation script

* add dummy data generation script

* add dummy data generation script

* add dummy data generation script

* update readme

* update readme

* update readme and ignore

* fix logger bug

* support parallel_output

* modify data preparation logic

* fix tokenization

* update lr

* fix inference

* run pre-commit

---------

Co-authored-by: Tong Li <tong.li352711588@gmail.com>
2024-03-29 14:12:29 +08:00
Frank Lee
84500b7799
[workflow] fixed testmon cache in build CI (#3806)
* [workflow] fixed testmon cache in build CI

* polish code
2023-05-24 14:59:40 +08:00
Frank Lee
b3472d32e0
[workflow]auto comment with test coverage report (#2419)
* [workflow]auto comment with test coverage report

* polish code

* polish yaml
2023-01-10 22:30:16 +08:00
Frank Lee
53bb8682a2
[worfklow] added coverage test (#2399)
* [worfklow] added coverage test

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-01-09 17:57:57 +08:00
Frank Lee
40d376c566
[setup] support pre-build and jit-build of cuda kernels (#2374)
* [setup] support pre-build and jit-build of cuda kernels

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-01-06 20:50:26 +08:00
Frank Lee
81e0da7fa8
[setup] supported conda-installed torch (#2048)
* [setup] supported conda-installed torch

* polish code
2022-11-30 16:45:15 +08:00
Jiarui Fang
f86a703bcf
[NFC] update gitignore remove DS_Store (#1830) 2022-11-08 17:18:15 +08:00
アマデウス
354b7954d1
[model checkpoint] added unit tests for checkpoint save/load (#599) 2022-04-01 16:53:32 +08:00
アマデウス
9ee197d0e9 moved env variables to global variables; (#215)
added branch context;
added vocab parallel layers;
moved split_batch from load_batch to tensor parallel embedding layers;
updated gpt model;
updated unit test cases;
fixed few collective communicator bugs
2022-02-15 11:31:13 +08:00
zbian
404ecbdcc6 Migrated project 2021-10-28 18:21:23 +02:00