Commit Graph

280 Commits

Author SHA1 Message Date
Frank Lee
6718a2f285 [workflow] cancel duplicated workflow jobs (#3960) 2023-06-12 15:11:27 +08:00
digger yu
1aadeedeea
fix typo .github/workflows/scripts/ (#3946) 2023-06-09 10:30:50 +08:00
Frank Lee
5e2132dcff
[workflow] added docker latest tag for release (#3920) 2023-06-07 15:37:37 +08:00
Hongxin Liu
c25d421f3e
[devops] hotfix testmon cache clean logic (#3917) 2023-06-07 12:39:12 +08:00
Hongxin Liu
b5f0566363
[chat] add distributed PPO trainer (#3740)
* Detached ppo (#9)

* run the base

* working on dist ppo

* sync

* detached trainer

* update detached trainer. no maker update function

* facing init problem

* 1 maker 1 trainer detached run. but no model update

* facing cuda problem

* fix save functions

* verified maker update

* nothing

* add ignore

* analyize loss issue

* remove some debug codes

* facing 2m1t stuck issue

* 2m1t verified

* do not use torchrun

* working on 2m2t

* working on 2m2t

* initialize strategy in ray actor env

* facing actor's init order issue

* facing ddp model update issue (need unwarp ddp)

* unwrap ddp actor

* checking 1m2t stuck problem

* nothing

* set timeout for trainer choosing. It solves the stuck problem!

* delete some debug output

* rename to sync with upstream

* rename to sync with upstream

* coati rename

* nothing

* I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations

* experience_maker_holder performs target-revolving _send_experience() instead of length comparison.

* move code to ray subfolder

* working on pipeline inference

* apply comments

* working on pipeline strategy. in progress.

* remove pipeline code. clean this branch

* update remote parameters by state_dict. no test

* nothing

* state_dict sharding transfer

* merge debug branch

* gemini _unwrap_model fix

* simplify code

* simplify code & fix LoRALinear AttributeError

* critic unwrapped state_dict

---------

Co-authored-by: csric <richcsr256@gmail.com>

* [chat] add perfomance evaluator and fix bugs (#10)

* [chat] add performance evaluator for ray

* [chat] refactor debug arg

* [chat] support hf config

* [chat] fix generation

* [chat] add 1mmt dummy example

* [chat] fix gemini ckpt

* split experience to send (#11)

Co-authored-by: csric <richcsr256@gmail.com>

* [chat] refactor trainer and maker (#12)

* [chat] refactor experience maker holder

* [chat] refactor model init

* [chat] refactor trainer args

* [chat] refactor model init

* [chat] refactor trainer

* [chat] refactor experience sending logic and training loop args (#13)

* [chat] refactor experience send logic

* [chat] refactor trainer

* [chat] refactor trainer

* [chat] refactor experience maker

* [chat] refactor pbar

* [chat] refactor example folder (#14)

* [chat] support quant (#15)

* [chat] add quant

* [chat] add quant example

* prompt example (#16)

* prompt example

* prompt load csv data

* remove legacy try

---------

Co-authored-by: csric <richcsr256@gmail.com>

* [chat] add mmmt dummy example and refactor experience sending (#17)

* [chat] add mmmt dummy example

* [chat] refactor naive strategy

* [chat] fix struck problem

* [chat] fix naive strategy

* [chat] optimize experience maker sending logic

* [chat] refactor sending assignment

* [chat] refactor performance evaluator (#18)

* Prompt Example & requires_grad state_dict & sharding state_dict (#19)

* prompt example

* prompt load csv data

* remove legacy try

* maker models require_grad set to False

* working on zero redundancy update

* mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.

* remove legacy examples

* remove legacy examples

* remove replay buffer tp state. bad design

---------

Co-authored-by: csric <richcsr256@gmail.com>

* state_dict sending adapts to new unwrap function (#20)

* prompt example

* prompt load csv data

* remove legacy try

* maker models require_grad set to False

* working on zero redundancy update

* mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.

* remove legacy examples

* remove legacy examples

* remove replay buffer tp state. bad design

* opt benchmark

* better script

* nothing

* [chat] strategy refactor unwrap model

* [chat] strategy refactor save model

* [chat] add docstr

* [chat] refactor trainer save model

* [chat] fix strategy typing

* [chat] refactor trainer save model

* [chat] update readme

* [chat] fix unit test

* working on lora reconstruction

* state_dict sending adapts to new unwrap function

* remove comments

---------

Co-authored-by: csric <richcsr256@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* [chat-ray] add readme (#21)

* add readme

* transparent graph

* add note background

---------

Co-authored-by: csric <richcsr256@gmail.com>

* [chat] get images from url (#22)

* Refactor/chat ray (#23)

* [chat] lora add todo

* [chat] remove unused pipeline strategy

* [chat] refactor example structure

* [chat] setup ci for ray

* [chat-ray] Support LoRA trainer. LoRA weights reconstruction. (#24)

* lora support prototype

* lora support

* 1mmt lora & remove useless code

---------

Co-authored-by: csric <richcsr256@gmail.com>

* [chat] fix test ci for ray

* [chat] fix test ci requirements for ray

* [chat] fix ray runtime env

* [chat] fix ray runtime env

* [chat] fix example ci docker args

* [chat] add debug info in trainer

* [chat] add nccl debug info

* [chat] skip ray test

* [doc] fix typo

---------

Co-authored-by: csric <59389055+CsRic@users.noreply.github.com>
Co-authored-by: csric <richcsr256@gmail.com>
2023-06-07 10:41:16 +08:00
Hongxin Liu
41fb7236aa
[devops] hotfix CI about testmon cache (#3910)
* [devops] hotfix CI about testmon cache

* [devops] fix testmon cahe on pr
2023-06-06 18:58:58 +08:00
Hongxin Liu
ec9bbc0094
[devops] improving testmon cache (#3902)
* [devops] improving testmon cache

* [devops] fix branch name with slash

* [devops] fix branch name with slash

* [devops] fix edit action

* [devops] fix edit action

* [devops] fix edit action

* [devops] fix edit action

* [devops] fix edit action

* [devops] fix edit action

* [devops] update readme
2023-06-06 11:32:31 +08:00
Frank Lee
ae959a72a5
[workflow] fixed workflow check for docker build (#3849) 2023-05-25 16:42:34 +08:00
Frank Lee
54e97ed7ea
[workflow] supported test on CUDA 10.2 (#3841) 2023-05-25 14:14:34 +08:00
Frank Lee
84500b7799
[workflow] fixed testmon cache in build CI (#3806)
* [workflow] fixed testmon cache in build CI

* polish code
2023-05-24 14:59:40 +08:00
Frank Lee
05b8a8de58
[workflow] changed to doc build to be on schedule and release (#3825)
* [workflow] changed to doc build to be on schedule and release

* polish code
2023-05-24 10:50:19 +08:00
digger yu
7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. (#3808) 2023-05-24 09:01:50 +08:00
Frank Lee
1e3b64f26c
[workflow] enblaed doc build from a forked repo (#3815) 2023-05-23 17:49:53 +08:00
Frank Lee
ad93c736ea
[workflow] enable testing for develop & feature branch (#3801) 2023-05-23 11:21:15 +08:00
Frank Lee
788e07dbc5
[workflow] fixed the docker build workflow (#3794)
* [workflow] fixed the docker build workflow

* polish code
2023-05-22 16:30:32 +08:00
liuzeming
4d29c0f8e0
Fix/docker action (#3266)
* [docker] Add ARG VERSION to determine the Tag

* [workflow] fixed the version in the release docker workflow

---------

Co-authored-by: liuzeming <liuzeming@4paradigm.com>
2023-05-22 15:04:00 +08:00
Hongxin Liu
b4788d63ed
[devops] fix doc test on pr (#3782) 2023-05-19 16:28:57 +08:00
Hongxin Liu
5dd573c6b6
[devops] fix ci for document check (#3751)
* [doc] add test info

* [devops] update doc check ci

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] remove debug info and update invalid doc

* [devops] add essential comments
2023-05-17 11:24:22 +08:00
Hongxin Liu
c03bd7c6b2
[devops] make build on PR run automatically (#3748)
* [devops] make build on PR run automatically

* [devops] update build on pr condition
2023-05-17 11:17:37 +08:00
Hongxin Liu
afb239bbf8
[devops] update torch version of CI (#3725)
* [test] fix flop tensor test

* [test] fix autochunk test

* [test] fix lazyinit test

* [devops] update torch version of CI

* [devops] enable testmon

* [devops] fix ci

* [devops] fix ci

* [test] fix checkpoint io test

* [test] fix cluster test

* [test] fix timm test

* [devops] fix ci

* [devops] fix ci

* [devops] fix ci

* [devops] fix ci

* [devops] force sync to test ci

* [test] skip fsdp test
2023-05-15 17:20:56 +08:00
Hongxin Liu
50793b35f4
[gemini] accelerate inference (#3641)
* [gemini] support don't scatter after inference

* [chat] update colossalai strategy

* [chat] fix opt benchmark

* [chat] update opt benchmark

* [gemini] optimize inference

* [test] add gemini inference test

* [chat] fix unit test ci

* [chat] fix ci

* [chat] fix ci

* [chat] skip checkpoint test
2023-04-26 16:32:40 +08:00
Hongxin Liu
179558a87a
[devops] fix chat ci (#3628) 2023-04-24 10:55:14 +08:00
digger-yu
633bac2f58
[doc] .github/workflows/README.md (#3605)
Fixed several word spelling errors
change "compatiblity" to "compatibility" etc.
2023-04-20 10:36:28 +08:00
Camille Zhong
36a519b49f Update test_ci.sh
update

Update test_ci.sh

Update test_ci.sh

Update test_ci.sh

Update test_ci.sh

Update test_ci.sh

Update test_ci.sh

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update test_ci.sh

Update test_ci.sh

update

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

update ci

Update test_ci.sh

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml

Update test_ci.sh

Update test_ci.sh

Update run_chatgpt_examples.yml

Update test_ci.sh

Update test_ci.sh

Update test_ci.sh

update test ci

RoBERTa for RLHF Stage 2 & 3 (still in testing)

Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

This reverts commit 06741d894d.

Add RoBERTa for RLHF stage 2 & 3

1. add roberta folder under model folder
2. add  roberta option in train_reward_model.py
3. add some test in testci

Update test_ci.sh

Revert "Update test_ci.sh"

This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.

Add RoBERTa for RLHF Stage 2 & 3 (test)

RoBERTa for RLHF Stage 2 & 3 (still in testing)

Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

This reverts commit 06741d894d.

Add RoBERTa for RLHF stage 2 & 3

1. add roberta folder under model folder
2. add  roberta option in train_reward_model.py
3. add some test in testci

Update test_ci.sh

Revert "Update test_ci.sh"

This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.

update roberta with coati

chat ci update

Revert "chat ci update"

This reverts commit 17ae7ae01fa752bd3289fc39069868fde99cf846.

[test]chat_update_ci

Update test_ci.sh

Update test_ci.sh

test

Update gpt_critic.py

Update gpt_critic.py

Update run_chatgpt_unit_tests.yml

update test ci

update

update

update

update

Update test_ci.sh

update

Update test_ci.sh

Update test_ci.sh

Update run_chatgpt_examples.yml

Update run_chatgpt_examples.yml
2023-04-18 14:33:12 +08:00
digger-yu
6e7e43c6fe
[doc] Update .github/workflows/README.md (#3577)
Optimization Code
I think there were two extra $ entered here, which have been deleted
2023-04-17 16:27:38 +08:00
Frank Lee
80eba05b0a
[test] refactor tests with spawn (#3452)
* [test] added spawn decorator

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-04-06 14:51:35 +08:00
Hakjin Lee
1653063fce
[CI] Fix pre-commit workflow (#3238) 2023-03-27 09:41:08 +08:00
Frank Lee
169ed4d24e
[workflow] purged extension cache before GPT test (#3128) 2023-03-14 10:11:32 +08:00
Frank Lee
91ccf97514
[workflow] fixed doc build trigger condition (#3072) 2023-03-09 17:31:41 +08:00
Frank Lee
8fedc8766a
[workflow] supported conda package installation in doc test (#3028)
* [workflow] supported conda package installation in doc test

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-03-07 14:21:26 +08:00
Frank Lee
2cd6ba3098
[workflow] fixed the post-commit failure when no formatting needed (#3020)
* [workflow] fixed the post-commit failure when no formatting needed

* polish code

* polish code

* polish code
2023-03-07 13:35:45 +08:00
Frank Lee
2e427ddf42
[revert] recover "[refactor] restructure configuration files (#2977)" (#3022)
This reverts commit 35c8f4ce47.
2023-03-07 13:31:23 +08:00
Saurav Maheshkar
35c8f4ce47
[refactor] restructure configuration files (#2977)
* gh: move CONTRIBUTING to .github

* chore: move isort config to pyproject

* chore: move pytest config to pyproject

* chore: move yapf config to pyproject

* chore: move clang-format config to pre-commit
2023-03-05 20:29:34 +08:00
Frank Lee
77b88a3849
[workflow] added auto doc test on PR (#2929)
* [workflow] added auto doc test on PR

* [workflow] added doc test workflow

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-02-28 11:10:38 +08:00
Frank Lee
e33c043dec
[workflow] moved pre-commit to post-commit (#2895) 2023-02-24 14:41:33 +08:00
LuGY
dbd0fd1522
[CI/CD] fix nightly release CD running on forked repo (#2812)
* [CI/CD] fix nightly release CD running on forker repo

* fix misunderstanding of dispatch

* remove some build condition, enable notify even when release failed
2023-02-18 13:27:13 +08:00
ver217
9c0943ecdb
[chatgpt] optimize generation kwargs (#2717)
* [chatgpt] ppo trainer use default generate args

* [chatgpt] example remove generation preparing fn

* [chatgpt] benchmark remove generation preparing fn

* [chatgpt] fix ci
2023-02-15 13:59:58 +08:00
Frank Lee
2045d45ab7
[doc] updated documentation version list (#2715) 2023-02-15 11:24:18 +08:00
ver217
f6b4ca4e6c
[devops] add chatgpt ci (#2713) 2023-02-15 10:53:54 +08:00
Frank Lee
89f8975fb8
[workflow] fixed tensor-nvme build caching (#2711) 2023-02-15 10:12:55 +08:00
Frank Lee
5cd8cae0c9
[workflow] fixed communtity report ranking (#2680) 2023-02-13 17:04:49 +08:00
Frank Lee
c44fd0c867
[workflow] added trigger to build doc upon release (#2678) 2023-02-13 16:53:26 +08:00
Frank Lee
327bc06278
[workflow] added doc build test (#2675)
* [workflow] added doc build test

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-02-13 15:55:57 +08:00
Frank Lee
94f87f9651
[workflow] fixed gpu memory check condition (#2659) 2023-02-10 09:59:07 +08:00
Frank Lee
85b2303b55
[doc] migrate the markdown files (#2652) 2023-02-09 14:21:38 +08:00
Frank Lee
8518263b80
[test] fixed the triton version for testing (#2608) 2023-02-07 13:49:38 +08:00
Frank Lee
aa7e9e4794
[workflow] fixed the test coverage report (#2614)
* [workflow] fixed the test coverage report

* polish code
2023-02-07 11:50:53 +08:00
Frank Lee
b3973b995a
[workflow] fixed test coverage report (#2611) 2023-02-07 11:02:56 +08:00
Frank Lee
f566b0ce6b
[workflow] fixed broken rellease workflows (#2604) 2023-02-06 21:40:19 +08:00
Frank Lee
f7458d3ec7
[release] v0.2.1 (#2602)
* [release] v0.2.1

* polish code
2023-02-06 20:46:18 +08:00
Frank Lee
719c4d5553
[doc] updated readme for CI/CD (#2600) 2023-02-06 17:42:15 +08:00
Frank Lee
4d582893a7
[workflow] added cuda extension build test before release (#2598)
* [workflow] added cuda extension build test before release

* polish code
2023-02-06 17:07:41 +08:00
Frank Lee
0c03802bff
[workflow] hooked pypi release with lark (#2596) 2023-02-06 16:29:04 +08:00
Frank Lee
fd90245399
[workflow] hooked docker release with lark (#2594) 2023-02-06 16:15:46 +08:00
Frank Lee
d6cc8f313e
[workflow] added test-pypi check before release (#2591)
* [workflow] added test-pypi check before release

* polish code
2023-02-06 15:42:08 +08:00
Frank Lee
2059408edc
[workflow] fixed the typo in the example check workflow (#2589) 2023-02-06 15:03:54 +08:00
Frank Lee
5767f8e394
[workflow] hook compatibility test failure to lark (#2586) 2023-02-06 14:56:31 +08:00
Frank Lee
186ddce2c4
[workflow] hook example test alert with lark (#2585) 2023-02-06 14:38:35 +08:00
Frank Lee
788e138960
[workflow] added notification if scheduled build fails (#2574)
* [workflow] added notification if scheduled build fails

* polish code

* polish code
2023-02-06 14:03:13 +08:00
Frank Lee
8af5a0799b
[workflow] added discussion stats to community report (#2572)
* [workflow] added discussion stats to community report

* polish code
2023-02-06 13:47:59 +08:00
Frank Lee
b0c29d1b4c
[workflow] refactored compatibility test workflow for maintenability (#2560) 2023-02-06 13:47:50 +08:00
Frank Lee
76edb04b0d
[workflow] adjust the GPU memory threshold for scheduled unit test (#2558)
* [workflow] adjust the GPU memory threshold for scheduled unit test

* polish code
2023-02-06 13:47:25 +08:00
Frank Lee
ba47517342
[workflow] fixed example check workflow (#2554)
* [workflow] fixed example check workflow

* polish yaml
2023-02-06 13:46:52 +08:00
Frank Lee
fb1a4c0d96
[doc] fixed issue link in pr template (#2577) 2023-02-06 10:29:24 +08:00
Frank Lee
2eb4268b47
[workflow] fixed typos in the leaderboard workflow (#2567) 2023-02-03 17:25:56 +08:00
Frank Lee
7b4ad6e0fc
[workflow] added contributor and user-engagement report (#2564)
* [workflow] added contributor and user-engagement report

* polish code

* polish code
2023-02-03 17:12:35 +08:00
Frank Lee
578374d0de
[doc] fixed the typo in pr template (#2556) 2023-02-03 10:47:00 +08:00
Frank Lee
8438c35a5f
[doc] added pull request template (#2550)
* [doc] added pull  request template

* polish code

* polish code
2023-02-02 18:16:03 +08:00
Frank Lee
b55deb0662
[workflow] only report coverage for changed files (#2524)
* [workflow] only report coverage for changed files

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file

* polish file
2023-01-30 21:28:27 +08:00
Frank Lee
0af793836c
[workflow] fixed changed file detection (#2515) 2023-01-26 16:34:19 +08:00
Frank Lee
579dba572f
[workflow] fixed the skip condition of example weekly check workflow (#2481) 2023-01-16 10:05:41 +08:00
Frank Lee
32c46e146e
[workflow] automated bdist wheel build (#2459)
* [workflow] automated bdist wheel build

* polish workflow

* polish readme

* polish readme
2023-01-12 10:57:02 +08:00
Frank Lee
c9ec5190a0
[workflow] automated the compatiblity test (#2453)
* [workflow] automated the compatiblity test

* polish code
2023-01-11 23:40:16 +08:00
Frank Lee
483efdabc5
[workflow] fixed the on-merge condition check (#2452) 2023-01-11 17:22:11 +08:00
Frank Lee
1b7587d958
[workflow] make test coverage report collapsable (#2436) 2023-01-11 13:37:48 +08:00
Frank Lee
a3e5496156
[example] improved the clarity yof the example readme (#2427)
* [example] improved the clarity yof the example readme

* polish workflow

* polish workflow

* polish workflow

* polish workflow

* polish workflow

* polish workflow
2023-01-11 10:46:32 +08:00
Frank Lee
21256674e9
[workflow] report test coverage even if below threshold (#2431) 2023-01-11 10:44:52 +08:00
Frank Lee
cd38167c1a
[doc] added documentation for CI/CD (#2420)
* [doc] added documentation for CI/CD

* polish markdown

* polish markdown

* polish markdown
2023-01-10 22:30:32 +08:00
Frank Lee
b3472d32e0
[workflow]auto comment with test coverage report (#2419)
* [workflow]auto comment with test coverage report

* polish code

* polish yaml
2023-01-10 22:30:16 +08:00
Frank Lee
57b6157b6c
[workflow] auto comment if precommit check fails (#2417) 2023-01-10 15:06:27 +08:00
Frank Lee
9d432230ba
[workflow] added translation for non-english comments (#2414) 2023-01-10 12:06:01 +08:00
Frank Lee
4befaabace
[workflow] added precommit check for code consistency (#2401)
* [workflow] added precommit check for code consistency

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-01-10 11:40:04 +08:00
Frank Lee
8327932d2c
[workflow] refactored the example check workflow (#2411)
* [workflow] refactored the example check workflow

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-01-10 11:26:19 +08:00
Frank Lee
8de8de9fa3
[docker] updated Dockerfile and release workflow (#2410) 2023-01-10 09:26:14 +08:00
Frank Lee
53bb8682a2
[worfklow] added coverage test (#2399)
* [worfklow] added coverage test

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-01-09 17:57:57 +08:00
Frank Lee
d3f5ce9efb
[workflow] added nightly release to pypi (#2403) 2023-01-09 16:21:44 +08:00
Frank Lee
2add870138
[workflow] added missing file change detection output (#2387) 2023-01-09 09:18:44 +08:00
ziyuhuang123
7080a8edb0
[workflow]New version: Create workflow files for examples' auto check (#2298)
* [workflows]bug_repair

* [workflow]new_pr_fixing_bugs

Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2023-01-06 09:26:49 +08:00
Frank Lee
f1bc2418c4
[setup] make cuda extension build optional (#2336)
* [setup] make cuda extension build optional

* polish code

* polish code

* polish code
2023-01-05 15:13:11 +08:00
Frank Lee
6e34cc0830
[workflow] fixed pypi release workflow error (#2328) 2023-01-05 10:52:43 +08:00
Frank Lee
2916eed34a
[workflow] fixed pypi release workflow error (#2327) 2023-01-05 10:48:38 +08:00
Frank Lee
8d8dec09ba
[workflow] added workflow to release to pypi upon version change (#2320)
* [workflow] added workflow to release to pypi upon version change

* polish code

* polish code

* polish code
2023-01-05 10:40:18 +08:00
Frank Lee
693ef121a1
[workflow] removed unused assign reviewer workflow (#2318) 2023-01-05 10:40:07 +08:00
Frank Lee
e8dfa2e2e0
[workflow] rebuild cuda kernels when kernel-related files change (#2317) 2023-01-04 17:23:59 +08:00
binmakeswell
bb6245612d
[GitHub] update issue template (#2023)
* Update bug-report.yml

* Update documentation.yml

* Update bug-report.yml

* Update feature_request.yml

* Update proposal.yml
2022-11-25 09:13:27 +08:00
Frank Lee
254ee2c54f
[workflow] removed unused pypi release workflow (#2022) 2022-11-24 17:27:55 +08:00
Frank Lee
7242bffc5f
[workflow] fixed the python and cpu arch mismatch (#2010) 2022-11-23 17:24:17 +08:00
Frank Lee
56a3dcdabd
[workflow] fixed the typo in condarc (#2006) 2022-11-23 16:05:30 +08:00
Frank Lee
7ad9bd14d8
[workflow] added conda cache and fixed no-compilation bug in release (#2005) 2022-11-23 15:52:42 +08:00
ver217
f8a7148dec
[kernel] move all symlinks of kernel to colossalai._C (#1971) 2022-11-17 13:42:33 +08:00
yuxuan-lou
cc27adceb0
[NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1856) 2022-11-09 16:54:09 +08:00
Ofey Chan
e5b1a0c9be
[NFC] polish .github/workflows/scripts/generate_release_draft.py code style (#1855) 2022-11-09 15:28:33 +08:00
Kai Wang (Victor Kai)
a3b1d07ca4
[NFC] polish workflows code style (#1854) 2022-11-09 14:50:09 +08:00
RichardoLuo
81a642fe8d
[NFC] polish <.github/workflows/release_nightly.yml> code style (#1851)
Co-authored-by: RichardoLuo <14049555596@qq.com>
2022-11-09 14:48:53 +08:00
xyupeng
b0a138aa22 [NFC] polish .github/workflows/build.yml code style (#1837) 2022-11-09 12:08:47 +08:00
Maruyama_Aya
90833b45dd [NFC] polish .github/workflows/release_docker.yml code style 2022-11-09 12:08:47 +08:00
shenggan
b0706fbb00 [NFC] polish .github/workflows/submodule.yml code style (#1822) 2022-11-09 12:08:47 +08:00
Arsmart1
fc8d8b1b9c [NFC] polish .github/workflows/draft_github_release_post.yml code style (#1820) 2022-11-09 12:08:47 +08:00
Zangwei Zheng
25993db98a [NFC] polish .github/workflows/build_gpu_8.yml code style (#1813) 2022-11-09 12:08:47 +08:00
Arsmart1
8860d37846 [NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1721) 2022-10-19 12:20:51 +08:00
Frank Lee
9d0560af9c
[workflow] handled the git directory ownership error (#1741) 2022-10-19 11:59:11 +08:00
Frank Lee
725666d6a9
[workflow] deactivate conda environment before removing (#1606) 2022-09-19 12:05:33 +08:00
Frank Lee
6474e31556
[workflow] added TensorNVMe to compatibility test (#1449) 2022-08-12 17:06:29 +08:00
ver217
c415240db6
[nvme] CPUAdam and HybridAdam support NVMe offload (#1360)
* impl nvme optimizer

* update cpu adam

* add unit test

* update hybrid adam

* update docstr

* add TODOs

* update CI

* fix CI

* fix CI

* fix CI path

* fix CI path

* fix CI path

* fix install tensornvme

* fix CI

* fix CI path

* fix CI env variables

* test CI

* test CI

* fix CI

* fix nvme optim __del__

* fix adam __del__

* fix nvme optim

* fix CI env variables

* fix nvme optim import

* test CI

* test CI

* fix CI
2022-07-26 17:25:24 +08:00
Frank Lee
11d1436a67
[workflow] update docker build workflow to use proxy (#1334) 2022-07-18 14:09:41 +08:00
Frank Lee
069d6fdc84
[workflow] update 8-gpu test to use torch 1.11 (#1332) 2022-07-18 11:41:13 +08:00
Frank Lee
659a740738
[workflow] roll back to use torch 1.11 for unit testing (#1325) 2022-07-15 17:20:17 +08:00
Frank Lee
4d5dbf48a6
[workflow] fixed trigger condition for 8-gpu unit test (#1323) 2022-07-15 15:00:02 +08:00
Frank Lee
7c2634f4b3
[workflow] updated release bdist workflow (#1318)
* [workflow] updated release bdist workflow

* polish workflow

* polish workflow
2022-07-15 09:40:58 +08:00
Frank Lee
efdc240f1f
[workflow] disable SHM for compatibility CI on rtx3080 (#1315) 2022-07-14 17:44:43 +08:00
Frank Lee
c9c37dcc4d
[workflow] updated pytorch compatibility test (#1311) 2022-07-14 16:45:17 +08:00
lucasliunju
339520c6e0
[NFC] polish build_colossalai_wheel.py code style (#1306) 2022-07-14 10:41:01 +08:00
Frank Lee
ca73028a3a
[workflow] auto-publish docker image upon release (#1164) 2022-06-23 14:51:59 +08:00
Frank Lee
d415d73286
[workflow] fixed release post workflow (#1154) 2022-06-22 11:55:21 +08:00
Frank Lee
c77da0dc81
[workflow] fixed format error in yaml file (#1145) 2022-06-22 11:31:24 +08:00
Frank Lee
d1918304bb
[workflow] added workflow to auto draft the release post (#1144) 2022-06-21 14:43:25 +08:00
Frank Lee
e61dc31b05
[ci] added scripts to auto-generate release post text (#1142)
* [ci] added scripts to auto-generate release post text

* polish code
2022-06-21 12:22:53 +08:00
Frank Lee
5a9d8ef4d5
[workflow] fixed 8-gpu test workflow (#1101) 2022-06-13 13:50:22 +08:00
Frank Lee
03e52ecba3
[workflow] added regular 8 GPU testing (#1099)
* [workflow] added regular 8 GPU testing

* polish workflow
2022-06-10 17:38:15 +08:00
Frank Lee
1bd8a72fc9
[workflow] disable p2p via shared memory on non-nvlink machine (#1086) 2022-06-09 15:24:35 +08:00
Frank Lee
65ee6dcc20
[test] ignore 8 gpu test (#1080)
* [test] ignore 8 gpu test

* polish code

* polish workflow

* polish workflow
2022-06-08 23:14:18 +08:00
Frank Lee
cfa6c1b46b
[ci] fixed nightly build workflow (#1040) 2022-05-31 10:43:18 +08:00
Frank Lee
ee50497db2
[ci] fixed nightly build workflow (#1029) 2022-05-26 11:42:50 +08:00
Frank Lee
58a7dd2ede
[ci] fixed nightly build workflow (#1022)
* [ci] fixed nightly build workflow

* [ci] fixed nightly build workflow

* [ci] fixed nightly build workflow
2022-05-24 22:38:56 +08:00
Frank Lee
1a76c88aba
[ci] added nightly build (#1018) (#1019) 2022-05-24 17:56:01 +08:00
Frank Lee
e17a43184b
[ci] update the docker image name (#1017) 2022-05-24 16:53:39 +08:00
Frank Lee
f0f35216f1
[ci] added wheel build scripts (#910)
* [ci] added wheel build scripts

* polish code and workflow

* polish code and workflow

* polish code and workflow

* polish code and workflow

* polish code and workflow

* polish code and workflow

* polish code and workflow

* polish code and workflow

* polish code and workflow

* polish code and workflow

* polish code and workflow

* [ci] polish wheel build scripts
2022-05-05 16:06:39 +08:00
ver217
16122d5fac
update release bdist CI (#902) 2022-04-28 17:52:57 +08:00
ver217
e46e423c00
add CI for releasing bdist wheel (#901) 2022-04-28 17:40:53 +08:00
Frank Lee
1258af71cc
[ci] cache cuda extension (#860) 2022-04-25 10:03:47 +08:00
ver217
70e8dd418b
[hotfix] update requirements-test (#701) 2022-04-08 16:52:36 +08:00
Frank Lee
1ae94ea85a
[ci] remove ipc config for rootless docker (#694) 2022-04-08 10:15:52 +08:00
Frank Lee
dbe8e030fb
[ci] added missing field in workflow (#692) 2022-04-07 18:07:15 +08:00
Frank Lee
0372ed7951
[ci] update workflow trigger condition and support options (#691) 2022-04-07 17:53:03 +08:00
Frank Lee
eace69387d
[ci] fixed compatibility workflow (#678) 2022-04-06 16:19:34 +08:00
Frank Lee
cc236916c6
[ci] replace the dngc ocker image with self-built pytorch image (#672) 2022-04-06 14:10:17 +08:00
binmakeswell
e0f875a8e2
[GitHub] Add prefix and label in issue template (#652) 2022-04-02 16:09:25 +08:00
Frank Lee
97933b6710
[devops] recover tsinghua pip source due to proxy issue (#509) 2022-03-24 16:11:49 +08:00
Frank Lee
65ad47c35c
[devops] remove tsinghua source for pip (#507) 2022-03-24 14:12:02 +08:00
Frank Lee
44f7bcb277
[devops] remove tsinghua source for pip (#505) 2022-03-24 14:03:05 +08:00