Frank Lee
6718a2f285
[workflow] cancel duplicated workflow jobs ( #3960 )
2023-06-12 15:11:27 +08:00
digger yu
1aadeedeea
fix typo .github/workflows/scripts/ ( #3946 )
2023-06-09 10:30:50 +08:00
Frank Lee
5e2132dcff
[workflow] added docker latest tag for release ( #3920 )
2023-06-07 15:37:37 +08:00
Hongxin Liu
c25d421f3e
[devops] hotfix testmon cache clean logic ( #3917 )
2023-06-07 12:39:12 +08:00
Hongxin Liu
b5f0566363
[chat] add distributed PPO trainer ( #3740 )
...
* Detached ppo (#9 )
* run the base
* working on dist ppo
* sync
* detached trainer
* update detached trainer. no maker update function
* facing init problem
* 1 maker 1 trainer detached run. but no model update
* facing cuda problem
* fix save functions
* verified maker update
* nothing
* add ignore
* analyize loss issue
* remove some debug codes
* facing 2m1t stuck issue
* 2m1t verified
* do not use torchrun
* working on 2m2t
* working on 2m2t
* initialize strategy in ray actor env
* facing actor's init order issue
* facing ddp model update issue (need unwarp ddp)
* unwrap ddp actor
* checking 1m2t stuck problem
* nothing
* set timeout for trainer choosing. It solves the stuck problem!
* delete some debug output
* rename to sync with upstream
* rename to sync with upstream
* coati rename
* nothing
* I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations
* experience_maker_holder performs target-revolving _send_experience() instead of length comparison.
* move code to ray subfolder
* working on pipeline inference
* apply comments
* working on pipeline strategy. in progress.
* remove pipeline code. clean this branch
* update remote parameters by state_dict. no test
* nothing
* state_dict sharding transfer
* merge debug branch
* gemini _unwrap_model fix
* simplify code
* simplify code & fix LoRALinear AttributeError
* critic unwrapped state_dict
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] add perfomance evaluator and fix bugs (#10 )
* [chat] add performance evaluator for ray
* [chat] refactor debug arg
* [chat] support hf config
* [chat] fix generation
* [chat] add 1mmt dummy example
* [chat] fix gemini ckpt
* split experience to send (#11 )
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] refactor trainer and maker (#12 )
* [chat] refactor experience maker holder
* [chat] refactor model init
* [chat] refactor trainer args
* [chat] refactor model init
* [chat] refactor trainer
* [chat] refactor experience sending logic and training loop args (#13 )
* [chat] refactor experience send logic
* [chat] refactor trainer
* [chat] refactor trainer
* [chat] refactor experience maker
* [chat] refactor pbar
* [chat] refactor example folder (#14 )
* [chat] support quant (#15 )
* [chat] add quant
* [chat] add quant example
* prompt example (#16 )
* prompt example
* prompt load csv data
* remove legacy try
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] add mmmt dummy example and refactor experience sending (#17 )
* [chat] add mmmt dummy example
* [chat] refactor naive strategy
* [chat] fix struck problem
* [chat] fix naive strategy
* [chat] optimize experience maker sending logic
* [chat] refactor sending assignment
* [chat] refactor performance evaluator (#18 )
* Prompt Example & requires_grad state_dict & sharding state_dict (#19 )
* prompt example
* prompt load csv data
* remove legacy try
* maker models require_grad set to False
* working on zero redundancy update
* mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
* remove legacy examples
* remove legacy examples
* remove replay buffer tp state. bad design
---------
Co-authored-by: csric <richcsr256@gmail.com>
* state_dict sending adapts to new unwrap function (#20 )
* prompt example
* prompt load csv data
* remove legacy try
* maker models require_grad set to False
* working on zero redundancy update
* mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad.
* remove legacy examples
* remove legacy examples
* remove replay buffer tp state. bad design
* opt benchmark
* better script
* nothing
* [chat] strategy refactor unwrap model
* [chat] strategy refactor save model
* [chat] add docstr
* [chat] refactor trainer save model
* [chat] fix strategy typing
* [chat] refactor trainer save model
* [chat] update readme
* [chat] fix unit test
* working on lora reconstruction
* state_dict sending adapts to new unwrap function
* remove comments
---------
Co-authored-by: csric <richcsr256@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
* [chat-ray] add readme (#21 )
* add readme
* transparent graph
* add note background
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] get images from url (#22 )
* Refactor/chat ray (#23 )
* [chat] lora add todo
* [chat] remove unused pipeline strategy
* [chat] refactor example structure
* [chat] setup ci for ray
* [chat-ray] Support LoRA trainer. LoRA weights reconstruction. (#24 )
* lora support prototype
* lora support
* 1mmt lora & remove useless code
---------
Co-authored-by: csric <richcsr256@gmail.com>
* [chat] fix test ci for ray
* [chat] fix test ci requirements for ray
* [chat] fix ray runtime env
* [chat] fix ray runtime env
* [chat] fix example ci docker args
* [chat] add debug info in trainer
* [chat] add nccl debug info
* [chat] skip ray test
* [doc] fix typo
---------
Co-authored-by: csric <59389055+CsRic@users.noreply.github.com>
Co-authored-by: csric <richcsr256@gmail.com>
2023-06-07 10:41:16 +08:00
Hongxin Liu
41fb7236aa
[devops] hotfix CI about testmon cache ( #3910 )
...
* [devops] hotfix CI about testmon cache
* [devops] fix testmon cahe on pr
2023-06-06 18:58:58 +08:00
Hongxin Liu
ec9bbc0094
[devops] improving testmon cache ( #3902 )
...
* [devops] improving testmon cache
* [devops] fix branch name with slash
* [devops] fix branch name with slash
* [devops] fix edit action
* [devops] fix edit action
* [devops] fix edit action
* [devops] fix edit action
* [devops] fix edit action
* [devops] fix edit action
* [devops] update readme
2023-06-06 11:32:31 +08:00
Frank Lee
ae959a72a5
[workflow] fixed workflow check for docker build ( #3849 )
2023-05-25 16:42:34 +08:00
Frank Lee
54e97ed7ea
[workflow] supported test on CUDA 10.2 ( #3841 )
2023-05-25 14:14:34 +08:00
Frank Lee
84500b7799
[workflow] fixed testmon cache in build CI ( #3806 )
...
* [workflow] fixed testmon cache in build CI
* polish code
2023-05-24 14:59:40 +08:00
Frank Lee
05b8a8de58
[workflow] changed to doc build to be on schedule and release ( #3825 )
...
* [workflow] changed to doc build to be on schedule and release
* polish code
2023-05-24 10:50:19 +08:00
digger yu
7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. ( #3808 )
2023-05-24 09:01:50 +08:00
Frank Lee
1e3b64f26c
[workflow] enblaed doc build from a forked repo ( #3815 )
2023-05-23 17:49:53 +08:00
Frank Lee
ad93c736ea
[workflow] enable testing for develop & feature branch ( #3801 )
2023-05-23 11:21:15 +08:00
Frank Lee
788e07dbc5
[workflow] fixed the docker build workflow ( #3794 )
...
* [workflow] fixed the docker build workflow
* polish code
2023-05-22 16:30:32 +08:00
liuzeming
4d29c0f8e0
Fix/docker action ( #3266 )
...
* [docker] Add ARG VERSION to determine the Tag
* [workflow] fixed the version in the release docker workflow
---------
Co-authored-by: liuzeming <liuzeming@4paradigm.com>
2023-05-22 15:04:00 +08:00
Hongxin Liu
b4788d63ed
[devops] fix doc test on pr ( #3782 )
2023-05-19 16:28:57 +08:00
Hongxin Liu
5dd573c6b6
[devops] fix ci for document check ( #3751 )
...
* [doc] add test info
* [devops] update doc check ci
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] add debug info
* [devops] remove debug info and update invalid doc
* [devops] add essential comments
2023-05-17 11:24:22 +08:00
Hongxin Liu
c03bd7c6b2
[devops] make build on PR run automatically ( #3748 )
...
* [devops] make build on PR run automatically
* [devops] update build on pr condition
2023-05-17 11:17:37 +08:00
Hongxin Liu
afb239bbf8
[devops] update torch version of CI ( #3725 )
...
* [test] fix flop tensor test
* [test] fix autochunk test
* [test] fix lazyinit test
* [devops] update torch version of CI
* [devops] enable testmon
* [devops] fix ci
* [devops] fix ci
* [test] fix checkpoint io test
* [test] fix cluster test
* [test] fix timm test
* [devops] fix ci
* [devops] fix ci
* [devops] fix ci
* [devops] fix ci
* [devops] force sync to test ci
* [test] skip fsdp test
2023-05-15 17:20:56 +08:00
Hongxin Liu
50793b35f4
[gemini] accelerate inference ( #3641 )
...
* [gemini] support don't scatter after inference
* [chat] update colossalai strategy
* [chat] fix opt benchmark
* [chat] update opt benchmark
* [gemini] optimize inference
* [test] add gemini inference test
* [chat] fix unit test ci
* [chat] fix ci
* [chat] fix ci
* [chat] skip checkpoint test
2023-04-26 16:32:40 +08:00
Hongxin Liu
179558a87a
[devops] fix chat ci ( #3628 )
2023-04-24 10:55:14 +08:00
digger-yu
633bac2f58
[doc] .github/workflows/README.md ( #3605 )
...
Fixed several word spelling errors
change "compatiblity" to "compatibility" etc.
2023-04-20 10:36:28 +08:00
Camille Zhong
36a519b49f
Update test_ci.sh
...
update
Update test_ci.sh
Update test_ci.sh
Update test_ci.sh
Update test_ci.sh
Update test_ci.sh
Update test_ci.sh
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update test_ci.sh
Update test_ci.sh
update
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
update ci
Update test_ci.sh
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
Update test_ci.sh
Update test_ci.sh
Update run_chatgpt_examples.yml
Update test_ci.sh
Update test_ci.sh
Update test_ci.sh
update test ci
RoBERTa for RLHF Stage 2 & 3 (still in testing)
Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"
This reverts commit 06741d894d
.
Add RoBERTa for RLHF stage 2 & 3
1. add roberta folder under model folder
2. add roberta option in train_reward_model.py
3. add some test in testci
Update test_ci.sh
Revert "Update test_ci.sh"
This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.
Add RoBERTa for RLHF Stage 2 & 3 (test)
RoBERTa for RLHF Stage 2 & 3 (still in testing)
Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"
This reverts commit 06741d894d
.
Add RoBERTa for RLHF stage 2 & 3
1. add roberta folder under model folder
2. add roberta option in train_reward_model.py
3. add some test in testci
Update test_ci.sh
Revert "Update test_ci.sh"
This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a.
update roberta with coati
chat ci update
Revert "chat ci update"
This reverts commit 17ae7ae01fa752bd3289fc39069868fde99cf846.
[test]chat_update_ci
Update test_ci.sh
Update test_ci.sh
test
Update gpt_critic.py
Update gpt_critic.py
Update run_chatgpt_unit_tests.yml
update test ci
update
update
update
update
Update test_ci.sh
update
Update test_ci.sh
Update test_ci.sh
Update run_chatgpt_examples.yml
Update run_chatgpt_examples.yml
2023-04-18 14:33:12 +08:00
digger-yu
6e7e43c6fe
[doc] Update .github/workflows/README.md ( #3577 )
...
Optimization Code
I think there were two extra $ entered here, which have been deleted
2023-04-17 16:27:38 +08:00
Frank Lee
80eba05b0a
[test] refactor tests with spawn ( #3452 )
...
* [test] added spawn decorator
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2023-04-06 14:51:35 +08:00
Hakjin Lee
1653063fce
[CI] Fix pre-commit workflow ( #3238 )
2023-03-27 09:41:08 +08:00
Frank Lee
169ed4d24e
[workflow] purged extension cache before GPT test ( #3128 )
2023-03-14 10:11:32 +08:00
Frank Lee
91ccf97514
[workflow] fixed doc build trigger condition ( #3072 )
2023-03-09 17:31:41 +08:00
Frank Lee
8fedc8766a
[workflow] supported conda package installation in doc test ( #3028 )
...
* [workflow] supported conda package installation in doc test
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2023-03-07 14:21:26 +08:00
Frank Lee
2cd6ba3098
[workflow] fixed the post-commit failure when no formatting needed ( #3020 )
...
* [workflow] fixed the post-commit failure when no formatting needed
* polish code
* polish code
* polish code
2023-03-07 13:35:45 +08:00
Frank Lee
2e427ddf42
[revert] recover "[refactor] restructure configuration files ( #2977 )" ( #3022 )
...
This reverts commit 35c8f4ce47
.
2023-03-07 13:31:23 +08:00
Saurav Maheshkar
35c8f4ce47
[refactor] restructure configuration files ( #2977 )
...
* gh: move CONTRIBUTING to .github
* chore: move isort config to pyproject
* chore: move pytest config to pyproject
* chore: move yapf config to pyproject
* chore: move clang-format config to pre-commit
2023-03-05 20:29:34 +08:00
Frank Lee
77b88a3849
[workflow] added auto doc test on PR ( #2929 )
...
* [workflow] added auto doc test on PR
* [workflow] added doc test workflow
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2023-02-28 11:10:38 +08:00
Frank Lee
e33c043dec
[workflow] moved pre-commit to post-commit ( #2895 )
2023-02-24 14:41:33 +08:00
LuGY
dbd0fd1522
[CI/CD] fix nightly release CD running on forked repo ( #2812 )
...
* [CI/CD] fix nightly release CD running on forker repo
* fix misunderstanding of dispatch
* remove some build condition, enable notify even when release failed
2023-02-18 13:27:13 +08:00
ver217
9c0943ecdb
[chatgpt] optimize generation kwargs ( #2717 )
...
* [chatgpt] ppo trainer use default generate args
* [chatgpt] example remove generation preparing fn
* [chatgpt] benchmark remove generation preparing fn
* [chatgpt] fix ci
2023-02-15 13:59:58 +08:00
Frank Lee
2045d45ab7
[doc] updated documentation version list ( #2715 )
2023-02-15 11:24:18 +08:00
ver217
f6b4ca4e6c
[devops] add chatgpt ci ( #2713 )
2023-02-15 10:53:54 +08:00
Frank Lee
89f8975fb8
[workflow] fixed tensor-nvme build caching ( #2711 )
2023-02-15 10:12:55 +08:00
Frank Lee
5cd8cae0c9
[workflow] fixed communtity report ranking ( #2680 )
2023-02-13 17:04:49 +08:00
Frank Lee
c44fd0c867
[workflow] added trigger to build doc upon release ( #2678 )
2023-02-13 16:53:26 +08:00
Frank Lee
327bc06278
[workflow] added doc build test ( #2675 )
...
* [workflow] added doc build test
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2023-02-13 15:55:57 +08:00
Frank Lee
94f87f9651
[workflow] fixed gpu memory check condition ( #2659 )
2023-02-10 09:59:07 +08:00
Frank Lee
85b2303b55
[doc] migrate the markdown files ( #2652 )
2023-02-09 14:21:38 +08:00
Frank Lee
8518263b80
[test] fixed the triton version for testing ( #2608 )
2023-02-07 13:49:38 +08:00
Frank Lee
aa7e9e4794
[workflow] fixed the test coverage report ( #2614 )
...
* [workflow] fixed the test coverage report
* polish code
2023-02-07 11:50:53 +08:00
Frank Lee
b3973b995a
[workflow] fixed test coverage report ( #2611 )
2023-02-07 11:02:56 +08:00
Frank Lee
f566b0ce6b
[workflow] fixed broken rellease workflows ( #2604 )
2023-02-06 21:40:19 +08:00
Frank Lee
f7458d3ec7
[release] v0.2.1 ( #2602 )
...
* [release] v0.2.1
* polish code
2023-02-06 20:46:18 +08:00
Frank Lee
719c4d5553
[doc] updated readme for CI/CD ( #2600 )
2023-02-06 17:42:15 +08:00
Frank Lee
4d582893a7
[workflow] added cuda extension build test before release ( #2598 )
...
* [workflow] added cuda extension build test before release
* polish code
2023-02-06 17:07:41 +08:00
Frank Lee
0c03802bff
[workflow] hooked pypi release with lark ( #2596 )
2023-02-06 16:29:04 +08:00
Frank Lee
fd90245399
[workflow] hooked docker release with lark ( #2594 )
2023-02-06 16:15:46 +08:00
Frank Lee
d6cc8f313e
[workflow] added test-pypi check before release ( #2591 )
...
* [workflow] added test-pypi check before release
* polish code
2023-02-06 15:42:08 +08:00
Frank Lee
2059408edc
[workflow] fixed the typo in the example check workflow ( #2589 )
2023-02-06 15:03:54 +08:00
Frank Lee
5767f8e394
[workflow] hook compatibility test failure to lark ( #2586 )
2023-02-06 14:56:31 +08:00
Frank Lee
186ddce2c4
[workflow] hook example test alert with lark ( #2585 )
2023-02-06 14:38:35 +08:00
Frank Lee
788e138960
[workflow] added notification if scheduled build fails ( #2574 )
...
* [workflow] added notification if scheduled build fails
* polish code
* polish code
2023-02-06 14:03:13 +08:00
Frank Lee
8af5a0799b
[workflow] added discussion stats to community report ( #2572 )
...
* [workflow] added discussion stats to community report
* polish code
2023-02-06 13:47:59 +08:00
Frank Lee
b0c29d1b4c
[workflow] refactored compatibility test workflow for maintenability ( #2560 )
2023-02-06 13:47:50 +08:00
Frank Lee
76edb04b0d
[workflow] adjust the GPU memory threshold for scheduled unit test ( #2558 )
...
* [workflow] adjust the GPU memory threshold for scheduled unit test
* polish code
2023-02-06 13:47:25 +08:00
Frank Lee
ba47517342
[workflow] fixed example check workflow ( #2554 )
...
* [workflow] fixed example check workflow
* polish yaml
2023-02-06 13:46:52 +08:00
Frank Lee
fb1a4c0d96
[doc] fixed issue link in pr template ( #2577 )
2023-02-06 10:29:24 +08:00
Frank Lee
2eb4268b47
[workflow] fixed typos in the leaderboard workflow ( #2567 )
2023-02-03 17:25:56 +08:00
Frank Lee
7b4ad6e0fc
[workflow] added contributor and user-engagement report ( #2564 )
...
* [workflow] added contributor and user-engagement report
* polish code
* polish code
2023-02-03 17:12:35 +08:00
Frank Lee
578374d0de
[doc] fixed the typo in pr template ( #2556 )
2023-02-03 10:47:00 +08:00
Frank Lee
8438c35a5f
[doc] added pull request template ( #2550 )
...
* [doc] added pull request template
* polish code
* polish code
2023-02-02 18:16:03 +08:00
Frank Lee
b55deb0662
[workflow] only report coverage for changed files ( #2524 )
...
* [workflow] only report coverage for changed files
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
* polish file
2023-01-30 21:28:27 +08:00
Frank Lee
0af793836c
[workflow] fixed changed file detection ( #2515 )
2023-01-26 16:34:19 +08:00
Frank Lee
579dba572f
[workflow] fixed the skip condition of example weekly check workflow ( #2481 )
2023-01-16 10:05:41 +08:00
Frank Lee
32c46e146e
[workflow] automated bdist wheel build ( #2459 )
...
* [workflow] automated bdist wheel build
* polish workflow
* polish readme
* polish readme
2023-01-12 10:57:02 +08:00
Frank Lee
c9ec5190a0
[workflow] automated the compatiblity test ( #2453 )
...
* [workflow] automated the compatiblity test
* polish code
2023-01-11 23:40:16 +08:00
Frank Lee
483efdabc5
[workflow] fixed the on-merge condition check ( #2452 )
2023-01-11 17:22:11 +08:00
Frank Lee
1b7587d958
[workflow] make test coverage report collapsable ( #2436 )
2023-01-11 13:37:48 +08:00
Frank Lee
a3e5496156
[example] improved the clarity yof the example readme ( #2427 )
...
* [example] improved the clarity yof the example readme
* polish workflow
* polish workflow
* polish workflow
* polish workflow
* polish workflow
* polish workflow
2023-01-11 10:46:32 +08:00
Frank Lee
21256674e9
[workflow] report test coverage even if below threshold ( #2431 )
2023-01-11 10:44:52 +08:00
Frank Lee
cd38167c1a
[doc] added documentation for CI/CD ( #2420 )
...
* [doc] added documentation for CI/CD
* polish markdown
* polish markdown
* polish markdown
2023-01-10 22:30:32 +08:00
Frank Lee
b3472d32e0
[workflow]auto comment with test coverage report ( #2419 )
...
* [workflow]auto comment with test coverage report
* polish code
* polish yaml
2023-01-10 22:30:16 +08:00
Frank Lee
57b6157b6c
[workflow] auto comment if precommit check fails ( #2417 )
2023-01-10 15:06:27 +08:00
Frank Lee
9d432230ba
[workflow] added translation for non-english comments ( #2414 )
2023-01-10 12:06:01 +08:00
Frank Lee
4befaabace
[workflow] added precommit check for code consistency ( #2401 )
...
* [workflow] added precommit check for code consistency
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2023-01-10 11:40:04 +08:00
Frank Lee
8327932d2c
[workflow] refactored the example check workflow ( #2411 )
...
* [workflow] refactored the example check workflow
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2023-01-10 11:26:19 +08:00
Frank Lee
8de8de9fa3
[docker] updated Dockerfile and release workflow ( #2410 )
2023-01-10 09:26:14 +08:00
Frank Lee
53bb8682a2
[worfklow] added coverage test ( #2399 )
...
* [worfklow] added coverage test
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
* polish code
2023-01-09 17:57:57 +08:00
Frank Lee
d3f5ce9efb
[workflow] added nightly release to pypi ( #2403 )
2023-01-09 16:21:44 +08:00
Frank Lee
2add870138
[workflow] added missing file change detection output ( #2387 )
2023-01-09 09:18:44 +08:00
ziyuhuang123
7080a8edb0
[workflow]New version: Create workflow files for examples' auto check ( #2298 )
...
* [workflows]bug_repair
* [workflow]new_pr_fixing_bugs
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2023-01-06 09:26:49 +08:00
Frank Lee
f1bc2418c4
[setup] make cuda extension build optional ( #2336 )
...
* [setup] make cuda extension build optional
* polish code
* polish code
* polish code
2023-01-05 15:13:11 +08:00
Frank Lee
6e34cc0830
[workflow] fixed pypi release workflow error ( #2328 )
2023-01-05 10:52:43 +08:00
Frank Lee
2916eed34a
[workflow] fixed pypi release workflow error ( #2327 )
2023-01-05 10:48:38 +08:00
Frank Lee
8d8dec09ba
[workflow] added workflow to release to pypi upon version change ( #2320 )
...
* [workflow] added workflow to release to pypi upon version change
* polish code
* polish code
* polish code
2023-01-05 10:40:18 +08:00
Frank Lee
693ef121a1
[workflow] removed unused assign reviewer workflow ( #2318 )
2023-01-05 10:40:07 +08:00
Frank Lee
e8dfa2e2e0
[workflow] rebuild cuda kernels when kernel-related files change ( #2317 )
2023-01-04 17:23:59 +08:00
binmakeswell
bb6245612d
[GitHub] update issue template ( #2023 )
...
* Update bug-report.yml
* Update documentation.yml
* Update bug-report.yml
* Update feature_request.yml
* Update proposal.yml
2022-11-25 09:13:27 +08:00
Frank Lee
254ee2c54f
[workflow] removed unused pypi release workflow ( #2022 )
2022-11-24 17:27:55 +08:00
Frank Lee
7242bffc5f
[workflow] fixed the python and cpu arch mismatch ( #2010 )
2022-11-23 17:24:17 +08:00
Frank Lee
56a3dcdabd
[workflow] fixed the typo in condarc ( #2006 )
2022-11-23 16:05:30 +08:00
Frank Lee
7ad9bd14d8
[workflow] added conda cache and fixed no-compilation bug in release ( #2005 )
2022-11-23 15:52:42 +08:00
ver217
f8a7148dec
[kernel] move all symlinks of kernel to colossalai._C
( #1971 )
2022-11-17 13:42:33 +08:00
yuxuan-lou
cc27adceb0
[NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style ( #1856 )
2022-11-09 16:54:09 +08:00
Ofey Chan
e5b1a0c9be
[NFC] polish .github/workflows/scripts/generate_release_draft.py code style ( #1855 )
2022-11-09 15:28:33 +08:00
Kai Wang (Victor Kai)
a3b1d07ca4
[NFC] polish workflows code style ( #1854 )
2022-11-09 14:50:09 +08:00
RichardoLuo
81a642fe8d
[NFC] polish <.github/workflows/release_nightly.yml> code style ( #1851 )
...
Co-authored-by: RichardoLuo <14049555596@qq.com>
2022-11-09 14:48:53 +08:00
xyupeng
b0a138aa22
[NFC] polish .github/workflows/build.yml code style ( #1837 )
2022-11-09 12:08:47 +08:00
Maruyama_Aya
90833b45dd
[NFC] polish .github/workflows/release_docker.yml code style
2022-11-09 12:08:47 +08:00
shenggan
b0706fbb00
[NFC] polish .github/workflows/submodule.yml code style ( #1822 )
2022-11-09 12:08:47 +08:00
Arsmart1
fc8d8b1b9c
[NFC] polish .github/workflows/draft_github_release_post.yml code style ( #1820 )
2022-11-09 12:08:47 +08:00
Zangwei Zheng
25993db98a
[NFC] polish .github/workflows/build_gpu_8.yml code style ( #1813 )
2022-11-09 12:08:47 +08:00
Arsmart1
8860d37846
[NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style ( #1721 )
2022-10-19 12:20:51 +08:00
Frank Lee
9d0560af9c
[workflow] handled the git directory ownership error ( #1741 )
2022-10-19 11:59:11 +08:00
Frank Lee
725666d6a9
[workflow] deactivate conda environment before removing ( #1606 )
2022-09-19 12:05:33 +08:00
Frank Lee
6474e31556
[workflow] added TensorNVMe to compatibility test ( #1449 )
2022-08-12 17:06:29 +08:00
ver217
c415240db6
[nvme] CPUAdam and HybridAdam support NVMe offload ( #1360 )
...
* impl nvme optimizer
* update cpu adam
* add unit test
* update hybrid adam
* update docstr
* add TODOs
* update CI
* fix CI
* fix CI
* fix CI path
* fix CI path
* fix CI path
* fix install tensornvme
* fix CI
* fix CI path
* fix CI env variables
* test CI
* test CI
* fix CI
* fix nvme optim __del__
* fix adam __del__
* fix nvme optim
* fix CI env variables
* fix nvme optim import
* test CI
* test CI
* fix CI
2022-07-26 17:25:24 +08:00
Frank Lee
11d1436a67
[workflow] update docker build workflow to use proxy ( #1334 )
2022-07-18 14:09:41 +08:00
Frank Lee
069d6fdc84
[workflow] update 8-gpu test to use torch 1.11 ( #1332 )
2022-07-18 11:41:13 +08:00
Frank Lee
659a740738
[workflow] roll back to use torch 1.11 for unit testing ( #1325 )
2022-07-15 17:20:17 +08:00
Frank Lee
4d5dbf48a6
[workflow] fixed trigger condition for 8-gpu unit test ( #1323 )
2022-07-15 15:00:02 +08:00
Frank Lee
7c2634f4b3
[workflow] updated release bdist workflow ( #1318 )
...
* [workflow] updated release bdist workflow
* polish workflow
* polish workflow
2022-07-15 09:40:58 +08:00
Frank Lee
efdc240f1f
[workflow] disable SHM for compatibility CI on rtx3080 ( #1315 )
2022-07-14 17:44:43 +08:00
Frank Lee
c9c37dcc4d
[workflow] updated pytorch compatibility test ( #1311 )
2022-07-14 16:45:17 +08:00
lucasliunju
339520c6e0
[NFC] polish build_colossalai_wheel.py code style ( #1306 )
2022-07-14 10:41:01 +08:00
Frank Lee
ca73028a3a
[workflow] auto-publish docker image upon release ( #1164 )
2022-06-23 14:51:59 +08:00
Frank Lee
d415d73286
[workflow] fixed release post workflow ( #1154 )
2022-06-22 11:55:21 +08:00
Frank Lee
c77da0dc81
[workflow] fixed format error in yaml file ( #1145 )
2022-06-22 11:31:24 +08:00
Frank Lee
d1918304bb
[workflow] added workflow to auto draft the release post ( #1144 )
2022-06-21 14:43:25 +08:00
Frank Lee
e61dc31b05
[ci] added scripts to auto-generate release post text ( #1142 )
...
* [ci] added scripts to auto-generate release post text
* polish code
2022-06-21 12:22:53 +08:00
Frank Lee
5a9d8ef4d5
[workflow] fixed 8-gpu test workflow ( #1101 )
2022-06-13 13:50:22 +08:00
Frank Lee
03e52ecba3
[workflow] added regular 8 GPU testing ( #1099 )
...
* [workflow] added regular 8 GPU testing
* polish workflow
2022-06-10 17:38:15 +08:00
Frank Lee
1bd8a72fc9
[workflow] disable p2p via shared memory on non-nvlink machine ( #1086 )
2022-06-09 15:24:35 +08:00
Frank Lee
65ee6dcc20
[test] ignore 8 gpu test ( #1080 )
...
* [test] ignore 8 gpu test
* polish code
* polish workflow
* polish workflow
2022-06-08 23:14:18 +08:00
Frank Lee
cfa6c1b46b
[ci] fixed nightly build workflow ( #1040 )
2022-05-31 10:43:18 +08:00
Frank Lee
ee50497db2
[ci] fixed nightly build workflow ( #1029 )
2022-05-26 11:42:50 +08:00
Frank Lee
58a7dd2ede
[ci] fixed nightly build workflow ( #1022 )
...
* [ci] fixed nightly build workflow
* [ci] fixed nightly build workflow
* [ci] fixed nightly build workflow
2022-05-24 22:38:56 +08:00
Frank Lee
1a76c88aba
[ci] added nightly build ( #1018 ) ( #1019 )
2022-05-24 17:56:01 +08:00
Frank Lee
e17a43184b
[ci] update the docker image name ( #1017 )
2022-05-24 16:53:39 +08:00
Frank Lee
f0f35216f1
[ci] added wheel build scripts ( #910 )
...
* [ci] added wheel build scripts
* polish code and workflow
* polish code and workflow
* polish code and workflow
* polish code and workflow
* polish code and workflow
* polish code and workflow
* polish code and workflow
* polish code and workflow
* polish code and workflow
* polish code and workflow
* polish code and workflow
* [ci] polish wheel build scripts
2022-05-05 16:06:39 +08:00
ver217
16122d5fac
update release bdist CI ( #902 )
2022-04-28 17:52:57 +08:00
ver217
e46e423c00
add CI for releasing bdist wheel ( #901 )
2022-04-28 17:40:53 +08:00
Frank Lee
1258af71cc
[ci] cache cuda extension ( #860 )
2022-04-25 10:03:47 +08:00
ver217
70e8dd418b
[hotfix] update requirements-test ( #701 )
2022-04-08 16:52:36 +08:00
Frank Lee
1ae94ea85a
[ci] remove ipc config for rootless docker ( #694 )
2022-04-08 10:15:52 +08:00
Frank Lee
dbe8e030fb
[ci] added missing field in workflow ( #692 )
2022-04-07 18:07:15 +08:00
Frank Lee
0372ed7951
[ci] update workflow trigger condition and support options ( #691 )
2022-04-07 17:53:03 +08:00
Frank Lee
eace69387d
[ci] fixed compatibility workflow ( #678 )
2022-04-06 16:19:34 +08:00
Frank Lee
cc236916c6
[ci] replace the dngc ocker image with self-built pytorch image ( #672 )
2022-04-06 14:10:17 +08:00
binmakeswell
e0f875a8e2
[GitHub] Add prefix and label in issue template ( #652 )
2022-04-02 16:09:25 +08:00
Frank Lee
97933b6710
[devops] recover tsinghua pip source due to proxy issue ( #509 )
2022-03-24 16:11:49 +08:00
Frank Lee
65ad47c35c
[devops] remove tsinghua source for pip ( #507 )
2022-03-24 14:12:02 +08:00
Frank Lee
44f7bcb277
[devops] remove tsinghua source for pip ( #505 )
2022-03-24 14:03:05 +08:00