ColossalAI

mirror of https://github.com/hpcaitech/ColossalAI.git synced 2025-08-12 21:25:53 +00:00

Author	SHA1	Message	Date
Frank Lee	6718a2f285	[workflow] cancel duplicated workflow jobs (#3960 )	2023-06-12 15:11:27 +08:00
digger yu	1aadeedeea	fix typo .github/workflows/scripts/ (#3946 )	2023-06-09 10:30:50 +08:00
Frank Lee	5e2132dcff	[workflow] added docker latest tag for release (#3920 )	2023-06-07 15:37:37 +08:00
Hongxin Liu	c25d421f3e	[devops] hotfix testmon cache clean logic (#3917 )	2023-06-07 12:39:12 +08:00
Hongxin Liu	b5f0566363	[chat] add distributed PPO trainer (#3740 ) * Detached ppo (#9) * run the base * working on dist ppo * sync * detached trainer * update detached trainer. no maker update function * facing init problem * 1 maker 1 trainer detached run. but no model update * facing cuda problem * fix save functions * verified maker update * nothing * add ignore * analyize loss issue * remove some debug codes * facing 2m1t stuck issue * 2m1t verified * do not use torchrun * working on 2m2t * working on 2m2t * initialize strategy in ray actor env * facing actor's init order issue * facing ddp model update issue (need unwarp ddp) * unwrap ddp actor * checking 1m2t stuck problem * nothing * set timeout for trainer choosing. It solves the stuck problem! * delete some debug output * rename to sync with upstream * rename to sync with upstream * coati rename * nothing * I am going to detach the replaybuffer from trainer and make it a Ray Actor. Two benefits: 1. support TP trainer. 2. asynchronized buffer operations * experience_maker_holder performs target-revolving _send_experience() instead of length comparison. * move code to ray subfolder * working on pipeline inference * apply comments * working on pipeline strategy. in progress. * remove pipeline code. clean this branch * update remote parameters by state_dict. no test * nothing * state_dict sharding transfer * merge debug branch * gemini _unwrap_model fix * simplify code * simplify code & fix LoRALinear AttributeError * critic unwrapped state_dict --------- Co-authored-by: csric <richcsr256@gmail.com> * [chat] add perfomance evaluator and fix bugs (#10) * [chat] add performance evaluator for ray * [chat] refactor debug arg * [chat] support hf config * [chat] fix generation * [chat] add 1mmt dummy example * [chat] fix gemini ckpt * split experience to send (#11) Co-authored-by: csric <richcsr256@gmail.com> * [chat] refactor trainer and maker (#12) * [chat] refactor experience maker holder * [chat] refactor model init * [chat] refactor trainer args * [chat] refactor model init * [chat] refactor trainer * [chat] refactor experience sending logic and training loop args (#13) * [chat] refactor experience send logic * [chat] refactor trainer * [chat] refactor trainer * [chat] refactor experience maker * [chat] refactor pbar * [chat] refactor example folder (#14) * [chat] support quant (#15) * [chat] add quant * [chat] add quant example * prompt example (#16) * prompt example * prompt load csv data * remove legacy try --------- Co-authored-by: csric <richcsr256@gmail.com> * [chat] add mmmt dummy example and refactor experience sending (#17) * [chat] add mmmt dummy example * [chat] refactor naive strategy * [chat] fix struck problem * [chat] fix naive strategy * [chat] optimize experience maker sending logic * [chat] refactor sending assignment * [chat] refactor performance evaluator (#18) * Prompt Example & requires_grad state_dict & sharding state_dict (#19) * prompt example * prompt load csv data * remove legacy try * maker models require_grad set to False * working on zero redundancy update * mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad. * remove legacy examples * remove legacy examples * remove replay buffer tp state. bad design --------- Co-authored-by: csric <richcsr256@gmail.com> * state_dict sending adapts to new unwrap function (#20) * prompt example * prompt load csv data * remove legacy try * maker models require_grad set to False * working on zero redundancy update * mmmt_prompt example; naive strategy requires_grad state_dict & sharding; maker model requires_no_grad. * remove legacy examples * remove legacy examples * remove replay buffer tp state. bad design * opt benchmark * better script * nothing * [chat] strategy refactor unwrap model * [chat] strategy refactor save model * [chat] add docstr * [chat] refactor trainer save model * [chat] fix strategy typing * [chat] refactor trainer save model * [chat] update readme * [chat] fix unit test * working on lora reconstruction * state_dict sending adapts to new unwrap function * remove comments --------- Co-authored-by: csric <richcsr256@gmail.com> Co-authored-by: ver217 <lhx0217@gmail.com> * [chat-ray] add readme (#21) * add readme * transparent graph * add note background --------- Co-authored-by: csric <richcsr256@gmail.com> * [chat] get images from url (#22) * Refactor/chat ray (#23) * [chat] lora add todo * [chat] remove unused pipeline strategy * [chat] refactor example structure * [chat] setup ci for ray * [chat-ray] Support LoRA trainer. LoRA weights reconstruction. (#24) * lora support prototype * lora support * 1mmt lora & remove useless code --------- Co-authored-by: csric <richcsr256@gmail.com> * [chat] fix test ci for ray * [chat] fix test ci requirements for ray * [chat] fix ray runtime env * [chat] fix ray runtime env * [chat] fix example ci docker args * [chat] add debug info in trainer * [chat] add nccl debug info * [chat] skip ray test * [doc] fix typo --------- Co-authored-by: csric <59389055+CsRic@users.noreply.github.com> Co-authored-by: csric <richcsr256@gmail.com>	2023-06-07 10:41:16 +08:00
Hongxin Liu	41fb7236aa	[devops] hotfix CI about testmon cache (#3910 ) * [devops] hotfix CI about testmon cache * [devops] fix testmon cahe on pr	2023-06-06 18:58:58 +08:00
Hongxin Liu	ec9bbc0094	[devops] improving testmon cache (#3902 ) * [devops] improving testmon cache * [devops] fix branch name with slash * [devops] fix branch name with slash * [devops] fix edit action * [devops] fix edit action * [devops] fix edit action * [devops] fix edit action * [devops] fix edit action * [devops] fix edit action * [devops] update readme	2023-06-06 11:32:31 +08:00
Frank Lee	ae959a72a5	[workflow] fixed workflow check for docker build (#3849 )	2023-05-25 16:42:34 +08:00
Frank Lee	54e97ed7ea	[workflow] supported test on CUDA 10.2 (#3841 )	2023-05-25 14:14:34 +08:00
Frank Lee	84500b7799	[workflow] fixed testmon cache in build CI (#3806 ) * [workflow] fixed testmon cache in build CI * polish code	2023-05-24 14:59:40 +08:00
Frank Lee	05b8a8de58	[workflow] changed to doc build to be on schedule and release (#3825 ) * [workflow] changed to doc build to be on schedule and release * polish code	2023-05-24 10:50:19 +08:00
digger yu	7f8203af69	fix typo colossalai/auto_parallel autochunk fx/passes etc. (#3808 )	2023-05-24 09:01:50 +08:00
Frank Lee	1e3b64f26c	[workflow] enblaed doc build from a forked repo (#3815 )	2023-05-23 17:49:53 +08:00
Frank Lee	ad93c736ea	[workflow] enable testing for develop & feature branch (#3801 )	2023-05-23 11:21:15 +08:00
Frank Lee	788e07dbc5	[workflow] fixed the docker build workflow (#3794 ) * [workflow] fixed the docker build workflow * polish code	2023-05-22 16:30:32 +08:00
liuzeming	4d29c0f8e0	Fix/docker action (#3266 ) * [docker] Add ARG VERSION to determine the Tag * [workflow] fixed the version in the release docker workflow --------- Co-authored-by: liuzeming <liuzeming@4paradigm.com>	2023-05-22 15:04:00 +08:00
Hongxin Liu	b4788d63ed	[devops] fix doc test on pr (#3782 )	2023-05-19 16:28:57 +08:00
Hongxin Liu	5dd573c6b6	[devops] fix ci for document check (#3751 ) * [doc] add test info * [devops] update doc check ci * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] add debug info * [devops] remove debug info and update invalid doc * [devops] add essential comments	2023-05-17 11:24:22 +08:00
Hongxin Liu	c03bd7c6b2	[devops] make build on PR run automatically (#3748 ) * [devops] make build on PR run automatically * [devops] update build on pr condition	2023-05-17 11:17:37 +08:00
Hongxin Liu	afb239bbf8	[devops] update torch version of CI (#3725 ) * [test] fix flop tensor test * [test] fix autochunk test * [test] fix lazyinit test * [devops] update torch version of CI * [devops] enable testmon * [devops] fix ci * [devops] fix ci * [test] fix checkpoint io test * [test] fix cluster test * [test] fix timm test * [devops] fix ci * [devops] fix ci * [devops] fix ci * [devops] fix ci * [devops] force sync to test ci * [test] skip fsdp test	2023-05-15 17:20:56 +08:00
Hongxin Liu	50793b35f4	[gemini] accelerate inference (#3641 ) * [gemini] support don't scatter after inference * [chat] update colossalai strategy * [chat] fix opt benchmark * [chat] update opt benchmark * [gemini] optimize inference * [test] add gemini inference test * [chat] fix unit test ci * [chat] fix ci * [chat] fix ci * [chat] skip checkpoint test	2023-04-26 16:32:40 +08:00
Hongxin Liu	179558a87a	[devops] fix chat ci (#3628 )	2023-04-24 10:55:14 +08:00
digger-yu	633bac2f58	[doc] .github/workflows/README.md (#3605 ) Fixed several word spelling errors change "compatiblity" to "compatibility" etc.	2023-04-20 10:36:28 +08:00
Camille Zhong	36a519b49f	Update test_ci.sh update Update test_ci.sh Update test_ci.sh Update test_ci.sh Update test_ci.sh Update test_ci.sh Update test_ci.sh Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml Update test_ci.sh Update test_ci.sh update Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml update ci Update test_ci.sh Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml Update test_ci.sh Update test_ci.sh Update run_chatgpt_examples.yml Update test_ci.sh Update test_ci.sh Update test_ci.sh update test ci RoBERTa for RLHF Stage 2 & 3 (still in testing) Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci Update test_ci.sh Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. Add RoBERTa for RLHF Stage 2 & 3 (test) RoBERTa for RLHF Stage 2 & 3 (still in testing) Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)" This reverts commit `06741d894d`. Add RoBERTa for RLHF stage 2 & 3 1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci Update test_ci.sh Revert "Update test_ci.sh" This reverts commit 9c7352b81766f3177d31eeec0ec178a301df966a. update roberta with coati chat ci update Revert "chat ci update" This reverts commit 17ae7ae01fa752bd3289fc39069868fde99cf846. [test]chat_update_ci Update test_ci.sh Update test_ci.sh test Update gpt_critic.py Update gpt_critic.py Update run_chatgpt_unit_tests.yml update test ci update update update update Update test_ci.sh update Update test_ci.sh Update test_ci.sh Update run_chatgpt_examples.yml Update run_chatgpt_examples.yml	2023-04-18 14:33:12 +08:00
digger-yu	6e7e43c6fe	[doc] Update .github/workflows/README.md (#3577 ) Optimization Code I think there were two extra $ entered here, which have been deleted	2023-04-17 16:27:38 +08:00
Frank Lee	80eba05b0a	[test] refactor tests with spawn (#3452 ) * [test] added spawn decorator * polish code * polish code * polish code * polish code * polish code * polish code	2023-04-06 14:51:35 +08:00
Hakjin Lee	1653063fce	[CI] Fix pre-commit workflow (#3238 )	2023-03-27 09:41:08 +08:00
Frank Lee	169ed4d24e	[workflow] purged extension cache before GPT test (#3128 )	2023-03-14 10:11:32 +08:00
Frank Lee	91ccf97514	[workflow] fixed doc build trigger condition (#3072 )	2023-03-09 17:31:41 +08:00
Frank Lee	8fedc8766a	[workflow] supported conda package installation in doc test (#3028 ) * [workflow] supported conda package installation in doc test * polish code * polish code * polish code * polish code * polish code * polish code	2023-03-07 14:21:26 +08:00
Frank Lee	2cd6ba3098	[workflow] fixed the post-commit failure when no formatting needed (#3020 ) * [workflow] fixed the post-commit failure when no formatting needed * polish code * polish code * polish code	2023-03-07 13:35:45 +08:00
Frank Lee	2e427ddf42	[revert] recover "[refactor] restructure configuration files (#2977 )" (#3022 ) This reverts commit `35c8f4ce47`.	2023-03-07 13:31:23 +08:00
Saurav Maheshkar	35c8f4ce47	[refactor] restructure configuration files (#2977 ) * gh: move CONTRIBUTING to .github * chore: move isort config to pyproject * chore: move pytest config to pyproject * chore: move yapf config to pyproject * chore: move clang-format config to pre-commit	2023-03-05 20:29:34 +08:00
Frank Lee	77b88a3849	[workflow] added auto doc test on PR (#2929 ) * [workflow] added auto doc test on PR * [workflow] added doc test workflow * polish code * polish code * polish code * polish code * polish code * polish code * polish code	2023-02-28 11:10:38 +08:00
Frank Lee	e33c043dec	[workflow] moved pre-commit to post-commit (#2895 )	2023-02-24 14:41:33 +08:00
LuGY	dbd0fd1522	[CI/CD] fix nightly release CD running on forked repo (#2812 ) * [CI/CD] fix nightly release CD running on forker repo * fix misunderstanding of dispatch * remove some build condition, enable notify even when release failed	2023-02-18 13:27:13 +08:00
ver217	9c0943ecdb	[chatgpt] optimize generation kwargs (#2717 ) * [chatgpt] ppo trainer use default generate args * [chatgpt] example remove generation preparing fn * [chatgpt] benchmark remove generation preparing fn * [chatgpt] fix ci	2023-02-15 13:59:58 +08:00
Frank Lee	2045d45ab7	[doc] updated documentation version list (#2715 )	2023-02-15 11:24:18 +08:00
ver217	f6b4ca4e6c	[devops] add chatgpt ci (#2713 )	2023-02-15 10:53:54 +08:00
Frank Lee	89f8975fb8	[workflow] fixed tensor-nvme build caching (#2711 )	2023-02-15 10:12:55 +08:00
Frank Lee	5cd8cae0c9	[workflow] fixed communtity report ranking (#2680 )	2023-02-13 17:04:49 +08:00
Frank Lee	c44fd0c867	[workflow] added trigger to build doc upon release (#2678 )	2023-02-13 16:53:26 +08:00
Frank Lee	327bc06278	[workflow] added doc build test (#2675 ) * [workflow] added doc build test * polish code * polish code * polish code * polish code * polish code * polish code * polish code * polish code * polish code	2023-02-13 15:55:57 +08:00
Frank Lee	94f87f9651	[workflow] fixed gpu memory check condition (#2659 )	2023-02-10 09:59:07 +08:00
Frank Lee	85b2303b55	[doc] migrate the markdown files (#2652 )	2023-02-09 14:21:38 +08:00
Frank Lee	8518263b80	[test] fixed the triton version for testing (#2608 )	2023-02-07 13:49:38 +08:00
Frank Lee	aa7e9e4794	[workflow] fixed the test coverage report (#2614 ) * [workflow] fixed the test coverage report * polish code	2023-02-07 11:50:53 +08:00
Frank Lee	b3973b995a	[workflow] fixed test coverage report (#2611 )	2023-02-07 11:02:56 +08:00
Frank Lee	f566b0ce6b	[workflow] fixed broken rellease workflows (#2604 )	2023-02-06 21:40:19 +08:00
Frank Lee	f7458d3ec7	[release] v0.2.1 (#2602 ) * [release] v0.2.1 * polish code	2023-02-06 20:46:18 +08:00
Frank Lee	719c4d5553	[doc] updated readme for CI/CD (#2600 )	2023-02-06 17:42:15 +08:00
Frank Lee	4d582893a7	[workflow] added cuda extension build test before release (#2598 ) * [workflow] added cuda extension build test before release * polish code	2023-02-06 17:07:41 +08:00
Frank Lee	0c03802bff	[workflow] hooked pypi release with lark (#2596 )	2023-02-06 16:29:04 +08:00
Frank Lee	fd90245399	[workflow] hooked docker release with lark (#2594 )	2023-02-06 16:15:46 +08:00
Frank Lee	d6cc8f313e	[workflow] added test-pypi check before release (#2591 ) * [workflow] added test-pypi check before release * polish code	2023-02-06 15:42:08 +08:00
Frank Lee	2059408edc	[workflow] fixed the typo in the example check workflow (#2589 )	2023-02-06 15:03:54 +08:00
Frank Lee	5767f8e394	[workflow] hook compatibility test failure to lark (#2586 )	2023-02-06 14:56:31 +08:00
Frank Lee	186ddce2c4	[workflow] hook example test alert with lark (#2585 )	2023-02-06 14:38:35 +08:00
Frank Lee	788e138960	[workflow] added notification if scheduled build fails (#2574 ) * [workflow] added notification if scheduled build fails * polish code * polish code	2023-02-06 14:03:13 +08:00
Frank Lee	8af5a0799b	[workflow] added discussion stats to community report (#2572 ) * [workflow] added discussion stats to community report * polish code	2023-02-06 13:47:59 +08:00
Frank Lee	b0c29d1b4c	[workflow] refactored compatibility test workflow for maintenability (#2560 )	2023-02-06 13:47:50 +08:00
Frank Lee	76edb04b0d	[workflow] adjust the GPU memory threshold for scheduled unit test (#2558 ) * [workflow] adjust the GPU memory threshold for scheduled unit test * polish code	2023-02-06 13:47:25 +08:00
Frank Lee	ba47517342	[workflow] fixed example check workflow (#2554 ) * [workflow] fixed example check workflow * polish yaml	2023-02-06 13:46:52 +08:00
Frank Lee	fb1a4c0d96	[doc] fixed issue link in pr template (#2577 )	2023-02-06 10:29:24 +08:00
Frank Lee	2eb4268b47	[workflow] fixed typos in the leaderboard workflow (#2567 )	2023-02-03 17:25:56 +08:00
Frank Lee	7b4ad6e0fc	[workflow] added contributor and user-engagement report (#2564 ) * [workflow] added contributor and user-engagement report * polish code * polish code	2023-02-03 17:12:35 +08:00
Frank Lee	578374d0de	[doc] fixed the typo in pr template (#2556 )	2023-02-03 10:47:00 +08:00
Frank Lee	8438c35a5f	[doc] added pull request template (#2550 ) * [doc] added pull request template * polish code * polish code	2023-02-02 18:16:03 +08:00
Frank Lee	b55deb0662	[workflow] only report coverage for changed files (#2524 ) * [workflow] only report coverage for changed files * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file * polish file	2023-01-30 21:28:27 +08:00
Frank Lee	0af793836c	[workflow] fixed changed file detection (#2515 )	2023-01-26 16:34:19 +08:00
Frank Lee	579dba572f	[workflow] fixed the skip condition of example weekly check workflow (#2481 )	2023-01-16 10:05:41 +08:00
Frank Lee	32c46e146e	[workflow] automated bdist wheel build (#2459 ) * [workflow] automated bdist wheel build * polish workflow * polish readme * polish readme	2023-01-12 10:57:02 +08:00
Frank Lee	c9ec5190a0	[workflow] automated the compatiblity test (#2453 ) * [workflow] automated the compatiblity test * polish code	2023-01-11 23:40:16 +08:00
Frank Lee	483efdabc5	[workflow] fixed the on-merge condition check (#2452 )	2023-01-11 17:22:11 +08:00
Frank Lee	1b7587d958	[workflow] make test coverage report collapsable (#2436 )	2023-01-11 13:37:48 +08:00
Frank Lee	a3e5496156	[example] improved the clarity yof the example readme (#2427 ) * [example] improved the clarity yof the example readme * polish workflow * polish workflow * polish workflow * polish workflow * polish workflow * polish workflow	2023-01-11 10:46:32 +08:00
Frank Lee	21256674e9	[workflow] report test coverage even if below threshold (#2431 )	2023-01-11 10:44:52 +08:00
Frank Lee	cd38167c1a	[doc] added documentation for CI/CD (#2420 ) * [doc] added documentation for CI/CD * polish markdown * polish markdown * polish markdown	2023-01-10 22:30:32 +08:00
Frank Lee	b3472d32e0	[workflow]auto comment with test coverage report (#2419 ) * [workflow]auto comment with test coverage report * polish code * polish yaml	2023-01-10 22:30:16 +08:00
Frank Lee	57b6157b6c	[workflow] auto comment if precommit check fails (#2417 )	2023-01-10 15:06:27 +08:00
Frank Lee	9d432230ba	[workflow] added translation for non-english comments (#2414 )	2023-01-10 12:06:01 +08:00
Frank Lee	4befaabace	[workflow] added precommit check for code consistency (#2401 ) * [workflow] added precommit check for code consistency * polish code * polish code * polish code * polish code * polish code * polish code * polish code	2023-01-10 11:40:04 +08:00
Frank Lee	8327932d2c	[workflow] refactored the example check workflow (#2411 ) * [workflow] refactored the example check workflow * polish code * polish code * polish code * polish code * polish code * polish code * polish code * polish code * polish code * polish code * polish code	2023-01-10 11:26:19 +08:00
Frank Lee	8de8de9fa3	[docker] updated Dockerfile and release workflow (#2410 )	2023-01-10 09:26:14 +08:00
Frank Lee	53bb8682a2	[worfklow] added coverage test (#2399 ) * [worfklow] added coverage test * polish code * polish code * polish code * polish code * polish code * polish code * polish code * polish code	2023-01-09 17:57:57 +08:00
Frank Lee	d3f5ce9efb	[workflow] added nightly release to pypi (#2403 )	2023-01-09 16:21:44 +08:00
Frank Lee	2add870138	[workflow] added missing file change detection output (#2387 )	2023-01-09 09:18:44 +08:00
ziyuhuang123	7080a8edb0	[workflow]New version: Create workflow files for examples' auto check (#2298 ) * [workflows]bug_repair * [workflow]new_pr_fixing_bugs Co-authored-by: binmakeswell <binmakeswell@gmail.com>	2023-01-06 09:26:49 +08:00
Frank Lee	f1bc2418c4	[setup] make cuda extension build optional (#2336 ) * [setup] make cuda extension build optional * polish code * polish code * polish code	2023-01-05 15:13:11 +08:00
Frank Lee	6e34cc0830	[workflow] fixed pypi release workflow error (#2328 )	2023-01-05 10:52:43 +08:00
Frank Lee	2916eed34a	[workflow] fixed pypi release workflow error (#2327 )	2023-01-05 10:48:38 +08:00
Frank Lee	8d8dec09ba	[workflow] added workflow to release to pypi upon version change (#2320 ) * [workflow] added workflow to release to pypi upon version change * polish code * polish code * polish code	2023-01-05 10:40:18 +08:00
Frank Lee	693ef121a1	[workflow] removed unused assign reviewer workflow (#2318 )	2023-01-05 10:40:07 +08:00
Frank Lee	e8dfa2e2e0	[workflow] rebuild cuda kernels when kernel-related files change (#2317 )	2023-01-04 17:23:59 +08:00
binmakeswell	bb6245612d	[GitHub] update issue template (#2023 ) * Update bug-report.yml * Update documentation.yml * Update bug-report.yml * Update feature_request.yml * Update proposal.yml	2022-11-25 09:13:27 +08:00
Frank Lee	254ee2c54f	[workflow] removed unused pypi release workflow (#2022 )	2022-11-24 17:27:55 +08:00
Frank Lee	7242bffc5f	[workflow] fixed the python and cpu arch mismatch (#2010 )	2022-11-23 17:24:17 +08:00
Frank Lee	56a3dcdabd	[workflow] fixed the typo in condarc (#2006 )	2022-11-23 16:05:30 +08:00
Frank Lee	7ad9bd14d8	[workflow] added conda cache and fixed no-compilation bug in release (#2005 )	2022-11-23 15:52:42 +08:00
ver217	f8a7148dec	[kernel] move all symlinks of kernel to `colossalai._C` (#1971 )	2022-11-17 13:42:33 +08:00
yuxuan-lou	cc27adceb0	[NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1856 )	2022-11-09 16:54:09 +08:00
Ofey Chan	e5b1a0c9be	[NFC] polish .github/workflows/scripts/generate_release_draft.py code style (#1855 )	2022-11-09 15:28:33 +08:00
Kai Wang (Victor Kai)	a3b1d07ca4	[NFC] polish workflows code style (#1854 )	2022-11-09 14:50:09 +08:00
RichardoLuo	81a642fe8d	[NFC] polish <.github/workflows/release_nightly.yml> code style (#1851 ) Co-authored-by: RichardoLuo <14049555596@qq.com>	2022-11-09 14:48:53 +08:00
xyupeng	b0a138aa22	[NFC] polish .github/workflows/build.yml code style (#1837 )	2022-11-09 12:08:47 +08:00
Maruyama_Aya	90833b45dd	[NFC] polish .github/workflows/release_docker.yml code style	2022-11-09 12:08:47 +08:00
shenggan	b0706fbb00	[NFC] polish .github/workflows/submodule.yml code style (#1822 )	2022-11-09 12:08:47 +08:00
Arsmart1	fc8d8b1b9c	[NFC] polish .github/workflows/draft_github_release_post.yml code style (#1820 )	2022-11-09 12:08:47 +08:00
Zangwei Zheng	25993db98a	[NFC] polish .github/workflows/build_gpu_8.yml code style (#1813 )	2022-11-09 12:08:47 +08:00
Arsmart1	8860d37846	[NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1721 )	2022-10-19 12:20:51 +08:00
Frank Lee	9d0560af9c	[workflow] handled the git directory ownership error (#1741 )	2022-10-19 11:59:11 +08:00
Frank Lee	725666d6a9	[workflow] deactivate conda environment before removing (#1606 )	2022-09-19 12:05:33 +08:00
Frank Lee	6474e31556	[workflow] added TensorNVMe to compatibility test (#1449 )	2022-08-12 17:06:29 +08:00
ver217	c415240db6	[nvme] CPUAdam and HybridAdam support NVMe offload (#1360 ) * impl nvme optimizer * update cpu adam * add unit test * update hybrid adam * update docstr * add TODOs * update CI * fix CI * fix CI * fix CI path * fix CI path * fix CI path * fix install tensornvme * fix CI * fix CI path * fix CI env variables * test CI * test CI * fix CI * fix nvme optim __del__ * fix adam __del__ * fix nvme optim * fix CI env variables * fix nvme optim import * test CI * test CI * fix CI	2022-07-26 17:25:24 +08:00
Frank Lee	11d1436a67	[workflow] update docker build workflow to use proxy (#1334 )	2022-07-18 14:09:41 +08:00
Frank Lee	069d6fdc84	[workflow] update 8-gpu test to use torch 1.11 (#1332 )	2022-07-18 11:41:13 +08:00
Frank Lee	659a740738	[workflow] roll back to use torch 1.11 for unit testing (#1325 )	2022-07-15 17:20:17 +08:00
Frank Lee	4d5dbf48a6	[workflow] fixed trigger condition for 8-gpu unit test (#1323 )	2022-07-15 15:00:02 +08:00
Frank Lee	7c2634f4b3	[workflow] updated release bdist workflow (#1318 ) * [workflow] updated release bdist workflow * polish workflow * polish workflow	2022-07-15 09:40:58 +08:00
Frank Lee	efdc240f1f	[workflow] disable SHM for compatibility CI on rtx3080 (#1315 )	2022-07-14 17:44:43 +08:00
Frank Lee	c9c37dcc4d	[workflow] updated pytorch compatibility test (#1311 )	2022-07-14 16:45:17 +08:00
lucasliunju	339520c6e0	[NFC] polish build_colossalai_wheel.py code style (#1306 )	2022-07-14 10:41:01 +08:00
Frank Lee	ca73028a3a	[workflow] auto-publish docker image upon release (#1164 )	2022-06-23 14:51:59 +08:00
Frank Lee	d415d73286	[workflow] fixed release post workflow (#1154 )	2022-06-22 11:55:21 +08:00
Frank Lee	c77da0dc81	[workflow] fixed format error in yaml file (#1145 )	2022-06-22 11:31:24 +08:00
Frank Lee	d1918304bb	[workflow] added workflow to auto draft the release post (#1144 )	2022-06-21 14:43:25 +08:00
Frank Lee	e61dc31b05	[ci] added scripts to auto-generate release post text (#1142 ) * [ci] added scripts to auto-generate release post text * polish code	2022-06-21 12:22:53 +08:00
Frank Lee	5a9d8ef4d5	[workflow] fixed 8-gpu test workflow (#1101 )	2022-06-13 13:50:22 +08:00
Frank Lee	03e52ecba3	[workflow] added regular 8 GPU testing (#1099 ) * [workflow] added regular 8 GPU testing * polish workflow	2022-06-10 17:38:15 +08:00
Frank Lee	1bd8a72fc9	[workflow] disable p2p via shared memory on non-nvlink machine (#1086 )	2022-06-09 15:24:35 +08:00
Frank Lee	65ee6dcc20	[test] ignore 8 gpu test (#1080 ) * [test] ignore 8 gpu test * polish code * polish workflow * polish workflow	2022-06-08 23:14:18 +08:00
Frank Lee	cfa6c1b46b	[ci] fixed nightly build workflow (#1040 )	2022-05-31 10:43:18 +08:00
Frank Lee	ee50497db2	[ci] fixed nightly build workflow (#1029 )	2022-05-26 11:42:50 +08:00
Frank Lee	58a7dd2ede	[ci] fixed nightly build workflow (#1022 ) * [ci] fixed nightly build workflow * [ci] fixed nightly build workflow * [ci] fixed nightly build workflow	2022-05-24 22:38:56 +08:00
Frank Lee	1a76c88aba	[ci] added nightly build (#1018 ) (#1019 )	2022-05-24 17:56:01 +08:00
Frank Lee	e17a43184b	[ci] update the docker image name (#1017 )	2022-05-24 16:53:39 +08:00
Frank Lee	f0f35216f1	[ci] added wheel build scripts (#910 ) * [ci] added wheel build scripts * polish code and workflow * polish code and workflow * polish code and workflow * polish code and workflow * polish code and workflow * polish code and workflow * polish code and workflow * polish code and workflow * polish code and workflow * polish code and workflow * polish code and workflow * [ci] polish wheel build scripts	2022-05-05 16:06:39 +08:00
ver217	16122d5fac	update release bdist CI (#902 )	2022-04-28 17:52:57 +08:00
ver217	e46e423c00	add CI for releasing bdist wheel (#901 )	2022-04-28 17:40:53 +08:00
Frank Lee	1258af71cc	[ci] cache cuda extension (#860 )	2022-04-25 10:03:47 +08:00
ver217	70e8dd418b	[hotfix] update requirements-test (#701 )	2022-04-08 16:52:36 +08:00
Frank Lee	1ae94ea85a	[ci] remove ipc config for rootless docker (#694 )	2022-04-08 10:15:52 +08:00
Frank Lee	dbe8e030fb	[ci] added missing field in workflow (#692 )	2022-04-07 18:07:15 +08:00
Frank Lee	0372ed7951	[ci] update workflow trigger condition and support options (#691 )	2022-04-07 17:53:03 +08:00
Frank Lee	eace69387d	[ci] fixed compatibility workflow (#678 )	2022-04-06 16:19:34 +08:00
Frank Lee	cc236916c6	[ci] replace the dngc ocker image with self-built pytorch image (#672 )	2022-04-06 14:10:17 +08:00
binmakeswell	e0f875a8e2	[GitHub] Add prefix and label in issue template (#652 )	2022-04-02 16:09:25 +08:00
Frank Lee	97933b6710	[devops] recover tsinghua pip source due to proxy issue (#509 )	2022-03-24 16:11:49 +08:00
Frank Lee	65ad47c35c	[devops] remove tsinghua source for pip (#507 )	2022-03-24 14:12:02 +08:00
Frank Lee	44f7bcb277	[devops] remove tsinghua source for pip (#505 )	2022-03-24 14:03:05 +08:00

1 2 3 4 5 ...

280 Commits