Commit Graph

20 Commits

Author SHA1 Message Date
BurkeHulk
c9cba49ab5 fix CI machine tag 2025-06-02 17:45:40 +08:00
flybird11111
cac878d7b7 fix 2025-05-29 11:10:37 +08:00
flybird11111
45dd5a7cf4 release 2025-05-29 10:47:23 +08:00
flybird11111
4afff92138 fix 2025-05-28 11:13:44 +08:00
Wenxuan Tan
d383449fc4 [CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018)
* remove triton version

* remove torch 2.2

* remove torch 2.1

* debug

* remove 2.1 build tests

* require torch >=2.2

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-08-27 10:12:21 +08:00
flybird11111
4b9bec8176 [test ci]Feature/fp8 comm (#5981)
* fix

* fix

* fix
2024-08-08 17:19:21 +08:00
Yuanheng Zhao
5bbab1533a [ci] Fix example tests (#5714)
* [fix] revise timeout value on example CI

* trivial
2024-05-14 16:08:51 +08:00
Hongxin Liu
a7790a92e8 [devops] fix example test ci (#5504) 2024-03-26 15:09:05 +08:00
Hongxin Liu
070df689e6 [devops] fix extention building (#5427) 2024-03-05 15:35:54 +08:00
Frank Lee
73f4dc578e [workflow] updated CI image (#5318) 2024-01-29 11:53:07 +08:00
Hongxin Liu
7f3400b560 [devops] update torch versoin in ci (#5217) 2024-01-03 11:46:33 +08:00
Wenhao Chen
7172459e74 [shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088)
* [shardformer] implement policy for all GPT-J models and test

* [shardformer] support interleaved pipeline parallel for bert finetune

* [shardformer] shardformer support falcon (#4883)

* [shardformer]: fix interleaved pipeline for bert model (#5048)

* [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093)

* Add Mistral support for Shardformer (#5103)

* [shardformer] add tests to mistral (#5105)

---------

Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: eric8607242 <e0928021388@gmail.com>
2023-11-28 16:54:42 +08:00
Hongxin Liu
b5f9e37c70 [legacy] clean up legacy code (#4743)
* [legacy] remove outdated codes of pipeline (#4692)

* [legacy] remove cli of benchmark and update optim (#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (#4696)

* [legacy] clean up utils (#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci
2023-09-18 16:31:06 +08:00
Hongxin Liu
536397cc95 [devops] fix concurrency group (#4667) 2023-09-11 15:32:50 +08:00
Hongxin Liu
a686f9ddc8 [devops] fix concurrency group and compatibility test (#4665)
* [devops] fix concurrency group

* [devops] fix compatibility test

* [devops] fix tensornvme install

* [devops] fix tensornvme install

* [devops] fix colossalai install
2023-09-08 13:49:40 +08:00
Hongxin Liu
c7b60f7547 [devops] cancel previous runs in the PR (#4546) 2023-08-30 23:07:21 +08:00
Frank Lee
4110d1f0d4 [workflow] cancel duplicated workflow jobs (#3960) 2023-06-12 09:50:57 +08:00
Frank Lee
ad93c736ea [workflow] enable testing for develop & feature branch (#3801) 2023-05-23 11:21:15 +08:00
Frank Lee
719c4d5553 [doc] updated readme for CI/CD (#2600) 2023-02-06 17:42:15 +08:00
Frank Lee
ba47517342 [workflow] fixed example check workflow (#2554)
* [workflow] fixed example check workflow

* polish yaml
2023-02-06 13:46:52 +08:00