Commit Graph

221 Commits

Author SHA1 Message Date
YuliangLiu0306
c20529fe78
[examples] update autoparallel tutorial demo (#2449)
* [examples] update autoparallel tutorial demo

* add test_ci.sh

* polish

* add conda yaml
2023-01-12 14:30:58 +08:00
Haofan Wang
cfd1d5ee49
[example] fixed seed error in train_dreambooth_colossalai.py (#2445) 2023-01-11 16:56:15 +08:00
Frank Lee
ac18a445fa
[example] updated large-batch optimizer tutorial (#2448)
* [example] updated large-batch optimizer tutorial

* polish code

* polish code
2023-01-11 16:27:31 +08:00
Frank Lee
39163417a1
[example] updated the hybrid parallel tutorial (#2444)
* [example] updated the hybrid parallel tutorial

* polish code
2023-01-11 15:17:17 +08:00
YuliangLiu0306
2731531bc2
[autoparallel] integrate device mesh initialization into autoparallelize (#2393)
* [autoparallel] integrate device mesh initialization into autoparallelize

* add megatron solution

* update gpt autoparallel examples with latest api

* adapt beta value to fit the current computation cost
2023-01-11 14:03:49 +08:00
Frank Lee
a3e5496156
[example] improved the clarity yof the example readme (#2427)
* [example] improved the clarity yof the example readme

* polish workflow

* polish workflow

* polish workflow

* polish workflow

* polish workflow

* polish workflow
2023-01-11 10:46:32 +08:00
Frank Lee
63be79d505
[example] removed duplicated stable diffusion example (#2424) 2023-01-11 10:07:18 +08:00
ZijianYY
fe0f7970a2
[examples] adding tflops to PaLM (#2365) 2023-01-10 16:18:56 +08:00
HELSON
d84e747975
[hotfix] add DISTPAN argument for benchmark (#2412)
* change the benchmark config file

* change config

* revert config file

* rename distpan to distplan
2023-01-10 11:39:25 +08:00
Frank Lee
8327932d2c
[workflow] refactored the example check workflow (#2411)
* [workflow] refactored the example check workflow

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-01-10 11:26:19 +08:00
HELSON
498b5ca993
[hotfix] fix gpt gemini example (#2404)
* [hotfix] fix gpt gemini example

* [example] add new assertions
2023-01-09 15:52:17 +08:00
jiaruifang
b2e0d502b8 [doc] hotfix #2377 2023-01-07 19:44:50 +08:00
Jiarui Fang
8f72b6f8fb
[hotfix] fix implement error in diffusers 2023-01-07 07:56:39 +08:00
1SAA
33f3023e19 [hotfix] fix implement error in diffusers 2023-01-06 18:37:18 +08:00
Jiarui Fang
12c8bf38d7
[Pipeline] Refine GPT PP Example 2023-01-06 18:03:45 +08:00
Ziyue Jiang
ad00894f7f polish 2023-01-06 16:03:16 +08:00
Jiarui Fang
1aaeb596c6
[example] gpt, shard init on all processes (#2366) 2023-01-06 15:44:50 +08:00
Ziyue Jiang
3a15b20421 Move GPT PP Example 2023-01-06 14:48:58 +08:00
HELSON
48d33b1b17
[gemini] add get static torch model (#2356) 2023-01-06 13:41:19 +08:00
Fazzie-Maqianli
7a332b1734
Merge pull request #2338 from haofanwang/patch-1
Fix a typo in train_dreambooth_colossalai.py
2023-01-06 11:50:18 +08:00
YuliangLiu0306
8b1e0dfd80
[example] upload auto parallel gpt2 demo (#2354) 2023-01-06 11:38:38 +08:00
Jiarui Fang
00a9c781fd
[example] add google doc for benchmark results of GPT (#2355) 2023-01-06 11:38:15 +08:00
Jiarui Fang
509a87f3ff
[example] make gpt example directory more clear (#2353) 2023-01-06 11:11:26 +08:00
Ikko Eltociear Ashimine
5e4bced0a3
[NFC] Update roberta/README.md (#2350) 2023-01-06 10:09:14 +08:00
Jiarui Fang
35e22be2f6
[example] simplify opt example (#2344) 2023-01-06 10:08:41 +08:00
ziyuhuang123
7080a8edb0
[workflow]New version: Create workflow files for examples' auto check (#2298)
* [workflows]bug_repair

* [workflow]new_pr_fixing_bugs

Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2023-01-06 09:26:49 +08:00
binmakeswell
d7352bef2c
[example] add example requirement (#2345) 2023-01-06 09:03:29 +08:00
Haofan Wang
7ce965c7cc
Update requirement_colossalai.txt (#2348) 2023-01-05 21:16:42 +08:00
ZijianYY
f7fd592bf4
[examples]adding tp to PaLM (#2319) 2023-01-05 17:57:50 +08:00
Haofan Wang
9edd0aa75e
Update train_dreambooth_colossalai.py
accelerator.num_processes -> gpc.get_world_size(ParallelMode.DATA)
2023-01-05 15:49:57 +08:00
Fazzie-Maqianli
89f26331e9
[example] diffusion update diffusion,Dreamblooth (#2329) 2023-01-05 11:23:26 +08:00
binmakeswell
e512ca9c24
[doc] update stable diffusion link (#2322)
* [doc] update link
2023-01-04 19:38:06 +08:00
Fazzie-Maqianli
a9b27b9265
[exmaple] fix dreamblooth format (#2315) 2023-01-04 16:20:00 +08:00
Jiarui Fang
32253315b4
[example] update diffusion readme with official lightning (#2304) 2023-01-04 13:13:38 +08:00
HELSON
e00cedd181
[example] update gemini benchmark bash (#2306) 2023-01-04 11:59:26 +08:00
binmakeswell
c8144223b8
[doc] update diffusion doc (#2296) 2023-01-03 21:27:44 +08:00
ZijianYY
df1d6dc553
[examples] using args and combining two versions for PaLM (#2284) 2023-01-03 17:49:00 +08:00
Ziyue Jiang
ac863a01d6
[example] add benchmark (#2276)
* add benchmark

* merge common func

* add total and avg tflops

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-03 17:20:59 +08:00
BlueRum
1405b4381e
[example] fix save_load bug for dreambooth (#2280) 2023-01-03 17:13:29 +08:00
Jiarui Fang
879df8b943
[example] GPT polish readme (#2274) 2023-01-03 15:46:52 +08:00
Ziyue Jiang
9654df0e9a
Add GPT PP Example (#2272)
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-01-03 15:17:26 +08:00
YuliangLiu0306
4b29112ab2
[autoparallel] gpt2 autoparallel examples (#2267)
* [autoparallel] gpt2 autoparallel examples

* polish code

* polish code
2023-01-03 14:23:33 +08:00
HELSON
09c0102fe6
[example] fix gpt example with 0.1.10 (#2265) 2023-01-03 13:38:14 +08:00
Fazzie-Maqianli
89f048a88a
[example] clear diffuser image (#2262) 2023-01-03 10:57:02 +08:00
Frank Lee
89542ceb44
[doc] updated the stable diffussion on docker usage (#2244)
* [doc] updated the stable diffussion on docker usage

* polish doc
2022-12-30 18:00:20 +08:00
Jiarui Fang
50cdf5430e
[example] diffusion install from docker (#2239)
* [builder] builder for scaled_upper_triang_masked_softmax

* add missing files

* fix a bug

* polish code

* [example] diffusion install from docker
2022-12-30 16:25:24 +08:00
Jiarui Fang
db4cbdc7fb
[builder] builder for scaled_upper_triang_masked_softmax (#2234) 2022-12-30 09:58:00 +08:00
HELSON
31fe84237b
[example] fix benchmark.sh for gpt example (#2229) 2022-12-29 23:00:14 +08:00
Jiarui Fang
2cdecc9f38
[example] make palm + GeminiDPP work (#2227) 2022-12-29 14:28:31 +08:00
ZijianYY
63cc77173b
[example] Palm adding gemini, still has bugs (#2221) 2022-12-29 14:01:09 +08:00
HELSON
7010e18134
[example] update gpt example (#2225) 2022-12-29 12:01:45 +08:00
Jiarui Fang
49c601da21
[example] add benchmark.sh for gpt (#2226) 2022-12-29 12:00:00 +08:00
HELSON
3629e611cd
[example] update gpt benchmark (#2219) 2022-12-29 10:51:42 +08:00
ZijianYY
92de90dfb3
[examples] replace einsum with matmul (#2210) 2022-12-28 19:03:06 +08:00
Jiarui Fang
7675792100
[builder] raise Error when CUDA_HOME is not set (#2213) 2022-12-28 16:07:08 +08:00
HELSON
78a89d9b41
[diffusion] update readme (#2214) 2022-12-28 16:06:48 +08:00
Jiarui Fang
d96cc37e32
[example] update GPT example benchmark results (#2212) 2022-12-28 14:28:12 +08:00
Jiarui Fang
d5e3e3ec01
[example] update gpt example for larger model scale (#2211) 2022-12-28 13:54:08 +08:00
Jiarui Fang
29868a9ec1
[example] update gpt readme with performance (#2206) 2022-12-27 17:39:53 +08:00
BlueRum
6642cebdbe
[example] Change some training settings for diffusion (#2195) 2022-12-26 15:22:20 +08:00
ziyuhuang123
4363ff3e41
'[NFC] fix some typos' (#2175) 2022-12-25 18:41:39 +08:00
Fazzie-Maqianli
ce3c4eca7b
[example] support Dreamblooth (#2188) 2022-12-23 16:47:30 +08:00
BlueRum
1cf6d92d7c
[exmaple] diffuser, support quant inference for stable diffusion (#2186) 2022-12-23 16:06:29 +08:00
Jiarui Fang
65f56f49e8
[example] gpt demo more accuracy tflops (#2178) 2022-12-22 20:51:35 +08:00
ziyuhuang123
cf5028363c 'diffusion-typo-change' 2022-12-22 10:28:59 +08:00
Jiarui Fang
27327a4c90
[example] add palm pytorch version (#2172) 2022-12-22 10:15:34 +08:00
Jiarui Fang
a4b4bb01d6
[example] update vit readme (#2155) 2022-12-20 15:56:54 +08:00
Jiarui Fang
2cfe685b9f
[exmaple] add vit missing functions (#2154) 2022-12-20 15:03:26 +08:00
HELSON
a7d95b7024
[example] add zero1, zero2 example in GPT examples (#2146)
* [example] add zero1 and zero2 for GPT

* update readme in gpt example

* polish code

* change init value

* update readme
2022-12-20 14:30:27 +08:00
Fazzie
cea4292ae5 support stable diffusion v2 2022-12-13 14:26:49 +08:00
ZijianYY
fa9d1aea71
[example] update GPT README (#2095) 2022-12-07 15:47:37 +08:00
YuliangLiu0306
edf4cd46c5
[examples] update autoparallel demo (#2061) 2022-12-01 18:50:58 +08:00
Super Daniel
2edbef13cc
[fx] add more meta_registry for MetaTensor execution. (#2000)
* [sc] add examples for auto checkpoint.

* merge upstream

* [fx] add more meta_registry for MetaTensor execution.
2022-11-23 10:55:46 +08:00
Fazzie-Maqianli
b5dbb46172
[example] add diffusion inference (#1986) 2022-11-20 18:35:29 +08:00
mandoxzhang
52bd106627
add RoBERTa (#1980)
* update roberta

* update roberta & readme

* update roberta & readme

* update roberta & readme
2022-11-18 14:04:49 +08:00
Jiarui Fang
f7e276fa71
[Gemini] add GeminiAdamOptimizer (#1960) 2022-11-16 14:44:28 +08:00
Jiarui Fang
60abd86d6a
[example] enhance GPT demo (#1959)
* [example] enhence GPT demo

* Update README.md

Co-authored-by: binmakeswell <binmakeswell@gmail.com>
2022-11-16 11:36:27 +08:00
Fazzie
a09f88ab07 update model download in README 2022-11-16 11:17:30 +08:00
Fazzie-Maqianli
6bdd0a90ca
update lightning version (#1954) 2022-11-15 16:57:48 +08:00
binmakeswell
9183e0dec5
[tutorial] polish all README (#1946) 2022-11-14 19:49:32 +08:00
Frank Lee
de56b563b9
[tutorial] added missing dummy dataloader (#1944) 2022-11-14 04:09:03 -06:00
Frank Lee
c6ea65011f
[tutorial] fixed pipeline bug for sequence parallel (#1943) 2022-11-14 04:06:57 -06:00
Jiarui Fang
cf68cc92ac
[example] add vit (#1942)
* [ColoTensor] ColoInitContext initialize parameters in shard mode.

* polish

* [example] add vit
2022-11-14 17:28:03 +08:00
YuliangLiu0306
c7925c5d08
[sc demo] add requirements to spmd README (#1941) 2022-11-14 17:22:45 +08:00
Boyuan Yao
d5f5e06d82
[SC] remove redundant hands on (#1939)
* [sc] SC tutorial for auto checkpoint

* [sc] polish examples

* [sc] polish readme

* [sc] polish readme and help information

* [sc] polish readme and help information

* [sc] modify auto checkpoint benchmark

* [sc] remove imgs

* [sc] remove redundant handson
2022-11-14 03:05:21 -06:00
binmakeswell
41868f7605
[tutorial] polish README and OPT files (#1930)
* [tutorial] polish README and OPT files

* [tutorial] polish README and OPT files

* [tutorial] polish README and OPT files
2022-11-13 13:09:58 +08:00
ver217
b0b7a786b7
[tutorial] add synthetic dataset for opt (#1924) 2022-11-13 03:26:11 +08:00
Frank Lee
0486048453
[tutorial] updated hybrid parallel readme (#1928)
* [tutorial] updated hybrid parallel readme

* polish code
2022-11-13 03:25:01 +08:00
Frank Lee
807cbdb87d
[tutorial] added synthetic data for sequence parallel (#1927)
* [tutorial] added synthetic data for sequence parallel

* polish code
2022-11-13 03:24:02 +08:00
Frank Lee
abf4c27f6a
[tutorial] removed huggingface model warning (#1925) 2022-11-12 23:12:18 +08:00
Frank Lee
d43a671ad6
Hotfix/tutorial readme index (#1922)
* [tutorial] removed tutorial index in readme

* [tutorial] removed tutorial index in readme
2022-11-12 18:24:52 +08:00
Boyuan Yao
24cbee0ebe
[tutorial] modify hands-on of auto activation checkpoint (#1920)
* [sc] SC tutorial for auto checkpoint

* [sc] polish examples

* [sc] polish readme

* [sc] polish readme and help information

* [sc] polish readme and help information

* [sc] modify auto checkpoint benchmark

* [sc] remove imgs
2022-11-12 18:21:03 +08:00
Frank Lee
ff16773ded
[tutorial] added synthetic data for hybrid parallel (#1921)
* [tutorial] added synthetic data for hybrid parallel

* polish code
2022-11-12 18:18:55 +08:00
Frank Lee
3c42fdbedc
[tutorial] added synthetic data for hybrid parallel (#1919) 2022-11-12 17:49:48 +08:00
Frank Lee
1b0dd05940
[tutorial] added synthetic dataset for auto parallel demo (#1918) 2022-11-12 17:14:32 +08:00
Frank Lee
acd9abc5ca
[tutorial] updated auto parallel demo with latest data path (#1917) 2022-11-12 16:55:19 +08:00
Frank Lee
d53415bc10
[tutorial] added data script and updated readme (#1916) 2022-11-12 16:38:41 +08:00
binmakeswell
155e202318
[example] update auto_parallel img path (#1910) 2022-11-11 23:43:22 +08:00
Boyuan Yao
d5c5bc219e
[SC] add GPT example for auto checkpoint (#1889)
* [sc] SC tutorial for auto checkpoint

* [sc] polish examples

* [sc] polish readme

* [sc] polish readme and help information

* [sc] polish readme and help information
2022-11-11 23:17:25 +08:00
binmakeswell
11ee8ae478
[tutorial] add cifar10 for diffusion (#1907) 2022-11-11 19:03:50 +08:00