1
0
mirror of https://github.com/hpcaitech/ColossalAI.git synced 2025-05-04 22:48:15 +00:00
Commit Graph

196 Commits

Author SHA1 Message Date
wukong1992
3229f93e30
[booster] add warning for torch fsdp plugin doc () 2023-05-25 14:00:02 +08:00
digger yu
518b31c059
[docs] change placememt_policy to placement_policy ()
* fix typo colossalai/autochunk auto_parallel amp

* fix typo colossalai/auto_parallel nn utils etc.

* fix typo colossalai/auto_parallel autochunk fx/passes  etc.

* fix typo docs/

* change placememt_policy to placement_policy in docs/ and examples/
2023-05-24 14:51:49 +08:00
digger yu
e90fdb1000 fix typo docs/ 2023-05-24 13:57:43 +08:00
jiangmingyan
725365f297
Merge pull request from jiangmingyan/amp
[doc] update amp document
2023-05-23 18:58:16 +08:00
jiangmingyan
278fcbc444 [doc]fix 2023-05-23 17:53:11 +08:00
jiangmingyan
8aa1fb2c7f [doc]fix 2023-05-23 17:50:30 +08:00
Hongxin Liu
19d153057e
[doc] add warning about fsdp plugin () 2023-05-23 17:16:10 +08:00
jiangmingyan
c425a69d52 [doc] add removed change of config.py 2023-05-23 16:42:36 +08:00
jiangmingyan
75272ef37b [doc] add removed warning 2023-05-23 16:34:30 +08:00
Mingyan Jiang
a520610bd9 [doc] update amp document 2023-05-23 16:20:29 +08:00
Mingyan Jiang
1167bf5b10 [doc] update amp document 2023-05-23 16:20:17 +08:00
Mingyan Jiang
8c62e50dbb [doc] update amp document 2023-05-23 16:20:01 +08:00
jiangmingyan
ef02d7ef6d
[doc] update gradient accumulation ()
* [doc]update gradient accumulation

* [doc]update gradient accumulation

* [doc]update gradient accumulation

* [doc]update gradient accumulation

* [doc]update gradient accumulation, fix

* [doc]update gradient accumulation, fix

* [doc]update gradient accumulation, fix

* [doc]update gradient accumulation, add sidebars

* [doc]update gradient accumulation, fix

* [doc]update gradient accumulation, fix

* [doc]update gradient accumulation, fix

* [doc]update gradient accumulation, resolve comments

* [doc]update gradient accumulation, resolve comments

* fix
2023-05-23 10:52:30 +08:00
github-actions[bot]
62c7e67f9f
[format] applied code formatting on changed files in pull request 3786 ()
Co-authored-by: github-actions <github-actions@github.com>
2023-05-22 14:42:09 +08:00
jiangmingyan
fe1561a884
[doc] update gradient cliping document ()
* [doc] update gradient clipping document

* [doc] update gradient clipping document

* [doc] update gradient clipping document

* [doc] update gradient clipping document

* [doc] update gradient clipping document

* [doc] update gradient clipping document

* [doc] update gradient clipping doc, fix sidebars.json

* [doc] update gradient clipping doc, fix doc test
2023-05-22 14:13:15 +08:00
Yanjia0
d9393b85f1
[doc] add deprecated warning on doc Basics section ()
* Update colotensor_concept.md

* Update configure_parallelization.md

* Update define_your_config.md

* Update engine_trainer.md

* Update initialize_features.md

* Update model_checkpoint.md

* Update colotensor_concept.md

* Update configure_parallelization.md

* Update define_your_config.md

* Update engine_trainer.md

* Update initialize_features.md

* Update model_checkpoint.md
2023-05-22 11:12:53 +08:00
Hongxin Liu
72688adb2f
[doc] add booster docstring and fix autodoc ()
* [doc] add docstr for booster methods

* [doc] fix autodoc
2023-05-22 10:56:47 +08:00
Hongxin Liu
60e6a154bc
[doc] add tutorial for booster checkpoint ()
* [doc] add checkpoint related docstr for booster

* [doc] add en checkpoint doc

* [doc] add zh checkpoint doc

* [doc] add booster checkpoint doc in sidebar

* [doc] add cuation about ckpt for plugins

* [doc] add doctest placeholder

* [doc] add doctest placeholder

* [doc] add doctest placeholder
2023-05-19 18:05:08 +08:00
binmakeswell
ad2cf58f50
[chat] add performance and tutorial () 2023-05-19 18:03:56 +08:00
Hongxin Liu
21e29e2212
[doc] add tutorial for booster plugins ()
* [doc] add en booster plugins doc

* [doc] add booster plugins doc in sidebar

* [doc] add zh booster plugins doc

* [doc] fix zh booster plugin translation

* [doc] reoganize tutorials order of basic section

* [devops] force sync to test ci
2023-05-19 12:12:42 +08:00
Hongxin Liu
5ce6c9d86f
[doc] add tutorial for cluster utils ()
* [doc] add en cluster utils doc

* [doc] add zh cluster utils doc

* [doc] add cluster utils doc in sidebar
2023-05-19 12:12:20 +08:00
jiangmingyan
48bd056761
[doc] update hybrid parallelism doc () 2023-05-18 14:16:13 +08:00
jiangmingyan
d449525acf
[doc] update booster tutorials ()
* [booster] update booster tutorials#3717

* [booster] update booster tutorials#3717, fix

* [booster] update booster tutorials#3717, update setup doc

* [booster] update booster tutorials#3717, update setup doc

* [booster] update booster tutorials#3717, update setup doc

* [booster] update booster tutorials#3717, update setup doc

* [booster] update booster tutorials#3717, update setup doc

* [booster] update booster tutorials#3717, update setup doc

* [booster] update booster tutorials#3717, rename colossalai booster.md

* [booster] update booster tutorials#3717, rename colossalai booster.md

* [booster] update booster tutorials#3717, rename colossalai booster.md

* [booster] update booster tutorials#3717, fix

* [booster] update booster tutorials#3717, fix

* [booster] update tutorials#3717, update booster api doc

* [booster] update tutorials#3717, modify file

* [booster] update tutorials#3717, modify file

* [booster] update tutorials#3717, modify file

* [booster] update tutorials#3717, modify file

* [booster] update tutorials#3717, modify file

* [booster] update tutorials#3717, modify file

* [booster] update tutorials#3717, modify file

* [booster] update tutorials#3717, fix reference link

* [booster] update tutorials#3717, fix reference link

* [booster] update tutorials#3717, fix reference link

* [booster] update tutorials#3717, fix reference link

* [booster] update tutorials#3717, fix reference link

* [booster] update tutorials#3717, fix reference link

* [booster] update tutorials#3717, fix reference link

* [booster] update tutorials#3713

* [booster] update tutorials#3713, modify file
2023-05-18 11:41:56 +08:00
Hongxin Liu
5dd573c6b6
[devops] fix ci for document check ()
* [doc] add test info

* [devops] update doc check ci

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] add debug info

* [devops] remove debug info and update invalid doc

* [devops] add essential comments
2023-05-17 11:24:22 +08:00
digger-yu
b9a8dff7e5
[doc] Fix typo under colossalai and doc()
* Fixed several spelling errors under colossalai

* Fix the spelling error in colossalai and docs directory

* Cautious Changed the spelling error under the example folder

* Update runtime_preparation_pass.py

revert autograft to autograd

* Update search_chunk.py

utile to until

* Update check_installation.py

change misteach to mismatch in line 91

* Update 1D_tensor_parallel.md

revert to perceptron

* Update 2D_tensor_parallel.md

revert to perceptron in line 73

* Update 2p5D_tensor_parallel.md

revert to perceptron in line 71

* Update 3D_tensor_parallel.md

revert to perceptron in line 80

* Update README.md

revert to resnet in line 42

* Update reorder_graph.py

revert to indice in line 7

* Update p2p.py

revert to megatron in line 94

* Update initialize.py

revert to torchrun in line 198

* Update routers.py

change to detailed in line 63

* Update routers.py

change to detailed in line 146

* Update README.md

revert  random number in line 402
2023-04-26 11:38:43 +08:00
digger-yu
9edeadfb24
[doc] Update 1D_tensor_parallel.md ()
Display format optimization , same as fix#3562
Simultaneous modification of en version
2023-04-17 12:19:53 +08:00
digger-yu
1c7734bc94
[doc] Update 1D_tensor_parallel.md ()
Display format optimization, fix bug#3562
Specific changes
1. "This is called a column-parallel fashion" Translate to Chinese
2. use the ```math code block syntax to display a math expression as a block, No modification of formula content

Please check that the math formula is displayed correctly
If OK, I will change the format of the English version of the formula in parallel
2023-04-14 22:12:32 +08:00
digger-yu
a3ac48ef3d
[doc] Update README-zh-Hans.md ()
Fixing document link errors using absolute paths
2023-04-12 23:09:30 +08:00
binmakeswell
0c0455700f
[doc] add requirement and highlight application ()
* [doc] add requirement and highlight application

* [doc] link example and application
2023-04-10 17:37:16 +08:00
Frank Lee
4e9989344d
[doc] updated contributor list () 2023-04-06 17:47:59 +08:00
Frank Lee
80eba05b0a
[test] refactor tests with spawn ()
* [test] added spawn decorator

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-04-06 14:51:35 +08:00
ver217
26b7aac0be
[zero] reorganize zero/gemini folder structure ()
* [zero] refactor low-level zero folder structure

* [zero] fix legacy zero import path

* [zero] fix legacy zero import path

* [zero] remove useless import

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] fix test import path

* [zero] fix test

* [zero] fix circular import

* [zero] update import
2023-04-04 13:48:16 +08:00
binmakeswell
15a74da79c
[doc] add Intel cooperation news ()
* [doc] add Intel cooperation news

* [doc] add Intel cooperation news
2023-03-30 11:45:01 +08:00
binmakeswell
31c78f2be3
[doc] add ColossalChat news ()
* [doc] add ColossalChat news

* [doc] add ColossalChat news
2023-03-29 09:27:55 +08:00
binmakeswell
682af61396
[doc] add ColossalChat ()
* [doc] add ColossalChat
2023-03-29 02:35:10 +08:00
Saurav Maheshkar
20d1c99444
[refactor] update docs ()
* refactor: README-zh-Hans

* refactor: REFERENCE

* docs: update paths in README
2023-03-20 10:52:01 +08:00
Frank Lee
3213347b49
[doc] fixed typos in docs/README.md () 2023-03-10 10:32:14 +08:00
Frank Lee
416a50dbd7
[doc] moved doc test command to bottom () 2023-03-09 18:10:45 +08:00
Frank Lee
ea0b52c12e
[doc] specified operating system requirement ()
* [doc] specified operating system requirement

* polish code
2023-03-07 18:04:10 +08:00
ver217
378d827c6b
[doc] update nvme offload doc ()
* [doc] update nvme offload doc

* [doc] add doc testing cmd and requirements

* [doc] add api reference

* [doc] add dependencies
2023-03-07 17:49:01 +08:00
Frank Lee
8fedc8766a
[workflow] supported conda package installation in doc test ()
* [workflow] supported conda package installation in doc test

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-03-07 14:21:26 +08:00
Frank Lee
e0a1c1321c
[doc] added reference to related works ()
* [doc] added reference to related works

* polish code
2023-03-04 17:32:22 +08:00
github-actions[bot]
dca98937f8
[format] applied code formatting on changed files in pull request 2933 ()
Co-authored-by: github-actions <github-actions@github.com>
2023-02-28 15:41:52 +08:00
binmakeswell
8264cd7ef1
[doc] add env scope () 2023-02-28 15:39:51 +08:00
Frank Lee
b8804aa60c
[doc] added readme for documentation () 2023-02-28 14:04:52 +08:00
Frank Lee
9e3b8b7aff
[doc] removed read-the-docs () 2023-02-28 11:28:24 +08:00
Frank Lee
77b88a3849
[workflow] added auto doc test on PR ()
* [workflow] added auto doc test on PR

* [workflow] added doc test workflow

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code

* polish code
2023-02-28 11:10:38 +08:00
binmakeswell
0afb55fc5b
[doc] add os scope, update tutorial install and tips () 2023-02-27 14:59:27 +08:00
YuliangLiu0306
cf6409dd40
Hotfix/auto parallel zh doc ()
* [hotfix] fix autoparallel zh docs

* polish

* polish
2023-02-19 15:57:14 +08:00
YuliangLiu0306
2059fdd6b0
[hotfix] add copyright for solver and device mesh ()
* [hotfix] add copyright for solver and device mesh

* add readme

* add alpa license

* polish
2023-02-18 21:14:38 +08:00
Frank Lee
e376954305
[doc] add opt service doc () 2023-02-16 15:45:26 +08:00
Frank Lee
5479fdd5b8
[doc] updated documentation version list () 2023-02-15 17:39:50 +08:00
Frank Lee
2045d45ab7
[doc] updated documentation version list () 2023-02-15 11:24:18 +08:00
Frank Lee
0966008839
[dooc] fixed the sidebar itemm key () 2023-02-13 10:45:16 +08:00
Frank Lee
6d60634433
[doc] added documentation sidebar translation () 2023-02-13 10:10:12 +08:00
Frank Lee
81ea66d25d
[release] v0.2.3 ()
* [release] v0.2.3

* polish code
2023-02-13 09:51:25 +08:00
YuliangLiu0306
8de85051b3
[Docs] layout converting management () 2023-02-10 18:38:32 +08:00
Frank Lee
b673e5f78b
[release] v0.2.2 () 2023-02-10 11:01:24 +08:00
Frank Lee
cd4f02bed8
[doc] fixed compatiblity with docusaurus () 2023-02-09 17:06:29 +08:00
Frank Lee
a4ae43f071
[doc] added docusaurus-based version control () 2023-02-09 16:38:49 +08:00
Frank Lee
85b2303b55
[doc] migrate the markdown files () 2023-02-09 14:21:38 +08:00
Frank Lee
d3480396f8
[doc] updated the sphinx theme () 2023-02-08 13:48:08 +08:00
binmakeswell
a01278e810
Update requirements.txt 2022-11-18 18:57:18 +08:00
Jiarui Fang
cc0ed7cf33
[Gemini] ZeROHookV2 -> GeminiZeROHook () 2022-11-17 14:43:49 +08:00
Ziyue Jiang
63f250bbd4
fix file name ()
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-10-25 16:48:48 +08:00
ver217
d068af81a3
[doc] update rst and docstring ()
* update rst

* add zero docstr

* fix docstr

* remove fx.tracer.meta_patch

* fix docstr

* fix docstr

* update fx rst

* fix fx docstr

* remove useless rst
2022-07-21 15:54:53 +08:00
Jiarui Fang
4165eabb1e
[hotfix] remove potiential circle import ()
* make it faster

* [hotfix] remove circle import
2022-07-14 13:44:26 +08:00
Jiarui Fang
4d9332b4c5
[refactor] moving memtracer to gemini () 2022-04-19 10:13:08 +08:00
ver217
f69507dd22
update rst () 2022-04-01 15:46:38 +08:00
Liang Bowen
2c45efc398
html refactor () 2022-03-31 11:36:56 +08:00
LuGY
c44d797072
[docs] updatad docs of hybrid adam and cpu adam () 2022-03-30 18:14:59 +08:00
ver217
ffca99d187
[doc] update apidoc () 2022-03-25 18:29:43 +08:00
ver217
9caa8b6481
docs get correct release version () 2022-03-22 14:24:41 +08:00
ver217
7e30068a22
[doc] update rst ()
* update rst

* remove empty rst
2022-03-21 10:52:45 +08:00
binmakeswell
ce7b2c9ae3 update README and images path () 2022-03-11 15:50:28 +08:00
binmakeswell
08eccfe681 add community group and update issue template() 2022-03-11 15:50:28 +08:00
Sze-qq
3312d716a0 update experimental visualization () 2022-03-11 15:50:28 +08:00
binmakeswell
753035edd3 add Chinese README 2022-03-11 15:50:28 +08:00
WANG-CR
6fb550acdb update logo 2022-01-21 12:31:07 +08:00
ver217
1949d3a889
update doc requirements and rtd conf () 2022-01-19 19:46:43 +08:00
Frank Lee
be85a0f366 removed tutorial markdown and refreshed rst files for consistency 2022-01-19 17:01:37 +08:00
binmakeswell
17ce8569a8
add logo at homepage, add forum in issue template () 2022-01-19 14:29:31 +08:00
puck_WCR
9473a1b9c8
AMP docstring/markdown update () 2022-01-18 18:33:36 +08:00
ver217
96780e6ee4
Optimize pipeline schedule ()
* add pipeline shared module wrapper and update load batch

* added model parallel process group for amp and clip grad ()

* added model parallel process group for amp and clip grad

* update amp and clip with model parallel process group

* remove pipeline_prev/next group ()

* micro batch offload

* optimize pipeline gpu memory usage

* pipeline can receive tensor shape ()

* optimize pipeline gpu memory usage

* fix grad accumulation step counter

* rename classes and functions

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
2021-12-30 15:56:46 +08:00
ver217
8f02a88db2
add interleaved pipeline, fix naive amp and update pipeline model initializer () 2021-12-20 23:26:19 +08:00
Frank Lee
35813ed3c4
update examples and sphnix docs for the new api () 2021-12-13 22:07:01 +08:00
ver217
7d3711058f
fix zero3 fp16 and add zero3 model context () 2021-12-10 17:48:50 +08:00
Frank Lee
9a0466534c
update markdown docs (english) () 2021-12-10 14:37:33 +08:00
Frank Lee
da01c234e1
Develop/experiments ()
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel ()

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule ()

Co-authored-by: 1SAA <c2h214748@gmail.com>

* Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
Fix convergence in cifar10, Imagenet1000

* Integrate 1d tensor parallel in Colossal-AI ()

* fixed 1D and 2D convergence ()

* optimized 2D operations

* fixed 1D ViT convergence problem

* Feature/ddp ()

* remove redundancy func in setup () ()

* use env to control the language of doc () ()

* Support TP-compatible Torch AMP and Update trainer API ()

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel ()

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule ()

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB ()

* add explanation for ViT example () ()

* support torch ddp

* fix loss accumulation

* add log for ddp

* change seed

* modify timing hook

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* Feature/pipeline ()

* remove redundancy func in setup () ()

* use env to control the language of doc () ()

* Support TP-compatible Torch AMP and Update trainer API ()

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel ()

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule ()

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB ()

* add explanation for ViT example () ()

* optimize communication of pipeline parallel

* fix grad clip for pipeline

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers ()

* Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset

* update api for better usability ()

update api for better usability

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com>
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-09 15:08:29 +08:00
Frank Lee
3defa32aee
Support TP-compatible Torch AMP and Update trainer API ()
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel ()

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule ()

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
2021-11-18 19:45:06 +08:00
ver217
2b05de4c64
use env to control the language of doc () () 2021-11-15 16:53:56 +08:00
binmakeswell
05e7069a5b fixed some typos in the documents, added blog link and paper author information in README 2021-11-03 17:18:43 +08:00
Fan Cui
18ba66e012 added Chinese documents and fixed some typos in English documents 2021-11-02 23:28:44 +08:00
ver217
50982c0b7d reoder parallelization methods in parallelization documentation 2021-11-01 14:31:55 +08:00
ver217
3c7604ba30 update documentation 2021-10-29 09:29:20 +08:00
zbian
404ecbdcc6 Migrated project 2021-10-28 18:21:23 +02:00