Commit Graph

953 Commits

Author SHA1 Message Date
Boyuan Yao
d5c5bc219e [SC] add GPT example for auto checkpoint (#1889)
* [sc] SC tutorial for auto checkpoint

* [sc] polish examples

* [sc] polish readme

* [sc] polish readme and help information

* [sc] polish readme and help information
2022-11-11 23:17:25 +08:00
Junming Wu
14a0b18305 [NFC] polish colossalai/amp/naive_amp/__init__.py code style (#1905) 2022-11-11 17:49:18 +08:00
HELSON
6e51d296f0 [zero] migrate zero1&2 (#1878)
* add zero1&2 optimizer

* rename test ditectory

* rename test files

* change tolerance in test
2022-11-11 09:26:40 +08:00
Super Daniel
cc55ff0aa4 [autoparallel] user-friendly API for CheckpointSolver. (#1879)
Merge for SC tutorial
2022-11-10 20:59:28 +08:00
Super Daniel
448248b27c [fx] metainfo_trace as an API. (#1873)
* [fx] metainfo_trace as an API.

* [fx] add return.
2022-11-10 20:58:37 +08:00
Jiarui Fang
986f8cbaa7 [inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 (#1876) 2022-11-10 17:36:42 +08:00
YuliangLiu0306
1b494ad73c [autoparallel] fix linear logical convert issue (#1857) 2022-11-10 17:19:22 +08:00
Jiarui Fang
c2947dadf1 [inference] streaming Linear 1D Row inference (#1874) 2022-11-10 17:03:21 +08:00
Frank Lee
e6ec99d389 [utils] fixed lazy init context (#1867) 2022-11-10 15:17:20 +08:00
zbian
653b0a620e added skip_bias_add for non-tp linear 2022-11-09 15:41:08 +08:00
LuGY
94329fc139 [NFC] polish colossalai/amp/apex_amp/__init__.py code style (#1853) 2022-11-09 14:49:42 +08:00
zbian
1559a09fb7 [NFC] polish amp.naive_amp.grad_scaler code style 2022-11-09 13:38:15 +08:00
HELSON
72c9448920 [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/operator_handler.py code style (#1845) 2022-11-09 12:08:47 +08:00
Genghan Zhang
b25030cc07 [NFC] polish ./colossalai/amp/torch_amp/__init__.py code style (#1836) 2022-11-09 12:08:47 +08:00
Sze-qq
95ac4f88ea [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/conv_handler.py code style (#1829)
Co-authored-by: siqi <siqi@siqis-MacBook-Pro.local>
2022-11-09 12:08:47 +08:00
Ziyue Jiang
5da03c936d [NFC] polish colossalai/amp/torch_amp/_grad_scaler.py code style (#1823)
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2022-11-09 12:08:47 +08:00
Fazzie-Maqianli
399f84d8f6 [NFC] polish colossalai/amp/naive_amp/_fp16_optimizer.py code style (#1819) 2022-11-09 12:08:47 +08:00
CsRic
9623ec1b02 [NFC] polish colossalai/amp/naive_amp/_utils.py code style (#1816)
* [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714)

* [NFC] polish colossalai/zero/sharded_param/__init__.py code style

* [NFC] polish colossalai/amp/naive_amp/_utils.py code style

Co-authored-by: shenggan <csg19971016@gmail.com>
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-11-09 12:08:47 +08:00
binmakeswell
3c3714fc2a [NFC] polish strategies_constructor.py code style (#1806) 2022-11-09 12:08:47 +08:00
Jiarui Fang
3ce4463fe6 [utils] remove lazy_memory_allocate from ColoInitContext (#1844) 2022-11-09 11:50:33 +08:00
Jiarui Fang
fba34efb5a version to 0.1.11rc2 (#1832) 2022-11-08 17:25:15 +08:00
YuliangLiu0306
49216d7ab1 [autoparallel] fix bugs caused by negative dim key (#1808)
* [autoparallel] fix bugs caused by negative dim key

* fix import error

* fix matmul test issue

* fix unit test issue
2022-11-08 17:03:50 +08:00
アマデウス
4268ae017b [kernel] added jit warmup (#1792) 2022-11-08 16:22:23 +08:00
YuliangLiu0306
f6032ddb17 [autoparallel] fix bias addition module (#1800) 2022-11-08 16:21:25 +08:00
Jiarui Fang
cd5a0d56fa [Gemini] make gemini usage simple (#1821) 2022-11-08 15:53:13 +08:00
ver217
99870726b1 [CheckpointIO] a uniform checkpoint I/O module (#1689) 2022-11-08 15:15:13 +08:00
Boyuan Yao
629172b319 [autoparallel] add batch norm metainfo (#1815)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test

* [autoparallel] add batchnorm metainfo class

* [autoparallel] fix batchnorm unit test function declaration

* [fx] restore profiler
2022-11-08 15:05:26 +08:00
Super Daniel
441d584e4a [fx] add a symbolic_trace api. (#1812)
* [fx] add a symbolic_trace api.

* [fx] fix import errors.
2022-11-08 13:59:20 +08:00
xcnick
e0da01ea71 [hotfix] fix build error when torch version >= 1.13 (#1803) 2022-11-08 09:40:24 +08:00
oahzxl
9639ea88fc [kernel] more flexible flashatt interface (#1804) 2022-11-07 17:02:09 +08:00
Zihao
20e255d4e8 MemStatsCollectorStatic (#1765) 2022-11-07 16:49:03 +08:00
Boyuan Yao
327d07c44a [autoparallel] add conv metainfo class for auto parallel (#1796)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel

* [fx] add conv metainfo class

* [fx] restore profiler

* [fx] restore meta profiler

* [autoparallel] modify unit test

* [fx] modify unit test
2022-11-07 16:15:35 +08:00
oahzxl
501a9e9cd2 [hotfix] polish flash attention (#1802) 2022-11-07 14:30:22 +08:00
Jiarui Fang
218c75fd9d [NFC] polish type hint for shape consistency (#1801)
* [NFC] polish type hint for shape consistency

* polish code

* polish code
2022-11-07 14:13:03 +08:00
Jiarui Fang
c248800359 [kernel] skip tests of flash_attn and triton when they are not available (#1798) 2022-11-07 13:41:13 +08:00
YuliangLiu0306
e34e850a4c [autoparallel]add essential CommActions for broadcast oprands (#1793) 2022-11-04 18:36:42 +08:00
Boyuan Yao
05ce3d369f [fx] Add linear metainfo class for auto parallel (#1783)
* [fx] metainfo class for auto parallel

* [fx] add unit test for linear metainfo

* [fx] fix bwd param for linear

* [fx] modify unit test

* [fx] modify unit test

* [fx] modify import

* [fx] modify import

* [fx] modify import

* [fx] move meta profiler to auto parallel
2022-11-04 10:55:09 +08:00
Super Daniel
e8a9bebc87 [autoparallel] refactor and add rotorc. (#1789)
* [autoparallel] refactor and add rotorc.

* [autoparallel] refactor and add rotorc.
2022-11-03 12:32:51 +08:00
YuliangLiu0306
2c4c7b3618 [autoparallel] add getattr handler (#1767)
* [autoparallel] add getattr haandler

* polish code

* add extra processes for Parameters

* add unit test for param resharding cost

* add docstring and polish test
2022-11-03 12:31:33 +08:00
HELSON
c6a1a62636 [hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12

* [zero] add cpu shard init

* [zero] add tiny example test

* [colo_tensor] fix bugs for torch-1.11
2022-11-02 16:11:34 +08:00
kurisusnowdeng
0b8161fab8 updated tp layers 2022-11-02 12:19:38 +08:00
Jiarui Fang
cb5a587e9a [hotfix] polish chunk import (#1787) 2022-11-02 12:10:52 +08:00
YuliangLiu0306
e859380bf7 [fx] support module with bias addition (#1780)
* [autoparallel] refactor tracer to fix bias addition issue

* [fx] support module with bias addition

* create bias_addition_module

* refactor file structure

* polish code

* fix unit test
2022-11-01 22:53:51 +08:00
Frank Lee
f3f19a5c47 [autoparallel] added matmul handler (#1763)
* [autoparallel] added matmul handler

* polish code
2022-11-01 15:14:53 +08:00
Ziyue Jiang
4df0194976 [Pipeline]Adapt to Pipelinable OPT (#1782) 2022-11-01 14:18:50 +08:00
YuliangLiu0306
27de252334 [autoparallel] fix conv handler numerical test (#1771) 2022-11-01 10:43:44 +08:00
Super Daniel
1e88811c7a [autoparallel] move ckpt solvers to autoparallel folder / refactor code (#1764)
* [autoparallel] first move.

* [autoparallel] add solver rotor.

* [autoparallel] add ckpt solvers.

* [autoparallel] modify codegen.

* [fx] fix annotation in test.

* [fx] remove check.

* [autoparallel] polish docstring.

* [fx] refactor MetaTensor.
2022-11-01 10:43:15 +08:00
Jiarui Fang
f34dab4270 [compatibility] ChunkMgr import error (#1772) 2022-10-28 14:48:54 +08:00
YuliangLiu0306
b0f7c8bde8 [autoparallel] update CommSpec to CommActions (#1768)
* [autoparallel] update CommSpec to CommActions

* polish code
2022-10-28 09:57:43 +08:00
YuliangLiu0306
b4cc59b61e [autoparallel] add numerical test for node strategies (#1760)
* [autoparallel] add numerical test for node strategies

* polish code

* polish code
2022-10-27 10:42:54 +08:00