Commit Graph

1298 Commits

Author SHA1 Message Date
YuliangLiu0306
8e4e8601b7 [DTensor] implement layout converter (#3055)
* [DTensor] refactor LayoutConverter for DTensor

* polish code

* polish docstring
2023-03-10 09:53:52 +08:00
Frank Lee
f19b49e164 [booster] init module structure and definition (#3056) 2023-03-09 11:27:46 +08:00
Xuanlei Zhao
2ca9728cbb [autochunk] refactor chunk memory estimation (#2762)
* refact memory code

* dont log free var memory

* add memory align

* update chunk target

* update setting for new memory

* finish test

* update tracer

* update typo

* update test
2023-03-08 16:22:30 +08:00
YuliangLiu0306
29386a54e6 [DTensor] refactor CommSpec (#3034) 2023-03-08 10:45:31 +08:00
YuliangLiu0306
cd2b0eaa8d [DTensor] refactor sharding spec (#2987)
* [autoparallel] refactor sharding spec

* rename function name
2023-03-07 11:08:11 +08:00
Ziyue Jiang
400f63012e [pipeline] Add Simplified Alpa DP Partition (#2507)
* add alpa dp split

* add alpa dp split

* use fwd+bwd instead of fwd only

---------

Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-03-07 10:34:31 +08:00
Super Daniel
b42d3d28ed [fx] remove depreciated algorithms. (#2312) (#2313) 2023-03-07 10:30:35 +08:00
github-actions[bot]
82503a96f2 [format] applied code formatting on changed files in pull request 2997 (#3008)
Co-authored-by: github-actions <github-actions@github.com>
2023-03-06 10:42:22 +08:00
binmakeswell
52a5078988 [doc] add ISC tutorial (#2997)
* [doc] add ISC tutorial

* [doc] add ISC tutorial

* [doc] add ISC tutorial

* [doc] add ISC tutorial
2023-03-06 10:36:38 +08:00
ver217
823f3b9cf4 [doc] add deepspeed citation and copyright (#2996)
* [doc] add deepspeed citation and copyright

* [doc] add deepspeed citation and copyright

* [doc] add deepspeed citation and copyright
2023-03-04 20:08:11 +08:00
YuliangLiu0306
e414e4092b [DTensor] implementation of dtensor (#2946)
* [DTensor] implementation of dtensor

* test layout convert

* polish
2023-03-01 16:34:58 +08:00
YuliangLiu0306
47fb214b3b [hotfix] add shard dim to aviod backward communication error (#2954) 2023-03-01 11:41:53 +08:00
ver217
090f14fd6b [misc] add reference (#2930)
* [misc] add reference

* [misc] add license
2023-02-28 18:07:24 +08:00
YuliangLiu0306
197d0bf4ed [autoparallel] apply repeat block to reduce solving time (#2912) 2023-02-28 11:03:30 +08:00
YH
a848091141 Fix port exception type (#2925) 2023-02-28 11:00:43 +08:00
zbian
61e687831d fixed using zero with tp cannot access weight correctly 2023-02-28 10:52:30 +08:00
YH
7b13f7db18 [zero] trivial zero optimizer refactoring (#2869)
* Fix mionr grad store interface

* Apply lint
2023-02-27 14:04:53 +08:00
Jiatong (Julius) Han
8c8a39be95 [hotfix]: Remove math.prod dependency (#2837)
* Remove math.prod dependency

* Fix style

* Fix style

---------

Co-authored-by: Jiatong Han <jiatong.han@u.nus.edu>
2023-02-23 23:56:15 +08:00
YuliangLiu0306
819e25d8b1 [hotfix] fix autoparallel compatibility test issues (#2754) 2023-02-23 17:28:36 +08:00
YuliangLiu0306
0f392d7403 [autoparallel] find repeat blocks (#2854)
* [autoparallel] find repeat blocks

* polish

* polish

* polish
2023-02-23 17:28:19 +08:00
junxu
c52edcf0eb Rename class method of ZeroDDP (#2692) 2023-02-22 15:05:53 +08:00
HELSON
6e4ac08172 [hotfix] fix chunk size can not be divided (#2867)
* [hotfix] fix chunk size can not be divided

* [hotfix] use numpy for python3.8
2023-02-22 15:04:46 +08:00
Boyuan Yao
eae77c831d [autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823)
* [autoparallel] non spmd meta information generator

* [autoparallel] patch meta information for non spmd nodes
2023-02-22 10:28:56 +08:00
Boyuan Yao
c7764d3f22 [autoparallel] Patch meta information of torch.where (#2822)
* [autoparallel] patch meta information of torch.where

* [autoparallel] pre-commit modified
2023-02-22 10:28:21 +08:00
Boyuan Yao
fcc4097efa [autoparallel] Patch meta information of torch.tanh() and torch.nn.Dropout (#2773)
* [autoparallel] tanh meta information

* [autoparallel] remove redundant code

* [autoparallel] patch meta information of torch.nn.Dropout
2023-02-22 10:27:59 +08:00
Frank Lee
935346430f [cli] handled version check exceptions (#2848)
* [cli] handled version check exceptions

* polish code
2023-02-21 17:04:49 +08:00
Frank Lee
918bc94b6b [triton] added copyright information for flash attention (#2835)
* [triton] added copyright information for flash attention

* polish code
2023-02-21 11:25:57 +08:00
Boyuan Yao
7ea6bc7f69 [autoparallel] Patch tensor related operations meta information (#2789)
* [autoparallel] tensor related meta information prototype

* [autoparallel] tensor related meta information

* [autoparallel] tensor related meta information

* [autoparallel] tensor related meta information

* [autoparallel] tensor related meta information
2023-02-20 17:38:55 +08:00
Michelle
c008d4ad0c [NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style (#2744) 2023-02-20 10:38:40 +08:00
YuliangLiu0306
2059fdd6b0 [hotfix] add copyright for solver and device mesh (#2803)
* [hotfix] add copyright for solver and device mesh

* add readme

* add alpa license

* polish
2023-02-18 21:14:38 +08:00
Boyuan Yao
8593ae1a3f [autoparallel] rotor solver refactor (#2813)
* [autoparallel] rotor solver refactor

* [autoparallel] rotor solver refactor
2023-02-18 11:30:15 +08:00
HELSON
56ddc9ca7a [hotfix] add correct device for fake_param (#2796) 2023-02-17 15:29:07 +08:00
Boyuan Yao
a2b43e393d [autoparallel] Patch meta information of torch.nn.Embedding (#2760)
* [autoparallel] embedding metainfo

* [autoparallel] fix function name in test_activation_metainfo

* [autoparallel] undo changes in activation metainfo and related tests
2023-02-17 10:39:48 +08:00
Boyuan Yao
8e3f66a0d1 [zero] fix wrong import (#2777) 2023-02-17 10:26:07 +08:00
Nikita Shulga
01066152f1 Don't use torch._six (#2775)
* Don't use `torch._six`

This is a private API which is gone after https://github.com/pytorch/pytorch/pull/94709

* Update common.py
2023-02-17 09:22:45 +08:00
binmakeswell
93b788b95a Merge branch 'main' into fix/format 2023-02-15 20:23:51 +08:00
xyupeng
2fd528b9f4 [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/graph_analysis.py code style (#2737) 2023-02-15 22:57:45 +08:00
YuliangLiu0306
1dc003c169 [autoparallel] distinguish different parallel strategies (#2699) 2023-02-15 22:28:28 +08:00
YH
ae86a29e23 Refact method of grad store (#2687) 2023-02-15 22:27:58 +08:00
Zirui Zhu
c9e3ee389e [NFC] polish colossalai/context/process_group_initializer/initializer_2d.py code style (#2726) 2023-02-15 22:27:13 +08:00
Zangwei Zheng
1819373e5c [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/batch_norm_handler.py code style (#2728) 2023-02-15 22:26:13 +08:00
Wangbo Zhao(黑色枷锁)
8331420520 [NFC] polish colossalai/cli/cli.py code style (#2734) 2023-02-15 22:25:28 +08:00
ziyuhuang123
d344313533 [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/embedding_handler.py code style (#2725) 2023-02-15 16:31:40 +08:00
Xue Fuzhao
e81caeb4bc [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/cost_graph.py code style (#2720)
Co-authored-by: Fuzhao Xue <fuzhao@login2.ls6.tacc.utexas.edu>
2023-02-15 16:12:45 +08:00
yuxuan-lou
51c45c2460 [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/where_handler.py code style (#2723) 2023-02-15 16:12:24 +08:00
YuliangLiu0306
21d6a48f4d [autoparallel] add shard option (#2696)
* [autoparallel] add shard option

* polish
2023-02-15 13:48:28 +08:00
YuliangLiu0306
5b24987fa7 [autoparallel] fix parameters sharding bug (#2716) 2023-02-15 12:25:50 +08:00
Ziyue Jiang
4603538ddd [NFC] posh colossalai/context/process_group_initializer/initializer_sequence.py code style (#2712)
Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>
2023-02-15 10:53:38 +08:00
YuliangLiu0306
cb2c6a2415 [autoparallel] refactor runtime pass (#2644)
* [autoparallel] refactor runtime pass

* add unit test

* polish
2023-02-15 10:36:19 +08:00
Zihao
b3d10db5f1 [NFC] polish colossalai/cli/launcher/__init__.py code style (#2709) 2023-02-15 09:57:22 +08:00