Commit Graph

318 Commits

Author SHA1 Message Date
HELSON
c6a1a62636 [hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786)
* [hotfix] fix zero's incompatibility with checkpoint in torch-1.12

* [zero] add cpu shard init

* [zero] add tiny example test

* [colo_tensor] fix bugs for torch-1.11
2022-11-02 16:11:34 +08:00
kurisusnowdeng
0b8161fab8 updated tp layers 2022-11-02 12:19:38 +08:00
Sze-qq
23703c9dd6 [NFC] polish colossalai/nn/metric/_utils.py code style (#1727) 2022-10-19 12:20:51 +08:00
Ofey Chan
7e62af28a0 [NFC] polish accuracy_2d.py code style (#1719) 2022-10-19 12:20:51 +08:00
yuxuan-lou
2b49ca80a3 [NFC] polish colossalai/nn/lr_scheduler/linear.py code style (#1716) 2022-10-19 12:20:51 +08:00
shenggan
e1d780030d [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714) 2022-10-19 12:20:51 +08:00
HELSON
1468e4bcfc [zero] add constant placement policy (#1705)
* fixes memory leak when paramter is in fp16 in ZeroDDP init.
* bans chunk releasement in CUDA. Only when a chunk is about to offload, it is allowed to release.
* adds a constant placement policy. With it, users can allocate a reserved caching memory space for parameters.
2022-10-14 17:53:16 +08:00
binmakeswell
5f41463a76 add optimizer README for tutorials (#1707) 2022-10-14 09:10:18 +00:00
Jiarui Fang
21962e1593 [embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699) 2022-10-13 22:22:27 +08:00
Jiarui Fang
363fc2861a [embeddings] more detailed timer (#1692) 2022-10-12 12:01:21 +08:00
jim
e5ab6be72e [hotfix[ fix colotensor.type() raise NotImplementedError (#1682) 2022-10-10 10:13:31 +08:00
HELSON
b28991dd0a [feature] A new ZeRO implementation (#1644) 2022-10-09 09:18:51 +08:00
Jiarui Fang
c638bec028 [embedding] polish async copy (#1657) 2022-09-27 14:37:03 +08:00
Jiarui Fang
988570e4a6 [embedding] add more detail profiling (#1656) 2022-09-27 13:43:59 +08:00
Jiarui Fang
e1f97fd2b8 [embedding] print profiling results (#1654) 2022-09-27 12:50:33 +08:00
Jiarui Fang
04443605a5 [embedding] non-blocking cpu-gpu copy (#1647) 2022-09-26 14:57:57 +08:00
CsRic
0767f67a0f [embedding] isolate cache_op from forward (#1645)
Co-authored-by: ric <mkkt_bkkt@mail.ustc.edu.cn>
2022-09-26 11:18:59 +08:00
Jiarui Fang
c5d39215f6 Revert "[feature] new zero implementation (#1623)" (#1643)
This reverts commit 5be118f405.
2022-09-26 10:06:03 +08:00
HELSON
5be118f405 [feature] new zero implementation (#1623) 2022-09-24 19:58:18 +08:00
Jiarui Fang
e57df80325 [embeddings] cache option (#1635) 2022-09-23 16:40:18 +08:00
HELSON
a088022efc [moe] fix moe bugs (#1633) 2022-09-23 15:33:57 +08:00
HELSON
f7f2248771 [moe] fix MoE bugs (#1628)
* remove forced FP32 modules

* correct no_shard-contexts' positions
2022-09-22 13:56:30 +08:00
Jiarui Fang
38c68b5b9a [embedding] rollback for better FAW performance (#1625) 2022-09-22 11:16:25 +08:00
Jiarui Fang
504ff1d101 [embeddings] use cache_ratio instead of cuda_row_num (#1611) 2022-09-20 14:33:04 +08:00
Jiarui Fang
a19eb80998 [embedding] updates some default parameters 2022-09-15 15:45:17 +08:00
CsRic
f3403ff98e [embeddings] add already_split_along_rank flag for tablewise mode (#1584) 2022-09-13 10:50:34 +08:00
Sze-qq
2144cbae8c [NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572) 2022-09-08 22:11:04 +08:00
superhao1995
e4bf7ae667 [NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571)
Co-authored-by: Research <research@soccf-snr3-017.comp.nus.edu.sg>
2022-09-08 22:11:04 +08:00
Jiatong Han
3263cdf57f [NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570)
Co-authored-by: JThh <jiatong.han@u.nus.edu>
2022-09-08 22:11:04 +08:00
DouJS
f586887a90 [NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568) 2022-09-08 22:11:04 +08:00
BigOneLiXiaoMing
0c4c9aa6e0 [NFC] polish colossalai/nn/_ops/embedding.py code style (#1561) 2022-09-08 22:11:04 +08:00
Ofey Chan
7cc052f6c0 [NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556) 2022-09-08 22:11:04 +08:00
yuxuan-lou
413f9c19f4 [NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555) 2022-09-08 22:11:04 +08:00
shenggan
8edb777cc2 [NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553) 2022-09-08 22:11:04 +08:00
Maruyama_Aya
bd2d789832 [NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552) 2022-09-08 22:11:04 +08:00
binmakeswell
73e9eb13b7 [NFC] polish colossalai/nn/lr_scheduler/cosine.py code style 2022-09-08 22:11:04 +08:00
CsRic
a389ac4ec9 [embedding] cache_embedding small improvement (#1564) 2022-09-08 16:41:19 +08:00
ver217
10dd8226b1 add gather_output for VocabParallelClassifier1D (#1569) 2022-09-08 16:40:56 +08:00
ver217
ae71036cd2 [utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548)
* refactor parallel layer

* broadcast rank0 model after load ckpt
2022-09-06 20:18:35 +08:00
Jiarui Fang
64169f3e8f [embedding] polish parallel embedding tablewise (#1545) 2022-09-06 10:41:20 +08:00
CsRic
964123ae0f [embedding] freq_aware_embedding: add small functions for caller application (#1537) 2022-09-05 15:12:53 +08:00
Jiarui Fang
521078ffc9 [embedding] fix a bug in table wise sharding (#1538) 2022-09-02 15:48:35 +08:00
Jiarui Fang
87134524fd [embedding] tablewise sharding polish (#1535) 2022-09-02 11:09:37 +08:00
CsRic
5156d5b4f8 [embedding] add tablewise sharding for FAW (#1526) 2022-09-01 17:55:41 +08:00
Jiarui Fang
4537d39df9 [doc] docstring for FreqAwareEmbeddingBag (#1525) 2022-08-31 13:52:30 +08:00
Jiarui Fang
9a9ef65313 [FAW] cpu caching operations (#1520) 2022-08-30 14:50:02 +08:00
Jiarui Fang
af5438caa2 [FAW] refactor reorder() for CachedParamMgr (#1514) 2022-08-29 14:22:07 +08:00
Jiarui Fang
9feee6d06b [FAW] LFU initialize with dataset freq (#1513) 2022-08-29 12:52:53 +08:00
CsRic
1b8fee8e9c [FAW] shrink freq_cnter size (#1509) 2022-08-29 11:44:55 +08:00
Jiarui Fang
ba61109b6c [FAW] remove code related to chunk (#1501) 2022-08-26 14:23:30 +08:00