1
0
mirror of https://github.com/hpcaitech/ColossalAI.git synced 2025-05-02 13:45:36 +00:00
Commit Graph

216 Commits

Author SHA1 Message Date
pre-commit-ci[bot]
7c2f79fa98
[pre-commit.ci] pre-commit autoupdate ()
* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/PyCQA/autoflake: v2.2.1 → v2.3.1](https://github.com/PyCQA/autoflake/compare/v2.2.1...v2.3.1)
- [github.com/pycqa/isort: 5.12.0 → 5.13.2](https://github.com/pycqa/isort/compare/5.12.0...5.13.2)
- [github.com/psf/black-pre-commit-mirror: 23.9.1 → 24.4.2](https://github.com/psf/black-pre-commit-mirror/compare/23.9.1...24.4.2)
- [github.com/pre-commit/mirrors-clang-format: v13.0.1 → v18.1.7](https://github.com/pre-commit/mirrors-clang-format/compare/v13.0.1...v18.1.7)
- [github.com/pre-commit/pre-commit-hooks: v4.3.0 → v4.6.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.3.0...v4.6.0)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-07-01 17:16:41 +08:00
Hongxin Liu
7f8b16635b
[misc] refactor launch API and tensor constructor ()
* [misc] remove config arg from initialize

* [misc] remove old tensor contrusctor

* [plugin] add npu support for ddp

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [devops] fix doc test ci

* [test] fix test launch

* [doc] update launch doc

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-04-29 10:40:11 +08:00
Edenzzzz
d83c633ca6
[hotfix] Fix examples no pad token & auto parallel codegen bug; ()
* fix no pad token bug

* fixed some auto parallel codegen bug, but might not run on torch 2.1

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-04-18 18:15:50 +08:00
Stephan Kölker
5d380a1a21
[hotfix] Fix wrong import in meta_registry () 2024-02-20 19:24:43 +08:00
Hongxin Liu
d202cc28c0
[npu] change device to accelerator api ()
* update accelerator

* fix timer

* fix amp

* update

* fix

* update bug

* add error raise

* fix autocast

* fix set device

* remove doc accelerator

* update doc

* update doc

* update doc

* use nullcontext

* update cpu

* update null context

* change time limit for example

* udpate

* update

* update

* update

* [npu] polish accelerator code

---------

Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>
2024-01-09 10:20:05 +08:00
Hongxin Liu
e5ce4c8ea6
[npu] add npu support for gemini and zero ()
* [npu] setup device utils ()

* [npu] add npu device support

* [npu] support low level zero

* [test] update npu zero plugin test

* [hotfix] fix import

* [test] recover tests

* [npu] gemini support npu ()

* [npu] refactor device utils

* [gemini] support npu

* [example] llama2+gemini support npu

* [kernel] add arm cpu adam kernel ()

* [kernel] add arm cpu adam

* [optim] update adam optimizer

* [kernel] arm cpu adam remove bf16 support
2023-11-20 16:12:41 +08:00
Hongxin Liu
079bf3cb26
[misc] update pre-commit and run all files ()
* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format
2023-09-19 14:20:26 +08:00
Hongxin Liu
b5f9e37c70
[legacy] clean up legacy code ()
* [legacy] remove outdated codes of pipeline ()

* [legacy] remove cli of benchmark and update optim ()

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor ()

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy ()

* [legacy] clean up utils ()

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc ()

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci
2023-09-18 16:31:06 +08:00
Hongxin Liu
554aa9592e
[legacy] move communication and nn to legacy and refactor logger ()
* [legacy] move communication to legacy ()

* [legacy] refactor logger and clean up legacy codes ()

* [legacy] make logger independent to gpc

* [legacy] make optim independent to registry

* [legacy] move test engine to legacy

* [legacy] move nn to legacy ()

* [legacy] move nn to legacy

* [checkpointio] fix save hf config

* [test] remove useledd rpc pp test

* [legacy] fix nn init

* [example] skip tutorial hybriad parallel example

* [devops] test doc check

* [devops] test doc check
2023-09-11 16:24:28 +08:00
Hongxin Liu
ac178ca5c1 [legacy] move builder and registry to legacy () 2023-09-05 21:53:10 +08:00
Lufang Chen
12c95a9fed
fix runtime prepare pass ()
Co-authored-by: lufang.chen <lufang.chen@nio.com>
2023-08-30 17:29:38 +08:00
Wenhao Chen
fee553288b [NFC] polish runtime_preparation_pass style () 2023-07-26 14:12:57 +08:00
YeAnbang
3883db452c [NFC] polish unary_elementwise_generator.py code style ()
Co-authored-by: aye42 <aye42@gatech.edu>
2023-07-26 14:12:57 +08:00
Yanjia0
c614a99d28 [NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code style () 2023-07-26 14:12:57 +08:00
Frank Lee
c4b1b65931 [test] fixed tests failed due to dtensor change ()
* [test] fixed tests failed due to dtensor change

* polish code
2023-07-04 16:05:01 +08:00
digger yu
e2d81eba0d
[nfc] fix typo colossalai/ applications/ ()
* fix typo colossalai/autochunk auto_parallel amp

* fix typo colossalai/auto_parallel nn utils etc.

* fix typo colossalai/auto_parallel autochunk fx/passes  etc.

* fix typo docs/

* change placememt_policy to placement_policy in docs/ and examples/

* fix typo colossalai/ applications/
2023-05-25 16:19:41 +08:00
digger yu
7f8203af69
fix typo colossalai/auto_parallel autochunk fx/passes etc. () 2023-05-24 09:01:50 +08:00
digger yu
9265f2d4d7
[NFC]fix typo colossalai/auto_parallel nn utils etc. ()
* fix typo colossalai/autochunk auto_parallel amp

* fix typo colossalai/auto_parallel nn utils etc.
2023-05-23 15:28:20 +08:00
digger yu
32f81f14d4
[NFC] fix typo colossalai/amp auto_parallel autochunk () 2023-05-19 13:50:00 +08:00
digger yu
1baeb39c72
[NFC] fix typo with colossalai/auto_parallel/tensor_shard ()
* fix typo applications/ and colossalai/ date 5.11

* fix typo colossalai/
2023-05-17 11:13:23 +08:00
digger-yu
ad6460cf2c
[NFC] fix typo applications/ and colossalai/ () 2023-05-15 11:46:25 +08:00
digger-yu
b9a8dff7e5
[doc] Fix typo under colossalai and doc()
* Fixed several spelling errors under colossalai

* Fix the spelling error in colossalai and docs directory

* Cautious Changed the spelling error under the example folder

* Update runtime_preparation_pass.py

revert autograft to autograd

* Update search_chunk.py

utile to until

* Update check_installation.py

change misteach to mismatch in line 91

* Update 1D_tensor_parallel.md

revert to perceptron

* Update 2D_tensor_parallel.md

revert to perceptron in line 73

* Update 2p5D_tensor_parallel.md

revert to perceptron in line 71

* Update 3D_tensor_parallel.md

revert to perceptron in line 80

* Update README.md

revert to resnet in line 42

* Update reorder_graph.py

revert to indice in line 7

* Update p2p.py

revert to megatron in line 94

* Update initialize.py

revert to torchrun in line 198

* Update routers.py

change to detailed in line 63

* Update routers.py

change to detailed in line 146

* Update README.md

revert  random number in line 402
2023-04-26 11:38:43 +08:00
YuliangLiu0306
ffcdbf0f65
[autoparallel]integrate auto parallel feature with new tracer ()
* [autoparallel] integrate new analyzer in module level

* unify the profiling method

* polish

* fix no codegen bug

* fix pass bug

* fix liveness test

* polish
2023-04-04 17:40:45 +08:00
ver217
26b7aac0be
[zero] reorganize zero/gemini folder structure ()
* [zero] refactor low-level zero folder structure

* [zero] fix legacy zero import path

* [zero] fix legacy zero import path

* [zero] remove useless import

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor gemini folder structure

* [zero] refactor legacy zero import path

* [zero] fix test import path

* [zero] fix test

* [zero] fix circular import

* [zero] update import
2023-04-04 13:48:16 +08:00
Frank Lee
638a07a7f9
[test] fixed gemini plugin test ()
* [test] fixed gemini plugin test

* polish code

* polish code
2023-04-03 17:12:22 +08:00
YuliangLiu0306
fee2af8610
[autoparallel] adapt autoparallel with new analyzer ()
* [autoparallel] adapt autoparallel with new analyzer

* fix all node handler tests

* polish

* polish
2023-03-30 17:47:24 +08:00
Zihao
18dbe76cae
[auto-parallel] add auto-offload feature ()
* add auto-offload feature

* polish code

* fix syn offload runtime pass bug

* add offload example

* fix offload testing bug

* fix example testing bug
2023-03-21 14:17:41 +08:00
YuliangLiu0306
47fb214b3b
[hotfix] add shard dim to aviod backward communication error () 2023-03-01 11:41:53 +08:00
YuliangLiu0306
197d0bf4ed
[autoparallel] apply repeat block to reduce solving time () 2023-02-28 11:03:30 +08:00
YuliangLiu0306
819e25d8b1
[hotfix] fix autoparallel compatibility test issues () 2023-02-23 17:28:36 +08:00
YuliangLiu0306
0f392d7403
[autoparallel] find repeat blocks ()
* [autoparallel] find repeat blocks

* polish

* polish

* polish
2023-02-23 17:28:19 +08:00
Boyuan Yao
eae77c831d
[autoparallel] Patch meta information for nodes that will not be handled by SPMD solver ()
* [autoparallel] non spmd meta information generator

* [autoparallel] patch meta information for non spmd nodes
2023-02-22 10:28:56 +08:00
Boyuan Yao
c7764d3f22
[autoparallel] Patch meta information of torch.where ()
* [autoparallel] patch meta information of torch.where

* [autoparallel] pre-commit modified
2023-02-22 10:28:21 +08:00
Boyuan Yao
fcc4097efa
[autoparallel] Patch meta information of torch.tanh() and torch.nn.Dropout ()
* [autoparallel] tanh meta information

* [autoparallel] remove redundant code

* [autoparallel] patch meta information of torch.nn.Dropout
2023-02-22 10:27:59 +08:00
Boyuan Yao
7ea6bc7f69
[autoparallel] Patch tensor related operations meta information ()
* [autoparallel] tensor related meta information prototype

* [autoparallel] tensor related meta information

* [autoparallel] tensor related meta information

* [autoparallel] tensor related meta information

* [autoparallel] tensor related meta information
2023-02-20 17:38:55 +08:00
YuliangLiu0306
2059fdd6b0
[hotfix] add copyright for solver and device mesh ()
* [hotfix] add copyright for solver and device mesh

* add readme

* add alpa license

* polish
2023-02-18 21:14:38 +08:00
Boyuan Yao
8593ae1a3f
[autoparallel] rotor solver refactor ()
* [autoparallel] rotor solver refactor

* [autoparallel] rotor solver refactor
2023-02-18 11:30:15 +08:00
Boyuan Yao
a2b43e393d
[autoparallel] Patch meta information of torch.nn.Embedding ()
* [autoparallel] embedding metainfo

* [autoparallel] fix function name in test_activation_metainfo

* [autoparallel] undo changes in activation metainfo and related tests
2023-02-17 10:39:48 +08:00
YuliangLiu0306
1dc003c169
[autoparallel] distinguish different parallel strategies () 2023-02-15 22:28:28 +08:00
YuliangLiu0306
21d6a48f4d
[autoparallel] add shard option ()
* [autoparallel] add shard option

* polish
2023-02-15 13:48:28 +08:00
YuliangLiu0306
5b24987fa7
[autoparallel] fix parameters sharding bug () 2023-02-15 12:25:50 +08:00
YuliangLiu0306
cb2c6a2415
[autoparallel] refactor runtime pass ()
* [autoparallel] refactor runtime pass

* add unit test

* polish
2023-02-15 10:36:19 +08:00
YuliangLiu0306
0b2a738393
[autoparallel] remove deprecated codes () 2023-02-15 09:54:32 +08:00
YuliangLiu0306
7fa6be49d2
[autoparallel] test compatibility for gemini and auto parallel () 2023-02-15 09:43:29 +08:00
Boyuan Yao
40c916b192
[autoparallel] Patch meta information of torch.nn.functional.softmax and torch.nn.Softmax ()
* [autoparallel] softmax metainfo

* [autoparallel] softmax metainfo
2023-02-13 16:09:22 +08:00
Boyuan Yao
0385b26ebf
[autoparallel] Patch meta information of torch.nn.LayerNorm ()
* [autoparallel] layernorm metainfo patch

* [autoparallel] polish test
2023-02-10 14:29:24 +08:00
YuliangLiu0306
37df666f38
[autoparallel] refactor handlers which reshape input tensors ()
* [autoparallel] refactor handlers which reshape input tensors

* polish
2023-02-08 15:02:49 +08:00
YuliangLiu0306
28398f1c70
add overlap option () 2023-02-08 15:02:31 +08:00
YuliangLiu0306
cb3d1bef62
[autoparallel] adapt autoparallel tests with latest api () 2023-02-08 15:02:12 +08:00
Boyuan Yao
90a9fdd91d
[autoparallel] Patch meta information of torch.matmul ()
* [autoparallel] matmul metainfo

* [auto_parallel] remove unused print

* [tests] skip test_matmul_handler when torch version is lower than 1.12.0
2023-02-08 11:05:31 +08:00