1
0
mirror of https://github.com/hpcaitech/ColossalAI.git synced 2025-05-06 07:28:12 +00:00
Commit Graph

3297 Commits

Author SHA1 Message Date
Yuanheng Zhao
498f42c45b
[NFC] fix requirements () 2024-05-22 12:08:49 +08:00
Yuanheng Zhao
bd38fe6b91
[NFC] Fix code factors on inference triton kernels () 2024-05-21 22:12:15 +08:00
Yuanheng Zhao
c2c8c9cf17
[ci] Temporary fix for build on pr ()
* temporary fix for CI

* timeout to 90
2024-05-21 18:20:57 +08:00
Yuanheng Zhao
c06208e72c
Merge pull request from yuanheng-zhao/inference/sync/main
[sync] Sync feature/colossal-infer with main
2024-05-21 11:26:37 +08:00
Yuanheng Zhao
8633c15da9 [sync] Sync feature/colossal-infer with main 2024-05-20 15:50:53 +00:00
Yuanheng Zhao
d8b1ea4ac9
[doc] Update Inference Readme ()
* [doc] update inference readme

* add contents

* trivial
2024-05-20 22:50:04 +08:00
Yuanheng Zhao
bdf9a001d6
[Fix/Inference] Add unsupported auto-policy error message ()
* [fix] auto policy error message

* trivial
2024-05-20 22:49:18 +08:00
Yuanheng Zhao
283c407a19
[Inference] Fix Inference Generation Config and Sampling ()
* refactor and add

* config default values

* fix gen config passing

* fix rpc generation config
2024-05-19 15:08:42 +08:00
flybird11111
9d83c6d715
[lazy] fix lazy cls init ()
* fix

* fix

* fix

* fix

* fix

* remove kernel intall

* rebase

revert

fix

* fix

* fix
2024-05-17 18:18:59 +08:00
Yuanheng Zhao
8bcfe360fd
[example] Update Inference Example ()
* [example] update inference example
2024-05-17 11:28:53 +08:00
binmakeswell
2011b1356a
[misc] Update PyTorch version in docs ()
* [misc] Update PyTorch version in docs

* [misc] Update PyTorch version in docs
2024-05-16 13:54:32 +08:00
傅剑寒
a8d459f99a
【Inference] Delete duplicated package () 2024-05-16 10:49:03 +08:00
Jianghai
f47f2fbb24
[Inference] Fix API server, test and example ()
* fix api server

* fix generation config

* fix api server

* fix comments

* fix infer hanging bug

* resolve comments, change backend to free port
2024-05-15 15:47:31 +08:00
Tong Li
913c920ecc
[Colossal-LLaMA] Fix sft issue for llama2 ()
* fix minor issue

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-15 10:52:11 +08:00
Runyu Lu
74c47921fa
[Fix] Llama3 Load/Omit CheckpointIO Temporarily ()
* Fix Llama3 Load error
* Omit Checkpoint IO Temporarily
2024-05-14 20:17:43 +08:00
Yuanheng Zhao
5bbab1533a
[ci] Fix example tests ()
* [fix] revise timeout value on example CI

* trivial
2024-05-14 16:08:51 +08:00
傅剑寒
121d7ad629
[Inference] Delete duplicated copy_vector () 2024-05-14 14:35:33 +08:00
Edenzzzz
43995ee436
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor ()
* [feat] Add distributed lamb; minor fixes in DeviceMesh ()

* init: add dist lamb; add debiasing for lamb

* dist lamb tester mostly done

* all tests passed

* add comments

* all tests passed. Removed debugging statements

* moved setup_distributed inside plugin. Added dist layout caching

* organize better

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

* [hotfix] Improve tester precision by removing ZeRO on vanilla lamb ()

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

* [optim] add distributed came ()

* test CAME under LowLevelZeroOptimizer wrapper

* test CAME TP row and col pass

* test CAME zero pass

* came zero add master and worker param id convert

* came zero test pass

* came zero test pass

* test distributed came passed

* reform code, Modify some expressions and add comments

* minor fix of test came

* minor fix of dist_came and test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix of dist_came and test

* rebase dist-optim

* rebase dist-optim

* fix remaining comments

* add test dist came using booster api

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [optim] Distributed Adafactor ()

* [feature] solve conflict; update optimizer readme;

* [feature] update optimize readme;

* [fix] fix testcase;

* [feature] Add transformer-bert to testcase;solve a bug related to indivisible shape (induction in use_zero and tp is row parallel);

* [feature] Add transformers_bert model zoo in testcase;

* [feature] add user documentation to docs/source/feature.

* [feature] add API Reference & Sample to optimizer Readme; add state check for bert exam;

* [feature] modify user documentation;

* [fix] fix readme format issue;

* [fix] add zero=0 in testcase; cached augment in dict;

* [fix] fix percision issue;

* [feature] add distributed rms;

* [feature] remove useless comment in testcase;

* [fix] Remove useless test; open zero test; remove fp16 test in bert exam;

* [feature] Extract distributed rms function;

* [feature] add booster + lowlevelzeroPlugin in test;

* [feature] add Start_with_booster_API case in md; add Supporting Information in md;

* [fix] Also remove state movement in base adafactor;

* [feature] extract factor function;

* [feature] add LowLevelZeroPlugin test;

* [fix] add tp=False and zero=True in logic;

* [fix] fix use zero logic;

* [feature] add row residue logic in column parallel factor;

* [feature] add check optim state func;

* [feature] Remove duplicate logic;

* [feature] update optim state check func and percision test bug;

* [fix] update/fix optim state; Still exist percision issue;

* [fix] Add use_zero check in _rms; Add plugin support info in Readme; Add Dist Adafactor init Info;

* [feature] removed print & comments in utils;

* [feature] uodate Readme;

* [feature] add LowLevelZeroPlugin test with Bert model zoo;

* [fix] fix logic in _rms;

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [fix] remove comments in testcase;

* [feature] add zh-Han Readme;

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Feature] refractor dist came; fix percision error; add low level zero test with bert model zoo; ()

* [feature] daily update;

* [fix] fix dist came;

* [feature] refractor dist came; fix percision error; add low level zero test with bert model zoo;

* [fix] open rms; fix low level zero test; fix dist came test function name;

* [fix] remove redundant test;

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Feature] Add Galore (Adam, Adafactor) and distributed GaloreAdamW8bit ()

* init: add dist lamb; add debiasing for lamb

* dist lamb tester mostly done

* all tests passed

* add comments

* all tests passed. Removed debugging statements

* moved setup_distributed inside plugin. Added dist layout caching

* organize better

* update comments

* add initial distributed galore

* add initial distributed galore

* add galore set param utils; change setup_distributed interface

* projected grad precision passed

* basic precision tests passed

* tests passed; located svd precision issue in fwd-bwd; banned these tests

* Plugin DP + TP tests passed

* move get_shard_dim to d_tensor

* add comments

* remove useless files

* remove useless files

* fix zero typo

* improve interface

* remove moe changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix import

* fix deepcopy

* update came & adafactor to main

* fix param map

* fix typo

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Hotfix] Remove one buggy test case from dist_adafactor for now ()


Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

---------

Co-authored-by: Edenzzzz <wtan45@wisc.edu>
Co-authored-by: chongqichuizi875 <107315010+chongqichuizi875@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: duanjunwen <54985467+duanjunwen@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
2024-05-14 13:52:45 +08:00
Steve Luo
7806842f2d
add paged-attetionv2: support seq length split across thread block () 2024-05-14 12:46:54 +08:00
Runyu Lu
18d67d0e8e
[Feat]Inference RPC Server Support ()
* rpc support source
* kv cache logical/physical disaggregation
* sampler refactor
* colossalai launch built in
* Unitest
* Rpyc support

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-14 10:00:55 +08:00
hugo-syn
393c8f5b7f
[hotfix] fix inference typo () 2024-05-13 21:06:44 +08:00
Edenzzzz
785cd9a9c9
[misc] Update PyTorch version in docs ()
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-05-13 12:02:52 +08:00
yuehuayingxueluo
de4bf3dedf
[Inference]Adapt repetition_penalty and no_repeat_ngram_size ()
* Adapt repetition_penalty and no_repeat_ngram_size

* fix no_repeat_ngram_size_logit_process

* remove batch_updated

* fix annotation

* modified codes based on the review feedback.

* rm get_batch_token_ids
2024-05-11 15:13:25 +08:00
傅剑寒
50104ab340
[Inference/Feat] Add convert_fp8 op for fp8 test in the future ()
* add convert_fp8 op for fp8 test in the future

* rerun ci
2024-05-10 18:39:54 +08:00
Wang Binluo
537f6a3855
[Shardformer]fix the num_heads assert for llama model and qwen model ()
* fix the num_heads assert

* fix the transformers import

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix the import

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-10 15:33:39 +08:00
Wang Binluo
a3cc68ca93
[Shardformer] Support the Qwen2 model ()
* feat: support qwen2 model

* fix: modify model config and add Qwen2RMSNorm

* fix qwen2 model conflicts

* test: add qwen2 shard test

* to: add qwen2 auto policy

* support qwen model

* fix the conflicts

* add try catch

* add transformers version for qwen2

* add the ColoAttention for the qwen2 model

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add the unit test version check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix the test input bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix the version check

* fix the version check

---------

Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-05-09 20:04:25 +08:00
傅剑寒
bfad39357b
[Inference/Feat] Add quant kvcache interface ()
* add quant kvcache interface

* delete unused output

* complete args comments
2024-05-09 18:03:24 +08:00
Jianghai
492520dbdb
Merge pull request from hpcaitech/feat/online-serving
[Feature]Online Serving
2024-05-09 17:19:45 +08:00
CjhHa1
5d9a49483d [Inference] Add example test_ci script 2024-05-09 05:44:05 +00:00
flybird11111
d4c5ef441e
[gemini]remove registered gradients hooks ()
* fix gemini

fix gemini

* fix

fix
2024-05-09 10:29:49 +08:00
CjhHa1
bc9063adf1 resolve rebase conflicts on Branch feat/online-serving 2024-05-08 15:20:53 +00:00
Jianghai
61a1b2e798 [Inference] Fix bugs and docs for feat/online-server ()
* fix test bugs

* add do sample test

* del useless lines

* fix comments

* fix tests

* delete version tag

* delete version tag

* add

* del test sever

* fix test

* fix

* Revert "add"

This reverts commit b9305fb024.
2024-05-08 15:20:53 +00:00
CjhHa1
7bbb28e48b [Inference] resolve rebase conflicts
fix
2024-05-08 15:20:53 +00:00
Jianghai
c064032865 [Online Server] Chat Api for streaming and not streaming response ()
* fix bugs

* fix bugs

* fix api server

* fix api server

* add chat api and test

* del request.n
2024-05-08 15:20:53 +00:00
Jianghai
de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example ()
* finish online test and add examples

* fix test_contionus_batching

* fix some bugs

* fix bash

* fix

* fix inference

* finish revision

* fix typos

* revision
2024-05-08 15:20:52 +00:00
Jianghai
69cd7e069d [Inference] ADD async and sync Api server using FastAPI ()
* add api server

* fix

* add

* add completion service and fix bug

* add generation config

* revise shardformer

* fix bugs

* add docstrings and fix some bugs

* fix bugs and add choices for prompt template
2024-05-08 15:18:28 +00:00
yuehuayingxueluo
d482922035
[Inference] Support the logic related to ignoring EOS token ()
* Adapt temperature processing logic

* add ValueError for top_p and top_k

* add GQA Test

* fix except_msg

* support ignore EOS token

* change variable's name

* fix annotation
2024-05-08 19:59:10 +08:00
yuehuayingxueluo
9c2fe7935f
[Inference]Adapt temperature processing logic ()
* Adapt temperature processing logic

* add ValueError for top_p and top_k

* add GQA Test

* fix except_msg
2024-05-08 17:58:29 +08:00
Yuanheng Zhao
12e7c28d5e
[hotfix] fix OpenMOE example import path () 2024-05-08 15:48:47 +08:00
Wang Binluo
22297789ab
Merge pull request from wangbluo/parallel_output
[Shardformer] Add Parallel output for shardformer models
2024-05-07 22:59:42 -05:00
Yuanheng Zhao
55cc7f3df7
[Fix] Fix Inference Example, Tests, and Requirements ()
* clean requirements

* modify example inference struct

* add test ci scripts

* mark test_infer as submodule

* rm deprecated cls & deps

* import of HAS_FLASH_ATTN

* prune inference tests to be run

* prune triton kernel tests

* increment pytest timeout mins

* revert import path in openmoe
2024-05-08 11:30:15 +08:00
Yuanheng Zhao
f9afe0addd
[hotfix] Fix KV Heads Number Assignment in KVCacheManager ()
- Fix key value number assignment in KVCacheManager, as well as method of accessing
2024-05-07 23:13:14 +08:00
wangbluo
4e50cce26b fix the mistral model 2024-05-07 09:17:56 +00:00
wangbluo
a8408b4d31 remove comment code 2024-05-07 07:08:56 +00:00
pre-commit-ci[bot]
ca56b93d83 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-05-07 07:07:09 +00:00
wangbluo
108ddfb795 add parallel_output for the opt model 2024-05-07 07:05:53 +00:00
pre-commit-ci[bot]
88f057ce7c [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2024-05-07 07:03:47 +00:00
Edenzzzz
58954b2986
[misc] Add an existing issue checkbox in bug report ()
Co-authored-by: Wenxuan(Eden) Tan <wtan45@wisc.edu>
2024-05-07 12:18:50 +08:00
flybird11111
77ec773388
[zero]remove registered gradients hooks ()
* remove registered hooks

fix

fix

fix zero

fix

fix

fix

fix

fix zero

fix zero

fix

fix

fix

* fix

fix

fix
2024-05-07 12:01:38 +08:00
Edenzzzz
c25f83c85f
fix missing pad token ()
Co-authored-by: Edenzzzz <wtan45@wisc.edu>
2024-05-06 18:17:26 +08:00