Jianghai
|
c6cd629e7a
|
[Inference]ADD Bench Chatglm2 script (#4963)
* add bench chatglm
* fix bug and make utils
---------
Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
|
2023-10-24 13:11:15 +08:00 |
|
Xu Kai
|
785802e809
|
[inference] add reference and fix some bugs (#4937)
* add reference and fix some bugs
* update gptq init
---------
Co-authored-by: Xu Kai <xukai16@foxamil.com>
|
2023-10-20 13:39:34 +08:00 |
|
Jianghai
|
013a4bedf0
|
[inference]fix import bug and delete down useless init (#4830)
* fix import bug and release useless init
* fix
* fix
* fix
|
2023-10-04 09:18:45 +08:00 |
|
Xu Kai
|
946ab56c48
|
[feature] add gptq for inference (#4754)
* [gptq] add gptq kernel (#4416)
* add gptq
* refactor code
* fix tests
* replace auto-gptq
* rname inferance/quant
* refactor test
* add auto-gptq as an option
* reset requirements
* change assert and check auto-gptq
* add import warnings
* change test flash attn version
* remove example
* change requirements of flash_attn
* modify tests
* [skip ci] change requirements-test
* [gptq] faster gptq cuda kernel (#4494)
* [skip ci] add cuda kernels
* add license
* [skip ci] fix max_input_len
* format files & change test size
* [skip ci]
* [gptq] add gptq tensor parallel (#4538)
* add gptq tensor parallel
* add gptq tp
* delete print
* add test gptq check
* add test auto gptq check
* [gptq] combine gptq and kv cache manager (#4706)
* combine gptq and kv cache manager
* add init bits
* delete useless code
* add model path
* delete usless print and update test
* delete usless import
* move option gptq to shard config
* change replace linear to shardformer
* update bloom policy
* delete useless code
* fix import bug and delete uselss code
* change colossalai/gptq to colossalai/quant/gptq
* update import linear for tests
* delete useless code and mv gptq_kernel to kernel directory
* fix triton kernel
* add triton import
|
2023-09-22 11:02:50 +08:00 |
|