Commit Graph

5 Commits

Author SHA1 Message Date
Xu Kai
c6295c3381
[Refactor] remove useless inference code (#5022)
* remove useless code

* fix quant model

* fix test import bug

* mv original inference legacy

* fix chatglm2
2023-11-10 14:47:06 +08:00
Bin Jia
81b8f5e76a
[Inference Refactor] Merge chatglm2 with pp and tp (#5023)
merge chatglm with pp and tp
2023-11-09 14:46:19 +08:00
Xu Kai
450115bd0f [refactor] refactor gptq and smoothquant llama (#5012)
* refactor gptq and smoothquant llama

* fix import error

* fix linear import torch-int

* fix smoothquant llama import error

* fix import accelerate error

* fix bug

* fix import smooth cuda

* fix smoothcuda
2023-11-09 10:12:11 +08:00
Bin Jia
48d0a58d10 add support for bloom (#5008) 2023-11-09 10:12:11 +08:00
Bin Jia
b6696beb04
[Pipeline Inference] Merge pp with tp (#4993)
* refactor pipeline into new CaiInferEngine

* updata llama modeling forward

* merge tp with pp

* update docstring

* optimize test workflow and example

* fix typo

* add assert and todo
2023-11-01 12:46:21 +08:00