Xu Kai
c6295c3381
[Refactor] remove useless inference code ( #5022 )
...
* remove useless code
* fix quant model
* fix test import bug
* mv original inference legacy
* fix chatglm2
2023-11-10 14:47:06 +08:00
Xu Kai
450115bd0f
[refactor] refactor gptq and smoothquant llama ( #5012 )
...
* refactor gptq and smoothquant llama
* fix import error
* fix linear import torch-int
* fix smoothquant llama import error
* fix import accelerate error
* fix bug
* fix import smooth cuda
* fix smoothcuda
2023-11-09 10:12:11 +08:00
Bin Jia
b6696beb04
[Pipeline Inference] Merge pp with tp ( #4993 )
...
* refactor pipeline into new CaiInferEngine
* updata llama modeling forward
* merge tp with pp
* update docstring
* optimize test workflow and example
* fix typo
* add assert and todo
2023-11-01 12:46:21 +08:00