pre-commit-ci[bot]
|
df612434c9
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2024-06-14 16:27:46 +08:00 |
|
digger yu
|
385e85afd4
|
[hotfix] fix typo s/keywrods/keywords etc. (#5429)
|
2024-03-12 11:25:16 +08:00 |
|
digger yu
|
16c96d4d8c
|
[hotfix] fix typo change _descrption to _description (#5331)
|
2024-03-05 21:47:48 +08:00 |
|
Hongxin Liu
|
070df689e6
|
[devops] fix extention building (#5427)
|
2024-03-05 15:35:54 +08:00 |
|
Xu Kai
|
fd6482ad8c
|
[inference] Refactor inference architecture (#5057)
* [inference] support only TP (#4998)
* support only tp
* enable tp
* add support for bloom (#5008)
* [refactor] refactor gptq and smoothquant llama (#5012)
* refactor gptq and smoothquant llama
* fix import error
* fix linear import torch-int
* fix smoothquant llama import error
* fix import accelerate error
* fix bug
* fix import smooth cuda
* fix smoothcuda
* [Inference Refactor] Merge chatglm2 with pp and tp (#5023)
merge chatglm with pp and tp
* [Refactor] remove useless inference code (#5022)
* remove useless code
* fix quant model
* fix test import bug
* mv original inference legacy
* fix chatglm2
* [Refactor] refactor policy search and quant type controlling in inference (#5035)
* [Refactor] refactor policy search and quant type controling in inference
* [inference] update readme (#5051)
* update readme
* update readme
* fix architecture
* fix table
* fix table
* [inference] udpate example (#5053)
* udpate example
* fix run.sh
* fix rebase bug
* fix some errors
* update readme
* add some features
* update interface
* update readme
* update benchmark
* add requirements-infer
---------
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
|
2023-11-19 21:05:05 +08:00 |
|