* remove useless code * fix quant model * fix test import bug * mv original inference legacy * fix chatglm2