[inference] chatglm2 infer demo (#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix
This commit is contained in:
Jianghai
2023-09-22 11:12:50 +08:00
committed by GitHub
parent 946ab56c48
commit ce7ade3882
15 changed files with 1692 additions and 14 deletions

View File

@@ -39,6 +39,21 @@ config = ChatGLMConfig(
padded_vocab_size=65024,
hidden_size=64,
num_attention_heads=8,
kv_channels=16,
rmsnorm=True,
original_rope=True,
use_cache=True,
torch_dtype=torch.float32,
)
infer_config = ChatGLMConfig(
num_layers=2,
padded_vocab_size=65024,
hidden_size=128,
num_attention_heads=8,
multi_query_attention=True,
multi_query_group_num=2,
kv_channels=16,
rmsnorm=True,
original_rope=True,
use_cache=True,