Commit Graph

5 Commits

Author SHA1 Message Date
Steve Luo
f7aecc0c6b
feat rmsnorm cuda kernel and add unittest, benchmark script (#5417) 2024-03-08 16:21:12 +08:00
yuehuayingxueluo
2a718c8be8
Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390)
* opt_view_and_memcopy

* fix bugs in ci

* fix ci bugs

* update benchmark scripts

* fix ci bugs
2024-02-21 13:23:57 +08:00
yuehuayingxueluo
21ad4a27f9
[Inference/opt]Optimize the mid tensor of RMS Norm (#5350)
* opt rms_norm

* fix bugs in rms_layernorm
2024-02-02 15:06:01 +08:00
yuehuayingxueluo
249644c23b
[Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation,add fused_qkv and fused linear_add (#5340)
* add fused qkv

* replace attn and mlp by shardformer

* fix bugs in mlp

* add docstrings

* fix test_inference_engine.py

* add optimize unbind

* add fused_addmm

* rm squeeze(1)

* refactor codes

* fix ci bugs

* rename ShardFormerLlamaMLP and ShardFormerLlamaAttention

* Removed the dependency on LlamaFlashAttention2

* rollback test_inference_engine.py
2024-02-01 15:49:39 +08:00
yuehuayingxueluo
e8f0642f28
[Inference]Add Nopadding Llama Modeling (#5327)
* add nopadding llama modeling

* add nopadding_llama.py

* rm unused codes

* fix bugs in test_xine_copy.py

* fix code style
2024-01-30 10:31:46 +08:00