Yuanheng Zhao
5f98a9d68a
[Infer] Optimize Blocked KVCache And Kernels Using It (#5325)
* revise shape of kvcache (context attn kernel)
* revise shape of kvcache (flash decoding kernel)
* revise shape of kvcache (kvcache copy) and attn func
* init of kvcache in kvcache manager
* revise llama modeling
* revise block size retrieval
* use torch for rms_norm benchmarking
* revise block size retrieval
2024-01-30 16:06:09 +08:00
..
2023-09-19 14:20:26 +08:00
2023-01-06 20:50:26 +08:00
2023-11-20 16:12:41 +08:00
2023-11-20 16:12:41 +08:00
2023-09-19 14:20:26 +08:00
2024-01-11 19:07:45 +08:00
2023-11-16 20:15:59 +08:00
2023-09-22 10:50:47 +08:00
2024-01-03 14:26:13 +08:00
2023-11-02 02:21:24 +00:00
2023-11-22 19:23:21 +08:00
2023-09-19 14:20:26 +08:00
2024-01-30 16:06:09 +08:00
2023-09-26 11:04:11 +08:00
2024-01-30 16:06:09 +08:00
2023-09-27 10:24:04 +08:00
2024-01-11 13:44:06 +00:00
2023-09-19 14:20:26 +08:00
2023-11-17 10:53:00 +08:00
2023-11-20 16:12:41 +08:00
2024-01-08 15:37:27 +08:00
2024-01-11 21:01:11 +08:00
2023-12-05 14:28:36 +08:00
2023-11-22 19:23:21 +08:00
2023-11-28 16:54:42 +08:00
2023-11-28 16:54:42 +08:00
2023-09-19 14:20:26 +08:00
2023-11-20 16:12:41 +08:00