Files
ColossalAI/colossalai
Yuanheng Zhao 6e487e7d3c [kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274)
* prevent re-creating intermediate tensors

* add singleton class holding intermediate values

* fix triton kernel api

* add benchmark in pytest

* fix kernel api and add benchmark

* revise flash decoding triton kernel in/out shapes

* fix calling of triton kernel in modeling

* fix pytest: extract to util functions
2024-01-19 15:47:16 +08:00
..
2024-01-03 14:26:13 +08:00
2023-11-02 02:21:24 +00:00
2023-09-27 10:24:04 +08:00
2024-01-11 13:44:06 +00:00
2024-01-11 21:01:11 +08:00
2023-12-05 14:28:36 +08:00