Yuanheng Zhao
|
6e487e7d3c
|
[kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274)
* prevent re-creating intermediate tensors
* add singleton class holding intermediate values
* fix triton kernel api
* add benchmark in pytest
* fix kernel api and add benchmark
* revise flash decoding triton kernel in/out shapes
* fix calling of triton kernel in modeling
* fix pytest: extract to util functions
|
2024-01-19 15:47:16 +08:00 |
|
Yuanheng Zhao
|
1513f20f4d
|
[kernel] Add flash decoding triton kernel for blocked kv cache (#5249)
* add flash decoding unpad triton kernel
* rename flash decoding kernel
* add kernel testing (draft)
* revise pytest
* support kv group (GQA)
* (trivial) fix api and pytest
* (trivial) func renaming
* (trivial) func/file renaming
* refactor pytest for attention
* (trivial) format and consistent vars of context/decode attn
* (trivial) remove test redundancy
|
2024-01-11 13:46:14 +00:00 |
|
Yuanheng Zhao
|
2bb92243d4
|
[Inference/NFC] Clean outdated inference tests and deprecated kernels (#5159)
* [inference/nfc] remove outdated inference tests
* remove outdated kernel tests
* remove deprecated triton kernels
* remove imports from deprecated kernels
|
2024-01-11 13:39:29 +00:00 |
|
Cuiqing Li (李崔卿)
|
bce919708f
|
[Kernels]added flash-decoidng of triton (#5063)
* added flash-decoidng of triton based on lightllm kernel
* add req
* clean
* clean
* delete build.sh
---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>
|
2023-11-20 13:58:29 +08:00 |
|