Commit Graph

6 Commits

Author SHA1 Message Date
Steve Luo
725fbd2ed0 [Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679) 2024-05-06 10:55:34 +08:00
傅剑寒
9df016fc45 [Inference] Fix quant bits order (#5681) 2024-04-30 19:38:00 +08:00
傅剑寒
ef8e4ffe31 [Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680) 2024-04-30 18:33:53 +08:00
傅剑寒
808ee6e4ad [Inference/Feat] Feat quant kvcache step2 (#5674) 2024-04-30 11:26:36 +08:00
傅剑寒
8ccb6714e7 [Inference/Feat] Add kvcache quantization support for FlashDecoding (#5656) 2024-04-26 19:40:37 +08:00
傅剑寒
279300dc5f [Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613)
* refactor compilation mechanism and unified multi hw

* fix file path bug

* add init.py to make pybind a module to avoid relative path error caused by softlink

* delete duplicated micros

* fix micros bug in gcc
2024-04-24 14:17:54 +08:00