[Kernels]added flash-decoidng of triton (#5063)

* added flash-decoidng of triton based on lightllm kernel * add req * clean * clean * delete build.sh --------- Co-authored-by: cuiqing.li <lixx336@gmail.com>
2025-09-03 18:19:58 +00:00 · 2023-11-20 13:58:29 +08:00
parent fd6482ad8c
commit bce919708f
6 changed files with 82 additions and 43 deletions
--- a/requirements/requirements-infer.txt
+++ b/requirements/requirements-infer.txt
@@ -2,6 +2,6 @@ transformers==4.34.0
 packaging
 ninja
 auto-gptq==0.5.0
-git+https://github.com/ModelTC/lightllm.git@28c1267cfca536b7b4f28e921e03de735b003039
+git+https://github.com/ModelTC/lightllm.git@ece7b43f8a6dfa74027adc77c2c176cff28c76c8
 git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
 git+https://github.com/Dao-AILab/flash-attention.git@017716451d446e464dde9aca3a3c1ed2209caaa9