ColossalAI

mirror of https://github.com/hpcaitech/ColossalAI.git synced 2026-04-11 14:43:10 +00:00

Files

Yuanheng Zhao 5d4c1fe8f5 [Fix/Inference] Fix GQA Triton and Support Llama3 (#5624 )

* [fix] GQA calling of flash decoding triton

* fix kv cache alloc shape

* fix rotary triton - GQA

* fix sequence max length assigning

* Sequence max length logic

* fix scheduling and spec-dec

* skip without import error

* fix pytest - skip without ImportError

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

2024-04-23 13:09:55 +08:00

__init__.py

fix bugs in request_handler

2024-01-11 13:39:56 +00:00

glide_llama.py

[Inference/SpecDec] Support GLIDE Drafter Model (#5455 )

2024-04-10 11:07:52 +08:00

nopadding_baichuan.py

[inference/model]Adapted to the baichuan2-7B model (#5591 )

2024-04-15 16:53:02 +08:00

nopadding_llama.py

[Fix/Inference] Fix GQA Triton and Support Llama3 (#5624 )

2024-04-23 13:09:55 +08:00