[Inference/SpecDec] Support GLIDE Drafter Model (#5455)

* add glide-llama policy and modeling

* update glide modeling, compitable with transformers 4.36.2

* revise glide llama modeling/usage

* fix issues of glimpsing large kv

* revise the way re-loading params for glide drafter

* fix drafter and engine tests

* enable convert to glide strict=False

* revise glide llama modeling

* revise vicuna prompt template

* revise drafter and tests

* apply usage of glide model in engine
This commit is contained in:
Yuanheng Zhao
2024-04-01 21:54:24 +08:00
committed by Yuanheng
parent 912e24b2aa
commit d85d91435a
10 changed files with 722 additions and 82 deletions

View File

@@ -1,4 +1,4 @@
from .drafter import Drafter
from .struct import DrafterOutput
from .struct import DrafterOutput, GlideInput
__all__ = ["Drafter", "DrafterOutput"]
__all__ = ["Drafter", "DrafterOutput", "GlideInput"]