[Inference] Add CacheBlock and KV-Cache Manager (#5156)

* [Inference] Add KVCache Manager

* function refactored

* add test for KVCache Manager

* add attr beam width

* Revise alloc func in CacheManager

* Fix docs and pytests

* add tp slicing for head number

* optimize shapes of tensors used as physical cache

* Apply using InferenceConfig on KVCacheManager

* rm duplicate config file

* Optimize cache allocation: use contiguous cache

* Fix config in pytest (and config)
This commit is contained in:
Yuanheng Zhao
2023-12-11 10:56:18 +08:00
committed by FrankLeeeee
parent fab9b931d9
commit 3de2e62299
5 changed files with 516 additions and 7 deletions

View File

@@ -0,0 +1,4 @@
from .block_cache import CacheBlock
from .kvcache_manager import KVCacheManager
__all__ = ["CacheBlock", "KVCacheManager"]