mirror of
https://github.com/hpcaitech/ColossalAI.git
synced 2026-04-11 14:43:10 +00:00
* [Inference] Add KVCache Manager * function refactored * add test for KVCache Manager * add attr beam width * Revise alloc func in CacheManager * Fix docs and pytests * add tp slicing for head number * optimize shapes of tensors used as physical cache * Apply using InferenceConfig on KVCacheManager * rm duplicate config file * Optimize cache allocation: use contiguous cache * Fix config in pytest (and config)
Colossal-Infer
Introduction
Colossal-Infer is a library for inference of LLMs and MLMs. It is built on top of Colossal AI.
Structures
Overview
Roadmap
- [] design of structures
- [] Core components
- [] engine
- [] request handler
- [] kv cache manager
- [] modeling
- [] custom layers
- [] online server
- [] supported models
- [] llama2