mirror of https://github.com/hpcaitech/ColossalAI.git synced 2026-04-11 14:43:10 +00:00

Files

Yuanheng Zhao 3de2e62299 [Inference] Add CacheBlock and KV-Cache Manager (#5156 )

* [Inference] Add KVCache Manager

* function refactored

* add test for KVCache Manager

* add attr beam width

* Revise alloc func in CacheManager

* Fix docs and pytests

* add tp slicing for head number

* optimize shapes of tensors used as physical cache

* Apply using InferenceConfig on KVCacheManager

* rm duplicate config file

* Optimize cache allocation: use contiguous cache

* Fix config in pytest (and config)

2024-01-11 13:39:29 +00:00

core

[Inference] Add CacheBlock and KV-Cache Manager (#5156 )

2024-01-11 13:39:29 +00:00

kv_cache

[Inference] Add CacheBlock and KV-Cache Manager (#5156 )

2024-01-11 13:39:29 +00:00

__init__.py

[Inference] First PR for rebuild colossal-infer (#5143 )

2024-01-11 13:39:29 +00:00

readme.md

[Inference] Add readme (roadmap) and fulfill request handler (#5147 )

2024-01-11 13:39:29 +00:00

sequence.py

[Inference] First PR for rebuild colossal-infer (#5143 )

2024-01-11 13:39:29 +00:00

readme.md

Colossal-Infer

Introduction

Colossal-Infer is a library for inference of LLMs and MLMs. It is built on top of Colossal AI.

Structures

Overview

https://n4fyd3ptax.feishu.cn/docx/MhlmdHsGkoeoslx9fqucPO17n9b?openbrd=1&doc_app_id=501&blockId=WCGBdWI9hobOEsxkW5uc8HM6n3b&blockType=whiteboard&blockToken=Cca3wKWk7hPnJxbkCX6cMxPQnqd#WCGBdWI9hobOEsxkW5uc8HM6n3b

Roadmap

[] design of structures
[] Core components
- [] engine
- [] request handler
- [] kv cache manager
- [] modeling
- [] custom layers
- [] online server
[] supported models
- [] llama2