ColossalAI

mirror of https://github.com/hpcaitech/ColossalAI.git synced 2026-02-21 14:32:09 +00:00

Files

yuehuayingxueluo 4f28cb43c0 [inference]Optimize the usage of the mid tensors space in flash attn (#5304 )

* opt flash attn

* opt tmp tensor

* fix benchmark_llama

* fix code style

* fix None logic for output tensor

* fix adapted to get_xine_cache

* add comment

* fix ci bugs

* fix some codes

* rm duplicated codes

* rm duplicated codes

* fix code style

* add _get_dtype in config.py

2024-01-26 14:00:10 +08:00

__init__.py

[Inference] Add CacheBlock and KV-Cache Manager (#5156 )

2024-01-11 13:39:29 +00:00

block_cache.py

[Inference] Add CacheBlock and KV-Cache Manager (#5156 )

2024-01-11 13:39:29 +00:00

kvcache_manager.py

[inference]Optimize the usage of the mid tensors space in flash attn (#5304 )

2024-01-26 14:00:10 +08:00