1
0
mirror of https://github.com/hpcaitech/ColossalAI.git synced 2025-05-05 15:08:18 +00:00
ColossalAI/tests
Yuanheng Zhao 3de2e62299 [Inference] Add CacheBlock and KV-Cache Manager ()
* [Inference] Add KVCache Manager

* function refactored

* add test for KVCache Manager

* add attr beam width

* Revise alloc func in CacheManager

* Fix docs and pytests

* add tp slicing for head number

* optimize shapes of tensors used as physical cache

* Apply using InferenceConfig on KVCacheManager

* rm duplicate config file

* Optimize cache allocation: use contiguous cache

* Fix config in pytest (and config)
2024-01-11 13:39:29 +00:00
..
kit [ci] fixed ddp test () 2024-01-11 17:16:32 +08:00
test_analyzer [misc] update pre-commit and run all files () 2023-09-19 14:20:26 +08:00
test_auto_parallel [misc] update pre-commit and run all files () 2023-09-19 14:20:26 +08:00
test_autochunk [misc] update pre-commit and run all files () 2023-09-19 14:20:26 +08:00
test_booster [ci] fixed booster test () 2024-01-11 16:04:45 +08:00
test_checkpoint_io [workflow] fixed build CI () 2024-01-10 22:34:16 +08:00
test_cluster [misc] update pre-commit and run all files () 2023-09-19 14:20:26 +08:00
test_config [misc] update pre-commit and run all files () 2023-09-19 14:20:26 +08:00
test_device [misc] update pre-commit and run all files () 2023-09-19 14:20:26 +08:00
test_fx [misc] update pre-commit and run all files () 2023-09-19 14:20:26 +08:00
test_gptq [feature] add gptq for inference () 2023-09-22 11:02:50 +08:00
test_infer [Inference] Add CacheBlock and KV-Cache Manager () 2024-01-11 13:39:29 +00:00
test_infer_ops/triton [Inference/NFC] Clean outdated inference tests and deprecated kernels () 2024-01-11 13:39:29 +00:00
test_lazy [workflow] fixed build CI () 2024-01-10 22:34:16 +08:00
test_legacy [npu] add npu support for gemini and zero () 2023-11-20 16:12:41 +08:00
test_moe [hotfix]: modify create_ep_hierarchical_group and add test () 2023-11-17 10:53:00 +08:00
test_optimizer [test] merge old components to test to model zoo () 2023-10-20 10:35:08 +08:00
test_pipeline [pipeline] A more general _communicate in p2p () 2024-01-08 15:37:27 +08:00
test_shardformer [ci] fix shardformer tests. () 2024-01-11 19:07:45 +08:00
test_smoothquant [inference] Add smmoothquant for llama () 2023-10-16 11:28:44 +08:00
test_tensor [misc] update pre-commit and run all files () 2023-09-19 14:20:26 +08:00
test_utils [misc] update pre-commit and run all files () 2023-09-19 14:20:26 +08:00
test_zero [npu] add npu support for gemini and zero () 2023-11-20 16:12:41 +08:00
__init__.py [zero] Update sharded model v2 using sharded param v2 () 2022-03-11 15:50:28 +08:00