[inference] release (#5747)

* [inference] release * [inference] release * [inference] release * [inference] release * [inference] release * [inference] release * [inference] release
2025-09-25 03:31:56 +00:00 · 2024-05-23 17:44:06 +08:00
parent df6747603f
commit 4647ec28c8
3 changed files with 39 additions and 44 deletions
--- a/colossalai/inference/README.md
+++ b/colossalai/inference/README.md
@@ -18,8 +18,15 @@


 ## 📌 Introduction
-ColossalAI-Inference is a module which offers acceleration to the inference execution of Transformers models, especially LLMs. In ColossalAI-Inference, we leverage high-performance kernels, KV cache, paged attention, continous batching and other techniques to accelerate the inference of LLMs. We also provide simple and unified APIs for the sake of user-friendliness.
+ColossalAI-Inference is a module which offers acceleration to the inference execution of Transformers models, especially LLMs. In ColossalAI-Inference, we leverage high-performance kernels, KV cache, paged attention, continous batching and other techniques to accelerate the inference of LLMs. We also provide simple and unified APIs for the sake of user-friendliness. [[blog]](https://hpc-ai.com/blog/colossal-inference)

+<p align="center">
+<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/colossal-inference-v1-1.png" width=1000/>
+</p>
+
+<p align="center">
+<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/colossal-inference-v1-2.png" width=1000/>
+</p>

 ## 🕹 Usage