ColossalAI

mirror of https://github.com/hpcaitech/ColossalAI.git synced 2025-05-06 07:28:12 +00:00

History

yuehuayingxueluo f366a5ea1f [Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 ) * add rotary embedding kernel * add rotary_embedding_kernel * add fused rotary_emb and kvcache memcopy * add fused_rotary_emb_and_cache_kernel.cu * add fused_rotary_emb_and_memcopy * fix bugs in fused_rotary_emb_and_cache_kernel.cu * fix ci bugs * use vec memcopy and opt the gloabl memory access * fix code style * fix test_rotary_embdding_unpad.py * codes revised based on the review comments * fix bugs about include path * rm inline		2024-03-13 17:20:03 +08:00
..
benchmark_ops	[Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418 )	2024-03-13 17:20:03 +08:00
benchmark_llama.py	Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390 )	2024-02-21 13:23:57 +08:00
build_smoothquant_weight.py	[inference] refactor examples and fix schedule (#5077 )	2023-11-21 10:46:03 +08:00
run_benchmark.sh	[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395 )	2024-02-23 10:51:35 +08:00
run_llama_inference.py	[npu] change device to accelerator api (#5239 )	2024-01-09 10:20:05 +08:00