ColossalAI

mirror of https://github.com/hpcaitech/ColossalAI.git synced 2026-04-26 17:53:08 +00:00

Files

yuehuayingxueluo bc1da87366 [Fix/Inference] Fix format of input prompts and input model in inference engine (#5395 )

* Fix bugs in inference_engine

* fix bugs in engine.py

* rm  CUDA_VISIBLE_DEVICES

* add request_ids in generate

* fix bug in engine.py

* add logger.debug for BatchBucket

2024-02-23 10:51:35 +08:00

test_models

[Infer] Optimize Blocked KVCache And Kernels Using It (#5325 )

2024-01-30 16:06:09 +08:00

test_ops/triton

Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390 )

2024-02-21 13:23:57 +08:00

_utils.py

[Inference] Add the logic of the inference engine (#5173 )

2024-01-11 13:39:56 +00:00

test_batch_bucket.py

[Fix/Inference] Fix format of input prompts and input model in inference engine (#5395 )

2024-02-23 10:51:35 +08:00

test_config_and_struct.py

[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367 )

2024-02-19 17:18:20 +08:00

test_inference_engine.py

[Inference] User Experience: update the logic of default tokenizer and generation config. (#5337 )

2024-02-07 17:55:48 +08:00

test_kvcache_manager.py

[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367 )

2024-02-19 17:18:20 +08:00

test_request_handler.py

[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367 )

2024-02-19 17:18:20 +08:00