[Inference] Optimize and Refactor Inference Batching/Scheduling (#5367)

* add kvcache manager funcs for batching * add batch bucket for batching * revise RunningList struct in handler * add kvcache/batch funcs for compatibility * use new batching methods * fix indexing bugs * revise abort logic * use cpu seq lengths/block tables * rm unused attr in Sequence * fix type conversion/default arg * add and revise pytests * revise pytests, rm unused tests * rm unused statements * fix pop finished indexing issue * fix: use index in batch when retrieving inputs/update seqs * use dict instead of odict in batch struct * arg type hinting * fix make compress * refine comments * fix: pop_n_seqs to pop the first n seqs * add check in request handler * remove redundant conversion * fix test for request handler * fix pop method in batch bucket * fix prefill adding
2025-09-06 11:32:10 +00:00 · 2024-02-19 17:18:20 +08:00
parent 8c69debdc7
commit b21aac5bae
11 changed files with 902 additions and 112 deletions
--- a/tests/test_infer/test_config_and_struct.py
+++ b/tests/test_infer/test_config_and_struct.py
@@ -15,7 +15,6 @@ def check_config_and_inference():
        input_token_id=[1, 2, 3],
        block_size=16,
        sample_params=None,
-        block_table=None,
        eos_token_id=2,
        pad_token_id=2,
        max_output_len=256,
@@ -27,7 +26,6 @@ def check_config_and_inference():
        input_token_id=[4, 5, 6],
        block_size=16,
        sample_params=None,
-        block_table=None,
        eos_token_id=2,
        pad_token_id=2,
        max_output_len=256,
@@ -39,7 +37,6 @@ def check_config_and_inference():
        input_token_id=[7, 8, 9],
        block_size=16,
        sample_params=None,
-        block_table=None,
        eos_token_id=2,
        pad_token_id=2,
        max_output_len=256,