[Feat]Tensor Model Parallel Support For Inference (#5563)

* tensor parallel support naive source * [fix]precision, model load and refactor the framework * add tp unit test * docstring * fix do_sample
2025-09-06 11:32:10 +00:00 · 2024-04-18 16:56:46 +08:00
parent be396ad6cc
commit e37ee2fb65
8 changed files with 640 additions and 150 deletions
--- a/tests/test_infer/test_cuda_graph.py
+++ b/tests/test_infer/test_cuda_graph.py
@@ -40,7 +40,7 @@ def check_inference_engine(use_cuda_graph=False, batch_size=32):

    input_len = 1024
    output_len = 128
-    do_sample = True
+    do_sample = False
    top_p = 0.5
    top_k = 50