[Inference] Support the logic related to ignoring EOS token (#5693)

* Adapt temperature processing logic * add ValueError for top_p and top_k * add GQA Test * fix except_msg * support ignore EOS token * change variable's name * fix annotation
2025-09-02 01:28:31 +00:00 · 2024-05-08 19:59:10 +08:00
parent 9c2fe7935f
commit d482922035
3 changed files with 9 additions and 1 deletions
--- a/colossalai/inference/core/engine.py
+++ b/colossalai/inference/core/engine.py
@@ -662,6 +662,7 @@ class InferenceEngine:
                self.tokenizer.eos_token_id,
                self.tokenizer.pad_token_id,
                max_output_len=max_new_tokens,
+                ignore_eos=self.inference_config.ignore_eos,
            )
            self.request_handler.add_sequence(sequence)