[Inference] Support the logic related to ignoring EOS token (#5693)

* Adapt temperature processing logic

* add ValueError for top_p and top_k

* add GQA Test

* fix except_msg

* support ignore EOS token

* change variable's name

* fix annotation
This commit is contained in:
yuehuayingxueluo
2024-05-08 19:59:10 +08:00
committed by GitHub
parent 9c2fe7935f
commit d482922035
3 changed files with 9 additions and 1 deletions

View File

@@ -662,6 +662,7 @@ class InferenceEngine:
self.tokenizer.eos_token_id,
self.tokenizer.pad_token_id,
max_output_len=max_new_tokens,
ignore_eos=self.inference_config.ignore_eos,
)
self.request_handler.add_sequence(sequence)