[Inference] Fix API server, test and example (#5712)

* fix api server

* fix generation config

* fix api server

* fix comments

* fix infer hanging bug

* resolve comments, change backend to free port
This commit is contained in:
Jianghai
2024-05-15 15:47:31 +08:00
committed by GitHub
parent 74c47921fa
commit f47f2fbb24
5 changed files with 73 additions and 32 deletions

View File

@@ -23,7 +23,7 @@ class CompletionServing:
# it is not a intuitive way
self.engine.engine.generation_config = generation_config
result_generator = self.engine.generate(request_id, prompt=prompt)
result_generator = self.engine.generate(request_id, prompt=prompt, generation_config=generation_config)
if await request.is_disconnected():
# Abort the request if the client disconnects.