Yuanheng Zhao
|
573f270537
|
[Infer] Serving example w/ ray-serve (multiple GPU case) (#4841)
* fix imports
* add ray-serve with Colossal-Infer tp
* trivial: send requests script
* add README
* fix worker port
* fix readme
* use app builder and autoscaling
* trivial: input args
* clean code; revise readme
* testci (skip example test)
* use auto model/tokenizer
* revert imports fix (fixed in other PRs)
|
2023-10-02 17:48:38 +08:00 |
|