mirror of
https://github.com/hpcaitech/ColossalAI.git
synced 2025-09-07 20:10:17 +00:00
resolve rebase conflicts on Branch feat/online-serving
This commit is contained in:
27
colossalai/inference/server/README.md
Normal file
27
colossalai/inference/server/README.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Online Service
|
||||
Colossal-Inference supports fast-api based online service. Simple completion and chat are both supported. Follow the commands below and
|
||||
you can simply construct a server with both completion and chat functionalities. For now we only support `Llama` model, we will fullfill
|
||||
the blank quickly.
|
||||
|
||||
# Usage
|
||||
```bash
|
||||
# First, Lauch an API locally.
|
||||
python3 -m colossalai.inference.server.api_server --model path of your llama2 model --chat_template "{% for message in messages %}
|
||||
{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}"
|
||||
|
||||
|
||||
# Second, you can turn to the page `http://127.0.0.1:8000/docs` to check the api
|
||||
|
||||
# For completion service, you can invoke it
|
||||
curl -X POST http://127.0.0.1:8000/completion -H 'Content-Type: application/json' -d '{"prompt":"hello, who are you? ","stream":"False"}'
|
||||
|
||||
# For chat service, you can invoke it
|
||||
curl -X POST http://127.0.0.1:8000/completion -H 'Content-Type: application/json' -d '{"converation":
|
||||
[{"role": "system", "content": "you are a helpful assistant"},
|
||||
{"role": "user", "content": "what is 1+1?"},],
|
||||
"stream": "False",}'
|
||||
# If you just want to test a simple generation, turn to generate api
|
||||
curl -X POST http://127.0.0.1:8000/generate -H 'Content-Type: application/json' -d '{"prompt":"hello, who are you? ","stream":"False"}'
|
||||
|
||||
```
|
||||
We also support streaming output, simply change the `stream` to `True` in the request body.
|
Reference in New Issue
Block a user