resolve rebase conflicts on Branch feat/online-serving

2025-09-07 20:10:17 +00:00 · 2024-05-08 10:36:42 +00:00
parent 61a1b2e798
commit bc9063adf1
4 changed files with 31 additions and 13 deletions
--- a/colossalai/inference/server/README.md
+++ b/colossalai/inference/server/README.md
@@ -0,0 +1,27 @@
+# Online Service
+Colossal-Inference supports fast-api based online service. Simple completion and chat are both supported. Follow the commands below and
+you can simply construct a server with both completion and chat functionalities. For now we only support `Llama` model, we will fullfill
+the blank quickly.
+
+# Usage
+```bash
+# First, Lauch an API locally.
+python3 -m colossalai.inference.server.api_server  --model path of your llama2 model --chat_template "{% for message in messages %}
+{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}"
+
+
+# Second, you can turn to the page `http://127.0.0.1:8000/docs` to check the api
+
+# For completion service, you can invoke it
+curl -X POST  http://127.0.0.1:8000/completion  -H 'Content-Type: application/json'  -d '{"prompt":"hello, who are you? ","stream":"False"}'
+
+# For chat service, you can invoke it
+curl -X POST  http://127.0.0.1:8000/completion  -H 'Content-Type: application/json'  -d  '{"converation":
+                [{"role": "system", "content": "you are a helpful assistant"},
+                {"role": "user", "content": "what is 1+1?"},],
+                "stream": "False",}'
+# If you just want to test a simple generation, turn to generate api
+curl -X POST  http://127.0.0.1:8000/generate  -H 'Content-Type: application/json'  -d '{"prompt":"hello, who are you? ","stream":"False"}'
+
+```
+We also support streaming output, simply change the `stream` to `True` in the request body.