[tutorial] edited hands-on practices (#1899)

* Add handson to ColossalAI. * Change names of handsons and edit sequence parallel example. * Edit wrong folder name * resolve conflict * delete readme
2025-09-10 13:30:19 +00:00 · 2022-11-11 04:08:17 -05:00
parent d9bf83e084
commit ca6e75bc28
121 changed files with 20464 additions and 0 deletions
--- a/examples/tutorial/opt/inference/README.md
+++ b/examples/tutorial/opt/inference/README.md
@@ -0,0 +1,77 @@
+# Overview
+
+This is an example showing how to run OPT generation. The OPT model is implemented using ColossalAI.
+
+It supports tensor parallelism, batching and caching.
+
+# How to run
+
+Run OPT-125M:
+```shell
+python opt_fastapi.py opt-125m
+```
+
+It will launch a HTTP server on `0.0.0.0:7070` by default and you can customize host and port. You can open `localhost:7070/docs` in your browser to see the openapi docs.
+
+## Configure
+
+### Configure model
+```shell
+python opt_fastapi.py <model>
+```
+Available models: opt-125m, opt-6.7b, opt-30b, opt-175b.
+
+### Configure tensor parallelism
+```shell
+python opt_fastapi.py <model> --tp <TensorParallelismWorldSize>
+```
+The `<TensorParallelismWorldSize>` can be an integer in `[1, #GPUs]`. Default `1`.
+
+### Configure checkpoint
+```shell
+python opt_fastapi.py <model> --checkpoint <CheckpointPath>
+```
+The `<CheckpointPath>` can be a file path or a directory path. If it's a directory path, all files under the directory will be loaded.
+
+### Configure queue
+```shell
+python opt_fastapi.py <model> --queue_size <QueueSize>
+```
+The `<QueueSize>` can be an integer in `[0, MAXINT]`. If it's `0`, the request queue size is infinite. If it's a positive integer, when the request queue is full, incoming requests will be dropped (the HTTP status code of response will be 406).
+
+### Configure bathcing
+```shell
+python opt_fastapi.py <model> --max_batch_size <MaxBatchSize>
+```
+The `<MaxBatchSize>` can be an integer in `[1, MAXINT]`. The engine will make batch whose size is less or equal to this value.
+
+Note that the batch size is not always equal to `<MaxBatchSize>`, as some consecutive requests may not be batched.
+
+### Configure caching
+```shell
+python opt_fastapi.py <model> --cache_size <CacheSize> --cache_list_size <CacheListSize>
+```
+This will cache `<CacheSize>` unique requests. And for each unique request, it cache `<CacheListSize>` different results. A random result will be returned if the cache is hit.
+
+The `<CacheSize>` can be an integer in `[0, MAXINT]`. If it's `0`, cache won't be applied. The `<CacheListSize>` can be an integer in `[1, MAXINT]`.
+
+### Other configurations
+```shell
+python opt_fastapi.py -h
+```
+
+# How to benchmark
+```shell
+cd benchmark
+locust
+```
+
+Then open the web interface link which is on your console.
+
+# Pre-process pre-trained weights
+
+## OPT-66B
+See [script/processing_ckpt_66b.py](./script/processing_ckpt_66b.py).
+
+## OPT-175B
+See [script/process-opt-175b](./script/process-opt-175b/).