mirror of
https://github.com/hpcaitech/ColossalAI.git
synced 2026-01-29 21:49:54 +00:00
[chat] add performance and tutorial (#3786)
This commit is contained in:
@@ -48,6 +48,7 @@ The following pic shows how we collected the data.
|
||||
## Stage1 - Supervised instructs tuning
|
||||
|
||||
Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model.
|
||||
[[Stage1 tutorial video]](https://www.youtube.com/watch?v=-qFBZFmOJfg)
|
||||
|
||||
You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.
|
||||
|
||||
@@ -83,6 +84,7 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \
|
||||
## Stage2 - Training reward model
|
||||
|
||||
We train a reward model in stage 2, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model.
|
||||
[[Stage2 tutorial video]](https://www.youtube.com/watch?v=gMx2CApKhuo)
|
||||
|
||||
You can run the `examples/train_rm.sh` to start a reward model training.
|
||||
|
||||
@@ -141,6 +143,7 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of
|
||||
|
||||
You can run the `examples/train_prompts.sh` to start PPO training.
|
||||
You can also use the cmd following to start PPO training.
|
||||
[[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g)
|
||||
|
||||
```
|
||||
torchrun --standalone --nproc_per_node=4 train_prompts.py \
|
||||
|
||||
Reference in New Issue
Block a user