[chat] add performance and tutorial (#3786)

2026-01-29 21:49:54 +00:00 · 2023-05-19 18:03:56 +08:00
parent b4788d63ed
commit ad2cf58f50
5 changed files with 52 additions and 11 deletions
--- a/applications/Chat/examples/README.md
+++ b/applications/Chat/examples/README.md
@@ -48,6 +48,7 @@ The following pic shows how we collected the data.
 ## Stage1 - Supervised instructs tuning

 Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model.
+[[Stage1 tutorial video]](https://www.youtube.com/watch?v=-qFBZFmOJfg)

 You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.

@@ -83,6 +84,7 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \
 ## Stage2 - Training reward model

 We train a reward model in stage 2, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model.
+[[Stage2 tutorial video]](https://www.youtube.com/watch?v=gMx2CApKhuo)

 You can run the `examples/train_rm.sh` to start a reward model training.

@@ -141,6 +143,7 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of

 You can run the `examples/train_prompts.sh` to start PPO training.
 You can also use the cmd following to start PPO training.
+[[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g)

 ```
 torchrun --standalone --nproc_per_node=4 train_prompts.py \