From 5134ad5d1abf95fe63a72452953f894b9630ea93 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 29 Mar 2023 02:35:40 +0800 Subject: [PATCH] [format] applied code formatting on changed files in pull request 3296 (#3298) Co-authored-by: github-actions --- applications/Chat/README.md | 2 +- applications/Chat/examples/README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/applications/Chat/README.md b/applications/Chat/README.md index c7553041a..3e431ca9e 100644 --- a/applications/Chat/README.md +++ b/applications/Chat/README.md @@ -17,7 +17,7 @@ - [Stage1 - Supervised instructs tuning](#stage1---supervised-instructs-tuning) - [Stage2 - Training reward model](#stage2---training-reward-model) - [Stage3 - Training model with reinforcement learning by human feedback](#stage3---training-model-with-reinforcement-learning-by-human-feedback) - - [Inference - After Training](#inference---after-training) + - [Inference - After Training](#inference---after-training) - [Coati7B examples](#coati7b-examples) - [Generation](#generation) - [Open QA](#open-qa) diff --git a/applications/Chat/examples/README.md b/applications/Chat/examples/README.md index 95404368c..56d8cbb15 100644 --- a/applications/Chat/examples/README.md +++ b/applications/Chat/examples/README.md @@ -100,7 +100,7 @@ Model performance in [Anthropics paper](https://arxiv.org/abs/2204.05862): - --max_len: max sentence length for generation, type=int, default=512 - --test: whether is only tesing, if it's ture, the dataset will be small -## Stage3 - Training model using prompts with RL +## Stage3 - Training model using prompts with RL Stage3 uses reinforcement learning algorithm, which is the most complex part of the training process, as shown below: