[Coati] Train DPO using PP (#6054)

* update dpo * remove unsupport plugin * update msg * update dpo * remove unsupport plugin * update msg * update template * update dataset * add pp for dpo * update dpo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add dpo fn * update dpo * update dpo * update dpo * update dpo * minor update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update loss * update help * polish code --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-09-02 17:46:42 +00:00 · 2024-10-11 19:32:00 +08:00
parent dc2cdaf3e8
commit 4c8e85ee0d
8 changed files with 529 additions and 236 deletions
--- a/applications/ColossalChat/examples/data_preparation_scripts/prepare_dataset.py
+++ b/applications/ColossalChat/examples/data_preparation_scripts/prepare_dataset.py
@@ -73,8 +73,7 @@ def main():
        "--conversation_template_config",
        type=str,
        default="conversation_template_config",
-        help="Path \
-        to save conversation template config files.",
+        help="Path to save conversation template config files.",
    )
    parser.add_argument("--data_cache_dir", type=str, default="cache", help="Data cache directory")
    parser.add_argument(