[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702)

* [pre-commit.ci] auto fixes from pre-commit.com hooks * add parallel cross entropy output for falcon model & fix some typos in bloom.py * fix module name error, self.model -> self.transformers in bloom, falcon model * Fix the overflow bug of distributed cross entropy loss function when training with fp16 * add dtype to parallel cross entropy loss function * fix dtype related typos adn prettify the loss.py * fix grad dtype and update dtype mismatch error * fix typo bugs
2025-09-08 20:40:34 +00:00 · 2024-05-21 11:07:13 +08:00
parent 9d83c6d715
commit 22ce873c3f
9 changed files with 230 additions and 17 deletions
--- a/colossalai/shardformer/modeling/opt.py
+++ b/colossalai/shardformer/modeling/opt.py
@@ -348,6 +348,7 @@ class OPTPipelineForwards:
                        shift_labels,
                        process_group=shard_config.tensor_parallel_process_group,
                        vocab_size=self.lm_head.out_features,
+                        dtype=self.model.decoder.dtype,
                    )
                else:
                    loss_fct = CrossEntropyLoss()
@@ -988,6 +989,7 @@ def get_lm_forward_with_dist_cross_entropy(shard_config: ShardConfig):
                shift_labels,
                process_group=shard_config.tensor_parallel_process_group,
                vocab_size=self.lm_head.out_features,
+                dtype=self.model.decoder.dtype,
            )

        if not return_dict: