[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645)

* [shardformer] update shardformer readme [shardformer] update shardformer readme [shardformer] update shardformer readme * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] change dataset * [shardformer] change dataset * [shardformer] fix CI * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix [example] update opt example [example] resolve comments fix fix
2025-09-26 12:14:02 +00:00 · 2023-09-09 22:45:36 +08:00
parent a686f9ddc8
commit 7486ed7d3a
12 changed files with 165 additions and 167 deletions
--- a/colossalai/shardformer/policies/llama.py
+++ b/colossalai/shardformer/policies/llama.py
@@ -43,10 +43,8 @@ class LlamaPolicy(Policy):

        if self.shard_config.enable_tensor_parallelism:
            decoder_attribute_replacement = {
-                "self_attn.hidden_size":
-                    self.model.config.hidden_size // self.shard_config.tensor_parallel_size,
-                "self_attn.num_heads":
-                    self.model.config.num_attention_heads // self.shard_config.tensor_parallel_size,
+                "self_attn.hidden_size": self.model.config.hidden_size // self.shard_config.tensor_parallel_size,
+                "self_attn.num_heads": self.model.config.num_attention_heads // self.shard_config.tensor_parallel_size,
            }
            if getattr(self.model.config, "num_key_value_heads", False):
                decoder_attribute_replacement["self_attn.num_key_value_heads"] = \