[inference/model]Adapted to the baichuan2-7B model (#5591)

* Adapted to the baichuan2-7B model * modified according to the review comments. * Modified the method of obtaining random weights. * modified according to the review comments. * change mlp layewr 'NOTE'
2025-09-09 04:50:17 +00:00 · 2024-04-15 16:53:02 +08:00
parent d4cb023b62
commit 56b222eff8
8 changed files with 354 additions and 2 deletions
--- a/colossalai/inference/modeling/models/nopadding_llama.py
+++ b/colossalai/inference/modeling/models/nopadding_llama.py
@@ -479,7 +479,7 @@ class NopadLlamaAttention(LlamaAttention):
        return attn_output


-# NOTE This will cause the result to be different from the transformer in some cases.
+# NOTE This will cause difference as out length increases.
 class NopadLlamaMLP(LlamaMLP):
    def __init__(
        self,