[Inference]Fused the gate and up proj in mlp，and optimized the autograd process. (#5365)

* fused the gate and up proj in mlp * fix code styles * opt auto_grad * rollback test_inference_engine.py * modifications based on the review feedback. * fix bugs in flash attn * Change reshape to view * fix test_rmsnorm_triton.py
2025-09-06 19:40:28 +00:00 · 2024-02-06 19:38:25 +08:00
parent 1dedb57747
commit 35382a7fbf
10 changed files with 484 additions and 50 deletions
--- a/colossalai/kernel/triton/rms_layernorm.py
+++ b/colossalai/kernel/triton/rms_layernorm.py
@@ -49,7 +49,6 @@ if HAS_TRITON:
            # Write output
            tl.store(Y + cols, y.to(tl.float16), mask=mask)

-    @torch.no_grad()
    def rms_layernorm(x, weight, eps, norm_output=None):
        # allocate output
        y = torch.empty_like(x) if norm_output is None else norm_output