[Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365)

* fused the gate and up proj in mlp

* fix code styles

* opt auto_grad

* rollback test_inference_engine.py

* modifications based on the review feedback.

* fix bugs in flash attn

* Change reshape to view

* fix test_rmsnorm_triton.py
This commit is contained in:
yuehuayingxueluo
2024-02-06 19:38:25 +08:00
committed by GitHub
parent 1dedb57747
commit 35382a7fbf
10 changed files with 484 additions and 50 deletions

View File

@@ -49,7 +49,6 @@ if HAS_TRITON:
# Write output
tl.store(Y + cols, y.to(tl.float16), mask=mask)
@torch.no_grad()
def rms_layernorm(x, weight, eps, norm_output=None):
# allocate output
y = torch.empty_like(x) if norm_output is None else norm_output