fix

2025-07-31 07:18:59 +00:00 · 2025-05-06 14:14:22 -05:00 · 2025-05-06 14:14:22 -05:00 · 35f45ffd36
commit 35f45ffd36
parent 35c2c44d52
1 changed files with 1 additions and 1 deletions
--- a/colossalai/shardformer/layer/attn.py
+++ b/colossalai/shardformer/layer/attn.py
@ -410,7 +410,7 @@ class RingAttention(torch.autograd.Function):
    We also adopt the double ring topology from LoongTrain to fully utilize available
    NICs on each node, by computing attention within a inner ring first and then sending all KVs to the next
    ring at once.
-    Our implementation references
+    Our implementation references code from
    - ring-flash-attention: https://github.com/zhuzilin/ring-flash-attention/tree/main
    - Megatron Context Parallel: https://github.com/NVIDIA/TransformerEngine/pull/726
    References: