mirror of
https://github.com/hpcaitech/ColossalAI.git
synced 2025-09-02 01:28:31 +00:00
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926)
* [hotfix] Add layer norm gradients all-reduce for sequence parallel. (#4915) * Add layer norm gradients all-reduce for sequence parallel. * skip pipeline inference test * [hotfix] fixing polices of sequence parallel (#4922) * Add layer norm gradients all-reduce for sequence parallel. * fix parameter passing when calling get_autopolicy --------- Co-authored-by: littsk <1214689160@qq.com> * Hotfix/add grad all reduce for sequence parallel (#4927) * Add layer norm gradients all-reduce for sequence parallel. * fix parameter passing when calling get_autopolicy * fix bug using wrong variables --------- Co-authored-by: littsk <1214689160@qq.com> * fix policy initialization * fix bloom and chatglm policices * polish code of handling layernorm * fix moe module * polish code of class initializing --------- Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>
This commit is contained in:
@@ -700,7 +700,7 @@ class LowLevelZeroOptimizer(OptimizerWrapper):
|
||||
############################
|
||||
|
||||
# this method is used to sync gradient manually
|
||||
def sync_grad(self):
|
||||
def _sync_grad(self):
|
||||
for group_id in range(self.num_param_groups):
|
||||
param_group = self._working_param_groups[group_id]
|
||||
for param in param_group:
|
||||
@@ -713,7 +713,7 @@ class LowLevelZeroOptimizer(OptimizerWrapper):
|
||||
# if not overlapping communication (no reduction hook is attached) when zero1
|
||||
# we need to manually reduce these gradients
|
||||
if not partition_grad and not self._overlap_communication:
|
||||
self.sync_grad()
|
||||
self._sync_grad()
|
||||
else:
|
||||
self._run_reduction()
|
||||
|
||||
|
Reference in New Issue
Block a user