Optimize pipeline schedule (#94)

* add pipeline shared module wrapper and update load batch * added model parallel process group for amp and clip grad (#86) * added model parallel process group for amp and clip grad * update amp and clip with model parallel process group * remove pipeline_prev/next group (#88) * micro batch offload * optimize pipeline gpu memory usage * pipeline can receive tensor shape (#93) * optimize pipeline gpu memory usage * fix grad accumulation step counter * rename classes and functions Co-authored-by: Frank Lee <somerlee.9@gmail.com>
2025-09-02 17:46:42 +00:00 · 2021-12-30 15:56:46 +08:00
parent e5b9f9a08d
commit 96780e6ee4
29 changed files with 423 additions and 290 deletions
--- a/docs/add_your_parallel.md
+++ b/docs/add_your_parallel.md
@@ -26,8 +26,6 @@ follow the steps below to create a new distributed initialization.
        GLOBAL = 'global'
        DATA = 'data'
        PIPELINE = 'pipe'
-        PIPELINE_PREV = 'pipe_prev'
-        PIPELINE_NEXT = 'pipe_next'
        ...

        NEW_MODE = 'new_mode'  # define your mode here
--- a/docs/add_your_parallel_zh.md
+++ b/docs/add_your_parallel_zh.md
@@ -18,8 +18,6 @@ class ParallelMode(Enum):
    GLOBAL = 'global'
    DATA = 'data'
    PIPELINE = 'pipe'
-    PIPELINE_PREV = 'pipe_prev'
-    PIPELINE_NEXT = 'pipe_next'
    ...

    NEW_MODE = 'new_mode'  # define your mode here