[shardformer, pipeline] add gradient_checkpointing_ratio and heterogenous shard policy for llama (#5508)

* feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig` * feat: apply `GradientCheckpointConfig` to policy and llama_forward * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager * fix: add optional args for `distribute_layer` and `get_stage_index` * fix: fix changed API calls * test: update llama tests * style: polish `GradientCheckpointConfig` * fix: fix pipeline utils tests
2025-09-10 21:40:02 +00:00 · 2024-04-01 11:34:58 +08:00
parent df5e9c53cf
commit e614aa34f3
28 changed files with 396 additions and 213 deletions
--- a/tests/kit/model_zoo/transformers/llama.py
+++ b/tests/kit/model_zoo/transformers/llama.py
@@ -49,9 +49,9 @@ if HAS_LLAMA:
    loss_fn_for_seq_classification = lambda output: output["logits"].mean()

    config = LlamaConfig(
-        num_hidden_layers=4,
-        hidden_size=128,
-        intermediate_size=256,
+        num_hidden_layers=8,
+        hidden_size=32,
+        intermediate_size=64,
        num_attention_heads=4,
        max_position_embeddings=128,
        num_labels=16,