mirror of
https://github.com/hpcaitech/ColossalAI.git
synced 2025-09-03 18:19:58 +00:00
[hotfix] add copyright for solver and device mesh (#2803)
* [hotfix] add copyright for solver and device mesh * add readme * add alpa license * polish
This commit is contained in:
@@ -37,9 +37,6 @@ Colossal-AI’s auto-parallelism searches for strategies in regard to each opera
|
||||
## Distributed Tensor and Shape-Consistency System
|
||||
|
||||
The Colossal-AI system uses a device-mesh, similar to PyTorch's latest DTensor release, to manage its cluster. Colossal-AI uses a sharding-spec to annotate the storage status of each tensor and facilitate their distribution across the cluster. The system also employs a shape-consistency manager to automatically transform tensors between different sharding-specs, allowing for seamless slicing and dicing of tensors, while the shape-consistency manager ensures that the output of upstream operands is consistently stored in the cluster, regardless of how the input of downstream operands is stored. This makes Colossal-AI highly versatile and easy to use without users worrying about the storage status of tensors when performing operations on them.
|
||||
<figure style={{textAlign: "center"}}>
|
||||
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/auto_parallel/shape_consistency.png"/>
|
||||
</figure>
|
||||
|
||||
Here are some key advantages of Colossal-AI compared to PyTorch DTensor:
|
||||
Colossal-AI's device-mesh uses cluster performance metrics and profiling results to estimate the time consumption of different communication operators. This helps Colossal-AI optimize communication between nodes and improve overall system efficiency.
|
||||
|
@@ -11,7 +11,3 @@ Detailed instructions can be found in its `README.md`.
|
||||
|
||||
Colossal-Auto's automatic search function for activation checkpointing finds the most efficient checkpoint within a given memory budget, rather than just aiming for maximum memory compression. To avoid a lengthy search process for an optimal activation checkpoint, Colossal-Auto has implemented a two-stage search process. This allows the system to find a feasible distributed training solution in a reasonable amount of time while still benefiting from activation checkpointing for memory management. The integration of activation checkpointing in Colossal-AI improves the efficiency and effectiveness of large model training. You can follow the [Resnet example](https://github.com/hpcaitech/ColossalAI/tree/main/examples/tutorial/auto_parallel).
|
||||
Detailed instructions can be found in its `README.md`.
|
||||
|
||||
<figure style={{textAlign: "center"}}>
|
||||
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/auto_parallel/auto_ckpt.jpg"/>
|
||||
</figure>
|
||||
|
Reference in New Issue
Block a user