diff --git a/README.md b/README.md index 8528ac2a8..dd181341e 100644 --- a/README.md +++ b/README.md @@ -38,24 +38,22 @@ distributed training in a few lines. ## Examples ### ViT - - + - 14x larger batch size, and 5x faster training for Tensor Parallel = 64 ### GPT-3 - - + - Free 50% GPU resources, or 10.7% acceleration ### GPT-2 - + - 11x lower GPU RAM, or superlinear scaling ### BERT - + - 2x faster training, or 50% longer sequence length