Files
ColossalAI/model_zoo/vit/vision_transformer_from_config.py
Frank Lee da01c234e1 Develop/experiments (#59)
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

* Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
Fix convergence in cifar10, Imagenet1000

* Integrate 1d tensor parallel in Colossal-AI (#39)

* fixed 1D and 2D convergence (#38)

* optimized 2D operations

* fixed 1D ViT convergence problem

* Feature/ddp (#49)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* support torch ddp

* fix loss accumulation

* add log for ddp

* change seed

* modify timing hook

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* Feature/pipeline (#40)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7699.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <c2h214748@gmail.com>

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* optimize communication of pipeline parallel

* fix grad clip for pipeline

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51)

* Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset

* update api for better usability (#58)

update api for better usability

Co-authored-by: 1SAA <c2h214748@gmail.com>
Co-authored-by: ver217 <lhx0217@gmail.com>
Co-authored-by: puck_WCR <46049915+WANG-CR@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: アマデウス <kurisusnowdeng@users.noreply.github.com>
Co-authored-by: BoxiangW <45734921+BoxiangW@users.noreply.github.com>
2021-12-09 15:08:29 +08:00

88 lines
2.4 KiB
Python

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
import torch
from colossalai.registry import MODELS
from colossalai.nn.model.model_from_config import ModelFromConfig
@MODELS.register_module
class VisionTransformerFromConfig(ModelFromConfig):
"""Vision Transformer from
`"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" <https://arxiv.org/pdf/2010.11929>`_.
"""
def __init__(self,
embedding_cfg: dict,
norm_cfg: dict,
block_cfg: dict,
head_cfg: dict,
token_fusion_cfg: dict = None,
embed_dim=768,
depth=12,
drop_path_rate=0.,
tensor_splitting_cfg: dict = None):
super().__init__()
self.embed_dim = embed_dim
self.num_tokens = 1
self.tensor_splitting_cfg = tensor_splitting_cfg
dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)
] # stochastic depth decay rule
if token_fusion_cfg is None:
token_fusion_cfg = []
else:
token_fusion_cfg = [token_fusion_cfg]
self.layers_cfg = [
embedding_cfg,
# input tensor splitting
*self._generate_tensor_splitting_cfg(),
*token_fusion_cfg,
# blocks
*self._generate_block_cfg(
dpr=dpr, block_cfg=block_cfg, depth=depth),
# norm
norm_cfg,
# head
head_cfg
]
def _fuse_tokens(self, x):
cls_token = self.cls_token.expand(x.shape[0], -1, -1)
x = torch.cat((cls_token, x), dim=1)
return x
def _generate_block_cfg(self, dpr, depth, block_cfg):
blocks_cfg = []
for i in range(depth):
_cfg = block_cfg.copy()
_cfg['droppath_cfg']['drop_path'] = dpr[i]
blocks_cfg.append(_cfg)
return blocks_cfg
def _generate_tensor_splitting_cfg(self):
if self.tensor_splitting_cfg:
return [self.tensor_splitting_cfg]
else:
return []
def forward(self, x): # [512, 3, 32, 32]
for layer in self.layers:
if isinstance(x, tuple):
x = layer(*x)
else:
x = layer(x)
return x # [256, 5]
def init_weights(self):
# TODO: add init weights
pass