* [builder] builder for scaled_upper_triang_masked_softmax * add missing files * fix a bug * polish code * [example] diffusion install from docker