* add benchmark * merge common func * add total and avg tflops Co-authored-by: Ziyue Jiang <ziyue.jiang@gmail.com>