VLAG: A very fast locality approximation model for GPU kernels with regular access patterns
Abstract
Performance modeling plays an important role for optimal hardware design and optimized application implementation. This paper presents a very low overhead performance model, called VLAG, to approximate the data localities exploited by GPU kernels. VLAG receives source code-level information to estimate per memory-access instruction, per data array, and per kernel localities within GPU kernels. VLAG is only applicable to kernels with regular memory access patterns. VLAG was experimentally evaluated using an NVIDiA Maxwell GPU. For two different Matrix Multiplication kernels, the average errors of 7.68% and 6.29%, was resulted, respectively. The slowdown of VLAG for MM was measured 1.4X which, comparing with other approaches such as trace-driven simulation, is negligible. © 2017 IEEE.