Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches

Investor logo

Warning

This publication doesn't include Faculty of Education. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

FILIPOVIČ Jiří PLHÁK Jan STŘELÁK David

Year of publication 2015
Type Article in Proceedings
Conference Proceedings of IEEE International Conference on High Performance Computing & Simulation
MU Faculty or unit

Faculty of Informatics

Citation
Doi http://dx.doi.org/10.1109/HPCSim.2015.7237020
Field Informatics
Keywords RMSD; GPU; code optimization; cache
Description In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4x speedup in clustering and 62.7x speedup in 1:1 dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5 % and 91.6 % of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.