Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches
Authors | |
---|---|
Year of publication | 2015 |
Type | Article in Proceedings |
Conference | Proceedings of IEEE International Conference on High Performance Computing & Simulation |
MU Faculty or unit | |
Citation | |
Doi | http://dx.doi.org/10.1109/HPCSim.2015.7237020 |
Field | Informatics |
Keywords | RMSD; GPU; code optimization; cache |
Description | In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4x speedup in clustering and 62.7x speedup in 1:1 dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5 % and 91.6 % of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance. |
Related projects: |