Acceleration of dRMSD Calculation and Efficient Usage of GPU Caches
Název česky | Akcelerace dRMSD výpočtu a efektivní užití GPU cache |
---|---|
Autoři | |
Rok publikování | 2015 |
Druh | Článek ve sborníku |
Konference | Proceedings of IEEE International Conference on High Performance Computing & Simulation |
Fakulta / Pracoviště MU | |
Citace | |
Doi | http://dx.doi.org/10.1109/HPCSim.2015.7237020 |
Obor | Informatika |
Klíčová slova | RMSD; GPU; code optimization; cache |
Popis | In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4x speedup in clustering and 62.7x speedup in 1:1 dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5 % and 91.6 % of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance. |
Související projekty: |