Repository logo
 

Software Prefetching for Unstructured Mesh Applications

Accepted version
Peer-reviewed

Type

Article

Change log

Authors

Hadade, I 
Jones, TM 
Wang, F 
Mare, LD 

Abstract

jats:pThis article demonstrates the utility and implementation of software prefetching in an unstructured finite volume computational fluid dynamics code of representative size and complexity to an industrial application and across a number of modern processors. We present the benefits of auto-tuning for finding the optimal prefetch distance values across different computational kernels and architectures and demonstrate the importance of choosing the right prefetch destination across the available cache levels for best performance. We discuss the impact of the data layout on the number of prefetch instructions required in kernels with indirect addressing patterns and show how to best implement them in an existing large-scale computational fluid dynamics application. Through this, we show significant full application speed-ups on a range of processors and realistic test cases in both single core/tile and full socket configurations, such as 1.14× on the Intel Xeon Sandy Bridge, 1.09× on the Intel Xeon Broadwell, 1.29× on the Intel Xeon Skylake, 1.99× on the in-order Intel Xeon Phi Knights Corner coprocessor, and 1.51× on the out-of-order Intel Xeon Phi Knights Landing many-core processor.</jats:p>

Description

Keywords

Software prefetching, unstructured mesh, irregular memory access, auto-tuning, performance optimisation

Journal Title

ACM Transactions on Parallel Computing

Conference Name

Journal ISSN

2329-4949
2329-4957

Volume Title

7

Publisher

Association for Computing Machinery (ACM)

Rights

All rights reserved
Sponsorship
Engineering and Physical Sciences Research Council (EP/K026399/1)