Repository logo
 

Separable projection integrals for higher-order correlators of the cosmic microwave sky: Acceleration by factors exceeding 100

Published version
Peer-reviewed

Repository DOI


Change log

Authors

Briggs, JP 
Pennycook, SJ 
Fergusson, JR 
Jäykkä, J 
Shellard, EPS 

Abstract

© 2016. We present a case study describing efforts to optimise and modernise "Modal", the simulation and analysis pipeline used by the Planck satellite experiment for constraining general non-Gaussian models of the early universe via the bispectrum (or three-point correlator) of the cosmic microwave background radiation. We focus on one particular element of the code: the projection of bispectra from the end of inflation to the spherical shell at decoupling, which defines the CMB we observe today. This code involves a three-dimensional inner product between two functions, one of which requires an integral, on a non-rectangular domain containing a sparse grid. We show that by employing separable methods this calculation can be reduced to a one-dimensional summation plus two integrations, reducing the overall dimensionality from four to three. The introduction of separable functions also solves the issue of the non-rectangular sparse grid. This separable method can become unstable in certain scenarios and so the slower non-separable integral must be calculated instead. We present a discussion of the optimisation of both approaches.We demonstrate significant speed-ups of ≈100×, arising from a combination of algorithmic improvements and architecture-aware optimisations targeted at improving thread and vectorisation behaviour. The resulting MPI/OpenMP hybrid code is capable of executing on clusters containing processors and/or coprocessors, with strong-scaling efficiency of 98.6% on up to 16 nodes. We find that a single coprocessor outperforms two processor sockets by a factor of 1.3× and that running the same code across a combination of both microarchitectures improves performance-per-node by a factor of 3.38×. By making bispectrum calculations competitive with those for the power spectrum (or two-point correlator) we are now able to consider joint analysis for cosmological science exploitation of new data.

Description

Keywords

Cosmology, Xeon Phi, Many-core, Nested parallelism

Journal Title

Journal of Computational Physics

Conference Name

Journal ISSN

0021-9991
1090-2716

Volume Title

310

Publisher

Elsevier
Sponsorship
Science and Technology Facilities Council (ST/L000636/1)
Science and Technology Facilities Council (ST/J005673/1)
Science and Technology Facilities Council (ST/K00333X/1)
Science and Technology Facilities Council (ST/M00418X/1)
Science and Technology Facilities Council (ST/M007065/1)
Science and Technology Facilities Council (ST/H008586/1)
This research is supported by an STFC consolidated grant ST/L000636/1, and funded in part by the Intel R Parallel Computing Centre program. This work was undertaken on the COSMOS Shared Memory system at DAMTP, University of Cambridge operated on behalf of the STFC DiRAC HPC Facility. This equipment is funded by BIS National E-infrastructure capital grant ST/J005673/1 and STFC grants ST/H008586/1, ST/K00333X/1.