Parallelisation of greedy algorithms for compressive sensing reconstruction
Compressive Sensing (CS) is a technique which allows a signal to be compressed at the same time as it is captured. The process of capturing and simultaneously compressing the signal is represented as linear sampling, which can encompass a variety of physical processes or signal processing. Instead of explicitly identifying redundancies in the source signal, CS relies on the property of sparsity in order to reconstruct the compressed signal. While linear sampling is much less burdensome than conventional compression, this is more than made up for by the high computational cost of reconstructing a signal which has been captured using CS. Even when using some of the fastest reconstruction techniques, known as greedy pursuits, reconstruction of large problems can pose a significant burden, consuming a great deal of memory as well as compute time.
Parallel computing is the foundation of the field of High Performance Computing (HPC). Modern supercomputers are generally composed of large clusters of standard servers, with a dedicated low-latency high-bandwidth interconnect network. On such a cluster, an appropriately written program can harness vast quantities of memory and computational power. However, in order to exploit a parallel compute resource, an algorithm usually has to be redesigned from the ground up. In this thesis I describe the development of parallel variants of two algorithms commonly used in CS reconstruction, Matching Pursuit (MP) and Orthogonal Matching Pursuit (OMP), resulting in the new distributed compute algorithms DistMP and DistOMP. I present the results from experiments showing how DistMP and DistOMP can utilise a compute cluster to solve CS problems much more quickly than a single computer could alone. Speed-up of as much as a factor of 76 is observed with DistMP when utilising 210 workers across 14 servers, compared to a single worker. Finally, I demonstrate how DistOMP can solve a problem with a 429GB equivalent sampling matrix in as little as 62 minutes using a 16-node compute cluster.