Guided Automatic Binary Parallelisation
For decades, the software industry has amassed a vast repository of pre-compiled libraries and executables which are still valuable and actively in use. However, for a significant fraction of these binaries, most of the source code is absent or is written in old languages, making it practically impossible to recompile them for new generations of hardware. As the number of cores in chip multi-processors (CMPs) continue to scale, the performance of this legacy software becomes increasingly sub-optimal. Rewriting new optimised and parallel software would be a time-consuming and expensive task. Without source code, existing automatic performance enhancing and parallelisation techniques are not applicable for legacy software or parts of new applications linked with legacy libraries.
In this dissertation, three tools are presented to address the challenge of optimising legacy binaries. The first, GBR (Guided Binary Recompilation), is a tool that recompiles stripped application binaries without the need for the source code or relocation information. GBR performs static binary analysis to determine how recompilation should be undertaken, and produces a domain-specific hint program. This hint program is loaded and interpreted by the GBR dynamic runtime, which is built on top of the open-source dynamic binary translator, DynamoRIO. In this manner, complicated recompilation of the target binary is carried out to achieve optimised execution on a real system. The problem of limited dataflow and type information is addressed through cooperation between the hint program and JIT optimisation. The utility of GBR is demonstrated by software prefetch and vectorisation optimisations to achieve performance improvements compared to their original native execution.
The second tool is called BEEP (Binary Emulator for Estimating Parallelism), an extension to GBR for binary instrumentation. BEEP is used to identify potential thread-level parallelism through static binary analysis and binary instrumentation. BEEP performs preliminary static analysis on binaries and encodes all statically-undecided questions into a hint program. The hint program is interpreted by GBR so that on-demand binary instrumentation codes are inserted to answer the questions from runtime information. BEEP incorporates a few parallel cost models to evaluate identified parallelism under different parallelisation paradigms.
The third tool is named GABP (Guided Automatic Binary Parallelisation), an extension to GBR for parallelisation. GABP focuses on loops from sequential application binaries and automatically extracts thread-level parallelism from them on-the-fly, under the direction of the hint program, for efficient parallel execution. It employs a range of runtime schemes, such as thread-level speculation and synchronisation, to handle runtime data dependences. GABP achieves a geometric mean of speedup of 1.91x on binaries from SPEC CPU2006 on a real x86-64 eight-core system compared to native sequential execution. Performance is obtained for SPEC CPU2006 executables compiled from a variety of source languages and by different compilers.