Repository logo
 

COMET: Communication-optimised multi-threaded error-detection technique

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Mitropoulou, K 
Porpodas, V 
Jones, TM 

Abstract

© 2016 ACM. Relentless technology scaling has made transistors more vulnerable to soft, or transient, errors. To keep systems robust against these, current error detection techniques use different types of redundancy at the hardware or the software level. A consequence of these additional protection mechanisms is that these systems tend to become slower. In particular, software error-detection techniques degrade performance considerably, limiting their uptake. This paper focuses on software redundant multi-threading error detection, a compiler-based technique that makes use of redundant cores within a multi-core system to perform error checking. Implementations of this scheme feature two threads that execute almost the same code: the main thread runs the original code and the checker thread executes code to verify the correctness of the original. The main thread communicates the values that require checking to the checker thread to use in its comparisons. We identify a major performance bottleneck in existing schemes: poorly performing inter-core communication and the generated code associated with it. Our study shows this is a major performance impediment within existing techniques since the two threads require extremely fine-grained communication, on the order of every few instructions. We alleviate this bottleneck with a series of code generation optimisations at the compiler level. We propose COMET (Communication-Optimised Multi-threaded Error-detection Technique), which improves performance across the NAS parallel benchmarks by 31.4% (on average) compared to the state-of-the-art, without affecting fault-coverage.

Description

Keywords

Error Detection, Soft Errors, Communication Optimisations, Code Generation

Journal Title

Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2016

Conference Name

International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES

Journal ISSN

2381-1560

Volume Title

Publisher