Repository logo
 

ParaDox: Eliminating Voltage Margins via Heterogeneous Fault Tolerance.

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Ainsworth, Sam 
Zoubritzky, Lionel 

Abstract

Providing reliability is becoming a challenge for chip manufacturers, faced with simultaneously trying to improve miniaturization, performance and energy efficiency. This leads to very large margins on voltage and frequency, designed to avoid errors even in the worst case, along with significant hardware expenditure on eliminating voltage spikes and other forms of transient error, causing considerable inefficiency in power consumption and performance. We flip traditional ideas about reliability and performance around, by exploring the use of error resilience for power and performance gains. ParaMedic is a recent architecture that provides a solution for reliability with low overheads via automatic hardware error recovery. It works by splitting up checking onto many small cores in a heterogeneous multicore system with hardware logging support. However, its design is based on the idea that errors are exceptional. We transform ParaMedic into ParaDox, which shows high performance in both error-intensive and scarce-error scenarios, thus allowing correct execution even when undervolted and overclocked. Evaluation within error-intensive simulation environments confirms the error resilience of ParaDox and the low associated recovery cost. We estimate that compared to a non-resilient system with margins, ParaDox can reduce energy-delay product by 15% through undervolting, while completely recovering from any induced errors.

Description

Keywords

fault tolerance, microarchitecture, error detection, voltage margins

Journal Title

2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Conference Name

2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Journal ISSN

2378-203X

Volume Title

Publisher

IEEE

Rights

All rights reserved
Sponsorship
Engineering and Physical Sciences Research Council (EP/K026399/1)
Engineering and Physical Sciences Research Council (EP/P020011/1)
Relationships
Is supplemented by: