Repository logo
 

Parallel error detection using heterogeneous cores

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Jones, TM 

Abstract

Microprocessor error detection is increasingly important, as the number of transistors in modern systems heightens their vulnerability. In addition, many modern workloads in domains such as the automotive and health industries are increasingly error intolerant, due to strict safety standards. However, current detection techniques require duplication of all hardware structures, causing a considerable increase in power consumption and chip area. Solutions in the literature involve running the code multiple times on the same hardware, which reduces performance significantly and cannot capture all errors.

We have designed a novel hardware-only solution for error detection, that exploits parallelism in checking code which may not exist in the original execution. We pair a high-performance out-of-order core with a set of small low-power cores, each of which checks a portion of the out-of-order core's execution. Our system enables the detection of both hard and soft errors, with low area, power and performance overheads.

Description

Keywords

fault tolerance, microarchitecture, error detection

Journal Title

Proceedings - 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018

Conference Name

2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Journal ISSN

1530-0889

Volume Title

Publisher

IEEE
Sponsorship
EPSRC (1510365)
Engineering and Physical Sciences Research Council (EP/K026399/1)
Engineering and Physical Sciences Research Council (EP/M506485/1)
Engineering and Physical Sciences Research Council (EP/J016284/1)
This work was supported by the Engineering and Physical Sciences Research Council (EPSRC), through grant references EP/K026399/1 and EP/M506485/1, and Arm Ltd.