Speculative Vectorization with Selective Replay

Sun, Peng

Speculative Vectorization with Selective Replay

Repository URI

https://www.repository.cam.ac.uk/handle/1810/312445

Repository DOI

https://doi.org/10.17863/CAM.59537

Files

Thesis (1.16 MB)

Type

Thesis

Authors

Sun, Peng

Abstract

Vector architectures, once the mainstay of supercomputers, have become popular again after the introduction of short vector instruction sets into general-purpose processors in the 1990s, and their successive improvement in subsequent architecture revisions. Single Instruction Multiple Data (SIMD) execution, where a single instruction operates on multiple data items, offers higher performance and lower energy consumption. Unlike other types of execution that need to fetch, decode and execute one instruction per data operation, SIMD execution requires only a single fetch and decode to process multiple data operations in parallel, amortizing their cost. However, modern processors only leverage SIMD execution to accelerate a limited set of applications. Other, general-purpose, applications seldom take advantage of the SIMD functionality implemented in modern processors. With the trend moving towards more hardware resources devoted to SIMD execution, significant performance is left on the table if they are not often utilized.

This thesis first uses a program-analysis tool to investigate the reasons that prevent programs from being vectorized. Analysis shows that modern SIMD architectures still rely on the programmer or compiler to transform code to vector form only when it is safe. Limitations in the power of a compiler's memory alias analysis and the presence of infrequent memory data dependences mean that whole regions of code cannot be safely vectorized without risking changing the semantics of the application, thus restricting the available performance.

Based on these findings, this thesis presents a new SIMD architecture, which relies on hardware speculation to identify and catch memory-dependence violations that occur during vector execution. This architecture requires the compiler to mark and vectorize code regions, usually loops, with possible dependences, and allows execution of these vectorized codes with preserved semantics. The memory-disambiguation mechanism, implemented as an extension of the load-store queue in a conventional superscalar processor, resolves possible dependence violations. The code-generation process of the compiler is modified accordingly to vectorize code regions without guaranteeing the correctness of the semantics. This hardware-software co-design allows speculative vectorization of more code regions and hence enhances the coverage of vectorization.

Finally, to optimize the new SIMD architecture, a new instruction is proposed to augment memory-access instructions with extra tags that carry dependence information from the compiler. With the extra information, only instructions that may cause dependence violations, ensured by the compiler, take part in the memory-disambiguation process. This optimization technique removes unnecessary runtime memory-disambiguation if the compiler can perform it statically.

Date

2019-09-30

Advisors

Jones, Timothy

Keywords

Vectorization, Speculation

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Collections

Theses - Computer Science and Technology