Randomized Testing of RISC-V CPUs Using Direct Instruction Injection

This article presents a randomized testing framework for RISC-V implementations by using a technique called direct instruction injection for test injection


Introduction
TestRIG (Testing with Random Instruction Generation) is a testing framework for RISC-V implementations.The RISC-V community has standardized a formal model of the 1 architecture in the Sail language [1], giving a human-readable specification that can also be used for simulation and verification.Ideally, a RISC-V implementor could formally prove equivalence between their implementation and the Sail model, but proof tools are not yet sufficiently automated to be routinely used on the whole-processor level.As a pragmatic compromise, we use TestRIG to check equivalence between the model and an implementation by generating random instruction sequences, executing the same sequences on the model and the implementation under test, and comparing execution traces (tandem execution).This approach does not prove equivalence but can demonstrate divergence, and is usable in all stages of development.
TestRIG uses the RISC-V Formal Interface (RVFI) standard2 to observe the change in state after each instruction of the implementation under test, and uses a novel technique that we are calling Direct Instruction Injection (DII) for test injection.In normal program execution, the next instruction is fetched from program memory at an address determined by the program counter.With Direct Instruction Injection, the next instruction to be executed is provided by the test harness, regardless of the CPU's program counter.
We are not testing completed, fabricated chips.Rather, we are comparing executable formal models, software ISA simulators and simulated execution of hardware designs.This requires us to instrument the CPU design with an additional interface for Direct Instruction Injection used by the test harness during tandem verification.
We have added the Direct Instruction Injection interface to the Sail RISC-V formal model, and to two high-performance emulators: Spike , and QEMU .We have also instrumented four RISC-V processor implementations with RVFI-DII, spanning from embedded to superscalar.We have used TestRIG to test many standard RISC-V extensions, and the experimental CHERI security extension.
We found TestRIG to be easier to use than unit tests, since instructions can be tested as they are implemented without supporting a full testing framework.We also found that TestRIG gave more thorough test coverage due to random generation replacing developer effort to explore possibilities.It is effective at detecting not just issues in instruction semantics, but also in the pipeline and the data caches.As a result, TestRIG has completely replaced our instruction-set level unit testing for development.

TestRIG framework
TestRIG is designed as a modular ecosystem: an interactive Verification Engine (VEngine) stimulates RISC-V implementations over RVFI-DII sockets.An RVFI-DII compatible RISC-V implementation can reset, consume instruction sequences, and report execution traces via its RVFI-DII interface.
A VEngine can drive one or more RVFI-DII compatible implementations; a VEngine might have an internal RISC-V model, or could drive two independent implementations and compare their RVFI traces, as we have done with QCVEngine.VEngine instruction sequences could be loaded from disk, generated randomly, or produced with interactive architecture-driven state-space exploration.
The RVFI-DII bytestream interface allows models and implementations written in various languages to communicate through widely supported networking sockets.QCVEngine is written in Haskell, and the Sail RISC-V model is written in Sail (offering OCaml and C backends).Spike and QEMU are RISC-V simulators written in C and C++.Hardware implementations that support RVFI-DII, including RVBS , 5 Ibex , Piccolo , Flute , and RiscyOO are written in either 6 7 8 9 SystemVerilog or Bluespec, although this is not required for TestRIG.

RVFI-DII
To participate in the TestRIG verification ecosystem, implementations must be extended with RVFI-DII instrumentation.The RISC-V Formal Interface (RVFI), specified by Claire Wolf, is an existing trace format for formal verification using symbolic instructions.RVFI exposes select architecturally significant signals such as the instruction encoding and any memory address or value, as well as the indices and values of the operand and writeback registers.
TestRIG extends RVFI with Direct Instruction Injection (DII).DII is for instruction input, RVFI is for trace output, and RVFI-DII supports full interactive verification.DII directly specifies the instruction sequence expected in the output trace, and does not associate instructions with memory addresses.This requires custom pipeline instrumentation, but enables greatly simplified sequence generation and shrinking, as the program counter does not affect the instruction stream.
Existing RISC-V cores that implement RVFI can be augmented to participate in the TestRIG ecosystem by implementing DII, and conversely RVFI-DII designs may benefit from RVFI formal verification tooling.

QuickCheck VEngine
Our TestRIG Verification Engine, QCVEngine, leverages Haskell's QuickCheck library [2].Due to the simplicity of DII execution, which decouples the instruction stream from control flow, QCVEngine can use unmodified QuickCheck utilities to generate, compare, and shrink instruction sequences.
QuickCheck receives a function with a pass/fail return value, and generates inputs in search of a failure.To facilitate this, we construct a function that receives a list of instructions, sends these over two DII sockets, collects RVFI traces back from these sockets, asserts they match, and returns the result.
We then provide a set of generators of arbitrary instruction sequences that are used by QuickCheck to produce inputs to this function.We use convenience functions to define instructions in a syntax closely resembling the RISC-V ISA manual, and provide tailored generators for each instruction field to promote register reuse.QuickCheck automatically uses these generators to construct arbitrary instruction sequences.We also provide targeted generators for simple subsets of the instruction set, as well as generators that leverage templates of varying complexity to reach deeper states, including virtual memory mappings and cache conflicts.
We also develop a mechanism to allow semantic shrinking of counterexample traces, beyond QuickCheck's default of deleting instructions.

Evaluation
We have measured functional coverage of TestRIG over the Sail model compared to the RISC-V test suite and 10 RISCV-DV , finding the coverage broadly comparable.
11 Further work is needed to develop templates that cover the architecture more thoroughly.Due to trace shrinking, the counterexamples produced are orders of magnitude shorter than those produced by the other methods, significantly speeding up debugging cycles.
Several significant bugs have been discovered using QCVEngine and TestRIG, spanning architectural and microarchitectural errors in codebases of varying maturities.We will also discuss its application to debugging the CHERI security extension [3] and for measuring transient execution vulnerabilities.