Acceleration of Core Post-quantum Cryptography Primitive on Open-Source Silicon Platform Through Hardware/Software Co-design.
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
Post-Quantum Cryptography (PQC) algorithms are currently being standardised and their early implementations are not as efficient as the well-established public key cryptography (PKC) algorithms that have benefited from decades of optimisations. % We report on our efforts to accelerate the Number Theoretic Transform (NTT), the most computationally expensive primitive in the Kyber (ML-KEM) and Dilithium (ML-DSA) PQC algorithms selected by NIST for standardisation. Our target platform is the OpenTitan Big Number Accelerator (OTBN), part of the first open-source silicon root-of-trust chip. % We implemented the Kyber NTT in OTBN assembly, using only the existing instructions, and identified its bottlenecks. We then restructured the code to exploit parallelism and defined additional assembly instructions for the open-source co-processor that would enable execution of our vectorised implementation. % Our hardware/software co-design approach yielded a significant performance improvement: NTT ran 21.1 times faster than the baseline implementation which used only OTBN's existing instructions. Our approach fully leverages the potential for parallelism and maximally exploits the existing capabilities of OTBN. Some of our optimisations are fairly general and might be successfully applied to other contexts, including accelerating other algorithms on other platforms.