General hardware multicasting for fine-grained message-passing architectures

Conference Object
Change log
Naylor, M 
Moore, SW 
Thomas, D 
Beaumont, JR 
Fleming, S 

Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in some application domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model.

46 Information and Computing Sciences, 3301 Architecture, 33 Built Environment and Design
Journal Title
Proceedings - 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2021
Conference Name
2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)
Journal ISSN
Volume Title
All rights reserved
Engineering and Physical Sciences Research Council (EP/N031768/1)