Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication
Proceedings of the 30th ACM on International Conference on Supercomputing
Association for Computing Machinery
MetadataShow full item record
Mitropoulou, K., Porpodas, V., Zhang, X., & Jones, T. M. (2016). Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication. Proceedings of the 30th ACM on International Conference on Supercomputing, (18), 1-12. https://doi.org/10.1145/2925426.2926274
Designing high-performance software queues for fast intercore communication is challenging, but critical for maximising software parallelism. State-of-the-art single-producer / single-consumer queues for streaming applications contain multiple sections, requiring the producer and consumer to operate independently on different sections from each other. While these queues perform well for coarse-grained data transfers, they perform poorly in the fine-grained case. This paper proposes Lynx, a novel SP/SC queue, specifically tuned for fine-grained communication. Lynx is built from the ground up, reducing the generated code on the critical-path to just two operations per enqueue and dequeue. To achieve this it relies on existing commodity processor hardware and operating system exception handling support to deal with infrequent queue maintenance operations. Lynx outperforms the state-of-the art by up to 1.57× in total 64-bit throughput reaching a peak throughput of 15.7GB/s on a common desktop system. Real applications using Lynx get a performance improvement of up to 1.4×.
single-producer / single-consumer software queue, finegrained communication, hardware exceptions
This work was supported by the Engineering and Physical Sciences Research Council (EPSRC), through grant reference EP/K026399/1.
External DOI: https://doi.org/10.1145/2925426.2926274
This record's URL: https://www.repository.cam.ac.uk/handle/1810/255384