Repository logo

Graph prefetching using data structure knowledge


Conference Object

Change log


Jones, TM 


Searches on large graphs are heavily memory latency bound, as a result of many high latency DRAM accesses. Due to the highly irregular nature of the access patterns involved, caches and prefetchers, both hardware and software, perform poorly on graph workloads. This leads to CPU stalling for the majority of the time. However, in many cases the data access pattern is well defined and predictable in advance, many falling into a small set of simple patterns. Although existing implicit prefetchers cannot bring significant benefit, a prefetcher armed with knowledge of the data structures and access patterns could accurately anticipate applications' traversals to bring in the appropriate data. This paper presents a design of an explicitly configured prefetcher to improve performance for breadth-first searches and sequential iteration on the efficient and commonly-used compressed sparse row graph format. By snooping L1 cache accesses from the core and reacting to data returned from its own prefetches, the prefetcher can schedule timely loads of data in advance of the application needing it. For a range of applications and graph sizes, our prefetcher achieves average speedups of 2.3x, and up to 3.3x, with little impact on memory bandwidth requirements.



33 Built Environment and Design, 3301 Architecture

Journal Title

Proceedings of the International Conference on Supercomputing

Conference Name

ICS '16: 2016 International Conference on Supercomputing

Journal ISSN

Volume Title


Engineering and Physical Sciences Research Council (EP/K026399/1)
EPSRC (1510365)
Engineering and Physical Sciences Research Council (EP/M506485/1)
This work was supported by the Engineering and Physical Sciences Research Council (EPSRC), through grant references EP/K026399/1 and EP/M506485/1, and ARM Ltd.