Accelerating Mobile Audio Sensing Algorithms through On-Chip GPU Offloading
MobiSys 2017 - Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services
MobiSys '17 - 15th Annual International Conference on Mobile Systems, Applications, and Services
Association for Computing Machinery
MetadataShow full item record
Georgiev, P., Laney, N., Mascolo, C., & Chu, D. (2017). Accelerating Mobile Audio Sensing Algorithms through On-Chip GPU Offloading. MobiSys 2017 - Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, 306-318. https://doi.org/10.1145/3081333.3081358
GPUs have recently enjoyed increased popularity as general purpose software accelerators in multiple application domains including computer vision and natural language processing. However, there has been little exploration into the performance and energy trade-offs mobile GPUs can deliver for the increasingly popular workload of deep-inference audio sensing tasks, such as, spoken keyword spotting in energy-constrained smartphones and wearables. In this paper, we study these trade-offs and introduce an optimization engine that leverages a series of structural and memory access optimization techniques that allow audio algorithm performance to be automatically tuned as a function of GPU device specifications and model semantics. We find that parameter optimized audio routines obtain inferences an order of magnitude faster than sequential CPU implementations, and up to 6.5x times faster than cloud offloading with good connectivity, while critically consuming 3-4x less energy than the CPU. Under our optimized GPU, conventional wisdom about how to use the cloud and low power chips is broken. Unless the network has a throughput of at least 20Mbps (and a RTT of 25 ms or less), with only about 10 to 20 seconds of buffering audio data for batched execution, the optimized GPU audio sensing apps begin to consume less energy than cloud offloading. Under such conditions we find the optimized GPU can provide energy benefits comparable to low-power reference DSP implementations with some preliminary level of optimization; in addition to the GPU always winning with lower latency.
This work was supported by Microsoft Research through its PhD Scholarship Program.
External DOI: https://doi.org/10.1145/3081333.3081358
This record's URL: https://www.repository.cam.ac.uk/handle/1810/267271