Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations

Change log
Georgiev, P 
Bhattacharya, S 
Lane, N 

Continuous audio analysis from embedded and mobile devices is an increasingly important application domain. More and more, appliances like the Amazon Echo, along with smartphones and watches, and even research prototypes seek to perform multiple discriminative tasks simultaneously from ambient audio; for example, monitoring background sound classes (e.g., music or conversation), recognizing certain keywords (‘Hey Siri’ or ‘Alexa’), or identifying the user and her emotion from speech. The use of deep learning algorithms typically provides state-of-the-art model performances for such general audio tasks. However, the large computational demands of deep learning models are at odds with the limited processing, energy and memory resources of mobile, embedded and IoT devices. In this paper, we propose and evaluate a novel deep learning modeling and optimization framework that speci cally targets this category of embedded audio sensing tasks. Although the supported tasks are simpler than the task of speech recognition, this framework aims at maintaining accuracies in predictions while minimizing the overall processor resource footprint. The proposed model is grounded in multi-task learning principles to train shared deep layers and exploits, as input layer, only statistical summaries of audio lter banks to further lower computations. We nd that for embedded audio sensing tasks our framework is able to maintain similar accuracies, which are observed in comparable deep architectures that use single-task learning and typically more complex input layers. Most importantly, on an average, this approach provides almost a 2.1⇥ reduction in runtime, energy, and memory for four separate audio sensing tasks, assuming a variety of task combinations.

46 Information and Computing Sciences, 4603 Computer Vision and Multimedia Computation, 4605 Data Management and Data Science, 4608 Human-Centred Computing, 4611 Machine Learning, Behavioral and Social Science, Clinical Research
Journal Title
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)
Conference Name
Journal ISSN
Volume Title
Association for Computing Machinery
Microsoft Research