Repository logo
 

Encoding parameter and structural efficiency in deep learning


Type

Thesis

Change log

Authors

Spasov, Simeon 

Abstract

The development of deep learning has led to significant performance gains on a variety of tasks in a diverse set of application areas spanning computer vision, natural language processing, reinforcement learning, generative modelling and more recently relational learning from graph-structured data. The main reason for this success is an increase in the availability of computational power, which allows for deep and highly parameterized neural network architectures which can learn complex feature transformations from raw data. The high representational power of deep neural networks, however, often comes at the cost of high model complexity which refers to the high parameterization, as well as memory and computational burden associated with deep learning. In this thesis, I rely on parameter-efficient neural operators, appropriate modelling assumptions about the data and inductive biases on network structure to propose simpler neural network models in several application areas. For each of the application areas I work in, I use a combination of these principles of efficiency to design novel approaches. First, within the context of medical image processing, I make the observation that spatially aligned neuroimages exhibit fewer degrees of freedom than natural images, which justifies the use of lower capacity convolutional operators. I achieve this by applying parameter-efficient convolutional variants. I demonstrate state-of-the-art results on early Alzheimer's prediction while using up to 125 times fewer parameters and over 17 times fewer multiply–accumulate operations. Similar conclusions are also reached for an unsupervised method for neuroimages designed to identify subject subtypes. Second, I set out to alleviate the challenge of training parameter-efficient deep models from scratch. This can reduce the infeasibility of training deep models on resource-constrained ``edge'' devices. The proposed method is based on a simplifying assumption about network architecture, that is parameter independence, which allows to model the problem in the context of combinatorial multi-armed bandits. The method can dynamically, that is during training, identify a high-performing compact subnetwork within an overparameterized model while adhering to a predefined memory utilization budget. This is achieved by associating a saliency metric with each neuron, which is then used to drive parameter activation akin to a gating mechanism, while simultaneously learning the parameters. As a result, the computational and memory burden during both training and inference of deep neural networks is significantly reduced. Finally, I present a deep probabilistic model for learning unsupervised node and community embeddings in dynamic graphs. I introduce structural inductive biases about the edge formation mechanism based on the inherent community structure of networks. Further, I also assume smooth temporal evolution of both the nodes and the communities inspired by the lack of disruptive events in the data. I present a parameter-efficient implementation of the method which outperforms state-of-the-art graph convolutional networks on a variety of dynamic predictive tasks.

Description

Date

2021-05-21

Advisors

Lio, Pietro
Passamonti, Luca

Keywords

Machine learning, Deep learning, Parameter efficiency

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
EPSRC (1620072)
EPSRC (1620072)