Repository logo
 

Data and Computation Efficient Meta-Learning


Type

Thesis

Change log

Authors

Abstract

In order to make predictions with high accuracy, conventional deep learning systems require large training datasets consisting of thousands or millions of examples and long training times measured in hours or days, consuming high levels of electricity with a negative impact on our environment. It is desirable to have have machine learning systems that can emulate human behavior such that they can quickly learn new concepts from only a few examples. This is especially true if we need to quickly customize or personalize machine learning models to specific scenarios where it would be impractical to acquire a large amount of training data and where a mobile device is the means for computation. We define a data efficient machine learning system to be one that can learn a new concept from only a few examples (or shots) and a computation efficient machine learning system to be one that can learn a new concept rapidly without retraining on an everyday computing device such as a smart phone.

In this work, we design, develop, analyze, and extend the theory of machine learning systems that are both data efficient and computation efficient. We present systems that are trained using multiple tasks such that it "learns how to learn" to solve new tasks from only a few examples. These systems can efficiently solve new, unseen tasks drawn from a broad range of data distributions, in both the low and high data regimes, without the need for costly retraining. Adapting to a new task requires only a forward pass of the example task data through the trained network making the learning of new tasks possible on mobile devices. In particular, we focus on few-shot image classification systems, i.e. machine learning systems that can distinguish between numerous classes of objects depicted in digital images given only a few examples of each class of object to learn from.

To accomplish this, we first develop ML-PIP, a general framework for Meta-Learning approximate Probabilistic Inference for Prediction. ML-PIP extends existing probabilistic interpretations of meta-learning to cover a broad class of methods. We then introduce Versa, an instance of the framework employing a fast, flexible and versatile amortization network that takes few-shot learning datasets as inputs, with arbitrary numbers of training examples, and outputs a distribution over task-specific parameters in a single forward pass of the network. We evaluate Versa on benchmark datasets, where at the time, the method achieved state-of-the-art results when compared to meta-learning approaches using similar training regimes and feature extractor capacity.

Next, we build on Versa and add a second amortized network to adapt key parameters in the feature extractor to the current task. To accomplish this, we introduce CNAPs, a conditional neural process based approach to multi-task classification. We demonstrate that, at the time, CNAPs achieved state-of-the-art results on the challenging Meta-Dataset benchmark indicating high-quality transfer-learning. Timing experiments reveal that CNAPs is computationally efficient when adapting to an unseen task as it does not involve gradient back propagation computations. We show that trained models are immediately deployable to continual learning and active learning where they can outperform existing approaches that do not leverage transfer learning.

Finally, we investigate the effects of different methods of batch normalization on meta-learning systems. Batch normalization has become an essential component of deep learning systems as it significantly accelerates the training of neural networks by allowing the use of higher learning rates and decreasing the sensitivity to network initialization. We show that the hierarchical nature of the meta-learning setting presents several challenges that can render conventional batch normalization ineffective. We evaluate a range of approaches to batch normalization for few-shot learning scenarios, and develop a novel approach that we call TaskNorm. Experiments demonstrate that the choice of batch normalization has a dramatic effect on both classification accuracy and training time for both gradient based- and gradient-free meta-learning approaches and that TaskNorm consistently improves performance.

Description

Date

2020-07-14

Advisors

Turner, Richard E

Keywords

meta-learning, few-shot learning, computer vision, image classification

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge