Deep Structured Multi-Task Learning for Computer Vision in Autonomous  Driving

Teichmann, Marvin

Deep Structured Multi-Task Learning for Computer Vision in Autonomous Driving

Repository URI

https://www.repository.cam.ac.uk/handle/1810/309112

Repository DOI

https://doi.org/10.17863/CAM.56207

Files

Thesis (46.7 MB)

Type

Thesis

Authors

Teichmann, Marvin

Abstract

The field of computer vision is currently dominated by deep learning advances. Convolutional Neural Networks (CNNs) have become the predominant tool for solving almost any computer vision task, so state-of-the-art systems have been built by using the predictive capabilities of Convolutional Neural Networks (CNNs). Many of those systems use simple encoder–decoder based design, where an off-the-shelf CNN architecture is combined with a task-specific decoder and loss function in order to create an end-to-end trainable model. This ultimately raises the question of whether these kinds of models are the future of computer vision. In this thesis we argue that this is not the case. We start off by discussing three limitations of simple end-to-end training. We proceed by showing how it is possible to overcome those limitations by using an approach that we call structured modelling. The idea is to use CNNs to compute a rich semantic intermediate representation which is then used to solve the actual problem by applying a geometric and task-related structure. In this work we solve the localization, segmentation and landmark recognition task using structured modelling, and we show that this approach can improve generalization, interpretability and robustness. We also discuss how this approach is particularly useful for real-time applications such as autonomous driving. Visual perception is a multi-module problem that requires several different computer vision tasks to be solved. We discuss how, by sharing computations, we can improve not only the inference speed but also the prediction performance by using the structural relationship between the tasks. Lastly, we demonstrate that structured modelling is able to achieve state-of-the-art performance, making it a very relevant approach for solving current and future computer vision problems.

Date

2020-07-28

Advisors

Cipolla, Roberto

Keywords

Computer Vision, Deep Learning, Machine Learning, Autonomous Driving

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Sponsorship

Trinity College, ESPCR, Qualcomm

Collections

Theses - Engineering