SceneNet: Understanding Real World Indoor Scenes With Synthetic Data

Handa, Ankur; Patraucean, Viorica; Badrinarayanan, Vijay; Stent, Simon; Cipolla, Roberto

SceneNet: Understanding Real World Indoor Scenes With Synthetic Data

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/279106

Repository DOI

https://doi.org/10.17863/CAM.26486

Files

Accepted version (8.7 MB)

Type

Conference Object

Authors

Handa, Ankur

Patraucean, Viorica

Badrinarayanan, Vijay

Stent, Simon

Cipolla, Roberto

https://orcid.org/0000-0002-8999-2151

Abstract

Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments. Recent attempts with supervised learning have shown promise in this direction but also highlighted the need for enormous quantity of supervised data --- performance increases in proportion to the amount of data used. However, this quickly becomes prohibitive when considering the manual labour needed to collect such data. In this work, we focus our attention on depth based semantic per-pixel labelling as a scene understanding problem and show the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes. By carefully synthesizing training data with appropriate noise models we show comparable performance to state-of-the-art RGBD systems on NYUv2 dataset despite using only depth data as input and set a benchmark on depth-based segmentation on SUN RGB-D dataset. Additionally, we offer a route to generating synthesized frame or video data, and understanding of different factors influencing performance gains.

Keywords

cs.CV, cs.CV

Conference Name

IEEE Conference on Computer Vision and Pattern Recognition

Publisher DOI

https://doi.org/10.17863/CAM.26486

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Collections

Cambridge University Research Outputs