Real-time factored ConvNets: Extracting the x factor in human parsing

Charles, J; Budvytis, I; Cipolla, R

Real-time factored ConvNets: Extracting the x factor in human parsing

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/274294

Repository DOI

https://doi.org/10.17863/CAM.21419

Files

Accepted version (1.07 MB)

Type

Conference Object

Authors

Charles, J

Budvytis, I

Cipolla, Roberto

https://orcid.org/0000-0002-8999-2151

Abstract

© 2017. The copyright of this document resides with its authors. We propose a real-time and lightweight multi-task style ConvNet (termed a Factored ConvNet) for human body parsing in images or video. Factored ConvNets have isolated areas which perform known sub-tasks, such as object localization or edge detection. We call this area and sub-task pair an X factor. Unlike multi-task ConvNets which have independent tasks, the Factored ConvNet’s sub-task has direct effect on the main task outcome. In this paper we show how to isolate the X factor of foreground/background (f/b) subtraction from the main task of segmenting human body images into 31 different body part types. Knowledge of this X factor leads to a number of benefits for the Factored ConvNet: 1) Ease of network transfer to other image domains, 2) ability to personalize to humans in video and 3) easy model performance boosts. All achieved by either efficient network update or replacement of the X factor whilst avoiding catastrophic forgetting of previously learnt body part dependencies and structure. We show these benefits on a large dataset of images and also on YouTube videos.

Journal Title

British Machine Vision Conference 2017, BMVC 2017

Conference Name

British Machine Vision Conference 2017, BMVC 2017

Publisher

British Machine Vision Association

Publisher DOI

https://doi.org/10.5244/c.31.24

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Sponsorship

SeeQuestor

Collections

Scholarly Works - Engineering