Probabilistic Methods for Monocular 3D Human Reconstruction
Repository URI
Repository DOI
Change log
Authors
Abstract
Remarkably, humans can infer the approximate 3D surface geometry and appearance of complex articulated objects, such as people and animals, given but a single 2D RGB image. In fact, we can reason about the whole range of 3D poses and body shapes, surface colours and textures that can plausibly explain a given image. Thus, we instinctively acknowledge and account for the ill-posed nature of the problem of monocular 3D reconstruction, where a single 2D input image gives rise to multiple reasonable 3D solutions.
In recent years, great progress has been made towards computer vision algorithms that can imitate a human's ability to reconstruct in 3D from partial 2D observations. These 3D reconstruction methods facilitate impactful applications in healthcare, robotics, virtual retail and entertainment. However, most approaches in the contemporary research literature are deterministic, and estimate a single 3D ``best-guess'' solution given an input image -- ignoring the inherent ambiguity in monocular reconstruction. Blindly assuming that a single 3D estimate matches the true 3D geometry of the subject, which is infeasible in such an ill-posed setting, can result in failures in reconstruction-reliant downstream applications. Moreover, a deterministic approach may also curtail the quality of monocular 3D human geometry and appearance estimates, resulting in blurry colours and over-smooth surfaces in ambiguous regions of the body.
This thesis develops probabilistic approaches to 3D human reconstruction, which predict probability distributions over 3D reconstructions conditioned on a single 2D RGB image. This enables us to sample any number of plausible 3D hypotheses during inference, and quantify and visualise prediction uncertainty, indicating the level of confidence our methods have in different reconstructed regions of the body. Specifically, Chapter 5 presents a selection of model-based probabilistic reconstruction methods, which involve predicting distributions over the parameters of a statistical body model. The extra information present in a predicted 3D distribution, beyond a single 3D point estimate, is valuable in downstream tasks. For example, it facilitates probabilistic fusion of 3D solutions from multiple images, or model fitting with an image-conditioned prior probability distribution -- both of which are demonstrated in Chapter 5. Moreover, Chapter 6 introduces a model-free probabilistic reconstruction method that yields photorealistic 3D samples with sharp colours and fine geometric details, even in unseen regions of the body.
We predict probability distributions using deep neural networks that are trained via supervised learning. This requires suitable training data -- i.e. images of humans with diverse poses, body shapes and scene conditions that are accurately labelled with the subject's ground-truth 3D geometry. Before detailing our probabilistic reconstruction methods, we present a synthetic training data generation pipeline for 3D pose and shape regression in Chapter 3, which overcomes a trade-off between 3D label accuracy and data diversity exhibited by contemporary real training datasets. In a similar vein, contemporary evaluation datasets for 3D human reconstruction also feature limited diversity of body shapes. To this end, Chapter 4 introduces an evaluation dataset for parametric body shape estimation -- Sports Shape and Pose 3D (SSP-3D) -- which contains 311 RGB images of 62 sportspeople with a wide range of body shapes. These works are used through this thesis to train and evaluate our probabilistic reconstruction methods.
