Gaze Estimation with Graphics

Wood, Erroll William 

Thumbnail Image
Change log

Gaze estimation systems determine where someone is looking. Gaze is used for a wide range of applications including market research, usability studies, and gaze-based interfaces. Traditional equipment uses special hardware. To bring gaze estimation mainstream, researchers are exploring approaches that use commodity hardware alone. My work addresses two outstanding problems in this field: 1) it is hard to collect good ground truth eye images for machine learning, and 2) gaze estimation systems do not generalize well -- once they are trained with images from one scenario, they do not work in another scenario.

In this dissertation I address these problems in two different ways: learning-by-synthesis and analysis-by-synthesis. Learning-by-synthesis is the process of training a machine learning system with synthetic data, i.e. data that has been rendered with graphics rather than collected by hand. Analysis-by-synthesis is a computer vision strategy that couples a generative model of image formation (synthesis) with a perceptive model of scene comparison (analysis). The goal is to synthesize an image that best matches an observed image.

In this dissertation I present three main contributions. First, I present a new method for training gaze estimation systems that use machine learning: learning-by-synthesis using 3D head scans and photorealistic rendering. Second, I present a new morphable model of the eye region. I show how this model can be used to generate large amounts of varied data for learning-by-synthesis. Third, I present a new method for gaze estimation: analysis-by-synthesis. I demonstrate how analysis-by-synthesis can generalize to different scenarios, estimating gaze in a device- and person- independent manner.

Robinson, Peter
Graphics, Computer Vision, Eye Tracking, Gaze Estimation
Doctor of Philosophy (PhD)
Awarding Institution
University of Cambridge
EPSRC Doctoral Training Grant studentship for Erroll Wood (RG71269)