Repository logo

Methods for handling missing data in cohort studies where outcomes are truncated by death



Change log


This dissertation addresses problems found in observational cohort studies where the repeated outcomes of interest are truncated by both death and by dropout. In particular, we consider methods that make inference for the population of survivors at each time point, otherwise known as 'partly conditional inference'. Partly conditional inference distinguishes between the reasons for missingness; failure to make this distinction will cause inference to be based not only on pre-death outcomes which exist but also on post-death outcomes which fundamentally do not exist. Such inference is called 'immortal cohort inference'.

Investigations of health and cognitive outcomes in two studies - the 'Origins of Variance in the Old Old' and the 'Health and Retirement Study' - are conducted. Analysis of these studies is complicated by outcomes of interest being missing because of death and dropout. We show, first, that linear mixed models and joint models (that model both the outcome and survival processes) produce immortal cohort inference. This makes the parameters in the longitudinal (sub-)model difficult to interpret.

Second, a thorough comparison of well-known methods used to handle missing outcomes - inverse probability weighting, multiple imputation and linear increments - is made, focusing particularly on the setting where outcomes are missing due to both dropout and death. We show that when the dropout models are correctly specified for inverse probability weighting, and the imputation models are correctly specified for multiple imputation or linear increments, then the assumptions of multiple imputation and linear increments are the same as those of inverse probability weighting only if the time of death is included in the dropout and imputation models. Otherwise they may not be. Simulation studies show that each of these methods gives negligibly biased estimates of the partly conditional mean when its assumptions are met, but potentially biased estimates if its assumptions are not met. In addition, we develop new augmented inverse probability weighted estimating equations for making partly conditional inference, which offer double protection against model misspecification. That is, as long as one of the dropout and imputation models is correctly specified, the partly conditional inference is valid.

Third, we describe methods that can be used to make partly conditional inference for non-ignorable missing data. Both monotone and non-monotone missing data are considered. We propose three methods that use a tilt function to relate the distribution of an outcome at visit j among those who were last observed at some time before j to those who were observed at visit j. Sensitivity analyses to departures from ignorable missingness assumptions are conducted on simulations and on real datasets. The three methods are: i) an inverse probability weighted method that up-weights observed subjects to represent subjects who are still alive but are not observed; ii) an imputation method that replaces missing outcomes of subjects who are alive with their conditional mean outcomes given past observed data; and iii) a new augmented inverse probability method that combines the previous two methods and is doubly-robust against model misspecification.





Seaman, Shaun
Muniz Terrera, Graciela


Dropout, Generalized estimating equation, Imputation, Longitudinal data, Missing at random, Partly conditional inference


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge