Repository logo

Estimating physical activity from self-reported behaviours in large-scale population studies using network harmonisation: findings from UK Biobank and associations with disease outcomes

Published version

Change log


Pearce, Matthew 
Strain, Tessa 
Kim, Youngwon 
Sharp, Stephen J. 
Westgate, Kate 


Abstract: Background: UK Biobank is a large prospective cohort study containing accelerometer-based physical activity data with strong validity collected from 100,000 participants approximately 5 years after baseline. In contrast, the main cohort has multiple self-reported physical behaviours from > 500,000 participants with longer follow-up time, offering several epidemiological advantages. However, questionnaire methods typically suffer from greater measurement error, and at present there is no tested method for combining these diverse self-reported data to more comprehensively assess the overall dose of physical activity. This study aimed to use the accelerometry sub-cohort to calibrate the self-reported behavioural variables to produce a harmonised estimate of physical activity energy expenditure, and subsequently examine its reliability, validity, and associations with disease outcomes. Methods: We calibrated 14 self-reported behavioural variables from the UK Biobank main cohort using the wrist accelerometry sub-cohort (n = 93,425), and used published equations to estimate physical activity energy expenditure (PAEESR). For comparison, we estimated physical activity based on the scoring criteria of the International Physical Activity Questionnaire, and by summing variables for occupational and leisure-time physical activity with no calibration. Test-retest reliability was assessed using data from the UK Biobank repeat assessment (n = 18,905) collected a mean of 4.3 years after baseline. Validity was assessed in an independent validation study (n = 98) with estimates based on doubly labelled water (PAEEDLW). In the main UK Biobank cohort (n = 374,352), Cox regression was used to estimate associations between PAEESR and fatal and non-fatal outcomes including all-cause, cardiovascular diseases, respiratory diseases, and cancers. Results: PAEESR explained 27% variance in gold-standard PAEEDLW estimates, with no mean bias. However, error was strongly correlated with PAEEDLW (r = −.98; p < 0.001), and PAEESR had narrower range than the criterion. Test-retest reliability (Λ = .67) and relative validity (Spearman = .52) of PAEESR outperformed two common approaches for processing self-report data with no calibration. Predictive validity was demonstrated by associations with morbidity and mortality, e.g. 14% (95%CI: 11–17%) lower mortality for individuals meeting lower physical activity guidelines. Conclusions: The PAEESR variable has good reliability and validity for ranking individuals, with no mean bias but correlated error at individual-level. PAEESR outperformed uncalibrated estimates and showed stronger inverse associations with disease outcomes.



Research, Accelerometer, Physical activity energy expenditure, Questionnaire, Calibration, Doubly labelled water

Journal Title

International Journal of Behavioral Nutrition and Physical Activity

Conference Name

Journal ISSN


Volume Title



BioMed Central
UK Medical Research Council (MC_UU_12015/3)
NIHR Biomedical Research Centre in Cambridge (IS-BRC-1215-20014)
UK Medical Research Council (MC_UU_12015/1)