Repository logo

Optimising Cardiovascular Disease Risk Assessment: Application of Dynamic Prediction Tools and Risk Stratification Strategies Using Electronic Health Records



Change log


Cardiovascular diseases (CVDs) remain the leading cause of morbidity and mortality worldwide. Identifying individuals who are at higher risk of CVD is fundamental for effectively implementing prevention strategies with limited health care resources and subsequently reducing the burden of CVD. For this purpose, numerous prognostic cardiovascular risk prediction models have been developed in populations from different regions over the past two decades.

However, there are limitations of existing risk prediction models. First, they are mostly based on single measurements of risk factors and there is limited evidence quantifying the value of longitudinal risk predictor measures. Therefore, the first aim of this thesis is to evaluate the role of repeated risk factor measures on CVD risk prediction, with a focus on people with type 2 diabetes who are regularly monitored and have more measurements. Second, few models have considered effect of post-baseline statin initiation, which may lead to an underestimation of an individual’s future risk of disease. Thus, the second aim is to explore novel approaches to account for post-baseline statin initiation in CVD risk prediction models. Third, a single fixed risk threshold for treatment initiation is typically recommended in most guidelines, however, such strategy does not account for the large impact of age and sex on CVD risk. Consequently, the third aim is to investigate age- and sex-specific thresholds for CVD risk stratification.

These questions are addressed using electronic health records (EHRs) from approximately two million individuals from the UK Clinical Practice Research Datalink (CPRD), together with the linked data from Hospital Episode Statistics (HES) and the Office for National Statistics (ONS).

Key findings 1: By applying landmark modelling to EHRs for people with type 2 diabetes, models incorporating trajectories and variability of risk predictors demonstrated significant improvement in risk discrimination (C-index=0.659, 95% Confidence Interval: 0.654-0.663) as compared to using last observed values (0.651, 0.646-0.656) or means (0.650, 0.645-0.655). Inclusion of standard deviations (SDs) of systolic blood pressure yielded the greatest improvement in discrimination (C-index increase=0.005, 95% Confidence Interval: 0.004-0.007) in comparison to incorporating SDs of total cholesterol (0.002, 0.000-0.003), HbA1c (0.002, 0.000-0.003), or high-density lipoprotein cholesterol (0.003, 0.002-0.005). Given that repeat measures are readily available in EHRs especially for regularly monitored patients with diabetes, this improvement could easily be achieved.

Key findings 2: To account for statin initiation in CVD risk prediction, I incorporated a time-dependent effect of statin initiation constrained to a 25% relative risk reduction (from trial results) into the risk prediction models. In models accounting for (versus ignoring) statin initiation, 10-year CVD risk predictions were slightly higher; predictive performance was moderately improved. However, few individuals were reclassified to a high-risk threshold, resulting in negligible improvements in number needed to screen to prevent one CVD event. In conclusion, incorporating statin effects from trial results into risk prediction models enabled statin-naïve CVD risk estimation and provides moderate gains in predictive ability but had a limited impact on treatment decision-making under current guidelines in this population.

Key findings 3: Age- and sex-specific risk thresholds were specified as the minimum of 10% or the 90th percentile of the estimated risk distributions from the respective populations. Compared with the single threshold of 10%, using age- and sex-specific thresholds significantly improved the discriminatory ability to identify high-risk men and women at younger ages. The number needed to screen to prevent one CVD event was reduced by 58% and 89% for women and women aged 40 to 49. The gain in CVD-free life expectancy by age and sex was slightly higher when the strategy identified more people as high-risk for younger age groups, with a maximum increase of 0.16 years. In conclusion, the results suggest using age- and sex-specific thresholds can modestly enhance CVD risk stratification for allocation of statin therapy among younger people.

Overall, these findings have identified achievable and pragmatic approaches to improve CVD risk prediction and risk stratification for allocating statin initiation by harnessing information from electronic medical records.





Wood, Angela


cardiovascular disease, risk prediction, risk stratification, repeated measurements, electronic health records


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Medical Research Council (MR/L003120/1)
British Heart Foundation (RG/18/13/33946)
China Scholarship Council; UK Medical Research Council (MR/L003120/1); British Heart Foundation (RG/13/13/30194; RG/18/13/33946); BHF Cambridge Centre for Research Excellence (RE/13/6/30180); NIHR Cambridge Biomedical Research Centre (BRC-1215-20014)