Understanding COVID-19 trajectories from a nationwide linked electronic health record cohort of 57 million people: phenotypes, severity, waves & vaccination
The Lancet Digital Health
MetadataShow full item record
Wood, A. Understanding COVID-19 trajectories from a nationwide linked electronic health record cohort of 57 million people: phenotypes, severity, waves & vaccination. The Lancet Digital Health https://doi.org/10.17863/CAM.85091
Background: Updatable understanding of the onset and progression of individuals COVID-19 trajectories underpins pandemic mitigation efforts. In order to identify and characterise individual trajectories, we defined and validated ten COVID-19 phenotypes from linked electronic health records (EHR) on a nationwide scale using an extensible framework. Methods: Cohort study of 57 million people in England alive on 23/01/2020, followed until 30/11/2021, using eight linked national datasets spanning COVID-19 testing, vaccination, primary & secondary care and death registrations data. We defined ten COVID-19 phenotypes reflecting clinically relevant stages of disease severity using a combination of international clinical terminologies (e.g. SNOMED-CT, ICD-10) and bespoke data fields; positive test, primary care diagnosis, hospitalisation, ventilatory support (four phenotypes), and death (three phenotypes). Using these phenotypes, we constructed patient trajectories illustrating the transition frequency and duration between phenotypes. Analyses were stratified by pandemic waves and vaccination status. Findings: We identified 7,244,925 infected individuals (12.7%) with 13,990,423 recorded COVID-19 phenotypes. Of these, 460,737 (6.4%) were hospitalised and 158,020 (2.2%) died. Of those hospitalised, 48,847 (11%) were admitted to intensive care (ICU), 69,090 (15%) received non-invasive ventilation and 25,928 (5.6%) invasive ventilation. Amongst hospitalised patients, first wave mortality (30%) was higher than the second (23%) for those not receiving ventilatory support, but remained unchanged for ICU patients. The highest mortality was for patients receiving ventilatory support outside of ICU in wave 1 (51%). 15,486 (10%) COVID-19 related deaths occurred without diagnoses on the death certificate, but within 28 days of a first COVID-19 event while 10,884 (7% of deaths) were identified from mortality data alone with no prior phenotypes recorded. We observed longer patient trajectories in the second pandemic wave compared to the first. Interpretation: Our analyses illustrate the wide spectrum of severity that COVID-19 displays and significant differences in incidence, survival and pathways across pandemic waves. We provide an adaptable framework to answer questions of clinical and policy relevance; new variant impact, booster dose efficacy and a way of maximising existing data to understand individuals progression through disease states.
The British Heart Foundation Data Science Centre (grant No SP/19/3/34678, awarded to Health Data Research (HDR) UK) funded co-development (with NHS Digital) of the trusted research environment, provision of linked datasets, data access, user software licences, computational usage, and data management and wrangling support, with additional contributions from the HDR UK Data and Connectivity component of the UK Government Chief Scientific Adviser’s National Core Studies programme to coordinate national covid-19 priority research. Consortium partner organisations funded the time of contributing data analysts, biostatisticians, epidemiologists, and clinicians. This work was funded by the Longitudinal Health and Wellbeing COVID-19 National Core Study, which was established by the UK Chief Scientific Officer in October 2020 and funded by UK Research and Innovation (grant references MC_PC_20030 and MC_PC_20059), by the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation(grant reference MC_PC_20058), and by the CONVALESCENCE study of long COVID, which is funded by NIHR/UKRI. This work was supported by Health Data Research UK, which receives its funding from HDR UK Ltd (HDR-9006) funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation (BHF) and the Wellcome Trust. AA is supported by Health Data Research UK (HDR-9006), which receives its funding from the UK Medical Research Council (MRC), Engineering and Physical Sciences Research Council (EPSRC), Economic and Social Research Council (ESRC), Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh government), Public Health Agency (Northern Ireland), British Heart Foundation (BHF), and Wellcome Trust; and Administrative Data Research UK, which is funded by the ESRC (grant ES/S007393/1). AB is supported by research funding from the National Institute for Health Research (NIHR), British Medical Association, Astra-Zeneca, and UK Research and Innovation. AGL is supported by funding from the Wellcome Trust (204841/Z/16/Z), National Institute for Health Research (NIHR) University College London Hospitals Biomedical Research Centre (BRC714/HI/RW/101440), NIHR Great Ormond Street Hospital Biomedical Research Centre (19RX02) and Academy of Medical Sciences (SBF006\1084). MAM is supported by research funding from Astra-Zeneca. AH is supported by research funding from the HDR UK text analytics implementation project AB, AW, HH, and SD are part of the BigData@Heart Consortium, funded by the Innovative Medicines Initiative-2 Joint Undertaking under grant agreement No 116074. AW and SI are supported by the BHF-Turing Cardiovascular Data Science Award (BCDSA\100005) and by core funding from UK MRC (MR/L003120/1), BHF (RG/13/13/30194; RG/18/13/33946), and NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). JAC and JS are supported by the Health Data Research (HDR) UK South West Better Care Partnership and the NIHR Bristol Biomedical Research Centre at University Hospitals Bristol, and Weston NHS Foundation Trust and the University of Bristol. JS is additionally supported by the UKRI MRC. SD, HH are supported by HDR UK London, which receives its funding from HDR UK funded by the UK MRC, EPSRC, ESRC, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh government), Public Health Agency (Northern Ireland), BHF, and Wellcome Trust; HH and SD are supported by the NIHR Biomedical Research Centre at University College London Hospital NHS Trust. SD is supported by an Alan Turing Fellowship (EP/N510129/1), the BHF Data Science Centre and the NIHR-UKRI CONVALESCENCE study. HH is a NIHR Senior Investigator. SD and HH are supported by the BHF Accelerator Award AA/18/6/24223. CT is supported by a UCL UKRI Centre for Doctoral Training in AI-enabled Healthcare studentship (EP/S021612/1), MRC Clinical Top-Up and a studentship from the NIHR Biomedical Research Centre at University College London Hospital NHS Trust. WW is supported by a Scottish senior clinical fellowship, CSO (SCAF/17/01) and the Stroke Association (SA CV 20\100018), has received consulting fees from Bayer, has received payment for expert testimony from UK courts, participates on the data safety monitoring/advisory board for PROTECT-U, CATIS, INTERACT-4, MOSES and Bayer, has leadership of fiduciary roles with BIASP Scientific Committee and is associate editor of Stroke. HW is supported by Medical Research Council (grant no. MR/S004149/2); National Institute for Health Research (grant no. NIHR202639); The Advanced Care Research Centre Programme at the University of Edinburgh. KL is supported by University College London (UCL) & Rosetrees Trust (UCL-IHE-2020\102); National Institute for Health Research (NIHR) & NHS (AI_AWARD01786); NIHR UCLH Biomedical Research Centre (BRC713/HI/RW/101440); UCL Higher Education Innovation Fund (KEI2021-03-16). BAM is an employee of the Wellcome Trust and supported by research funding from HDR UK, MRC and Diabetes UK. NS is supported by grants from AstraZeneca, Boehringer Ingelheim, Novartis and Roche Diagnostics and has received consulting fees from Afimmune, Amgen, AstraZeneca, Boehringer Ingelheim, Eli Lilly, Hanmi Pharmaceuticals, Merck Sharp & Dohme, Novartis, Novo Nordisk, Pfizer and Sanofi.
Embargo Lift Date
This record's DOI: https://doi.org/10.17863/CAM.85091
This record's URL: https://www.repository.cam.ac.uk/handle/1810/337685
Attribution 4.0 International
Licence URL: https://creativecommons.org/licenses/by/4.0/