Multimorbidity is associated with mortality and service use, with specific types of multimorbidity having differential effects. Additionally, multimorbidity is often negatively associated with participation in research cohorts. Therefore, we set out to identify clusters of multimorbidity patients and how they are differentially associated with mortality and service use across age groups in a population-representative sample.
Linked primary and secondary care electronic health records contributed by 382 general practices in England to the Clinical Practice Research Datalink (CPRD) were used. The study included a representative set of multimorbid adults (18 years old or more,
We identified 20 patient clusters across four age strata. The clusters with the highest mortality comprised psychoactive substance and alcohol misuse (aged 18–64); coronary heart disease, depression and pain (aged 65–84); and coronary heart disease, heart failure and atrial fibrillation (aged 85+). The clusters with the highest service use coincided with those with the highest mortality for people aged over 65. For people aged 18–64, the cluster with the highest service use comprised depression, anxiety and pain. The majority of 85+-year-old multimorbid patients belonged to the cluster with the lowest service use and mortality for that age range. Pain featured in 13 clusters.
This work has highlighted patterns of multimorbidity that have implications for health services. These include the importance of psychoactive substance and alcohol misuse in people under the age of 65, of co-morbid depression and coronary heart disease in people aged 65–84 and of cardiovascular disease in people aged 85+.
As a result of improved life expectancy and ageing populations, a growing number of individuals are living with multimorbidity, i.e. more than one long-term condition [
Patients with multimorbidity have a diverse range of diseases, needs and outcomes [
This study aims to identify, validate and study the outcomes of age-stratified clusters of multimorbid adult patients in a large representative sample of UK patients. Towards this end, we used a comprehensive list of 38 long-term conditions [
Our analysis used the Clinical Practice Research Datalink (CPRD)-GOLD database where anonymised and longitudinal primary care clinical data are contributed by UK general (family) practices (GP) who use the Vision health record system [
Data on a random selection of individuals were acquired from CPRD (the same individuals studied in Cassell et al. [
There was no patient or public involvement in this study.
Data analysis was performed in R 3.4.4. R package names are given in the following sections where appropriate (
Morbidities in this study were defined as binary variables (present or not) based on the classification of LTCs in primary care developed by Barnett et al. [
Two sets of outcome variables related to service use and mortality were defined. NHS service utilisation or treatment burden was measured by three variables over the 12-month period after January 2012: primary care consultations (consultations with any clinician in the primary care team), the number of all-type hospitalisation spells (defined by discharge dates) and the count of regular medications (at least four prescriptions in a year by counting the unique British National Formulary (BNF) codes). All-cause mortality at 2 and 5 years was extracted from ONS data.
Patient characteristics that were considered in this study include gender, age groups (stratified into 18–44, 45–64, 65–84 and 85+ years) in 2012, last recorded pre-2012 body mass index (BMI), last recorded pre-2012 smoking status (current, never and ex-smokers) and socioeconomic deprivation measured by quintiles of IMD across the UK (1 for the least socioeconomically deprived quintile of areas and 5 for the most). Gender and age were determined in a straightforward manner from the CPRD-GOLD patient table. BMI and smoking status were extracted from the CPRD-GOLD clinical and additional tables using CPRD entity type 13 (BMI), CPRD entity type 4 (smoking status) and a smoking status Read code list from Jennifer Quint (Imperial College London) which is available at
This study aims to identify clusters of multimorbid patients using patterns of co-existing long-term conditions. We used latent class analysis (LCA) (
Guided by simulation studies [
To account for the different nature of multimorbidity clusters at different ages, four age strata (18–44, 45–64, 65–84, 85+ years) were chosen. We derived the cluster solution and performed post hoc statistical tests in a stratified (by age strata) random sample of the multimorbid population that contained 80% of the patients (i.e. training set). Separate LCAs were performed for each stratum, and each patient allocated to a single multimorbidity cluster. For ease of interpretation, clusters were labelled by their three most distinctive conditions whose difference in prevalence between cluster and age strata were the highest (see Additional file
To assess the stability of age-stratified multimorbidity clusters, LCAs were repeated in the remaining 20% of the population (i.e. test set), fixing the number of clusters to match that learned from the training set [
A total of 391,669 patients were included in the study, of which 49% and 22% had none or only one long-term condition respectively (see Table Demographic characteristics of the whole population ( Demographics No. of morbidities, median [Q1–Q3] Multimorbid patients (%) All patients 391,669 (100) 1 [0–2] 28.9 Gender Male 192,929 (49.3) 0 [0–2] 26.0 Female 198,740 (50.7) 1 [0–2] 31.7 Age group (years) 18–24 32,007 (8.2) 0 [0–0] 4.7 25–34 60,501 (15.4) 0 [0–1] 7.9 35–44 68,688 (17.5) 0 [0–1] 13.1 45–54 74,734 (19.1) 0 [0–1] 20.5 55–64 60,323 (15.4) 1 [0–2] 34.4 65–74 49,427 (12.6) 2 [1–3] 53.6 75–84 31,262 (8.0) 3 [1–4] 73.6 85+ 14,727 (3.8) 4 [2–5] 83.6 Socioeconomic status 1 (least deprivation) 90,730 (23.2) 1 [0–2] 27.1 2 87,734 (22.4) 1 [0–2] 28.5 3 81,569 (20.8) 1 [0–2] 29.1 4 71,424 (18.2) 1 [0–2] 29.3 5 (greatest deprivation) 60,212 (15.4) 1 [0–2] 31.4 Independence test for demographics and multimorbidity Gender vs multimorbidity: Age group vs multimorbidity: IMD vs multimorbidity:
Among the multimorbid patients (i.e. those with more than one long-term condition, Characteristics of multimorbid patients ( Demographic characteristics Across age strata ( Age 18–44 years ( Age 45–64 years ( Age 65–84 years ( Age 85+ years ( Female (%) 63,072 (56) 9422 (62) 19,690 (55) 25,952 (52) 8008 (65) No. of morbidities, median [Q1–Q3] 3 [2–4] 2 [2–3] 2 [2–3] 3 [2–4] 4 [3–6] Age, median [Q1–Q3] 66 [53–77] 37 [30–41] 56 [51–61] 74 [69–79] 89 [86–91] BMI Median [Q1–Q3] 27 [24–31] 26 [23–31] 28 [25–33] 27 [24–31] 25 [22–28] Missing (%) 5788 (5) 1571 (10) 1462 (4) 1540 (3) 1215 (10) Smoking status (%) Current smoker 21,586 (19) 5641 (37) 9140 (25) 6160 (12) 645 (5) Never smoker 55,145 (49) 6789 (45) 17,040 (47) 24,042 (49) 7274 (59) Ex-smoker 36,265 (32) 2821 (18) 9869 (27) 19,251 (39) 4324 (35) Missing 215 (0.19) 55 (0.36) 48 (0.13) 41 (0.08) 71 (0.58) Index of multiple deprivation in quintiles (%) 1 (least deprivation) 24,624 (22) 2647 (17) 7333 (20) 11,722 (24) 2922 (24) 2 25,027 (22) 2738 (18) 7668 (21) 11,667 (24) 2954 (24) 3 23,700 (21) 2969 (19) 7352 (20) 10,612 (21) 2767 (22) 4 20,934 (18) 3316 (22) 6862 (19) 8645 (17) 2111 (17) 5 (greatest deprivation) 18,926 (17) 3636 (24) 6882 (19) 6848 (14) 1560 (13)
For ease of reference, we refer to each cluster by its lead or key conditions (i.e. one or three conditions, respectively, whose cluster-specific prevalence is highest, and higher than their overall prevalence in their respective age group).
These clusters differ across age strata, both in terms of the number of clusters per strata and main components within each cluster (Table Descriptions of the derived clusters of multimorbid patients for each age strata. Clusters are ordered by sizes from the largest to the smallest. Key conditions are the three estimated to be most distinctive in the cluster (where the difference between within-cluster prevalence and prevalence in age strata is the largest). For the number of morbidities, median and first (Q1) and third (Q3) quartiles are reported. For other categorical variables, percentages are reported. Greater deprivation denotes top 40% of IMD (categories 4 and 5) Three key conditions (prevalence) Patients (%) No. of morbidities (median [Q1–Q3]) Female (%) Greater deprivation (%) Current smokers (%) Lead condition (%) Subsidiary conditions (%) Age 18–44 years Depression (100%) Anxiety (41%), pain (31%) 32 2 [2–3] 66 50 46 Pain (36%) Hearing loss (30%), hypertension (23%) 23 2 [2–3] 52 46 27 Asthma (100%) IBS (26%), depression (20%) 20 2 [2–3] 63 41 29 IBS (100%) Depression (29%), hearing loss (21%) 18 2 [2–3] 77 37 28 PSM (75%) Alcohol (42%), depression (24%) 7 2 [2–3] 28 63 76 Age 45–64 years Hypertension (76%) Diabetes (37%), pain (25%) 37 2 [2–3] 42 38 20 IBS (40%) Hearing loss (29%), pain (28%) 24 2 [2–3] 64 29 20 Depression (93%) Pain (53%), anxiety (31%) 22 3 [2–5] 68 46 35 Asthma (100%) Pain (24%), COPD (16%) 12 2 [2–3] 61 35 20 Alcohol (62%) PSM (42%), pain (28%) 4 3 [2–4] 31 57 63 Age 65–84 years Hypertension (100%) Diabetes (31%), pain (27%) 41 3 [2–4] 54 30 10 Hearing loss (40%) Prostate disorder (21%), IBS (3%) 22 3 [2–4] 48 25 9 Depression (56%) Pain (56%), anxiety (23%) 14 4 [3–5] 72 33 15 CHD (54%) Diabetes (32%), atrial fibrillation (29%) 11 4 [3–5] 30 33 12 COPD (57%) Asthma (49%), pain (33%) 8 3 [2–5] 50 40 24 Pain (81%) CHD (53%), depression (45%) 5 7 [7–9] 54 43 16 Age 85+ years Hypertension (72%) Hearing loss (39%), diabetes (18%) 58 3 [2–4] 61 30 5 Pain (64%) Depression (41%), constipation (24%) 23 5 [4–6] 80 30 5 CHD (61%) Atrial fibrillation (53%), heart failure (49%) 11 7 [6–8] 60 30 4 Asthma (48%) COPD (48%), pain (44%) 8 5 [4–6] 59 30 8 Mortality and health service utilisation by patient clusters in each age stratum. Clusters are ordered by the highest to the lowest mortality. The non-multimorbid cluster contains patients with zero or only one long-term condition. The number of GP consultations, hospitalisations and repeat prescriptions (by counting the number of unique BNF codes that were in repeated prescriptions at least four times) are measured in 1 year after January 2012. Both mean and median are reported because they highlight different aspects of skewed distributions, especially in relation to hospitalisations and prescriptions Three key conditions (prevalence) 2-year mortality (%) 5-year mortality (%) No. of GP contacts in a year (mean, median [Q1–Q3]) No. of hospitalisations in a year (mean, median [Q1–Q3]) No. of unique medicine on repeat prescriptions in a year (mean, median [Q1–Q3]) Lead condition (%) Subsidiary conditions (%) Age 18–44 years PSM (75%) Alcohol (42%), depression (24%) 1.8 3.9 10.7, 7 [1–16] 0.4, 0 [0–0] 1.3, 0 [0–2] Pain (36%) Hearing loss (30%), hypertension (23%) 1.0 2.7 11.9, 9 [3–17] 0.6, 0 [0–0] 2.5, 1 [0–4] Depression (100%) Anxiety (41%), pain (31%) 0.9 1.8 14.5, 12 [5–20] 0.4, 0 [0–0] 2.4, 2 [0–3] Asthma (100%) IBS (26%), depression (20%) 0.2 0.6 11.3, 9 [4–16] 0.4, 0 [0–0] 1.9, 1 [0–3] IBS (100%) Depression (29%), hearing loss (21%) 0.2 0.4 10.6, 8 [3–15] 0.4, 0 [0–0] 1.2, 0 [0–2] Non-multimorbid 0.1 0.2 3.7, 1 [0–5] 0.2, 0 [0–0] 0.2, 0 [0–0] Age 45–64 years Alcohol (62%) PSM (42%), pain (28%) 4.5 12.5 10.7, 8 [2–16] 0.5, 0 [0–1] 2.4, 1 [0–4] Depression (93%) Pain (53%), anxiety (31%) 2.4 5.8 16.7, 14 [7–23] 0.6, 0 [0–1] 5.1, 4 [2–7] Hypertension (76%) Diabetes (37%), pain (25%) 1.6 4.4 11.5, 9 [4–16] 0.5, 0 [0–0] 4.1, 4 [1–6] IBS (40%) Hearing loss (29%), pain (28%) 1.3 3.0 10.5, 8 [3–15] 0.5, 0 [0–0] 2.0, 1 [0–3] Asthma (100%) Pain (24%), COPD (16%) 1.0 2.7 12.6, 10 [5–17] 0.4, 0 [0–0] 3.4, 3 [1–5] Non-multimorbid 0.4 1.3 4.4, 2 [0–6] 0.2, 0 [0–0] 0.5, 0 [0–0] Age 65–84 years Pain (81%) CHD (53%), depression (45%) 16.2 39.2 26.3, 23 [14–35] 1.6, 1 [0–2] 10.7, 11 [8–14] CHD (54%) Diabetes (32%), atrial fibrillation (29%) 11.3 28.8 16.8, 14 [6–24] 1.1, 0 [0–1] 5.9, 6 [3–8] COPD (57%) Asthma (49%), pain (33%) 9.2 25.5 15.7, 13 [6–22] 0.8, 0 [0–1] 5.5, 5 [2–8] Depression (56%) Pain (56%), anxiety (23%) 8.4 20.9 17.3, 14 [8–23] 0.8, 0 [0–1] 5.9, 6 [3–8] Hypertension (100%) Diabetes (31%), pain (27%) 4.7 13.2 12.4, 10 [4–17] 0.6, 0 [0–1] 4.5, 4 [2–7] Hearing loss (40%) Prostate disorder (21%), IBS (3%) 4.4 11.1 13.2, 11 [5–19] 0.7, 0 [0–1] 3.3, 3 [1–5] Non-multimorbid 1.9 6.6 6.4, 4 [0–9] 0.3, 0 [0–0] 1.2, 0 [0–2] Age 85+ years CHD (61%) Atrial fibrillation (53%), heart failure (49%) 37.7 70.8 21.9, 18 [9–31] 1.5, 1 [0–2] 8.0, 8 [5–11] Pain (64%) Depression (41%), constipation (24%) 31.1 62.9 17.3, 15 [7–24] 0.8, 0 [0–1] 6.7, 7 [4–10] Asthma (48%) COPD (48%), pain (44%) 28.0 56.5 19.6, 17 [9–27] 1.1, 0 [0–2] 6.9, 7 [4–10] Hypertension (72%) Hearing loss (39%), diabetes (18%) 20.9 49.5 13.0, 10 [2–19] 0.8, 0 [0–1] 4.1, 4 [0–6] Non-multimorbid – 13.4 36.0 6.7, 3 [0–10] 0.4, 0 [0–0] 1.3, 0 [0–2]
Five clusters were uncovered in the 18–44 age strata (Additional file
Those in the cluster whose three key conditions were pain (36%), hearing loss (30%) and hypertension (23%) were found to have the highest hospital admission rates (an average of 0.6 visits in a year) and the highest count of regular medicines (median 1 [IQR 0–4] unique drug classes in a year). This corresponded to an aIRR for hospitalisations of 1.04 [95% CI 0.90–1.20] and an aIRR for regular medicines of 1.87 [95% CI 1.74–2.02] relative to the cluster with the lowest service use and mortality.
The highest mortality in this age range was found in the least prevalent (7%) multimorbidity cluster whose three key conditions were psychoactive substance misuse (75%), alcohol problems (42%) and depression (24%) (3.9% mortality in 5 years). This level of mortality was 18 times higher than that of individuals in the same age range without multimorbidity (0.2%). This cluster was predominantly male (72%), came from socioeconomically deprived areas (63% from the most deprived 40% of UK areas) and with high smoking rates (76% current smokers).
In the 45–64 age strata, LCA revealed five clusters (Additional file
Six clusters were found in the 65–84 age strata (Additional file
The 85+ age stratum was composed of four clusters (Additional file
As well as validating the clusters by their association with patient characteristics and outcomes, the similarity of multimorbidity clusters was compared between the training set (80% of patients,
As the training set contained more disease patterns, the derived clusters were more comprehensive. The test set (with fewer patients) contained fewer disease patterns, and therefore, we expected the derived clusters to be a subset of those in the training set. Indeed, validation of cluster profiles showed that every cluster in the test set found a match in the training set. Some clusters were particularly robust (had the smallest JSD and the highest Pearson’s correlation coefficient), for instance, those in the largest age strata (65–84 age strata,
This study identified and validated clusters of multimorbid patients using a novel patient-centred approach. In summary, we identified 20 patient clusters across four age strata. In the younger age-strata (18–44; 45–64), the clusters with the highest mortality (18 times higher than the non-multimorbid group in 18–44-year olds) comprised psychoactive substance abuse in combination with alcohol problems. The clusters with the most contact with general practice in people aged under 65 comprised depression, anxiety and pain. In 65–84-year olds, the cluster with the highest mortality and highest health service use (GP contact, hospitalisations, repeat prescriptions) comprised pain, coronary heart disease and depression, and in people aged 85 or over, it comprised heart failure, coronary heart disease and atrial fibrillation. The most common cluster in 18–44-year olds was centred around depression, but in all other age groups, they were centred around hypertension. In the oldest age group, this hypertension-centred cluster was associated with the best survival and lowest health service use among multimorbid patients. Pain featured in 13 of the clusters.
In this study, unlike most previous analyses of multimorbidity, we have defined novel clusters in terms of patients rather than diseases [
In terms of relative importance of single conditions within multimorbid clusters, the predominance of mental health conditions and hypertension has been identified in previous work [
The robust identification of such clusters would not have been possible without the novel use of representative data reflecting real-world patterns of multimorbidity, age stratification, patient-level clustering (not requiring all patients to have identical lists of conditions) and validation with held-out data. This is the largest-scale application of age-stratified latent class analysis to multimorbidity, both in patient numbers (above 100,000) and the number of conditions (38) [
This study suffers from typical limitations of electronic health record research in that they rely on routine coding within the healthcare system including residual confounding and variable CPRD data quality. Wherever practically feasible we have taken steps to address these, e.g. the careful design of codelists, relying on variables with low missingness and adjusting for key variables. Some relevant information, such as disease severity, was not available for the majority of diseases and so was not modelled. This may affect the association of disease with characteristics and outcomes. Given the observational nature of this data, some residual confounding such as this is inevitable, and so we caution that the relationship between clusters, patient demographics and outcomes should not be interpreted causally.
While the clustering approach used (LCA) is a robust probabilistic approach, results may differ subtly if other approaches are used. Validation of latent clusters also requires further research where a larger sample size for the test set, perhaps from another database or country, can strengthen the validation. We notice that in every age strata, there was a cluster whose lead condition (pain, irritable bowel syndrome, hearing loss and asthma respectively) had a within-cluster prevalence of less than 50%, suggesting that they are less distinctive than the other clusters. It is also interesting that they are often the clusters with the lowest mortality. While these were validated in the test set, it may be that bigger datasets are required to split these into more distinct and interpretable clusters. Despite this, given the large and representative sample, the consistency of results both internally, across age strata and with existing literature, we are confident in our main results. Finally, multimorbidity evolves over time, but we only use longitudinal data to extract conditions in 2012 and service use and mortality outcomes.
These multimorbidity clusters highlight major targets for public health and healthcare, giving a more nuanced understanding of multimorbidity than the work of Barnett et al. [
While patients with multimorbidity account for an ever-increasing proportion of healthcare need and provision [
We acknowledge CPRD at Cambridge for developing and sharing disease definitions and Dr. Jennifer Quint (Imperial College London) for the permission to use and share a codelist for smoking status. We also acknowledge the valuable statistical discussions with Dr. Robert Goudie and Dr. Paul Kirk at MRC Biostatistics Unit, University of Cambridge. This study is based in part on data from the Clinical Practice Research Datalink obtained under licence from the UK Medicines and Healthcare Products Regulatory Agency. The data is provided by patients and collected by the NHS as part of their care and support. ONS is the provider of ONS mortality data used in this study. ONS and HES data copyright© (2018) was re-used with the permission of The Health & Social Care Information Centre. All rights reserved. The interpretation and conclusions contained in this study are those of the authors alone.
SJK, YZ and DE conceived and designed the study. DE drafted the protocol, which authors (YZ, DE, RAP, SJK) contributed to and revised critically. SJK and YZ were responsible for the data management. YZ did the statistical analysis and drafted the manuscript. DE, RAP and JM contributed to the data presentation and interpretation. All authors read and approved the final manuscript.
SJK and YZ are supported by SJK’s MRC Career Development Award (MR/P021573/1). JM is an NIHR Senior Investigator. The funder is not involved in the study design, data collection, analysis, interpretation, report writing and submission.
The Clinical Practice Research Datalink (CPRD) is an electronic healthcare record database open to all researchers. Researchers can apply to access CPRD data and, if successful, can access the data of their choosing. The CPRD charges researchers and other organisations to access this data.
The data that support the findings of this study are available from the Clinical Practice Research Datalink (CPRD), but restrictions apply to the availability of these data, which were used under licence for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Clinical Practice Research Datalink.
This study was approved by the CPRD Independent Scientific Advisory Committee (ISAC), and so is covered by their ethics approval.
Not applicable
SJK has previously received research funding from EPSRC, BBSRC, MRC, NIHR, Alzheimer’s Society, Eli Lily and Janssen for other projects, and funding from Roche Diagnostics for advisory board participation and travel, and consulting fees for DIADEM. None of this relates to this work. The other authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.