DGG and MT-MH are joint senior authors.
To explore the genetics of four Parkinson’s disease (PD) subtypes that have been previously described in two large cohorts of patients with recently diagnosed PD. These subtypes came from a data-driven cluster analysis of phenotypic variables.
We looked at the frequency of genetic mutations in glucocerebrosidase (GBA) and leucine-rich repeat kinase 2 against our subtypes. Then we calculated Genetic Risk Scores (GRS) for PD, multiple system atrophy, progressive supranuclear palsy, Lewy body dementia, and Alzheimer’s disease. These GRSs were regressed against the probability of belonging to a subtype in the two independent cohorts and we calculated q-values as an adjustment for multiple testing across four subtypes. We also carried out a Genome-Wide Association Study (GWAS) of belonging to a subtype.
A severe disease subtype had the highest rates of patients carrying GBA mutations while the mild disease subtype had the lowest rates (p=0.009). Using the GRS, we found a severe disease subtype had a reduced genetic risk of PD (p=0.004 and q=0.015). In our GWAS no individual variants met genome wide significance (<5×10e-8) although four variants require further follow-up, meeting a threshold of <1×10e-6.
We have found that four previously defined PD subtypes have different genetic determinants which will help to inform future studies looking at underlying disease mechanisms and pathogenesis in these different subtypes of disease.
Data-driven approaches have been used to generate Parkinson’s disease subtypes in many studies but little is known about the genetics of these subtypes.
We found in previously developed Parkinson’s subtypes that a severe disease subtype had the highest rates of glucocerebrosidase mutation carriers and the lowest genetic risk within a Parkinson’s Genetic Risk Score.
These results provide some biological validity to our data-driven subtyping approach and will assist in future studies looking at underlying disease mechanisms and pathogenesis.
Parkinson’s disease (PD) is a common and progressive neurodegenerative disorder encompassing a wide range of motor and non-motor features. There is considerable heterogeneity within these features in terms of presentation and progression which has led many to believe there are different clinically relevant subtypes of the disease. Data-driven approaches have been applied to many PD cohorts to try and delineate these subtypes, the first was in 1999
We previously derived Parkinson’s clinical subtypes in over 2500 early patients with PD recruited from two large cohorts: Oxford Discovery and Tracking Parkinson’s.
Considering differences in genetics might help determine any difference in the aetiology of the subtypes while also providing a biological confirmation of data-driven clustering approaches. Here, we report on the genetics of our validated PD subtypes using data from the Oxford Discovery and Tracking Parkinson’s cohorts combined. To calculate the genetic risk of PD and related conditions including the atypical parkinsonian disorders and Alzheimer’s disease (AD), we identified a Genome Wide Association Study (GWAS) of disease status (an analysis of case/control status) for each of the diseases. We then looked at whether the genetic risk of PD and related disorders was associated with belonging to a particular disease subtype. We also considered two of the most important mutations in PD, glucocerebrosidase (GBA) and leucine-rich repeat kinase 2 (LRRK2), against our subtypes. Finally, we carry out a GWAS study to see whether any individual genetic variants are associated with belonging to a subtype.
A recent GWAS study has been published based on the TD and PIGD motor subtypes which found multiple PD risk alleles that might influence the motor subtype.
We used data from two large prospective early PD cohorts. The Tracking Parkinson’s cohort includes UK-wide centres, recruited between February 2012 and May 2014. Full details of this cohort along with inclusion/exclusion criteria have been published previously.
Our data-derived PD subtypes were determined using variables from motor, non-motor and cognitive domains at baseline. Our clustering approach used a factor analysis followed by a k-means cluster analysis where we considered two to five clusters. Individuals were excluded from the cluster analysis if they had been rediagnosed with another condition during follow-up or if they had been given a probability of a diagnosis of PD of <90% at the latest visit as rated by a research neurologist or movement disorder specialist. This was an attempt to exclude those incorrectly diagnosed with PD.
Our first paper on this subject was based on only the Oxford Discovery cohort (with 769 patients) and we found five clusters gave us the optimal solution.
In the Tracking Parkinson’s cohort, individuals were genotyped using the Illumina HumanCore Exome array with custom content.
In a principal components (PCs) analysis, 20 genetic PCs were generated from a linkage-pruned SNP set (removing SNPs with an r2 >0.02 in a 1000 kb sliding window shifting 10 SNPs at a time). If an individual was >6 SDs from the mean of one of the first 5 PCs or a clear outlier in a scatter plot they were excluded and then the PCs recalculated and repeated until there were no outliers. The first five PCs were then retained to be included as covariates within the GWAS.
Our main focus was to look at genetic risk of PD but we also wanted to explore whether they might be shared genetic pathways between other neurodegenerative disorders (progressive supranuclear palsy (PSP), multiple system atrophy (MSA), Lewy body dementia (LBD) and AD) and each subtype while also exploring the potential for selection bias where atypical parkinsonian disorders might have been incorrectly diagnosed as PD. To calculate the genetic risk of each condition we identified an external GWAS of disease status (an analysis of case/control status) applied to separate PD, MSA, PSP, LBD and AD cohorts.
GBA mutations were split into those that are recognised as causing Gaucher’s disease (GD) (the most common being L444P and N370S) and those that are not (E326K and T369M) as previously reported from Tracking.
We tabulated the clusters against LRRK2 and GBA status using a Fisher’s exact test (since the frequencies are very small in some cells due to the rarity of these mutations) to determine the strength of any association.
We calculated the probability of belonging to a cluster from the discriminant analysis model from our validated subtypes paper.
In an attempt to assess the potential for selection bias we compared age (t-test), gender and cluster assignment (χ2 test) for those who did and did not have genetic data from the SNP arrays after quality control.
We calculated GRS for PD, PSP, MSA, LBD and AD by multiplying the genome wide significant SNPs (p<5×10e-8) by their beta coefficients taken from each external GWAS and then standardising the score. This GRS can be interpreted as an estimate of the contribution of genetics to developing one of these diseases.
We carried out a GWAS with linear regression using the logs odds of belonging to a cluster as the outcome. The first five genetic PCs were used as covariates for each regression. Only SNPs with a minor allele frequency (MAF) >0.05 were included. The data were combined using a fixed effects meta-analysis. We also computed the expected power for our sample size
Palindromic SNPs (where the alleles are nucleotides that pair to each other making it difficult to determine the direction of effect) that had an MAF >0.45 were excluded when calculating the GRS and also from the GWAS.
After all the quality control procedures, we had genetic data on 1467 derived from 1601 (91.6%) individuals from the original Tracking cluster analysis. Average age (67.2 vs 68.0 with p=0.31) and gender rates (34.2% vs 34.3% female with p=0.97) were similar in those with and without genetic data (respectively). Looking within clusters rates of those included varied from 96.8% (cluster 4) to 88.6% (cluster 1) with a p=0.001. For those with genetic data there were 437, 423, 304 and 303 individuals in clusters 1–4, respectively.
In the Oxford Discovery cohort, we had genetic data on 807 individuals, out of 944 (85.5%) individuals from the cluster analysis. Within Oxford Discovery average age (67.4 vs 66.1 with p=0.15) and gender rates (34.3% vs 41.6% female with p=0.099) were similar in those with and without genetic data (respectively). Looking within clusters rates of those included varied from 87.5% (cluster 4) to 83.0% (cluster 3) with a p =value 0.53. For those with genetic data there were 261, 145, 185 and 216 individuals in clusters 1–4, respectively.
Data-derived clusters compared with LRRK2 and GBA mutation status
LRRK2 | GBA | |||||
Non-carriers | Carriers | Non-carriers | E326K and T369M carriers | GD-causing variants | ||
Cluster 1 | 469 (99.8%) | 1 (0.2%) | Cluster 1 | 437 (91.8%) | 29 (6.1%) | 10 (2.1%) |
Cluster 2 | 432 (99.1%) | 4 (0.9%) | Cluster 2 | 413 (93.7%) | 20 (4.5%) | 8 (1.8%) |
Cluster 3 | 314 (98.1%) | 6 (1.9%) | Cluster 3 | 282 (87.0%) | 27 (8.3%) | 15 (4.6%) |
Cluster 4 | 304 (99.7%) | 1 (0.3%) | Cluster 4 | 280 (90.9%) | 20 (6.5%) | 8 (2.6%) |
P=0.059 | P=0.080 | |||||
P value (GBA variants combined)=0.018 | ||||||
Cluster 1 | 280 (99.3%) | 2 (0.7%) | Cluster 1 | 231 (90.9%) | 15 (5.9%) | 8 (3.2%) |
Cluster 2 | 150 (98.7%) | 2 (1.3%) | Cluster 2 | 127 (93.4%) | 8 (5.9%) | 1 (0.7%) |
Cluster 3 | 204 (100%) | 0 | Cluster 3 | 158 (88.8%) | 14 (7.9%) | 6 (3.4%) |
Cluster 4 | 221 (99.1%) | 2 (0.9%) | Cluster 4 | 185 (90.7%) | 16 (7.8%) | 3 (1.5%) |
P=0.45 | P=0.57 | |||||
P value (GBA variants combined)=0.59 | ||||||
Combined cohort | ||||||
Combined cohort p=0.35 | Combined cohort p=0.036 | |||||
Combined cohort p value (GBA variants combined)=0.009 |
Note the numbers in this table are slightly different to the numbers in the other analyses since the mutation status did not come from the imputed array data.
GBA, glucocerebrosidase; GD, Gaucher’s disease; LRRK2, leucine-rich repeat kinase 2.
Within the Tracking cohort the third disease cluster (severe motor disease and poor psychological well-being) had the greatest proportion of GBA carriers (12.9% across both carrier groups) and the second disease cluster (mild motor and non-motor disease) had the lowest proportion of GBA carriers (6.3%). This trend was also seen in Oxford Discovery cohort (11.3% in cluster 3 vs 6.6% in cluster 2). In the combined cohorts a p value for a difference in GBA carrier rates across the clusters was p=0.036, and when combining the two GBA carrier groups the p value was smaller at p=0.009.
Genetic PD risk (see
Genetic risk of Parkinson’s disease (PD) versus likelihood of belonging to a cluster.
We can see in
Genetic risk of atypical Parkinson’s: progressive supranuclear palsy (PSP) and multiple system atrophy (MSA).
In
Genetic risk of dementia: Alzheimer’s disease and Lewy body dementia (LBD). PD, Parkinson’s disease.
There was little evidence of population stratification since within the four GWAS analyses from Tracking, the genomic inflation factor lambda varied from 1.001 to 1.008, while within Oxford Discovery they were all 1.0.
We highlight the power we have to detect a genome wide significant variant given our sample size in
SNPs meeting a threshold of 1×10e-6 from the genome wide association study meta-analysis for each data-driven cluster
Chr | Position (GRCh37) | Marker | A1 | A2 | Nearest gene | Beta | SE | P value |
1 | 237 734 615 | rs151043031 | CT | C | RYR2 | 0.59 | 0.12 | 9.986e-07 |
6 | 160 698 177 | rs316037 | G | A | SLC22A2 | 0.60 | 0.12 | 9.867e-07 |
6 | 160 699 605 | rs5881357 | AT | A | SLC22A2 | 0.60 | 0.12 | 6.337e-07 |
1 | 214 449 747 | rs116258323 | T | C | SMYD2 | 1.62 | 0.33 | 6.715e-07 |
A1, effect allele; A2, other allele; Chr, chromosome; SE, SE error; SNPs, single-nucleotide polymorphisms.
The associations between GBA and the phenotypic clusters, with a severe disease cluster having the greatest proportion of carriers and a mild disease cluster having the smallest proportion, are what would be expected given the observational evidence that GBA mutations are associated with higher Hoehn and Yahr stage and worse cognition.
There is also heterogeneity of clinical phenotype within LRRK2 carriers which would make it difficult to correlate them with clusters. One study showed that mutations of the LRRK2 gene are associated with less cognitive impairment compared with iPD
There are several possible explanations for the negative association between genetic risk of PD and the third, severe disease cluster. The first is that the individuals in this cluster have a more environmental and less genetically driven disease aetiology. The second is that this cluster is enriched with non-PD cases although the MSA and PSP genetic risk pooled associations do not support this, and it would also require that the PD GWAS studies had no enrichment of other similar conditions. The third is one of selection bias, in that these severe disease cases are less likely to participate in the PD cohorts that supply cases to the PD GWAS study we used, as compared with Oxford Discovery and Tracking cohorts which offered local clinical review for the majority of research participants. This PD GWAS study used data from 17 different datasets.
We have data on other monogenetic forms of Parkinson’s (SNCA, PRKN and PINK1) and have published this data from the Tracking cohort.
The negative association between genetic risk of PSP and cluster 2 and the positive association with cluster 3 in the Oxford Discovery cohort is what we would expect to see if there was enrichment of PSP cases. That is, PSP cases are more likely to belong to a severe motor disease cluster than a mild motor and non-motor disease cluster. However, this is not backed up by the associations within the larger Tracking cohort. This could represent a chance finding in Oxford Discovery. Alternatively, it could reflect the procedure we used to exclude patients from the analysis, that is dropping those with probability of diagnosis of PD of <90% at the latest clinic visit. In Tracking 367/1975 (18.6%) were dropped, while in Oxford Discovery only 76/1022 (7.4%) were dropped using this criterion (see
In previous research, we found cluster 3 was associated with a higher proinflammatory baseline profile (raised CRP, reduced apolipoprotein A1). This is interesting, as it suggests that in PD subtype 3—who have greater rates of cognitive dysfunction—early immune modulation might improve clinical outcomes, for example, by reducing future dementia risk if commenced early enough in the disease process. The lower overall genetic risk of PD and a higher pro-inflammatory profile in this cluster, are consistent with a hypothesis that the aetiology of this cluster is more driven by environmental rather than genetic risk factors.
Although none of our individual variants met the GWAS p value significance threshold the ones that we highlight might be interesting for future follow-up and research. It could be that the variants, or closest genes to these variants, are a reason that a person develops a particular subtype of Parkinson’s.
In previous research, we used multinomial logistic regression to look at how blood biomarkers are associated with an individual belonging to one of the clusters.
The strengths of this study are we have used two large early in the disease course and well-phenotyped PD cohorts. Our subtypes were created using large amounts of phenotypic data incorporating 21 variables across 12 important domains and these subtypes were developed and validated in over 2500 subjects. These subtypes were shown to be associated with both motor progression and medication response in a levodopa challenge. The limitations of this study are that in terms of searching for individual genetic variants it is still too small to find any that reach genome wide significance, assuming that such variants exist. Also there is the possibility of selection bias as rates of those with genetic data varied by cluster within the Tracking cohort. The frequency of PD subtypes in our cohorts may be different to that in the general PD population if belonging to a subtype was related to agreeing to take part in our cohorts or our cohorts failed to identify specific individuals during recruitment. However, to bias our estimates of genetics versus the clusters, it would require that selection into our cohorts was also related to an individual’s genetics. Diagnosis of PD will not be perfect and some patients will turn out to have other parkinsonian disorders, although we have attempted to mitigate this by excluding individuals with a diagnostic probability of PD <90% at the latest visit.
There are other subtypes that have been defined by a data-driven cluster analysis on motor and non-motor symptomatic data. Currently, it is difficult to determine whether the cluster definition we have used is more robust or superior to other definitions. However, in a recent systematic review our paper was rated (among 25 other data-driven studies) along with two others as having the highest methodological quality and clinical applicability.
Future work is now ongoing to understand the underlying disease pathophysiology driving these different clinical clusters in early PD, and their subsequent progression. This will use a mechanistic approach comparing lysosomal, mitochondrial, inflammatory function, α-synuclein (α-syn) seeding amplification
The differences in genetics between these clusters lends biological validity to our data-driven clustering approach while also providing evidence that the different subtypes can inform on underlying disease mechanisms and pathogenesis, as well as informing individual disease trajectories in PD.
The Oxford Discovery study was funded by the Monument Trust Discovery Award from Parkinson’s UK and supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre based at Oxford University Hospitals NHS Trust and University of Oxford, and the NIHR Clinical Research Network: Thames Valley and South Midlands
The Tracking Parkinson’s study was funded by Parkinson’s UK and supported by the National Institute for Health Research (NIHR) DeNDRoN network, the NIHR Newcastle Biomedical Research Unit based at Newcastle upon Tyne Hospitals NHS Foundation Trust and Newcastle University, and the NIHR funded Biomedical Research Centre in Cambridge (Grant number:146281).
@foltynie
DGG and MT-MH contributed equally.
ML: analysis and interpretation of the data, writing of the manuscript. ML acts as guarantor and accepts full responsibility for the finished work and/or the conduct of the study, had access to the data, and controlled the decision to publish. MT: analysis and interpretation of the data, revision of the manuscript. YB-S: study concept and design, analysis and interpretation of the data, revision of the manuscript. FB: acquisition of data, revision of the manuscript. TB: acquisition of data, revision of the manuscript. JCK: acquisition of data, revision of the manuscript. SGE: analysis and interpretation of data, revision of the manuscript. SM: analysis and interpretation of data, revision of the manuscript. NM: acquisition of data, revision of the manuscript. KG: study concept and design, acquisition of data, revision of the manuscript. RAB: study concept and design, acquisition of data, revision of the manuscript. NW: study concept and design, revision of the manuscript. DJB: study concept and design, acquisition of data, revision of the manuscript. TF: study concept and design, acquisition of data, revision of the manuscript. HRM: study concept and design, acquisition of data, revision of the manuscript. NW: study concept and design, revision of the manuscript. DGG: study concept and design, acquisition of data, revision of the manuscript. MT-MH: study concept and design, acquisition of data, revision of the manuscript.
Both the Oxford Discovery (grant reference J-1403) and Tracking Parkinson’s cohorts (grant reference J-1101) were funded by Parkinson’s UK
The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
ML: received fees for advising on a secondary analysis of an RCT sponsored by North Bristol NHS trust. MMXT: reports no disclosures. YB-S: reports no disclosures. FB: reports no disclosures. TB: reports no disclosures. JCK: reports no disclosures. SGE: reports no disclosures. SM: reports no disclosures. NM: reports no disclosures. KG: reports no disclosures. RAB: received grants from Parkinson’s UK, NIHR, Cure Parkinson’s Trust, Evelyn Trust, Rosetrees Trust, MRC and EU along with payment for advisory board attendance from Oxford Biomedica, Aspen Neuroscience, UCB, BlueRock Therapeutics, Novo Nordisk and LCT, and honoraria from Wiley and Springer. NW: reports no disclosures. DJB: received grants from NIHR, Wellcome Trust, GlaxoSmithKline, Parkinson’s UK, and Michael J Fox Foundation. TF: grants from National Institute of Health Research, Michael J Fox Foundation, John Black Charitable Foundation, Cure Parkinson’s Trust, Innovate UK, Janet Owens Research Fellowship, Van Andel Research Institute and Defeat MSA. Advisory Boards for Peptron, Voyager Therapeutics, Handl therapeutics, Living Cell Technologies, Bial grants from Welsh Assembly Government, personal fees from Teva, personal fees from Abbvie, personal fees from Teva, personal fees from UCB, personal fees from Boehringer-Ingelheim, personal fees from GSK, non-financial support from Teva, grants from Ipsen Fund, non-financial support from Medtronic, grants from MNDA, grants from PSP Association, grants from CBD Solutions, grants from Drake Foundation, personal fees from Acorda, outside the submitted work; In addition, HRM has a patent. HRM is a coapplicant on a patent application related to C9ORF72 - Method for diagnosing a neurodegenerative disease (PCT/GB2012/052140) pending. NW: reports Funding from Aligning science against Parkinsons (ASAP). He has also received consultancy fees from GSK. DGG: received payment for advisory board attendance from Merz Pharma, Vectura plc, and consultancy fees from the GM clinic. Grant support from Parkinson’s UK, the Neurosciences Foundation, and Michael’s Movers. MT-MH: received payment for Advisory Board attendance/consultancy for Biogen, Roche, CuraSen Therapeutics, Evidera, Manus Neurodynamica. She received funding/grant support from Parkinson’s UK, Oxford NIHR BRC, University of Oxford, CPT, Lab10X, NIHR, Michael J Fox Foundation, H2020 European Union, GE Healthcare and the PSP Association. MT-MH is a coapplicant on a patent application related to smartphone predictions in Parkinson’s (PCT/GB2019/052522) pending.
Not commissioned; externally peer reviewed.
This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
Data are available on reasonable request. Data from the Oxford Discovery cohort is available on request from
Consent obtained directly from patient(s)
This study involves human participants and was approved by the Oxford Discovery cohort was approved by NRES Committee, South Central Oxford A Research Ethics Committee, Reference number 16/SC/0108 The Tracking Parkinsons cohort was approved by West of Scotland Research Ethics Service (WoSRES) reference 11/AL/0163. Participants gave informed consent to participate in the study before taking part.