RC and JC are joint first authors.
JEM and MI are joint senior authors.
Juvenile idiopathic arthritis (JIA) is an autoimmune disease and a common cause of chronic disability in children. Diagnosis of JIA is based purely on clinical symptoms, which can be variable, leading to diagnosis and treatment delays. Despite JIA having substantial heritability, the construction of genomic risk scores (GRSs) to aid or expedite diagnosis has not been assessed. Here, we generate GRSs for JIA and its subtypes and evaluate their performance.
We examined three case/control cohorts (UK, US-based and Australia) with genome-wide single nucleotide polymorphism (SNP) genotypes. We trained GRSs for JIA and its subtypes using lasso-penalised linear models in cross-validation on the UK cohort, and externally tested it in the other cohorts.
The JIA GRS alone achieved cross-validated area under the receiver operating characteristic curve (AUC)=0.670 in the UK cohort and externally-validated AUCs of 0.657 and 0.671 in the US-based and Australian cohorts, respectively. In logistic regression of case/control status, the corresponding odds ratios (ORs) per standard deviation (SD) of GRS were 1.831 (1.685 to 1.991) and 2.008 (1.731 to 2.345), and were unattenuated by adjustment for sex or the top 10 genetic principal components. Extending our analysis to JIA subtypes revealed that the enthesitis-related JIA had both the longest time-to-referral and the subtype GRS with the strongest predictive capacity overall across data sets: AUCs 0.82 in UK; 0.84 in Australian; and 0.70 in US-based. The particularly common oligoarthritis JIA also had a GRS that outperformed those for JIA overall, with AUCs of 0.72, 0.74 and 0.77, respectively.
A GRS for JIA has potential to augment clinical JIA diagnosis protocols, prioritising higher-risk individuals for follow-up and treatment. Consistent with JIA heterogeneity, subtype-specific GRSs showed particularly high performance for enthesitis-related and oligoarthritis JIA.
The diagnosis of juvenile idiopathic arthritis (JIA) is made purely using history and physical examination.
No sensitive or specific tests are available to assist clinicians in making the diagnosis.
JIA has similar genetic architecture to other autoimmune diseases and a strong association with the human leukocyte antigen (HLA) locus.
Demonstrates genomic machine learning can yield predictive genomic risk scores (GRSs) for JIA.
Subtype-specific GRSs better capture risk of each subtype separately.
Subtypes that take the longest to identify or are most common may benefit most from GRSs.
These GRSs have the potential to augment current JIA diagnosis protocols, prioritising higher-risk individuals for follow-up and treatment and reducing delays.
Subtype-specific analyses highlight the potential for genetic studies to better understand heterogeneous diseases such as JIA, potentially paving the way for better disease subtype prediction in general.
Juvenile idiopathic arthritis (JIA) is an autoimmune disease that comprises all forms of arthritis arising before the age of 16 years and persisting for more than 6 weeks.
Early diagnosis and treatment of JIA is critical as delays increase the risk of prolonged and uncontrolled disease, with consequent poorer long-term outcomes.
Schematic of a typical clinical path from first symptoms to JIA diagnosis and treatment. Potential informative points are included for JIA genomic risk scores to prioritising higher-risk individuals for referral, follow-up and treatment. GRS,genomic risk score; JIA, juvenile idiopathic arthritis.
JIA is a complex disease
Genetics is increasingly used to aid risk prediction, diagnosis and earlier treatment of human diseases, with HLA testing for various immune disorders being an example. More recently, the clinical utility of genetic and polygenic risk scores for diverse aetiologies, from coeliac disease to cardiovascular diseases, has come under intense investigation.
This study aims to create a GRS which in-principle could be used to support the current clinical JIA diagnosis practice. We used three large-scale independent cohorts of European ancestry to develop and test a GRS for JIA. Furthermore, we extended the GRS approach to design JIA subtype-specific GRSs, which we externally tested to quantify their potential relative clinical value in supporting each JIA subtype's time to diagnosis.
The ILAR classification system
In the US-based cohort, from the Children’s Hospital of Philadelphia (CHOP),
Finally, in the Australian cohort, from the ChiLdhood Arthritis Risk factor Identification sTudY (CLARITY),
All genotypes included in each cohort were aligned to the GRCh37/hg19 assembly build and passed stringent quality control (QC) measures. Additionally, the QC cohorts were imputed to harmonise and maximise the genetic information across them. All the individuals considered were of European descent and outliers from each cohort were removed to achieve more homogeneous samples.
The initial UK cohort consisted of 2758 cases and 5187 controls. Controls were obtained from the WTCCC, which have been demonstrated to be well-matched to the UK JIA cases,
We applied consistent QC procedures across all the genotyped cohorts. The CLARITY cohort was genotyped in three batches and we performed QC separately in each. The QC was performed using plink1.9
For genotype imputation of our QC cohorts, we used the Michigan Imputation Server
Next, we performed principal component analysis (PCA) using FlashPCA2
Cohort characteristics after imputation and quality control
Total individuals | SNPs | Number | Number | Number of males | Number of females | |
UK | 7505 | 6 029 891 | 2324 | 5181 | 3433 | 4072 |
CHOP | 3513 | 6 338 131 | 559 | 2954 | 1671 | 1842 |
CLARITY | 940 | 5 743 016 | 362 | 578 | 460 | 480 |
CHOP, Children’s Hospital of Philadelphia; CLARITY, ChiLdhood Arthritis Risk factor Identification sTudY; SNP, single nucleotide polymorphism.
The UK cohort was used to train our models as it was the most homogeneous cohort with the largest case sample size (2324 cases, 5181 controls). To account for potential confounding by the case/control genotyping batch in the UK cohort, we used logistic regression of case/control status on sex and the first 10 genetic PCs. The PCs were computed over a subset of the SNPs of UK, excluding the HLA region as well as known or putative JIA risk loci
To create the GRS we used SparSNP,
SparSNP considers all post-QC SNPs in the training cohort for the construction of the model, but the final number of SNPs receiving a non-zero weight varies depending on the value of the penalties used, which were tuned via 10 repeats of 10-fold cross-validation. The optimal number of SNPs selected in the chosen model was decided based on the model with the highest average area under the receiver operating characteristic curve (AUC) across the replications (
Once a model was chosen, we computed the GRSs for each of the test cohorts (CHOP and CLARITY). Assuming that the number of SNPs is
where
An overview of our study design is given in
Outline of the study design followed in this work. AUC, area under the receiver operating characteristic curve; CHOP, Children’s Hospital of Philadelphia; CLARITY, ChiLdhood Arthritis Risk factor Identification sTudY; GRS, genomic risk score; JIA, juvenile idiopathic arthritis; PCs, principal components; SNP, single nucleotide polymorphism.
Before computing the GRSs, we estimated the SNP heritability of JIA in our cohorts using GCTA V.1.91.7
We computed the GRS for CLARITY and CHOP, and evaluated the model in terms of AUC and ORs (
The predictive power of the GRS in the validation sets. Based on logistic regression on the test sets, optionally adjusted for sex and top 10 genetic PCs. Effect sizes are per SD of the GRS
AUC (95% CI) | OR (95% CI) | |
CHOP | ||
Sex+PCs | 0.677 (0.654 to 0.701) | |
GRS | 0.657 (0.631 to 0.683) | 1.831 (1.685 to 1.991) |
GRS+sex+PCs | 0.735 (0.712 to 0.758) | 1.838 (1.686 to 2.007) |
CLARITY | ||
Sex+PCs | 0.671 (0.636 to 0.706) | |
GRS | 0.671 (0.635 to 0.706) | 2.008 (1.731 to 2.345) |
GRS+sex+PCs | 0.738 (0.705 to 0.770) | 2.085 (1.773 to 2.471) |
AUC, area under the receiver operating characteristic curve; CHOP, Children’s Hospital of Philadelphia; CLARITY, ChiLdhood Arthritis Risk factor Identification sTudY; GRS, genomic risk score; PCs, principal components.
Recent works have shown that a metaGRS approach can substantially improve genomic prediction of common diseases.
We extended our analysis to consider subtypes of JIA and construct subtype-specific GRSs thereof. The ILAR recognises seven subtypes of JIA: systemic arthritis, oligoarthritis, rheumatoid-factor-positive polyarthritis (RF-positive), rheumatoid-factor-negative polyarthritis (RF-negative), enthesitis-related arthritis (ERA), psoriatic arthritis and undifferentiated arthritis.
Cross-validated AUC achieved by training the seven JIA subtype specific models (top), and median time taken by an individual with JIA to be referred for first time to a paediatric rheumatologist visit (in months; bottom).
Characteristics of JIA subtypes across cohorts, including rate (%) of each subtype among cases of each cohort. Cases with no known subtype classification were excluded (n=29 from CLARITY and n=25 from the UK)
UK (2324 cases) | CHOP (559 cases) | CLARITY (333 cases) | |||||||
Rate (%) | Males | Females | Rate | Males | Females | Rate | Males | Females | |
Enthesitis-related | 7.4 | 136 | 37 | 11.8 | 40 | 26 | 4.4 | 13 | 3 |
Oligoarthritis | 41.1 | 299 | 657 | 36.3 | 39 | 164 | 43.9 | 42 | 117 |
RF-negative | 23.8 | 144 | 408 | 24.2 | 34 | 101 | 20.7 | 23 | 52 |
RF-positive | 5.5 | 13 | 115 | 5.2 | 1 | 28 | 3.0 | 1 | 10 |
Psoriatic | 5.9 | 50 | 86 | 7.2 | 11 | 29 | 5.0 | 12 | 6 |
Undifferentiated | 2.1 | 21 | 28 | 4.7 | 8 | 18 | 7.5 | 13 | 14 |
Systemic | 13.2 | 142 | 164 | 10.6 | 24 | 36 | 7.5 | 12 | 15 |
CHOP, Children’s Hospital of Philadelphia; CLARITY, ChiLdhood Arthritis Risk factor Identification sTudY; JIA, juvenile idiopathic arthritis; RF-negative, rheumatoid-factor-negative polyarthritis; RF-positive, rheumatoid-factor-positive polyarthritis.
For each JIA subtype, we used the UK cohort to train subtype-specific GRSs, employing a similar approach as the JIA GRS above (
There was a high degree of variability in discrimination between subtype GRSs, with some subtypes displaying cross-validated AUCs greater than the JIA GRS and others not exhibiting significantly discrimination compared to random chance (AUC=0.5;
The weakest subtype GRSs were for the undifferentiated (AUC=0.542 with 1487 SNPs) and systemic (AUC=0.528 with 826 SNPs) subtypes. This was not unexpected as these subtypes are somewhat different to the other five subtypes. Children are diagnosed with the undifferentiated subtype when their symptoms do not fit within other subtypes, or meet the criteria for multiple subtypes. Systemic JIA is considered an autoinflammatory disease with little genetic overlap with the other JIA subtypes.
In general, external validation of the subtype-specific GRSs in CLARITY showed highly consistent AUC estimates with cross-validation performance in the UK, while in the CHOP cohort there was somewhat less consistent external validation than CLARITY (
External validation of the subtype-specific GRSs in CLARITY and CHOP. Based on logistic regression on the test sets, optionally adjusted for sex and top 10 genetic principal components. Effect sizes are per SD of the GRS
AUC (95% CI) | OR (95% CI) | |||
CHOP | CLARITY | CHOP | CLARITY | |
Enthesitis-related | ||||
GRS | 0.70 (0.63 to 0.77) | 0.84 (0.71 to 0.97) | 1.84 (1.60 to 2.17) | 2.99 (2.11 to 4.54) |
GRS+sex+PCs | 0.75 (0.68 to 0.82) | 0.93 (0.86 0.99) | 1.86 (1.61 to 2.14) | 3.09 (2.07 to 5.04) |
Oligoarthritis | ||||
GRS | 0.77 (0.73 to 0.80) | 0.74 (0.70 to 0.79) | 1.93 (1.76 to 2.11) | 2.24 (1.88 to 2.71) |
GRS+sex+PCs | 0.80 (0.77 to 0.84) | 0.79 (0.76 to 0.83) | 1.93 (1.75 to 2.13) | 2.19 (1.81 to 2.71) |
RF-negative | ||||
GRS | 0.64 (0.59 to 0.69) | 0.66 (0.59 to 0.73) | 1.48 (1.33 to 1.64) | 1.69 (1.42 to 2.02) |
GRS+sex+PCs | 0.76 (0.72 to 0.80) | 0.74 (0.68 to 0.80) | 1.51 (1.35 to 1.68) | 1.71 (1.42 to 2.07) |
RF-positive | ||||
GRS | 0.57 (0.47 to 0.67) | 0.59 (0.40 to 0.78) | 0.73 (0.44 to 1.11) | 1.42 (0.85 to 2.17) |
GRS+sex+PCs | 0.79 (0.73 to 0.86) | 0.97 (0.94 to 0.99) | 0.74 (0.44 to 1.13) | 1.27 (0.60 to 2.52) |
Psoriatic | ||||
GRS | 0.56 (0.47 to 0.65) | 0.58 (0.44 to 0.73) | 0.77 (0.52 to 1.08) | 1.33 (0.87 to 1.91) |
GRS+sex+PCs | 0.70 (0.62 to 0.78) | 0.76 (0.66 to 0.85) | 0.77 (0.52 to 1.08) | 1.32 (0.85 to 1.96) |
Undifferentiated | ||||
GRS | 0.48 (0.35 to 0.61) | 0.52 (0.42 to 0.62) | 0.89 (0.60 to 1.31) | 0.89 (0.59 to 1.31) |
GRS+sex+PCs | 0.69 (0.58 to 0.80) | 0.75 (0.66 to 0.83) | 0.90 (0.60 to 1.33) | 0.82 (0.51 to 1.28) |
Systemic | ||||
GRS | 0.50 (0.43 to 0.58) | 0.52 (0.41 to 0.62) | 1.01 (0.78 to 1.30) | 1.07 (0.73 to 1.56) |
GRS+sex+PCs | 0.69 (0.62 to 0.76) | 0.75 (0.66 to 0.84) | 1.01 (0.78 to 1.30) | 1.13 (0.73 to 1.72) |
AUC, area under the receiver operating characteristic curve; CHOP, Children’s Hospital of Philadelphia; CLARITY, ChiLdhood Arthritis Risk factor Identification sTudY; GRS, genomic risk score; PCs, principal components.
The accurate and timely diagnosis of JIA is a currently unmet clinical need. In this study, we aimed to address the paucity of molecular tools to aid the entirely clinical diagnosis of JIA, by leveraging the wealth of human genomic data gathered over the last decade, and developing a series of GRSs for JIA. We have shown that genomic machine learning can yield predictive GRSs for JIA as a composite diagnosis as well as subtype-specific GRSs, including the most common clinically reported subtype (oligoarthritic JIA),
A strength of this study is that the JIA GRS was developed on a UK data set and externally validated in two independent studies in Australia and the USA, indicating the robustness of the score. Despite having used the largest JIA cohorts available currently, the scores developed here only partially explained the genetic variability in JIA. Future improvements in predictive power will likely come with larger cohorts, particularly for less-common subtypes. In the case of the ERA subtype, we found that the GRS AUC was greater than the HLA haplotype in the UK, Australian and US-based cohorts. However, we caution that larger cohorts will be necessary for powerful statistical testing and assessment of clinical utility of GRS as compared with HLA typing for both ERA and systemic JIA. Furthermore, given the genetic heterogeneity of JIA subtypes, our study demonstrates that adding genomics to the ILAR classification has potential to increase the efficiency of classification, and may in turn inform the refinement or even redefinition of JIA subtype classification. However, we also caution that a limitation of the current study is that the participants in our cohorts were of European descent and we were unable to assess the performance of the JIA GRS in individuals of non-European ancestries,
In both primary and tertiary healthcare settings, it is often challenging to recognise and diagnose JIA in children, as there are many non-inflammatory conditions that are common to children that present with musculoskeletal pain mimicking JIA. Difficulty in discriminating between these cases causes delays in accessing vital care, due to the multitude of investigations and assessments that need to be done first. Moreover, accessing paediatric rheumatology specialist services is difficult, as waiting lists are usually lengthy and access to care is problematic due to workforce shortages worldwide.
The authors would like to thank Howard Tang and Elizabeth Hateley for their diligent proofreading and comments on this paper.
Josef S Smolen
MI, JM, JE, GA and RC conceived and designed the study. RC, JC, JM, MB, JB, YRL, SLS, HH, WT and JE contributed data. RC and GA performed the statistical analysis. RC, MI, GA, JC and JM wrote the manuscript with contributions by all co-authors. All authors approved of the final version.
This study was supported in part by the Victorian Government’s OIS Program, the Australian National Health and Medical Research Council (NHMRC Project no. 1122744), the Murdoch Children’s Research Institute and the Royal Children’s Hospital Foundation (grant no. 2017–896). This work was supported by core funding from the UK Medical Research Council (MR/L003120/1), the British Heart Foundation (RG/13/13/30194; RG/18/13/33946) and the National Institute for Health Research (Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust)*. It was also supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome. GA was supported by an NHMRC Early Career Fellowship (no. 1090462). MI was supported by the Munz Chair of Cardiovascular Prediction and Prevention. This study acknowledges the use of the following UK JIA cohort collections: The Biologics for Children with Rheumatic Diseases (BCRD) study (funded by Arthritis Research UK Grant 20747). The British Society for Paediatric and Adolescent Rheumatology Etanercept Cohort Study (BSPAR-ETN) (funded by a research grant from the British Society for Rheumatology (BSR). BSR has previously also received restricted income from Pfizer to fund this project). Childhood Arthritis Prospective Study (CAPS) (funded by Versus Arthritis, grant reference number 20542), Childhood Arthritis Response to Medication Study (CHARMS) (funded by Sparks UK, reference 08ICH09, and the Medical Research Council, reference MR/M004600/1), United Kingdom Juvenile Idiopathic Arthritis Genetics Consortium (UKJIAGC). Genotyping of the UK JIA case samples were supported by the Versus Arthritis grants reference numbers 20 385 and 21 754. This research was funded by the NIHR Manchester Biomedical Research Centre and supported by the Manchester Academic Health Sciences Centre (MAHSC). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. We would like to acknowledge the assistance given by IT Services and the use of the Computational Shared Facility at The University of Manchester. Finally, the CHOP data used were funded by an Institute Development Fund to the CAG centre from The Children’s Hospital of Philadelphia and by NIH grant, U01-HG006830, from the NHGRI-sponsored eMERGE Network.
None declared.
Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Not required.
All participants gave informed consent and the study protocols were approved by the relevant institutional or national ethics committees. The Australian CLARITY cohort collection was approved by the Royal Children’s Hospital Human Research Ethics Committee; UK ethical approval was obtained from the North West Multicentre for Research Ethics Committee (MREC:02/8/104 and MREC:99/8/84), West Midlands Multicentre Research Ethics Committee (MREC:02/7/106), North West Research Ethics Committee (REC:09/H1008/137) and the NHS Research Ethics Committee (REC:05/Q0508/95); and the US CHOP cohort collection was approved by the institutional review boards of the Texas Scottish Rite Hospital for Children, the Children’s Mercy Hospitals and Clinics and the Children's Hospital of Philadelphia.
Not commissioned; externally peer reviewed.
Data are available in a public, open access repository. Data are available upon reasonable request. For individual-level data, CHOP is available through the eMERGE Network dbGaP and the WTCCC controls are available through the Wellcome Trust Case Control Consortium webpage (