Identifying and correcting epigenetics measurements for systematic sources of variation.
Authors
Perrier, Flavie
Novoloaca, Alexei
Ambatipudi, Srikant
Baglietto, Laura
Ghantous, Akram
Perduca, Vittorio
Barrdahl, Myrto
Harlid, Sophia
Polidoro, Silvia
Nøst, Therese Haugdahl
Overvad, Kim
Omichessan, Hanane
Dollé, Martijn
Bamia, Christina
Huerta, José Marìa
Vineis, Paolo
Herceg, Zdenko
Romieu, Isabelle
Ferrari, Pietro
Publication Date
2018Journal Title
Clin Epigenetics
ISSN
1868-7075
Publisher
Springer Science and Business Media LLC
Volume
10
Pages
38
Language
eng
Type
Article
This Version
VoR
Physical Medium
Electronic-eCollection
Metadata
Show full item recordCitation
Perrier, F., Novoloaca, A., Ambatipudi, S., Baglietto, L., Ghantous, A., Perduca, V., Barrdahl, M., et al. (2018). Identifying and correcting epigenetics measurements for systematic sources of variation.. Clin Epigenetics, 10 38. https://doi.org/10.1186/s13148-018-0471-6
Abstract
Background: Methylation measures quantified by microarray techniques can be affected by systematic variation due to the technical processing of samples, which may compromise the accuracy of the measurement process and contribute to bias the estimate of the association under investigation. The quantification of the contribution of the systematic source of variation is challenging in datasets characterized by hundreds of thousands of features.In this study, we introduce a method previously developed for the analysis of metabolomics data to evaluate the performance of existing normalizing techniques to correct for unwanted variation. Illumina Infinium HumanMethylation450K was used to acquire methylation levels in over 421,000 CpG sites for 902 study participants of a case-control study on breast cancer nested within the EPIC cohort. The principal component partial R-square (PC-PR2) analysis was used to identify and quantify the variability attributable to potential systematic sources of variation. Three correcting techniques, namely ComBat, surrogate variables analysis (SVA) and a linear regression model to compute residuals were applied. The impact of each correcting method on the association between smoking status and DNA methylation levels was evaluated, and results were compared with findings from a large meta-analysis. Results: A sizeable proportion of systematic variability due to variables expressing 'batch' and 'sample position' within 'chip' was identified, with values of the partial R2 statistics equal to 9.5 and 11.4% of total variation, respectively. After application of ComBat or the residuals' methods, the contribution was 1.3 and 0.2%, respectively. The SVA technique resulted in a reduced variability due to 'batch' (1.3%) and 'sample position' (0.6%), and in a diminished variability attributable to 'chip' within a batch (0.9%). After ComBat or the residuals' corrections, a larger number of significant sites (k = 600 and k = 427, respectively) were associated to smoking status than the SVA correction (k = 96). Conclusions: The three correction methods removed systematic variation in DNA methylation data, as assessed by the PC-PR2, which lent itself as a useful tool to explore variability in large dimension data. SVA produced more conservative findings than ComBat in the association between smoking and DNA methylation.
Keywords
Humans, Breast Neoplasms, Oligonucleotide Array Sequence Analysis, Case-Control Studies, Computational Biology, DNA Methylation, Epigenesis, Genetic, CpG Islands, Principal Component Analysis, Female
Sponsorship
Medical Research Council (MC_UU_12015/2)
Medical Research Council (MC_UU_12015/1)
European Commission (260791)
Embargo Lift Date
2100-01-01
Identifiers
External DOI: https://doi.org/10.1186/s13148-018-0471-6
This record's URL: https://www.repository.cam.ac.uk/handle/1810/278406
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The following licence files are associated with this item: