Repository logo
 

simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics

cam.issuedOnline2018-10-29
dc.contributor.authorFortune, Mary
dc.contributor.authorWallace, Chris
dc.contributor.orcidFortune, Mary [0000-0002-6006-4343]
dc.contributor.orcidWallace, Chris [0000-0001-9755-1703]
dc.date.accessioned2018-12-11T00:30:27Z
dc.date.available2018-12-11T00:30:27Z
dc.date.issued2019
dc.description.abstractMethods for analysis of GWAS summary statistics have encouraged data sharing and democratised the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some "truth" is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, available as an open source R package, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis.
dc.description.sponsorshipMF and CW are funded by the Wellcome Trust (WT099772, WT107881) and CW by the MRC (MC_UU_00002/4). MF is currently funded by Dementia Platforms UK.
dc.identifier.doi10.17863/CAM.33915
dc.identifier.eissn1460-2059
dc.identifier.issn1367-4811
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/286603
dc.language.isoeng
dc.publisherOxford University Press
dc.publisher.urlhttp://dx.doi.org/10.1101/313023
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject31 Biological Sciences
dc.subject3105 Genetics
dc.subject4202 Epidemiology
dc.subject42 Health Sciences
dc.subject4905 Statistics
dc.subject49 Mathematical Sciences
dc.subjectBioengineering
dc.subjectGenetics
dc.subjectHuman Genome
dc.titlesimGWAS: a fast method for simulation of large scale case-control GWAS summary statistics
dc.typeArticle
dcterms.dateAccepted2018-10-21
prism.publicationNameBioinformatics
pubs.funder-project-idWellcome Trust (107881/Z/15/Z)
pubs.funder-project-idMedical Research Council (MC_UU_00002/4)
pubs.funder-project-idWellcome Trust (099772/Z/12/Z)
rioxxterms.licenseref.startdate2018-10-21
rioxxterms.licenseref.urihttp://creativecommons.org/licenses/by/4.0/
rioxxterms.typeJournal Article/Review
rioxxterms.versionVoR
rioxxterms.versionofrecord10.1093/bioinformatics/bty898

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
bty898.pdf
Size:
618.23 KB
Format:
Adobe Portable Document Format
Description:
Published version
Licence
https://creativecommons.org/licenses/by/4.0/
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
DepositLicenceAgreementv2.1.pdf
Size:
150.9 KB
Format:
Adobe Portable Document Format