Repository logo
 

Direct inference and control of genetic population structure from RNA sequencing data.

Published version
Peer-reviewed

Repository DOI


Change log

Authors

Karkey, Abhilasha 
Shakya, Mila 
Judd, Louise M 

Abstract

RNAseq data can be used to infer genetic variants, yet its use for estimating genetic population structure remains underexplored. Here, we construct a freely available computational tool (RGStraP) to estimate RNAseq-based genetic principal components (RG-PCs) and assess whether RG-PCs can be used to control for population structure in gene expression analyses. Using whole blood samples from understudied Nepalese populations and the Geuvadis study, we show that RG-PCs had comparable results to paired array-based genotypes, with high genotype concordance and high correlations of genetic principal components, capturing subpopulations within the dataset. In differential gene expression analysis, we found that inclusion of RG-PCs as covariates reduced test statistic inflation. Our paper demonstrates that genetic population structure can be directly inferred and controlled for using RNAseq data, thus facilitating improved retrospective and future analyses of transcriptomic data.

Description

Acknowledgements: We acknowledge the contributions of individuals and organizations who have arranged and taken part in the studies as well as the laboratory and field teams at the site, including the STRATAA Study Group and the Nepal Family Development Foundation team. We thank the Sanger sequencing teams. This research was funded in whole, or in part, by the Wellcome Trust [STRATAA, 106158/Z/14/Z and Sanger, 098051]. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. This research was also funded by NHMRC [project grant APP1101728] and supported by core funding from the British Heart Foundation (RG/18/13/33946) and the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014; NIHR203312)[*]. The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. M.I. is supported by the Munz Chair of Cardiovascular Prediction and Prevention and the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014; NIHR203312) []. M.I. was also supported by the UK Economic and Social Research 878 Council (ES/T013192/1). M.F. was supported by a Melbourne Research Scholarship from The University of Melbourne jointly funded by the Baker Heart and Diabetes Institute. This work was supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome. This study was also supported by the Victorian Government’s Operational Infrastructure Support (OIS) program. *The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The views expressed in this manuscript are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.


Funder: Scottish Government Health and Social Care Directorate (SGHSC); doi: https://doi.org/10.13039/100011529


Funder: State Government of Victoria (Victorian Government); doi: https://doi.org/10.13039/501100004752

Keywords

Humans, Retrospective Studies, Genetics, Population, Genotype, Base Sequence, Sequence Analysis, RNA

Journal Title

Commun Biol

Conference Name

Journal ISSN

2399-3642
2399-3642

Volume Title

6

Publisher

Springer Science and Business Media LLC
Sponsorship
British Heart Foundation (RG/18/13/33946)
Wellcome Trust (106158/Z/14/Z)
ESRC (ES/T013192/1)
National Institute for Health and Care Research (IS-BRC-1215-20014)