Repository logo
 

ProxECAT: Proxy External Controls Association Test. A new case-control gene region association test using allele frequencies from public controls.

Published version
Peer-reviewed

Loading...
Thumbnail Image

Type

Article

Change log

Authors

Hendricks, Audrey E  ORCID logo  https://orcid.org/0000-0002-7152-0287
Pike, Hamish NC 
Zeggini, Eleftheria 

Abstract

A primary goal of the recent investment in sequencing is to detect novel genetic associations in health and disease improving the development of treatments and playing a critical role in precision medicine. While this investment has resulted in an enormous total number of sequenced genomes, individual studies of complex traits and diseases are often smaller and underpowered to detect rare variant genetic associations. Existing genetic resources such as the Exome Aggregation Consortium (>60,000 exomes) and the Genome Aggregation Database (~140,000 sequenced samples) have the potential to be used as controls in these studies. Fully utilizing these and other existing sequencing resources may increase power and could be especially useful in studies where resources to sequence additional samples are limited. However, to date, these large, publicly available genetic resources remain underutilized, or even misused, in large part due to the lack of statistical methods that can appropriately use this summary level data. Here, we present a new method to incorporate external controls in case-control analysis called ProxECAT (Proxy External Controls Association Test). ProxECAT estimates enrichment of rare variants within a gene region using internally sequenced cases and external controls. We evaluated ProxECAT in simulations and empirical analyses of obesity cases using both low-depth of coverage (7x) whole-genome sequenced controls and ExAC as controls. We find that ProxECAT maintains the expected type I error rate with increased power as the number of external controls increases. With an accompanying R package, ProxECAT enables the use of publicly available allele frequencies as external controls in case-control analysis.

Description

Keywords

Algorithms, Case-Control Studies, Computer Simulation, Gene Frequency, Genetic Variation, Genome-Wide Association Study, Genotype, High-Throughput Nucleotide Sequencing, Humans, Models, Genetic, Poisson Distribution, Polymorphism, Single Nucleotide

Journal Title

PLoS Genet

Conference Name

Journal ISSN

1553-7390
1553-7404

Volume Title

14

Publisher

Public Library of Science (PLoS)