Repository logo
 

BBmix: a Bayesian Beta-Binomial mixture model for accurate genotyping from RNA-sequencing


Type

Preprint

Change log

Authors

Pitzalis, Costantino 
Lewis, Myles 
Wallace, Chris 

Abstract

Motivation

While many pipelines have been developed for calling genotypes using RNA-sequencing data, they all have adapted DNA genotype callers that do not model biases specific to RNA-sequencing such as reference panel bias or allele specific expression.

Results

Here, we present BBmix, a Bayesian Beta-Binomial mixture model that first learns the expected distribution of read counts for each genotype, and then deploys those learned parameters to call genotypes probabilistically. We benchmarked our model on a wide variety of datasets and showed that our method generally performed better than competitors, mainly due to an increase of up to 1.4% in the accuracy of heterozygous calls. Moreover, BBmix can be easily incorporated into standard pipelines for calling genotypes. We further show that parameters are generally transferable within datasets, such that a single learning run of less than one hour is sufficient to call genotypes in a large number of samples.

Availability

We implemented BBmix as an R package that is available for free under a GPL-2 licence at https://gitlab.com/evigorito/bbmix and accompanying pipeline at https://gitlab.com/evigorito/bbmix_pipeline .

Description

Keywords

31 Biological Sciences, 3102 Bioinformatics and Computational Biology, 3105 Genetics, Genetics, Human Genome

Is Part Of

Publisher

Sponsorship
Medical Research Council (MC_UU_00002/4)
Wellcome Trust (220788/Z/20/Z)
National Institute for Health and Care Research (IS-BRC-1215-20014)