Repository logo
 

Adversarial generation of gene expression data.

Accepted version
Peer-reviewed

Type

Article

Change log

Authors

Viñas, Ramon 
Andrés-Terré, Helena 
Liò, Pietro 
Bryson, Kevin 

Abstract

High-throughput gene expression can be used to address a wide range of fundamental biological problems, but datasets of an appropriate size are often unavailable. Moreover, existing transcriptomics simulators have been criticised because they fail to emulate key properties of gene expression data. In this paper, we develop a method based on a conditional generative adversarial network to generate realistic transcriptomics data for E. coli and humans. We assess the performance of our approach across several tissues and cancer types. We show that our model preserves several gene expression properties significantly better than widely used simulators such as SynTReN or GeneNetWeaver. The synthetic data preserves tissue and cancer-specific properties of transcriptomics data. Moreover, it exhibits real gene clusters and ontologies both at local and global scales, suggesting that the model learns to approximate the gene expression manifold in a biologically meaningful way.

Description

Keywords

Humans, Gene Expression Profiling, Escherichia coli, Gene Expression

Journal Title

Bioinformatics

Conference Name

Journal ISSN

1367-4803
1367-4811

Volume Title

Publisher

Oxford University Press (OUP)

Rights

All rights reserved
Sponsorship
The project leading to these results have received funding from “la Caixa” Foundation (ID 100010434), under the agreement LCF/BQ/EU19/11710059.