Repository logo
 

Detection and removal of barcode swapping in single-cell RNA-seq data

Published version
Peer-reviewed

Type

Article

Change log

Authors

Griffiths, Jonathan 
Richard, Arianne 
Bach, Karsten 

Abstract

Barcode swapping results in the mislabeling of sequencing reads between multiplexed samples on the new patterned flow cell Illumina sequencing machines. This may compromise the validity of numerous genomic assays, especially for single-cell studies where many samples are routinely multiplexed together. The severity and consequences of barcode swapping for single-cell transcriptomic studies remain poorly understood. We have used two statistical approaches to robustly quantify the fraction of swapped reads in each of two plate-based single-cell RNA sequencing datasets. We found that approximately 2.5% of reads were mislabeled between samples on the HiSeq 4000 machine, which is lower than previous reports. We observed no correlation between the swapped fraction of reads and the concentration of free barcode across plates. Furthermore, we have demonstrated that barcode swapping may generate complex but artefactual cell libraries in droplet-based single-cell RNA sequencing studies. To eliminate these artefacts, we have developed an algorithm to exclude individual molecules that have swapped between samples in 10X Genomics experiments, exploiting the combinatorial complexity present in the data. This permits the continued use of cutting-edge sequencing machines for droplet-based experiments while avoiding the confounding effects of barcode swapping.

Description

Keywords

31 Biological Sciences, 3102 Bioinformatics and Computational Biology, 3105 Genetics, Genetics, Human Genome, Biotechnology

Journal Title

Nature Communications

Conference Name

Journal ISSN

2041-1723

Volume Title

Publisher

Nature Research
Sponsorship
Wellcome Trust (100140/Z/12/Z)
Wellcome Trust (103930/Z/14/Z)
Wellcome Trust (109081/Z/15/Z)
Medical Research Council (MR/P014178/1)