Repository logo
 

On the effective depth of viral sequence data.

Published version
Peer-reviewed

Change log

Authors

Illingworth, Christopher JR 
Roy, Sunando 
Tutill, Helena 
Williams, Rachel 

Abstract

Genome sequence data are of great value in describing evolutionary processes in viral populations. However, in such studies, the extent to which data accurately describes the viral population is a matter of importance. Multiple factors may influence the accuracy of a dataset, including the quantity and nature of the sample collected, and the subsequent steps in viral processing. To investigate this phenomenon, we sequenced replica datasets spanning a range of viruses, and in which the point at which samples were split was different in each case, from a dataset in which independent samples were collected from a single patient to another in which all processing steps up to sequencing were applied to a single sample before splitting the sample and sequencing each replicate. We conclude that neither a high read depth nor a high template number in a sample guarantee the precision of a dataset. Measures of consistency calculated from within a single biological sample may also be insufficient; distortion of the composition of a population by the experimental procedure or genuine within-host diversity between samples may each affect the results. Where it is possible, data from replicate samples should be collected to validate the consistency of short-read sequence data.

Description

Keywords

evolutionary modelling, population genetics, sequence data

Journal Title

Virus Evol

Conference Name

Journal ISSN

2057-1577
2057-1577

Volume Title

3

Publisher

Oxford University Press (OUP)
Sponsorship
Wellcome Trust (101239/Z/13/Z)