On the effective depth of viral sequence data.

Genome sequence data are of great value in describing evolutionary processes in viral populations. However, in such studies, the extent to which data accurately describes the viral population is a matter of importance. Multiple factors may influence the accuracy of a dataset, including the quantity and nature of the sample collected, and the subsequent steps in viral processing. To investigate this phenomenon, we sequenced replica datasets spanning a range of viruses, and in which the point at which samples were split was different in each case, from a dataset in which independent samples were collected from a single patient to another in which all processing steps up to sequencing were applied to a single sample before splitting the sample and sequencing each replicate. We conclude that neither a high read depth nor a high template number in a sample guarantee the precision of a dataset. Measures of consistency calculated from within a single biological sample may also be insufficient; distortion of the composition of a population by the experimental procedure or genuine within-host diversity between samples may each affect the results. Where it is possible, data from replicate samples should be collected to validate the consistency of short-read sequence data.

Keywords

evolutionary modelling, population genetics, sequence data

Journal Title

Virus Evol

Journal ISSN

2057-1577
2057-1577

Volume Title

3

Publisher

Oxford University Press (OUP)

Publisher DOI

https://doi.org/10.1093/ve/vex030

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International

Sponsorship

Wellcome Trust (101239/Z/13/Z)

Collections

Scholarly Works - Genetics