Repository logo
 

Sampling bias and incorrect rooting make phylogenetic network tracing of SARS-COV-2 infections unreliable.

Accepted version
Peer-reviewed

Type

Article

Change log

Authors

Pond, Sergei Kosakovsky 
Marini, Simone 
Magalis, Brittany Rife  ORCID logo  https://orcid.org/0000-0001-6088-4651
Vandamme, Anne-Mieke 

Abstract

There is obvious interest in gaining insights into the epidemiology and evolution of the virus that has recently emerged in humans as the cause of the coronavirus disease 2019 (COVID-19) pandemic. The recent paper by Forster et al. (1), analyzed 160 SARS-CoV-2 full genomes available (https://www.gisaid.org/) in early March 2020. The central claim is the identification of three main SARS-CoV-2 types, named A, B, and C, circulating in different proportions among Europeans and Americans (types A and C) and East Asian (type B). According to a median-joining network analysis, variant A is proposed to be the ancestral type because it links to the sequence of a coronavirus from bats, used as an outgroup to trace the ancestral origin of the human strains. The authors further suggest that the “ancestral Wuhan B-type virus is immunologically or environmentally adapted to a large section of the East Asian population, and may need to mutate to overcome resistance outside East Asia”. There are several serious flaws with their findings and interpretation. First, and most obviously, the sequence identity between SARS-CoV-2 and the bat virus is only 96.2%, implying that these viral genomes (which are nearly 30,000 nucleotides long) differ by more than 1,000 mutations. Such a distant outgroup is unlikely to provide a reliable root for the network. Yet, strangely, the branch to the bat virus, in Figure 1 of the paper, is only 16 or 17 mutations in length. Indeed, the network seems to be mis-rooted because (see Supplementary Figure 4) a virus from Wuhan from week 0 (24th December 2019) is portrayed as a descendant of a clade of viruses collected in weeks 1-9 (presumably from many places outside China), which makes no evolutionary (2), nor epidemiological sense (3).

Description

Keywords

Betacoronavirus, COVID-19, Coronavirus Infections, Humans, Pandemics, Phylogeny, Pneumonia, Viral, Severe acute respiratory syndrome-related coronavirus, SARS-CoV-2, Selection Bias

Journal Title

Proc Natl Acad Sci U S A

Conference Name

Journal ISSN

0027-8424
1091-6490

Volume Title

117

Publisher

Proceedings of the National Academy of Sciences

Rights

All rights reserved
Sponsorship
Wellcome Trust (via University College London (UCL)) (Ref 17/0008 539724)
NA