Repository logo

Long-read-sequenced reference genomes of the seven major lineages of enterotoxigenic Escherichia coli (ETEC) circulating in modern time.

Published version



Change log


von Mentzer, Astrid 
Blackwell, Grace A 
Pickard, Derek 
Boinett, Christine J 
Joffré, Enrique 


Enterotoxigenic Escherichia coli (ETEC) is an enteric pathogen responsible for the majority of diarrheal cases worldwide. ETEC infections are estimated to cause 80,000 deaths annually, with the highest rates of burden, ca 75 million cases per year, amongst children under 5 years of age in resource-poor countries. It is also the leading cause of diarrhoea in travellers. Previous large-scale sequencing studies have found seven major ETEC lineages currently in circulation worldwide. We used PacBio long-read sequencing combined with Illumina sequencing to create high-quality complete reference genomes for each of the major lineages with manually curated chromosomes and plasmids. We confirm that the major ETEC lineages all harbour conserved plasmids that have been associated with their respective background genomes for decades, suggesting that the plasmids and chromosomes of ETEC are both crucial for ETEC virulence and success as pathogens. The in-depth analysis of gene content, synteny and correct annotations of plasmids will elucidate other plasmids with and without virulence factors in related bacterial species. These reference genomes allow for fast and accurate comparison between different ETEC strains, and these data will form the foundation of ETEC genomics research for years to come.



Antineoplastic Agents, Diarrhea, Drug Resistance, Bacterial, Enterotoxigenic Escherichia coli, Escherichia coli Infections, Escherichia coli Proteins, Genome, Bacterial, Genomics, Humans, Phylogeny, Reference Standards, Virulence, Virulence Factors

Journal Title

Sci Rep

Conference Name

Journal ISSN


Volume Title



Springer Science and Business Media LLC