University of Cambridge, Board of Graduate Studies, 4 Mill Lane, Cambridge CB2 1RZ · Tel: 01223 338389 · email: student.registry@admin.cam.ac.uk · http://www.admin.cam.ac.uk/students/studentregistry/exams/submission/phd/submitting.html Deposit & Copying of Dissertation Declaration Board of Graduate Studies Please note that you will also need to bind a copy of this Declaration into your final, hardbound copy of thesis - this has to be the very first page of the hardbound thesis. 1 Surname (Family Name) Forenames(s) Title 2 Title of Dissertation as approved by the Degree Committee In accordance with the University Regulations in Statutes and Ordinances for the PhD, MSc and MLitt Degrees, I agree to deposit one print copy of my dissertation entitled above and one print copy of the summary with the Secretary of the Board of Graduate Studies who shall deposit the dissertation and summary in the University Library under the following terms and conditions: 1. Dissertation Author Declaration I am the author of this dissertation and hereby give the University the right to make my dissertation available in print form as described in 2. below. My dissertation is my original work and a product of my own research endeavours and includes nothing which is the outcome of work done in collaboration with others except as declared in the Preface and specified in the text. I hereby assert my moral right to be identified as the author of the dissertation. The deposit and dissemination of my dissertation by the University does not constitute a breach of any other agreement, publishing or otherwise, including any confidentiality or publication restriction provisions in sponsorship or collaboration agreements governing my research or work at the University or elsewhere. 2. Access to Dissertation I understand that one print copy of my dissertation will be deposited in the University Library for archival and preservation purposes, and that, unless upon my application restricted access to my dissertation for a specified period of time has been granted by the Board of Graduate Studies prior to this deposit, the dissertation will be made available by the University Library for consultation by readers in accordance with University Library Regulations and copies of my dissertation may be provided to readers in accordance with applicable legislation. 3 Signature Date Corresponding Regulation Before being admitted to a degree, a student shall deposit with the Secretary of the Board one copy of his or her hard- bound dissertation and one copy of the summary (bearing student’s name and thesis title), both the dissertation and the summary in a form approved by the Board. The Secretary shall deposit the copy of the dissertation together with the copy of the summary in the University Library where, subject to restricted access to the dissertation for a specified period of time having been granted by the Board of Graduate Studies, they shall be made available for consultation by readers in accordance with University Library Regulations and copies of the dissertation provided to readers in accordance with applicable legislation. Antimicrobial resistance gene monitoring in aquatic environments By Will Rowe September 2015 This dissertation is submitted for the degree of Doctor of Philosophy Magdalene College University of Cambridge ii For the three girls I met during my PhD iii Preface and declaration The work described in this dissertation has been carried out as part of a BBSRC Industrial CASE PhD studentship, sponsored by GlaxoSmithKline and the Centre for Environment, Fisheries and Aquaculture Science. This work was primarily carried out in the Bacterial Infection Group laboratory at the University of Cambridge, however several months were spent in the collaborating laboratories of the Centre for Environment, Fisheries and Aquaculture Science in Weymouth, UK and also at the Wellcome Trust Sanger Institute, UK. This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration except as declared in the preface and specified in the formal acknowledgements. It is not substantially the same as any that I have submitted, or, is being concurrently submitted for a degree or diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the preface and specified in the text. I further state that no substantial part of my dissertation has already been submitted, or, is being concurrently submitted for any such degree, diploma or other qualification at the University of Cambridge or any other University of similar institution except as declared in the preface and specified in the text. The word count of this dissertation does not exceed the prescribed word limit set by the Department of Veterinary Medicine, University of Cambridge. Will Rowe ________________________________ iv Formal acknowledgements This table shows the contributions of others to the work in this dissertation. Function Person Institute Doctoral supervisors Gareth Pearce University of Cambridge - "guidance, editing and laboratory resources" Duncan Maskell University of Cambridge David Verner-Jeffreys Centre for Environment, Fisheries and Aquaculture Science Craig Baker-Austin Centre for Environment, Fisheries and Aquaculture Science Jim Ryan GlaxoSmithKline Publication co-authors Kate Baker Wellcome Trust Sanger Institute David Verner-Jeffreys Centre for Environment, Fisheries and Aquaculture Science Craig Baker-Austin Centre for Environment, Fisheries and Aquaculture Science Jim Ryan GlaxoSmithKline Duncan Maskell University of Cambridge Gareth Pearce University of Cambridge Facilitated sample collection Hedley Skelton Anglian Water Gavin Hughes Cambridge University dairy farm Gordon Cullingford Writtle College Pig Farm Nick Brown Addenbrooke's Hospital Provided computing support Jenny Barna University of Cambridge Provided laboratory and training Trevor Lawley Wellcome Trust Sanger Institute Assisted with method development Chris Coward University of Cambridge Mark Stares Wellcome Trust Sanger Institute Provided antimicrobial usage data Christianne Micallef Addenbrooke's Hospital Provided sequencing services Sophie Broughton Eastern Sequence and Informatics Hub Konrad Paszkiewicz Exeter Sequencing Service Performed LCMS analysis of effluent samples David Ratcliffe RPS Mountainheath Performed ribotyping of C. difficile isolates Cornelis Knetsch Leiden University v Abstract This dissertation documents the development of an environmental framework for monitoring antimicrobial resistance gene (ARG) dissemination in the aquatic environment. The work opens with a review of the relevant literature and outlines the importance of an environmental framework for monitoring ARG dissemination as part of antimicrobial resistance risk assessments. The ability to interrogate sequencing data quickly and easily for the presence of ARGs is crucial in order to facilitate their monitoring in the environment. As current laboratory methods for the detection and surveillance of antimicrobial resistant bacteria in the environment were limited in their effectiveness and scope, the dissertation begins by describing the design and implementation of a Search Engine for Antimicrobial Resistance (SEAR), a pipeline and web interface for detection of horizontally-acquired ARGs in raw sequencing data. The suitability of metagenomic methods for monitoring the ARG content of effluents from faecal sources was then assessed via a pilot study of a river catchment. Novel metagenomes generated from effluents entering the catchment were interrogated for ARGs. The relative abundance of ARGs in effluents were determined to be higher relative to the background environment, as were sequences relating to human and animal pathogens and mobile genetic elements. Thus, effluents were implicated in the dissemination of ARGs throughout the aquatic environment. To determine if ARGs were potentially in use in the environment, the expression of ARGs within effluents was then evaluated across a series of longitudinal samples through the use of metatranscriptomics, and the presence of potential environmental antimicrobial selection pressures was examined. This demonstrated that the abundance of ARGs, as well as antimicrobial usage at the effluent source, was correlated with the transcription of ARGs in aquatic environments. The work described in this dissertation has also found that horizontally transmitted ARGs were present in pathogenic endospore-forming bacteria commonly found across the aquatic environment, potentially providing a mechanism for ARG persistence in the environment. Finally, these findings were integrated into a universal framework for monitoring ARG dissemination in aquatic environments and used to highlight the developments required to incorporate this framework into future environmental ARG research and to facilitate antimicrobial resistance risk assessments. vi Table of contents Preface and declaration .......................................................................................................... iii Formal acknowledgements ..................................................................................................... iv Abstract .................................................................................................................................... v Table of contents .................................................................................................................... vi List of tables ............................................................................................................................. x List of figures .......................................................................................................................... xi List of appendices .................................................................................................................. xiii List of abbreviations .............................................................................................................. xiv Chapter 1. General introduction ............................................................................................ 1 1.1. Preface ....................................................................................................................... 1 1.2. Antimicrobial resistance .............................................................................................. 1 1.2.1. Antimicrobial resistance genes ............................................................................. 2 1.2.1.1. Target alteration ............................................................................................ 2 1.2.1.1. Target bypass ................................................................................................ 2 1.2.1.2. Barrier to antimicrobial entry .......................................................................... 3 1.2.1.3. Antimicrobial efflux ........................................................................................ 3 1.2.1.4. Enzyme-based modification ........................................................................... 3 1.2.1.5. Non-enzymatic protection proteins ................................................................ 3 1.2.2. Transfer of ARGs ................................................................................................. 3 1.2.2.1. Plasmids and other conjugative transfer elements ........................................ 4 1.2.2.2. Transposons, integrons and phages ............................................................. 4 1.3. The resistome ............................................................................................................. 5 1.3.1. Monitoring ARGs in the environment ................................................................... 7 1.3.1.1. Bacterial cell culture ...................................................................................... 7 1.3.1.2. Metagenomics ............................................................................................... 8 1.3.2. Dissemination of ARGs in the environment .......................................................... 9 1.4. Antimicrobial resistance risk assessment ................................................................. 10 1.4.1. Monitoring ARG dissemination in the aquatic environment ................................ 12 Chapter 2. Search Engine for Antimicrobial Resistance ...................................................... 13 2.1. Preface ..................................................................................................................... 13 2.2. Introduction ............................................................................................................... 13 2.3. Materials and methods ............................................................................................. 15 2.3.1. SEAR requirements ........................................................................................... 15 vii 2.3.1.1. Reference databases .................................................................................. 15 2.3.1.2. Hardware ..................................................................................................... 16 2.3.2. SEAR ................................................................................................................. 16 2.3.2.1. The pipeline ................................................................................................. 16 2.3.2.2. Pipeline outputs ........................................................................................... 19 2.3.3. Demonstrating SEAR utility ................................................................................ 19 2.3.3.1. Data sets and parameters used in this study ............................................... 19 2.3.3.2. Novel environmental metagenomes ............................................................ 20 2.3.3.3. Pre-existing metagenomic and clinical isolate data ..................................... 21 2.4. Results ...................................................................................................................... 21 2.4.1. Discrimination of ARG presence and abundance between environmental metagenomes ................................................................................................................. 22 2.4.2. Efficacy of SEAR for detecting ARGs in human faecal microbiomes ................. 24 2.4.3. Accuracy of SEAR ARG detection using clinical isolate sequencing data .......... 24 2.5. Discussion ................................................................................................................ 25 2.6. Conclusion ................................................................................................................ 27 Chapter 3. Comparative metagenomics reveals a diverse range of antimicrobial resistance genes in effluents entering a river catchment ........................................................................ 28 3.1. Preface ..................................................................................................................... 28 3.2. Introduction ............................................................................................................... 28 3.3. Methods .................................................................................................................... 30 3.3.1. Sample collection and DNA sequencing ............................................................ 30 3.3.2. Bioinformatic analyses ....................................................................................... 30 3.3.2.1. Identification of ARGs .................................................................................. 30 3.3.2.2. Identification of mobile genetic elements ..................................................... 30 3.3.2.3. Abundance analysis .................................................................................... 31 3.3.2.4. Taxonomic profiling and pathogen detection ............................................... 31 3.4. Results ...................................................................................................................... 32 3.4.1. Metagenome analysis ........................................................................................ 32 3.4.2. Identification of antimicrobial resistance genes .................................................. 32 3.4.3. Identification of mobile genetic elements ........................................................... 36 3.4.4. Taxonomic profiling and pathogen detection ...................................................... 40 3.5. Discussion ................................................................................................................ 44 3.6. Conclusion ................................................................................................................ 46 Chapter 4. Expression of ARGs in effluents ........................................................................ 47 viii 4.1. Preface ..................................................................................................................... 47 4.2. Introduction ............................................................................................................... 47 4.3. Methods .................................................................................................................... 49 4.3.1. Sample collection, DNA and RNA sequencing ................................................... 49 4.3.1.1. Antimicrobial residue testing ........................................................................ 50 4.3.2. Bioinformatic analyses ....................................................................................... 50 4.3.2.1. Identification of ARGs, MGEs and abundance analysis .............................. 50 4.3.2.2. Antimicrobial usage and statistics ................................................................ 50 4.3.2.3. Taxonomic profiling and pathogen detection ............................................... 51 4.4. Results ...................................................................................................................... 51 4.4.1. Metagenome and metatranscriptome analysis ................................................... 51 4.4.2. Correlation of ARG abundance to ARG transcript abundance ........................... 53 4.4.3. Antimicrobial residues ........................................................................................ 55 4.4.4. Expression of ARGs in response to hospital antimicrobial usage ...................... 58 4.4.5. Taxonomic profiling and pathogen detection ...................................................... 61 4.4.6. MGEs ................................................................................................................. 63 4.4.1. Abundance of ARGs over time ........................................................................... 66 4.5. Discussion ................................................................................................................ 68 4.6. Conclusion ................................................................................................................ 72 Chapter 5. Bacterial endospores present in the environment harbour antimicrobial resistance genes .................................................................................................................... 73 5.1. Preface ..................................................................................................................... 73 5.2. Introduction ............................................................................................................... 73 5.2.1. Endospore-forming bacteria ............................................................................... 74 5.2.1.1. Clostridium difficile ....................................................................................... 75 5.2.1.2. Clostridium perfringens ................................................................................ 76 5.2.2. Chapter hypothesis ............................................................................................ 76 5.3. Materials and methods ............................................................................................. 77 5.3.1. Sample collection ............................................................................................... 77 5.3.2. Isolate culture ..................................................................................................... 77 5.3.3. DNA extraction and 16S rRNA sequencing ........................................................ 78 5.3.4. DNA extraction for whole genome sequencing .................................................. 78 5.3.5. Ribotyping .......................................................................................................... 79 5.3.6. Metagenome sequencing ................................................................................... 79 5.3.7. Construction of high-quality draft genomes ........................................................ 79 ix 5.3.8. Bioinformatic analysis ........................................................................................ 80 5.4. Results ...................................................................................................................... 80 5.4.1. Isolation and identification of endospore-forming bacteria ................................. 80 5.4.1. Unclassified environmental isolates ................................................................... 85 5.4.1. Clostridium difficile isolates ................................................................................ 88 5.4.1.1. Phylogenetic analysis .................................................................................. 88 5.4.1.2. Antimicrobial resistance genes .................................................................... 92 5.4.1.2.1. Aminoglycoside resistance genes ......................................................... 94 5.4.1.2.2. Tetracycline resistance genes ............................................................... 96 5.4.1.2.3. Other types of antimicrobial resistance genes ...................................... 98 5.4.2. Clostridium perfringens isolates ......................................................................... 98 5.4.2.1. Phylogenetic analysis .................................................................................. 98 5.4.2.2. Antimicrobial resistance genes .................................................................. 102 5.4.2.2.1. Macrolide resistance genes ................................................................ 104 5.4.2.2.2. Tetracycline resistance genes ............................................................. 106 5.4.3. Annotating endospore-formers from metagenomes ......................................... 108 5.5. Discussion .............................................................................................................. 109 5.6. Conclusion .............................................................................................................. 112 Chapter 6. General discussion .......................................................................................... 113 6.1. Preface ................................................................................................................... 113 6.2. Summary of findings and dissertation outcomes .................................................... 113 6.2.1. Establishing an environmental framework for monitoring ARG dissemination in the aquatic environment ............................................................................................... 114 6.2.1.1. Summary of findings .................................................................................. 114 6.2.1.2. The environmental framework ................................................................... 116 6.2.2. Future work on an environmental framework for monitoring ARG dissemination 118 6.2.2.1. Assigning significance to antimicrobial resistance potential....................... 118 6.2.2.2. Understanding ARG dynamics .................................................................. 120 6.2.2.3. Incorporating additional factors and technologies into the framework ....... 122 6.2.2.4. Problems remaining ................................................................................... 124 6.3. Overall conclusions of this work.............................................................................. 125 References .......................................................................................................................... 127 x List of tables Table 2.1 List of parameters and default settings for SEAR. ................................................. 20 Table 2.2 Example runtimes for SEAR. ................................................................................. 22 Table 2.3 SEAR detection of ARGs in human faecal microbiomes. ....................................... 24 Table 2.4 Accuracy of SEAR ARG detection using clinical isolate sequencing data ............. 25 Table 3.1 Summary of the metagenomes used in the work of this chapter. .......................... 32 Table 3.2 Antimicrobial resistance gene analysis. ................................................................. 34 Table 3.3 Mobile genetic element analysis. ........................................................................... 38 Table 3.4 Pathogen analysis of effluents. .............................................................................. 42 Table 4.1 Summary of metagenomes used in this work. ....................................................... 52 Table 4.2 Summary of metatranscriptomes used in this work ................................................ 52 Table 4.3 Correlation of ARG and transcript abundance for all hospital effluent samples, according to antimicrobial class ...................................................................................... 55 Table 4.4 Correlation of ARG and transcript abundance for all farm effluent samples, according to antimicrobial class. ..................................................................................... 55 Table 4.5 LCMS results for hospital and farm effluent samples ............................................. 57 Table 5.1 All endospore-forming bacteria successfully isolated from environmental samples and processed for whole genome sequencing. .............................................................. 82 Table 5.2 Unclassified environmental isolates and their identified ARGs. ............................. 86 Table 5.3. Clostridium difficile isolates and their identified ARGs. ......................................... 93 Table 5.4 Clostridium perfringens isolates and their identified ARGs. ................................. 103 xi List of figures Figure 1.1 Graphical overview of potential routes of ARG dissemination .............................. 11 Figure 2.1 Screen shot of SEAR web interface. ..................................................................... 17 Figure 2.2 SEAR results for environmental metagenomes. ................................................... 23 Figure 3.1 Abundance of ARGs found in each effluent sample ............................................. 33 Figure 3.2 Abundance of MGEs found in each effluent sample ............................................. 37 Figure 3.3 Metagenomic phylogenetic analysis and annotation of potential bacterial pathogens. ...................................................................................................................... 41 Figure 4.1 Linear regression analysis of ARG abundance against corresponding transcript abundance, for ARGs detected in all hospital effluent samples ...................................... 53 Figure 4.2 Linear regression analysis of ARG abundance against corresponding transcript abundance, for ARGs detected in all farm effluent samples ........................................... 54 Figure 4.3 Linear regression analysis of ARG transcript abundance against hospital antimicrobial usage ......................................................................................................... 59 Figure 4.4 Scree plot showing the variance observed for the three principal components. ... 60 Figure 4.5 Biplot showing how the initial variables contribute to principal components. ........ 61 Figure 4.6 Graph showing the number of pathogenic bacteria species and the total species number, for both hospital and farm effluents, over time. ................................................. 62 Figure 4.7 A. Column chart showing the ten most commonly identified pathogen species in the hospital and farm effluent metagenomes. B. Venn diagram depicting the number pathogen species found in the combined hospital and farm effluent datasets. ............... 63 Figure 4.8 A. Graph showing the abundance of ARGs and MGEs, in relation to total species number, for hospital effluent over time. B. Graph showing the abundance of ARGs and MGEs, in relation to total species number, for farm effluent over time............................ 65 Figure 4.9 A. Graph showing the abundance of ARGs, grouped by antimicrobial class, in hospital effluent metagenomes. B. Graph showing the abundance of ARGs, grouped by antimicrobial class, in farm effluent metagenomes. ........................................................ 67 Figure 5.1 Genome synteny of potentially novel bacterial isolates, related to C. butyricum... 87 Figure 5.2 Whole genome phylogenetic tree of Clostridium difficile isolates.......................... 89 Figure 5.3 Rarefaction curves for C. difficile core genome analysis ...................................... 90 Figure 5.4 Tanglegram comparing two phylogenetic trees for the C. difficile isolates ............ 91 Figure 5.5 Genome synteny between the C. difficile reference 630 genome and contig 5 of isolate #26 ...................................................................................................................... 95 xii Figure 5.6. A. Genome synteny among the C. difficile reference 630 genome, the conjugative transposon Tn5397, and contig 18 of isolate #26. B. Genome synteny between the C. difficile reference 630 genome and contig 18 of isolate #26. .......................................... 97 Figure 5.7 Phylogenetic tree of Clostridium perfringens isolates. .......................................... 99 Figure 5.8 Rarefaction curves for C. perfringens core genome analysis.............................. 100 Figure 5.9 Tanglegram comparing two phylogenetic trees for the C. perfringens isolates ... 101 Figure 5.10 Comparison between the C. perfringens reference ATCC13124 genome and contig 9 of isolate #15. .................................................................................................. 104 Figure 5.11 Genome synteny between the C. perfringens reference ATCC13124 genome, a construct of contigs 42, 31 and 28 of isolate #15 and contig 31 of isolate #15. ............ 105 Figure 5.12 Comparison between the C. perfringens reference ATCC13124 genome and the tetPA-containing contig of isolate #23. .......................................................................... 106 Figure 5.13 A. Genome synteny between the C. perfringens plasmid pCW3 and the tetPB and tetPA containing contig of isolate #14. B. Circularised representation of tetPB- containing plasmid. ....................................................................................................... 107 Figure 5.14 Genome synteny between the Clostridium sticklandii reference genome (GI: 310657316) and the metagenome contigs of the pig farm lagoon effluent. .................. 108 Figure 6.1 Proposed environmental framework for monitoring ARG dissemination and assigning antimicrobial resistance potential to environmental samples. ....................... 117 xiii List of appendices Appendix 1. Manuscript and source code for SEAR. ........................................................... 146 Appendix 2. River Cam catchment area. ............................................................................. 147 Appendix 3. List of sequencing data generated and used in these studies. ........................ 148 Appendix 4. Accession numbers of sequences used in these studies. ................................ 153 Appendix 5. SEAR ARG detection using clinical isolate sequence data. ............................. 162 Appendix 6. Evaluation of metagenome preparation methodologies for aquatic samples. .. 170 xiv List of abbreviations ABACAS Algorithm-Based Automatic Contiguation of Assembled Sequences AMD Advanced Molecular Detection ARDB Antibiotic Resistance Genes Database ARG Antimicrobial Resistance Gene ARP Antimicrobial Resistance Potential BHI Brain-Heart Infusion BLAST Basic Local Alignment Search Tool BWA Burrows-Wheeler Aligner CARD Comprehensive Antimicrobial Resistance Database CDAD Clostridium difficile Associated Disease CDI Clostridium difficile Infection contig contiguous sequence ENA European Nucleotide Archive ESBL Extended Spectrum Beta Lactamase Gbp Giga base-pairs HMP Human Microbiome Project MGE Mobile Genetic Element MIC Minimum Inhibitory Concentration MLS Macrolide / Lincosamide / Streptogramin PBS Phosphate Buffered Saline PCA Principal Component Analysis PCR Polymerase Chain Reaction RAC Repository of Antibiotic resistance Cassettes RDP Ribosomal Database Project RPKM Reads Per Kilobase per Million SEAR Search Engine for Antimicrobial Resistance SNP Single Nucleotide Polymorphism WWTP Wastewater Treatment Plant 1 Chapter 1. General introduction 1.1. Preface This dissertation documents the development of an environmental framework for monitoring antimicrobial resistance gene (ARG) dissemination in the aquatic environment. The work described within includes the development of methodology and analysis tools, the identification of factors that may influence antimicrobial resistance gene dissemination and mechanisms by which they may persist in the environment. This chapter contains a review of the current literature concerning what is known about ARGs in the environment; identifying appropriate technologies to facilitate ARG monitoring, assessing methodology and analysis tools requiring development and determining factors that may influence ARG dissemination in the aquatic environment. 1.2. Antimicrobial resistance Since the inception of applying antimicrobial compounds as therapeutic and prophylactic agents in human and veterinary medicine, antimicrobial resistance has remained a persistent issue in the effective treatment of bacterial infections and the sustainability of antimicrobial drugs as therapeutic agents. The ever increasing demand for antimicrobials coupled with their over, under and misuse has resulted in bacteria evolving a plethora of mechanisms for rendering antimicrobial compounds ineffective and has propagated the occurrence of resistance within bacterial communities (Davies and Davies, 2010b). Multiple antimicrobial resistances present major health and economic concerns as the decreasing efficacy of antimicrobial compounds require treatment regimens to utilise alternative drugs and increased dosages, which have the potential for more costly or unsuccessful treatment of bacterial infections (Eliopoulos et al., 2003). There has been considerable research effort to address the clinical ramifications of antimicrobial resistance and its mitigation. Current preventative measures against antimicrobial resistance development are based on infection control, altered or restricted use of specific antimicrobials and also through ensuring optimum clinical practice (i.e. correct dosages, durations and drug choice according to sensitivity testing). However, there remains a significant lack of information concerning the ecological risks from antimicrobial resistance 2 present in the environment and the dissemination of causative agents, with limited regulations in place to mitigate these risks. This review of the literature will focus on evaluating the causative agents of antimicrobial resistance, i.e. antimicrobial resistance genes. The various resistance mechanisms of ARGs will be considered, along with details on their propagation and the concept and importance of the reservoir of ARGs in the environment - the resistome. The review will then progress to consider the methodology available to detect ARGs and how this can be used to determine the extent of ARG dissemination in the environment. Finally, the literature review will examine the impact of ARGs in the environment, highlighting the risks to human and animal health from their dissemination and the shortfalls in current monitoring approaches that this dissertation primarily aims to address. 1.2.1. Antimicrobial resistance genes ARGs encompass genes that confer a resistant phenotype in bacteria to one or more antimicrobial compounds. There are several mechanisms by which an ARG can confer resistance: 1.2.1.1. Target alteration An ARG can be a mutated version of a gene that encodes the protein target of an antimicrobial, thus decreasing the efficacy of the antimicrobial through a reduction in it’s ability to bind to the protein target. An example of this mechanism is the single nucleotide polymorphism (SNP) mutation of the DNA gyrase gene (gyrA), which alters the gyrase protein that is the target of fluoroquinolone antimicrobials. Consequently the mutated gyrase confers resistance by reduced fluoroquinolone binding (Hooper and Jacoby, 2015). 1.2.1.1. Target bypass In addition to target alteration, an ARG can also confer resistance by bypassing the antimicrobial target completely. This is exemplified by a cluster of glycopeptide ARGs (vanHAX) that result in a different bacterial peptidoglycan cell wall structure to which glycopeptides bind with approximately 1000-fold lower affinity (Henson et al., 2015). 3 1.2.1.2. Barrier to antimicrobial entry Antimicrobial compounds can be prevented from entering a bacterial cell entirely via ARGs that encode outer membrane proteins. This is a form of intrinsic resistance (Cox and Wright, 2013). 1.2.1.3. Antimicrobial efflux Antimicrobial efflux from a cell, via transporters such as multiple trans-membrane spanning proteins and resistance nodulation cell division efflux systems, can prevent certain antimicrobials reaching their targets and results in antimicrobial resistance (Li et al., 2015b). 1.2.1.4. Enzyme-based modification ARGs can encode enzymes that modify the antimicrobial target (e.g. via methylation) or the antimicrobial itself. A common example of an ARG that encodes an antimicrobial-modifying enzyme is the betalactamase group of hydrolases that target betalactam antimicrobials (Tang et al., 2014). 1.2.1.5. Non-enzymatic protection proteins Major sources of antimicrobial resistance are accounted for by ARGs that encode target- binding protection proteins. An example of this mechanism is the class of tetracycline ARGs that encode ribosomal protection proteins that bind to the ribosome and prevent tetracycline antimicrobials from inhibiting protein synthesis (Nguyen et al., 2014a). 1.2.2. Transfer of ARGs Bacteria can obtain ARGs spontaneously via gene mutations or they can acquire them, either vertically or horizontally. Vertical inheritance via cell division allows bacteria to transfer ARGs to their progeny. In addition to this, and of greater concern due to the rapid emergence of antimicrobial resistant pathogens, horizontal gene transfer can facilitate the acquisition of ARGs by unrelated bacteria. 4 Mobile genetic elements (MGEs) are a group of genetic constructs that facilitate the movement of DNA within genomes or between bacterial cells (horizontal gene transfer). Horizontal gene transfer can occur via several processes (transformation, transduction or conjugation), depending on the type of MGE (Frost et al., 2005). The major types of MGE that are implicated in the mobilisation of ARGs are: 1.2.2.1. Plasmids and other conjugative transfer elements Conjugative plasmids are stable, mobile elements that are able to replicate with autonomy. Plasmids can be grouped on the basis of their replication machinery; plasmids with the same type of machinery are unable to co-habit the same bacterial cell and this “incompatibility type” is the basis of a major form of plasmid classification (Frost et al., 2005). Due to the replication autonomy of plasmids, their ability to transfer to new hosts via conjugation and their ability to carry a wide array of genes, plasmids are key factors in the horizontal transfer of ARGs (Svara and Rankin, 2011). Non-conjugative plasmids can also harbour ARGs but require a conjugative plasmid to enable their transfer (O'Brien et al., 2015). 1.2.2.2. Transposons, integrons and phages Transposons are MGEs that contain gene cassettes, a transposase enzyme and terminal recombination regions that allow transposons to incorporate into and out of host genomes (Alekshun and Levy, 2007). Integrons are similar in structure to transposons and are genetic elements that can capture and rearrange gene cassettes (including ARGs) through the use of a site-specific tyrosine recombinase (integrase). Integrons are also able to promote transcription of their gene cassettes and they acquire their mobility through association with conjugative plasmids and transposons (Stalder et al., 2012, Escudero et al., 2015). Phages (bacteriophages) are MGEs that facilitate horizontal gene transfer via transduction; phages can infect a bacterial cell and incorporate their genetic material into the host genome, then replicating as part of the host (lysogenic cycle) or multiplying within the cell (lytic cycle) (Balcazar, 2014). ARGs have been found in the virome of environmental samples and phages have been implicated in the transfer of ARGs among bacteria (Colomer-Lluch et al., 2011). 5 There is concern that MGEs are a driving force behind the acquisition of antimicrobial resistances by pathogenic bacteria in a clinical setting (Canica et al., 2015). Analyses of isolates that pre-date the “antibiotic era” reveal a paucity of ARGs (Hughes and Datta, 1983), however analyses of modern day clinical isolates reveal a high incidence of ARGs and MGEs (Sunde et al., 2015). Indeed, the risks associated with MGEs and the horizontal acquisition of antimicrobial resistance is illustrated by the frequent finding that a pathogenic species of bacteria has incorporated one or more ARGs into its genome. For example, an isolate of the pathogen Yersinia pestis (the causative agent of plague) was found to have acquired a MGE that conferred resistance to six classes of antimicrobial drugs, including the two drugs that are frequently used to treat Y. pestis infection (tetracycline and chloramphenicol) (Galimand et al., 1997, Pan et al., 2008). Another example of a pathogen acquiring antimicrobial resistance through MGEs and that has resulted in great clinical concern is the enterobacteriaceae, Klebsiella pneumoniae. K. pneumoniae has been known to carry MGE- associated ARGs and has recently been found to have acquired carbapenamase ARGs via horizontal gene transfer (Robilotti and Deresinski, 2014). 1.3. The resistome The concept of the “antimicrobial resistome” was proposed by D’Costa et al. (2006) and is designed to encompass all ARGs existing in pathogenic bacteria and non-pathogenic bacteria present in both the clinical setting and environment, as well as the genes that can give rise to antimicrobial resistance (precursor elements) (D'Costa et al., 2006). The resistome is an important concept as it does not focus solely on resistant pathogens or instances of clinical resistance, due in part to the growing evidence that environmental bacteria are originating a large amount of the ARGs that are emerging in the clinical setting (Allen et al., 2010). The concept of the resistome provides a useful way of linking a reservoir of ARGs to the clinical incidence of antimicrobial resistance. Research into the resistome as well as the role of environmental bacteria in the emergence of antimicrobial resistance has shown instances of antimicrobial resistance in non-pathogenic bacteria, offering explanation as to the origin of certain ARGs. Studies of environmental bacteria found in soil (which are the originators of many clinically used antimicrobials) have isolated bacteria that can subsist on clinically relevant antimicrobials (some of these resistant bacteria were also closely related to human pathogens) (Dantas et al., 2008). It is understood that many anti-producing bacteria may have evolved ARGs to enable self-resistance to the 6 antimicrobial compounds that they produce. For instance, the glycopeptide resistance cluster vanHAX, a mechanism that plays a role in the widespread clinical cases of vancomycin resistant enterococci (Courvalin, 2006), appears to have originated in glycopeptide-producing bacteria for self-resistance. The vanHAX ARGs result in altered cell wall biochemistry, causing glycopeptide antimicrobials to have a low affinity for the peptidoglycan chain (Marshall and Wright, 1997, Marshall et al., 1999). It should be stated that the production of antimicrobial compounds by bacteria, and consequently the possession of ARGs for self- resistance, is thought to allow the bacteria to be more competitive for resources in a given environment. However, it has also been postulated that antimicrobial secretion by bacteria is used as an inter-cellular signalling mechanism (Yim et al., 2007). Given the wealth of ARGs in the resistome of a particular environment, as well as the presence of MGEs that carry ARGs, the environment is postulated to serve as a reservoir for ARGs that could be acquired by clinically relevant pathogens. An example of the interplay between environmental reservoirs of ARGs and antimicrobial resistant pathogenic bacteria in the clinical setting is that of the extended spectrum beta-lactamase (ESBL) blaCTX-M family of ARGs (Lahlaoui et al., 2014). The blaCTX-M family of ARGs has emerged as a major clinical problem as they confer resistance to third-generation cephalosporins. The blaCTX-M ARGs share low identity to other beta-lactamase ARGs and it was proposed that blaCTX-M ARGs emerging in the clinical setting were a new class of beta-lactamase. However, later studies found that the blaCTX-M family of beta-lactamases share a high level of similarity to a beta- lactamase ARG identified in a species of environmental bacteria closely related to the Kluyvera genus (Decousser et al., 2001, Olson et al., 2005). Importantly, the flanking regions of the blaCTX-M ARGs that have been found in pathogens appear to originate from Kluyvera chromosomes, suggesting the transfer of a clinically relevant ARG from environmental bacteria to clinically relevant pathogenic bacteria (Canton and Coque, 2006). It should be noted that despite a shared resistome between pathogens and environmental bacteria as well as evidence that indicates transfer of ARGs between these two groups, far greater research is needed to explore the link between these groups and assess the role of ARG dissemination in the environment. 7 1.3.1. Monitoring ARGs in the environment With an increasing interest in the role of environmental sources of ARGs in the development of clinically relevant antimicrobial resistance, the methodology to facilitate the monitoring of ARGs in environmental samples has seen increased development over recent years. 1.3.1.1. Bacterial cell culture Bacterial cell culture can be used to identify bacteria within a given sample through use of phenotypic testing (e.g. Gram staining, other biochemical testing, and colony morphology) and downstream molecular testing such as both polymerase chain reaction (PCR) testing and whole genome sequencing. Successfully cultured bacterial isolates can then be interrogated for ARG carriage through techniques such as antimicrobial susceptibility testing and PCR testing. However, there are several issues concerning the use of bacterial cell culture for bacterial identification or the downstream testing of ARGs. Importantly, only a relatively small proportion of bacteria can be cultured using a given type of laboratory media, therefore culturing is not suitable for large-scale monitoring of ARGs in diverse bacterial communities (Gasc et al., 2015). In addition to this, phenotypic testing of cultured bacterial isolates may reveal unknown combinations of characteristics that mean cultured bacteria cannot be identified (Stewart, 2012). Finally, in terms of ARG carriage, phenotypic testing cannot determine the exact ARG responsible for any identified resistance and relies on the ARG being expressed. Also, the use of PCR to identify ARGs in isolates can only test for a pre- determined ARG panel, is limited in its multiplexing capability and cannot generally identify novel ARGs. In light of the disadvantages of culture-based approaches, the use of specifically designed experimental protocols such as tailored growth media (Rasmussen et al., 2008) can be used to apply bacterial cell culture to test specific hypotheses or be used in conjunction with downstream testing such as antimicrobial sensitivity testing to form part of a wider assessment (Caplin et al., 2008). 8 1.3.1.2. Metagenomics A major advancement in the fields of microbiology and molecular ecology came with the advent of next generation sequencing technologies. This technology has allowed for the study of whole bacterial communities in a given environment using a technique termed metagenomics (National Center for Biotechnology Information, 2007). Metagenomics involves the sequencing of the collective DNA isolated from an environmental sample. The term originally encompassed 16S phylogenetic community profiling, whole genome shotgun metagenomics and functional metagenomics. 16S community profiling involves sequencing amplicons of the small ribosomal subunit (16S) of the collective bacteria in a sample to assess phylogenetic diversity (Degnan and Ochman, 2012). Whole genome shotgun metagenomics instead uses the total DNA from a given sample and can be used to, amongst other applications, identify and compare specific genes in microbial communities (Tringe et al., 2005). Functional metagenomics, which involves the cloning of metagenomic DNA into vectors (such as E.coli) and subsequent selective testing, is a powerful tool that can be used in the identification of novel ARGs in environmental samples as it does not require prior knowledge of ARGs thought to be present and can be applied to un-cultivable bacteria (Mullany, 2014). Sequencing the DNA insert of any vector that grows on the selective media identifies novel ARGs; this DNA sequence is then associated with resistance to the antibiotic in the selection media and is interrogated for the gene or genes that may be responsible for the observed phenotype (either by homology searching or de novo gene prediction). However, functional metagenomics is a resource-intensive tool that requires a large amount of time to create appropriate cloning libraries, test susceptibility to a wide-range of antimicrobials and characterise any identified ARGs (Perron et al., 2015). There has been a gradual reduction in the number of 16S community profiling metagenomic studies and a transition to whole genome shotgun metagenomics (hereafter referred to as just metagenomics) (Davenport and Tummler, 2013). The development of metagenomic analysis methodology such as species-specific biomarkers to facilitate the taxonomic binning of metagenomic sequencing (Segata et al., 2011) has resulted in metagenomics becoming a powerful tool in the analysis of environmental samples. The constant augmentation of sequence databases and the development of rapid metagenomic analysis pipelines such as MG-RAST (Meyer et al., 2008) and QIIME (Caporaso et al., 2010), have resulted in metagenomics becoming a viable tool for microbial monitoring applications. Indeed, 9 metagenomics is being used to help answer specific questions that are being asked of microbial communities in the environment (Mason et al., 2012). 1.3.2. Dissemination of ARGs in the environment Antimicrobial resistant bacteria, as well as un-metabolised antimicrobials and their residues, have been widely reported in the terrestrial and aquatic environments and are likely to have been distributed via a variety of sources. Effluents from wastewater treatment plants (WWTPs), as well as from hospital, farm, aquaculture and pharmaceutical manufacturing waste, have been shown to harbour significant levels of antimicrobial compounds (Hirsch et al., 1999, Golet et al., 2001, Dolliver and Gupta, 2008, Charpentier et al., 2011, Kristiansson et al., 2011, Kümmerer and Henninger, 2003) and antimicrobial resistant bacteria (Schwartz et al., 2003, Fuentefria et al., 2011, Heuer et al., 2002, Caplin et al., 2008) that may be a direct result of anthropogenic activity. Also, the agricultural application of effluents from medicated animals (Jørgensen and Halling-Sørensen, 2000) and the use of antimicrobials in crop protection (McManus et al., 2002) has been shown to lead to the terrestrial accumulation of antimicrobial compounds and antimicrobial resistant bacteria (Forsberg et al., 2012). These in turn can enter watercourses subsequent to rainfall, drainage and leaching (Dolliver and Gupta, 2008, Kemper, 2008). This widespread distribution of antimicrobial resistant bacteria and antimicrobial selection pressures in the environment is likely to have resulted in the augmentation of environmental reservoirs of ARGs and the resistome. Indeed, several studies have demonstrated the presence of ARGs in a wide-ranging selection of environmental samples using different techniques e.g. many experiments have used PCR to identify clinically relevant ARGs in environmental samples, such as glycopeptide ARGs in agricultural and garden soil samples (Guardabassi and Agerso, 2006) and tetracycline ARGs in effluents of urban residential areas (Li et al., 2015a). Other studies have utilised metagenomics to interrogate environments such as polluted lakes for ARGs that may have arisen due to the identified pollution sources (such as pharmaceutical manufacturing waste effluents) (Bengtsson-Palme et al., 2014). There is also an ever- increasing body of scientific evidence that ARGs are present in a plethora of aquatic environments and there are numerous sources that are likely to contribute to these environmental reservoirs (Zhang et al., 2009). However, there are no studies that have 10 assessed the factors contributing to ARG dissemination in a single defined aquatic environment over a period of time. Environmental reservoirs of ARGs, in conjunction with the global contamination of the biosphere with antimicrobial products, may allow for the propagation of antimicrobial resistances that could potentially impact on human and animal health. Several studies have speculated transmission routes of ARGs from the environment to humans and animals, with ARGs found to be present in treated drinking water and food sources (Schwartz et al., 2003, Pruden et al., 2006, Rodriguez et al., 2006) suggesting important clinical relevance to the fate of ARGs in the environment. The impact of the environmental release of ARGs on human health and the evolution of environmental bacteria is beginning to be highlighted (Martinez, 2009). As a result of this there is an increased awareness that the characterisation of ARGs in the environment is required to enable a detailed understanding of antimicrobial resistance. With the recent recognition of ARGs as being environmental contaminants (Pruden et al., 2006, Auerbach et al., 2007), it is vital that the dissemination of ARGs is monitored in order to assess the potential risk regarding the emergence and transmission of antimicrobial resistance. 1.4. Antimicrobial resistance risk assessment Recent reviews of the emergence and spread of antimicrobial resistance have highlighted current research aims that can be summarised into three broad components: i. the emergence of resistance in the environment, ii. antimicrobial resistance dissemination and iii. antimicrobial resistance transmission to humans (Berendonk et al., 2015). There are increasing calls to incorporate these components into overarching antimicrobial resistance risk assessments that detail potential routes dissemination, such as those represented in Figure 1.1. The purpose of a risk assessment being to identify and mitigate the factors that are contributing to the emergence and spread of antimicrobial resistance. 11 Figure 1.1 Graphical overview of potential routes of ARG dissemination. The figure is comprised of three components, which are linked by ARG transfer. The coloured arrows indicate transfer of ARGs: purple represents transfer within human populations, green represents transfer of ARGs within the environment and non-human animal species and orange represents ARG dissemination. A core principal behind a unified antimicrobial resistance risk assessment is that the three components can be linked by ARG transfer; depicted in the figure by coloured arrows that correspond to the ARG source and direction of ARG transfer (Figure 1.1). As illustrated by the figure, the dissemination of ARGs plays a pivotal role in linking the emergence of antimicrobial resistance in the environment and the transmission of antimicrobial resistance to humans. As outlined above, ARGs have been found extensively in effluents that link faecal sources (both human and animal) to the wider aquatic environment. However, these studies have not monitored the dissemination of ARGs over multiple time points or used a single aquatic catchment to investigate the relative importance of multiple effluent inputs contributing to a single aquatic environment. It has been proposed that an environmental framework to monitor ARG dissemination is required in order to achieve antimicrobial resistance risk assessments (Port et al., 2014). With this in mind, the aim of the current 12 programme of research is to develop an environmental framework that will allow the dissemination of ARGs to be monitored in effluents that are entering the aquatic environment. 1.4.1. Monitoring ARG dissemination in the aquatic environment In relation to the proposal that an environmental framework to monitor ARG dissemination is required, it has been suggested that sequencing technologies such as metagenomics are good candidates for facilitating quantitative microbial risk assessment as they can target a large array of ARGs both quickly and semi-quantitatively (Brul et al., 2012). The present review has highlighted that metagenomics is an ideal tool for the study of bacterial communities in environmental samples due to non-reliance on bacterial cell culture and the availability of reference databases and analysis tools. Metagenomics therefore offers considerable potential for facilitating the monitoring of ARG dissemination and with the development of ARG-specific metagenomic methodology and analysis tools it is likely to form the basis of the environmental framework that the work described in this dissertation will help to develop. In addition to developing a metagenomics-based environmental framework to monitor ARG dissemination, the work in this dissertation investigates additional approaches that can be utilised to facilitate monitoring and will examine factors that may contribute to ARG dissemination and the emergence and persistence of resistance. 13 Chapter 2. Search Engine for Antimicrobial Resistance 2.1. Preface As outlined in Chapter 1, current laboratory methods for the detection and surveillance of antimicrobial resistant bacteria in the environment are limited in their effectiveness and scope. As an alternative to culture and polymerase chain reaction approaches, metagenomics is able to identify known genes within the genetic content of environmental samples. Therefore, metagenomics has been proposed as a viable technology to assess the resistome. However, the ability to interrogate sequencing data quickly and easily for the presence of antimicrobial resistance genes (ARGs) is crucial in order to utilise metagenomics as part of an environmental framework to monitor ARG dissemination. This chapter presents the Search Engine for Antimicrobial Resistance (SEAR), a pipeline and web interface for detection of horizontally acquired antimicrobial resistance genes in raw sequencing data. This study has since been published in PLoS ONE (manuscript and source code available in Appendix 1). 2.2. Introduction The global threat of antimicrobial resistance is growing at an alarming rate; infections that were once easily treatable now constitute public health crises (Sack et al., 2001). This has lead to the consensus that more must be done to monitor and combat the occurrence and spread of antimicrobial resistance (World Health Organisation, 2012, Laxminarayan et al., 2013). Current diagnostic laboratory practice for the detection of antimicrobial resistance relies on isolate culturing, followed by growth inhibition assays for the identification of resistant phenotypes and determination of Minimum Inhibitory Concentrations against a range of antimicrobials (MICs) (Public Health England, 2013). Alternatively, ARGs can be identified using polymerase chain reaction (PCR) and quantified using real-time PCR, requiring specific primers for the amplification of target sequences (Espy et al., 2006). These approaches take time, consume resources, and have limitations that may result in clinically relevant resistances being undetected e.g. phenotypic testing will miss non-culturable bacteria and non-expressed ARGs, whereas limitations of multiplex composition and size in 14 molecular testing complicates the detection of ARGs (Diekema and Pfaller, 2013, Heiman et al., 2014). Perhaps not surprisingly, the Centers for Disease Control and Prevention (CDC) identified one of the current downfalls in the approach to combatting antimicrobial resistance as the poor use of advanced molecular detection (AMD) technologies (Centers for Disease Control and Prevention, 2013). AMD technologies, such as the whole genome sequencing of bacterial isolates as well as uncultured bacteria (metagenomic sequencing), have the potential to identify antimicrobial resistance more quickly and effectively than conventional laboratory assays (Centers for Disease Control and Prevention, 2013). In addition to these well-understood advantages, AMD technologies can also be applied to circumvent the requirement of prior knowledge of causative agents and provide clinically relevant information for the treatment and surveillance of pathogens as well as antimicrobial resistance (Miller et al., 2013). Upon receipt of a metagenomic (e.g. environmental or faecal microbiome) or isolate sample, DNA can be extracted, compiled into a library and sequenced within hours (Illumina, 2013). Indeed, AMD approaches to pathogen detection are currently being developed and seek to identify pathogens directly from metagenomic samples within clinically relevant timeframes (Naccache et al., 2014). Recent studies have also shown AMD to be effective in the epidemiological tracking of pathogens, as well as the detection of ARGs present in their genomes (Koser et al., 2012, Harrison et al., 2013). AMD offers an alternative screening tool that may be quicker than traditional culture-based techniques. For example, the detection of Legionella species in clinical pneumonia cases requires the inoculated isolation media to be incubated for several days in order to diagnose infection and characterise antimicrobial resistance, as there is no commercially available molecular assay for Legionella species (Reller et al., 2003). This highlights the potential for developing more efficient diagnostic tests and the utilisation of AMD technologies to create more rapid alternatives for ARG detection. In addition to these direct clinical applications, AMD technologies are also beginning to become a common tool in the detection of ARGs in the environment, which is vital for identifying reservoirs of ARGs (Lewin et al., 2013, Mason et al., 2012, Oh et al., 2013). However, there is need to establish a metagenomic framework for use in the monitoring of ARGs within the environment in order to influence public health decisions and the growing concern over antimicrobial resistance (Port et al., 2014). This must include the development of reliable surveillance methods and tools for risk assessment (Berendonk et al., 2015). When designing metagenomic tools for the environmental monitoring of ARGs, it is therefore 15 necessary to provide context in terms of the relative abundance of ARGs, so that these can be correlated with environmental variables (e.g. such as antimicrobial concentrations, etc.) as well as to obtain information on the mobile genetic elements (MGE) and pathogens that they are associated with. Currently published resources available for ARG detection are online databases that use the Basic Local Alignment Search Tool (BLAST) algorithm to find possible matches between the database and query sequences (e.g. ARDB, CARD, ResFinder) (Liu and Pop, 2009, McArthur et al., 2013, Zankari et al., 2012, Altschul et al., 1990). To the author’s knowledge, no existing tools give an ARG abundance measure or simultaneously provide MGE information. The targeting of full-length gene matches using BLAST requires a sequence assembly step, adding time, infrastructure requirements, and complexity to the analysis. Furthermore, full-length gene assembly is often difficult to achieve in metagenomic samples where coverage is frequently low and uneven across the sample. Ideally, raw sequencing data would be used directly to rapidly identify and quantify ARGs of interest. Although mapping-based approaches have been used for individual studies (Hu et al., 2013, Forslund et al., 2013) and tools that work directly with reads (though on non-ARG databases) such as the SEED subsystems and SRST2 can be applied to work to this aim (Inouye et al., 2014), there is as yet no such ARG-detection algorithm. This chapter documents an automated pipeline, the Search Engine for Antimicrobial Resistance (SEAR), which quickly and accurately identifies antimicrobial resistance information from biological samples. Furthermore, it also provides abundance estimates and returns the true sample full-length reconstructed gene sequence. To demonstrate efficacy, the pipeline was applied to a range of sequencing data types including novel environmental metagenomes, human faecal metagenomes and clinical isolates of pathogenic enteric bacteria (Shigella sonnei). 2.3. Materials and methods 2.3.1. SEAR requirements 2.3.1.1. Reference databases SEAR requires reference databases for read subtraction (optional) and read clustering. The user can supply a read subtraction reference database in the form of a BWA index, such as the human genome (HG19 build, (Genome Reference Consortium, 2009)) or Escherichia coli 16 K12 genome (these are not supplied in the SEAR package to reduce file size but can be readily downloaded (https://genome.ucsc.edu/)). SEAR requires a reference database for the read-clustering step, the default supplied is the ARGannot database (Gupta et al., 2014) but other options are available (such as CARD (McArthur et al., 2013)) and the user can supply any multiFASTA file as a database. The supplied ARGannot database was customized as follows: ARGannot ARGs were clustered at 97% identity using USEARCH (Edgar, 2010) and the representative sequence for each cluster was added to the pipeline’s ARG database. Each cluster and representative sequence is annotated with gene type and the class of antimicrobial to which the gene confers resistance. The Shigella reference database for the benchmarking test was made by downloading the FASTA files for each ARG tested (n=19) in the Holt et al. study, removing duplicate entries and creating a multiFASTA file (n=16). 2.3.1.2. Hardware Minimum hardware requirements for SEAR comprise a Unix server (tested using Ubuntu 10.04) with ~2 GB of disk space for reference data and software dependencies. Whilst running, SEAR requires up to 2X the input FASTQ file size (bytes) in both RAM and disk space for temporary file storage. 2.3.2. SEAR 2.3.2.1. The pipeline SEAR is a pipeline consisting of Perl, Shell and R scripts that call on several pieces of open source software and utilise a customisable reference database to annotate ARGs direct from short-read sequencing data. SEAR is downloadable from http://computing.bio.cam.ac.uk/sear/SEAR_WEB_PAGE/SEAR.html, in stand-alone command-line and web-based versions (Figure 2.1). 17 Figure 2.1 Screen shot of SEAR web interface including homepage (A) and quick start settings (B). 18 The pipeline follows five main steps in the annotation of ARGs: (1) processing of input files, (2) clustering of sequence reads to known ARGs in user-defined (or pre-loaded) database, (3) mapping of reads to reference sequences, (4) ARG annotation and calculation of relative abundance and (5) local alignment of annotated ARGs to online databases. (1) Processing of input files The pipeline accepts raw or compressed (.gz) FASTQ files (either 33 or 64 ASCII encoding) from metagenomic, metatranscriptomic or isolate sequencing. Where more than one input file (e.g. paired-end data) is provided, these files are merged to give a single input file (pair-end information is not currently utilised in the pipeline). The pipeline has the optional step of pre- filtering reads, by removing those that map against a user-defined reference, such as the human genome or a bacterial strain. FASTQ files are quality checked using user-defined cut offs and converted to FASTA formatted reads. (2) Clustering of sequence reads to ARG database The pipeline is supplied with a custom ARG database that has been built by clustering and annotating the ARGs held in the ARGannot-database (Gupta et al., 2014). Notably however, other ARG databases can be used or the user can use a custom FASTA file. Reads are clustered to the ARG database by global alignment with USEARCH (version 7.0.959) using a default identity cut-off of 99% (Edgar, 2010). Where multiple matches occur, the read is clustered with the highest identity match. SEAR parses the clusters by grouping reads to each matched reference gene and retrieving corresponding FASTQ information for each matched read. (3) Mapping of clustered sequence reads to ARG references The Burrows-Wheeler Aligner (BWA-mem version 0.7.8) (Li and Durbin, 2009) is used for read mapping each cluster of FASTQ reads to the corresponding reference gene. Samtools is then used to analyse the BWA alignment and generate a consensus sequence using mpileup (Li, 2011). (4) ARG annotation and relative abundance The consensus sequences are used to annotate ARGs and calculate relative abundance values; an ARG is present in the sample if sequence reads can be mapped to the ARG reference sequence above the defined coverage cut-off (coverage is the percentage length of reference ARG with mapped reads). For relative abundance calculation, SEAR uses a similar method to the reads per kilobase/million reads (RPKM) method that is commonly used in transcriptome studies (Mortazavi et al., 2008). When calculating relative abundance, the total number (n) of ARGs that have been annotated are used to calculate a relative abundance 19 (RA) percentage for each ARG. Firstly, an abundance value (A) is calculated for each gene according to: , where X denotes the number of reads that successfully mapped, Y denotes the total number of reads in the input file/s and L denotes the length of the reference gene (in bases). Relative abundance is then calculated using: In this way, the relative abundance measure describes the proportion of sequence reads that have built the consensus sequence of each annotated ARG from a single pipeline run. (5) Local alignment The consensus sequences for annotated ARGs are aligned to the NCBI nucleotide and protein databases using command line BLAST (Altschul et al., 1990) (using the –remote BLAST service by default, see documentation to utilise local database versions). In addition, sequences are also aligned to the current Repository of Antimicrobial Resistance Cassettes (RAC) (Tsafnat et al., 2011) and Antibiotic Resistance Database (ARDB) (Liu and Pop, 2009) databases using BLAST (though ARDB has not recently been curated). 2.3.2.2. Pipeline outputs In both command-line and web versions of SEAR, output includes: graphical overview, ARG annotations, relative abundance scores, consensus sequences, flat files (html, csv, blast files) and links to further gene information and homologues found in online databases (such as the repository of antimicrobial resistance cassettes, NCBI non-redundant nucleotide and protein databases). 2.3.3. Demonstrating SEAR utility 2.3.3.1. Data sets and parameters used in this study Several datasets were used to demonstrate the utility of this pipeline across broad data categories. All datasets were analysed using a UNIX server (Ubuntu 10.04) running SEAR with default parameters (99% clustering identity and 90% coverage cut-off for ARG annotation, full default parameter list found in Table 2.1). A = (X Y ) L RA = ( A i=1 n )*100 20 Table 2.1 List of parameters and default settings for SEAR. Parameter Default Explanation --fqformat (-ff) 33 ASCII offset for the input fastq files. Accepts either 33 or 64. --lengthcutoff (-lc) 70 Discard sequences with length < lc. --qualitycutoff (-qc) 20 Quality score cutoff for input fastq files. --filter (-f) N Filter reads by mapping to a reference database (default db = Human Genome) and discarding mapped reads. Accepts either Y or N. --coveragecutoff (-cc) 90% The coverage cut-off parameter dictates what proportion of the reference sequence must be covered by reads for a successful annotation. In this way, the annotation stringency is controlled and customisable. --clusteringident (-ci) 0.99 Identity value for usearch clustering. --references (-r) arg_annot_database.fa The reference gene dataset to use. --threads (-t) 1 The number of threads to use in steps that allow multi-threading. 2.3.3.2. Novel environmental metagenomes Samples were collected from two effluent sources within the River Cam Catchment, Cambridge, UK on the 21st June 2012. A map of the river catchment is included in Appendix 2. The waste effluent of the University of Cambridge dairy farm (latitude: 52.22259, longitude: 0.02603) was sampled prior to it being applied to the surrounding fields as fertiliser, where it subsequently enters drainage ditches that drain into the River Cam. The effluent of the municipal wastewater treatment plant (WWTP) (latitude: 52.234469, longitude: 0.154614) was collected from the effluent discharge pipe that enters the River Cam. Samples were collected in 10 L sterile polypropylene containers. Sample volumes were based on the microbial abundances, as previously determined for these sites using a DNA extraction series (data not shown). Samples were transported at 4°C to the laboratory and processed within 2 hours. Similarly as in Dancer et al. (Dancer et al., 2014), samples were filtered under pressure at approximately 2 bar using a pressure vessel system (10 L SM 1753, Sartorius). Samples were first pre-filtered through 3.0 µm membranes (Millipore) at 2 Bar to remove eukaryotic 21 cells and debris. The filtrate was subsequently filtered through 0.22 µm membranes (Millipore) to capture the prokaryotic cells, metagenomic DNA was then extracted by washing and vortexing the membranes in phosphate buffered saline with Tween20 (2%) before enzymatic lysis (Meta-G-Nome DNA isolation kit; Epicentre). Assessment of DNA quality and concentration was made by TBE agarose (2%) gel electrophoresis and spectrophotometry (Nanodrop ND-1000; ThermoScientific). For each sample, 2 µg of DNA was sequenced by the Eastern Sequence and Informatics Hub, Cambridge, UK. Two libraries (seventy-five base pair, paired-end reads) were prepared from the samples and were sequenced using an Illumina HiSeq2000. The FASTQ files for the WWTP and farm effluent metagenomes are available via the European Nucleotide Archive (ENA) (study: ERP003955). Sample accession numbers are as follows: farm effluent (ERS786322), WWTP effluent (ERS781558) (Appendix 3). 2.3.3.3. Pre-existing metagenomic and clinical isolate data Human Microbiome Project (HMP) data for 32 Spanish human faecal microbiomes (for which the ARGs have previously been characterised in an in silico study by Forslund et al. (Forslund et al., 2013)) was downloaded from the ENA website (study: PRJEB1220) (accessed: 02.03.2015) (Human Microbiome Project, 2012). Additionally, SEAR was used to detect ARGs in a global dataset of 126 clinical isolates of the pathogenic bacteria Shigella sonnei (SRA Study ERP000182) (Holt et al., 2012). The FASTQ files for the 126 isolates were downloaded from the Sanger FTP site (study: PRJEB2128) (accessed: 02.03.2015). In the case of the clinical isolates, SEAR ARG detection was compared with the published ARG content of the isolates, with SEAR being run with default parameters on a custom reference database of ARGs originally detected by 100% mapping (Holt et al., 2012). Further details on sequencing datasets are provided in Appendix 4. 2.4. Results To test the utility of SEAR the pipeline was run using a variety of sample types (environmental metagenomes, human faecal microbiome and bacterial clinical isolate), recorded pipeline run times (Table 2.2) and then investigated the presence and abundance of ARGs in all samples. 22 Table 2.2 Example runtimes for SEAR using default parameters and server settings. Name Sample ID Type File size (MB) Time (mins) ShIB1976 ERR025684 Clinical isolate 434 6.1 O2.UC1-0 ERR209529 HMP metagenome 3200 36 WWTP effluent ERS781558 Environmental metagenome 14000 194 2.4.1. Discrimination of ARG presence and abundance between environmental metagenomes A total of 28 (15 in each) ARGs were identified among the environmental metagenomes from WWTP effluent and farm waste effluent (Figure 2.2). Only two genes, strA and strB (both conferring aminoglycoside resistance), were common between the metagenomes and each gene found in both sets was five times more abundant in the WWTP effluent compared to the farm effluent when using the normalised abundance values for the combined datasets. The WWTP effluent had ARGs conferring resistance to a total of four antimicrobial classes with the most diverse (i.e. greatest number of ARGs) being the aminoglycoside class and the greatest abundance being ARGs conferring tetracycline resistance. In contrast, the farm effluent had ARGs conferring resistance to five antimicrobial classes with the most diverse being the beta lactam class and the most abundant also being tetracycline resistance (Figure 2.2). The most abundant ARGs in the metagenome datasets were tetracycline resistance genes; tetC (41.6%) in the farm effluent and tet39 (15.3%) in the WWTP effluent. A subset of ARGs identified by SEAR (tetA, qnrB and bla-ACT; chosen to encompass clinically relevant resistances, drugs with both a long and short history of resistance and chemically diverse antimicrobials) was confirmed in the original farm effluent DNA sample using PCR. Briefly, primers were designed using Primer3 (Rozen and Skaletsky, 1988) and were amplified using GoTaq DNA polymerase (Promgega) (not shown). 23 Figure 2.2 SEAR results for environmental metagenomes. The column chart in A shows the breakdown of the number of ARGs in each effluent, grouped by antimicrobial class. The column chart in B shows the relative abundance of ARGs found in each metagenome (coloured according to the key). The MLS class of antimicrobial represents macrolides, lincosamides and streptogramins. 24 2.4.2. Efficacy of SEAR for detecting ARGs in human faecal microbiomes To assess the efficacy of SEAR for detecting ARGs in microbiome data, SEAR was tested on 32 faecal microbiome samples (Appendix 4). ARGs were detected in 31 of the samples and a total of 295 genes conferring resistance to 6 classes of antimicrobials were identified across the samples (Table 2.3). Genes conferring resistance to tetracyclines were again the most common ARGs identified (39% of total ARGs detected). Table 2.3 SEAR detection of ARGs across antimicrobial classes in human faecal microbiomes. The table shows the number of genes identified in each antimicrobial class for the combined dataset of HMP samples. Antimicrobial class Number of ARGs Aminoglycosides 54 Beta lactams 38 Quinolones 0 Glycopeptides 0 MLS 82 Phenicols 1 Rifampicin 0 Sulfonamides 5 Tetracyclines 115 Trimethoprims 0 2.4.3. Accuracy of SEAR ARG detection using clinical isolate sequencing data To evaluate SEAR’s efficacy in detecting ARGs in clinical isolate sequencing data, SEAR was run on sequencing data from 126 isolates of the enteric pathogen Shigella sonnei. To evaluate SEAR’s performance, the results were compared to the ARG detection data presented in the original publication (Holt et al., 2012). Of the 231 detection events (see methods for criteria) originally presented in the publication, SEAR identified 221 of these, and a further 20 ARGs (Table 2.4, full results shown in Appendix 5). 25 Table 2.4 Accuracy of SEAR ARG detection using clinical isolate sequencing data. The contingency table compares the detection and non-detection of ARGs by SEAR relative to the published ARG detection data for 126 S. sonnei isolates. Reported in Holt et al. (Holt et al., 2012) detected not-detected Total SEAR results detected 221 20 241 not-detected 10 0 10 Total 231 20 2.5. Discussion SEAR is an ARG annotation tool that is freely available and may be downloaded as a cloud compatible web interface or a stand-alone command line program. It offers advantages over currently available ARG annotation tools as it provides ARG annotations, relative abundance values, gene sequence and gene information from raw sequencing data without requiring any sequence assembly. In contrast to tools based on BLAST comparison of de novo assemblies, the clustering and mapping approach used by SEAR, combined with the customisable database and annotation parameters, allows the user to detect putative ARGs in incomplete or low coverage sequencing data that is common in metagenomic analyses. SEAR successfully identified ARGs in sequencing datasets that were generated from novel environmental metagenomic samples, human microbiomes and clinical isolates of Shigella sonnei. SEAR was able to detect the ARGs present in two novel environmental metagenomes allowing direct comparison between two different wastewater effluent samples. SEAR identified meaningful differences among ARGs of clinical interest, for example the presence of quinolone resistance genes (qnrB and qnrS) exclusively in the wastewater effluent from the farm source. It also showed that the two sources had different qualitative ARG characteristics (with either aminoglycosides or beta lactams being the most diverse antimicrobial resistance class) and in both sources tetracycline resistance genes were present in the greatest abundance. In addition to detecting important differences among these sample types, the confirmation of a subset of identified ARGs by PCR demonstrated the robustness of the pipeline. Similarly, SEAR was effective for identifying ARGs from clinical samples. ARGs were detected in human microbiomes demonstrating the potential of using metagenomic analyses for the surveillance and management antimicrobial resistance. Additionally, SEAR 26 successfully identified ARGs in a global dataset of 126 clinical isolates of an important enteric pathogen. There were a few discrepancies, which were consistent with a given isolate or gene family, however the results were overwhelmingly consistent. Furthermore, the congruence of ARG detection results from SEAR with the published ARG content of the isolates further highlighted the effectiveness of the pipeline, providing further compelling argument for the application of high-throughput AMD into clinical microbiology. SEAR offers increased functionality over existing bioinformatic tools by providing a consensus sequence of annotated ARGs, links to online resources containing information on the ARGs (and gene homologs) and a relative abundance estimate for each ARG detected. Each ARG consensus sequence is generated using reads that clustered to a reference sequence and consequently any variability in the consensus sequence in a metagenomic sample may be due to either sequencing noise or the presence of multiple bona fide sequence variants. The relative abundance estimate is relative within an individual sample, however the SEAR output features the information required to calculate relative abundance across multiple samples. Due to possible large variations in user file size and upload speed, the SEAR interface and command line tool are available for use as downloadable packages. SEAR is designed for detecting ARGs that are horizontally acquired, not antimicrobial resistance that is caused (or inactivated) by single nucleotide polymorphisms (SNPs) e.g. SNPs in the gyrA gyrase gene that result in quinolone resistance. SNPs are not currently tested for due to the annotation parameters being calibrated for detecting partial ARG matches to compensate for low sequencing coverage. Hence, such SNPs may be missed by SEAR due to the number of mismatches permitted or by a low coverage cut-off (though these are both customisable settings). For these reasons, it is not recommended to include SNP- based resistances in reference databases used with SEAR as they may lead to false positives. The detection of SNP-based resistances in metagenomic samples represents a significant future challenge that needs to be addressed. It should also be stressed that the default SEAR parameters, which are based on high-stringency read clustering and mapping, result in an analysis that finds ARGs that are known in the reference data and it is not suited for discovery of emergent ARGs. The high-stringency settings are designed to exclude the possibility of non-competitive read mapping causing false positive results by ensuring that annotated ARGs have a high sequence identity compared to the reference database. 27 2.6. Conclusion This chapter has presented a bioinformatic pipeline that is highly effective at detecting ARGs directly from raw sequencing reads, in addition it also provides relative abundance estimation and sequences of identified genes. The work in this chapter has illustrated the application of SEAR on sequence data from metagenomic datasets and bacterial isolates. The work in this chapter has also demonstrated the application of SEAR in potential clinical and environmental monitoring applications, highlighting the advantages of automated interpretation of sequencing data for generating timely and informative reports for informing public health and potentially clinical decision-making. With the increasing drive to integrate AMD technology and existing laboratory assays in order to combat antimicrobial resistance, this pipeline has been presented as a valuable step towards this important goal. 28 Chapter 3. Comparative metagenomics reveals a diverse range of antimicrobial resistance genes in effluents entering a river catchment 3.1. Preface The work in this chapter has been designed as a pilot study to test the suitability of metagenomics for monitoring antimicrobial resistance genes (ARGs) and to address the underlying assumption of this dissertation; effluents entering the aquatic environment are disseminating ARGs. Chapter 2 described the design and implementation of SEAR; a bioinformatic pipeline to detect horizontally acquired ARGs in sequencing data. This study uses SEAR to interrogate novel metagenomes generated from effluents entering a single river catchment, determining the relative abundance of ARGs in relation to the environment they are entering. This study has been accepted for publication in Water Science and Technology. 3.2. Introduction Antimicrobial resistance remains a significant and growing concern for both human and veterinary clinical practice (Levy and Marshall, 2004, Davies and Davies, 2010a), with infections that were once readily treated now being resilient to antimicrobial therapy (World Health Organisation, 2012). The use of antimicrobial compounds exerts selection pressures on bacteria, leading to the fixation of gene mutations, selection of resistant precursors and the up-regulation and lateral transfer of antimicrobial resistance genes (ARGs) within prokaryotic communities (Gillings, 2013). The maintenance and transfer of ARGs is responsible in part for the rising threat of antimicrobial resistance (Laxminarayan et al., 2013). The collective pool of ARGs in a given environment is termed the resistome (D'Costa et al., 2006, Wright, 2007). Although a proportion of these ARGs are genes that have evolved to utilise antimicrobial compounds for functions other than defence, such as signalling molecules or constituents of metabolic pathways (Linares et al., 2006, Dantas et al., 2008), the resistome may also serve as a reservoir for ARGs that can be transferred to clinically 29 significant pathogens (Forsberg et al., 2012, Wellington et al., 2013). Indeed, ARGs are commonly associated with mobile genetic elements (MGEs) that facilitate the transfer of ARGs between bacteria and enable their entry into the accessory genome of pathogenic bacteria (Gaze et al., 2013). There is growing evidence showing that aquatic environments harbour ARGs, MGEs and pathogenic bacteria (Chen et al., 2013, Lu et al., 2015, Devarajan et al., 2015). It is also likely that these environments may host many uncharacterised and novel ARGs that may be selected for under sufficient selection pressures (Bengtsson-Palme et al., 2014). Effluents that feed into the aquatic environment have also been shown to contain ARGs, such as the effluents of urban residential areas and hospitals (Li et al., 2015a), as well as other wastewater and faecal sources (Li et al., 2012, Pruden et al., 2006, Zhang et al., 2009) but the abundance and diversity of these genes relative to background samples needs to be clarified. It is therefore crucial to establish whether effluents entering the aquatic environment are carrying ARGs, along with MGEs and pathogenic bacteria, thus contributing to the reservoirs of resistance genes that may be utilised by pathogenic bacteria and subsequently re-enter human and animal populations (Berendonk et al., 2015). Previous studies into the presence of ARGs within the aquatic environment have utilised techniques such as bacterial culture and polymerase chain reaction (PCR) (Tao et al., 2010, Zhang and Zhang, 2011, Lu et al., 2015). These techniques offer the ability to detect phenotypic resistance (culture), or a panel of ARGs (PCR), but they are limited by culturing bias or inadequate detection panels. Next generation sequencing techniques, such as metagenomics, offer the ability to circumvent these limitations and identify all known ARGs within a sample (if suitable reference sequences are available), providing a new approach for the environmental monitoring of antibiotic resistance (Port et al., 2014). In this chapter, two distinct effluents that enter a single river catchment were identified. Both effluents originated from faecal sources and were sampled several times, immediately prior to them entering the environment. Using a comparative metagenomic approach, the work in this chapter described the ARG content of these effluents, characterised the MGEs and pathogenic bacteria present, and related the abundance of these features to a background sample of the river source water, taken from upstream of the effluent entry points. 30 3.3. Methods 3.3.1. Sample collection and DNA sequencing Water samples were collected from three sources within the River Cam Catchment, Cambridge, UK by grab sampling. A pilot collection was made on 21st June 2012 and is described in 2.3.3.2. Further collections from the municipal wastewater treatment plant (WWTP) and the University of Cambridge dairy farm (both detailed in 2.3.3.2) were made on the 2nd May 2013 and 4th August 2014. The river source water of the River Cam was collected at Ashwell Spring (latitude: 52.0421, longitude: 0.1497) once on the 2nd May 2013. Samples were collected in 10 L sterile polypropylene containers, transported at 4°C to the laboratory and processed using the same methodology described in 2.3.3.2. For each sample, 2 µg of DNA was used to generate Illumina paired-end libraries that were sequenced using an Illumina HiSeq2500 (Exeter Sequencing Service, UK). A map of the River Cam catchment is included as Appendix 2. A full description of the metagenomic samples used in this chapter is available in Appendix 3. 3.3.2. Bioinformatic analyses 3.3.2.1. Identification of ARGs ARGs were identified using the Search Engine for Antimicrobial Resistance (SEAR) with default parameters (Rowe et al., 2015). A full description of SEAR is available in Chapter 2. In brief, SEAR quality checked and filtered metagenomic reads, clustered the filtered reads to the ARG-annot (Gupta et al., 2014) database of horizontally acquired ARGs and used the resulting clusters to map the reads and generated a consensus sequence for each ARG in the query metagenome. Consensus sequences were then aligned to online databases (NCBI genbank, RAC, ARDB), annotated and given an abundance value based on the Reads Per Kilobase per Million (RPKM) value from the read-mapping stage. 3.3.2.2. Identification of mobile genetic elements Identification of MGEs was performed by mapping metagenomic reads to a custom MGE database using BWA-mem (default options) (Li and Durbin, 2009). The MGE database was 31 built from the NCBI Refseq plasmid genomes dataset, combined with the representative sequences generated from clustering the Integrall dataset (Moura et al., 2009) at 97% identity using USEARCH (Edgar, 2010). MGE mapping results with less than 90% coverage of the reference sequence were discarded from the analysis. Successfully mapped sequences where then binned into class I and class II integrons, transposons and mobilisable plasmids. 3.3.2.3. Abundance analysis The ARG and MGE abundance data was normalised to the number of 16S rRNA sequences as in Bengtsson-Palme et al. (Bengtsson-Palme et al., 2014). In brief, bacterial 16S rRNA sequences were extracted from each metagenome using Metaxa 2.0 (Bengtsson-Palme et al., 2015) using default settings and then grafted to sequences from the SILVA RNA database using Megraft (Bengtsson et al., 2012) and subsequently clustered using USEARCH (Edgar, 2010). ARG abundance values were normalised to 16S sequences by dividing the number of extracted 16S sequences by the length of the 16S gene (Bengtsson- Palme et al., 2014). 3.3.2.4. Taxonomic profiling and pathogen detection Taxonomic profiling of metagenomes was carried out by mapping sequencing reads to clade- specific marker genes using the Metaphlan package (Segata et al., 2012) (default parameters). Metaphlan output was then cross-referenced to the PATRIC database of pathogenic bacteria (Gillespie et al., 2011) to annotate potential human-specific bacterial pathogens. Biomarker discovery and identification of differentially abundant features between metagenomes from 2012, 2013 and 2014 was performed using LEfSe (Segata et al., 2011). Taxonomic profiling and pathogen data was then combined and presented using the Graphlan package (Segata, 2014). 32 3.4. Results 3.4.1. Metagenome analysis In this work, 29.52 Giga base-pairs (Gbp) of data was generated across all samples, with the number of reads produced from the total farm effluent samples being approximately double that produced from the total WWTP effluent samples (Table 3.1). Table 3.1 Summary of the metagenomes used in the work of this chapter. Sample Total reads Gbp Total ARG reads % ARGs Farm effluent 2012 88674294 4.4337 7715 0.0087 Farm effluent 2013 66120642 3.3060 2317 0.0035 Farm effluent 2014 184149408 9.2075 13094 0.0071 WWTP effluent 2012 57392478 2.8696 4205 0.0073 WWTP effluent 2013 65960602 3.2980 250 0.0004 WWTP effluent 2014 73273516 3.6637 3767 0.0051 River source 2013 54799282 2.7400 181 0.0003 3.4.2. Identification of antimicrobial resistance genes In the effluent from the dairy farm an average of 7709 reads (0.007%) matching ARGs were found across the three samples. An average of 2740 reads (0.004%) matching ARGs were found across the three WWTP effluent samples. Only 181 reads (0.0003%) were found to match ARGs from the river source water. A significant diversity of ARGs was observed across the samples, with 53 different ARGs found in total, conferring resistance to seven antimicrobial classes (Figure 3.1, Table 3.2). There were 18 ARGs common between the farm and the WWTP effluent samples. The river source water contained the lowest diversity of ARGs (five ARGs, conferring resistance to two antimicrobial classes). When normalised to the number of 16S sequences in each sample, the most abundant ARG across all the samples was found to be sul2 (sulfonamide resistance) in the WWTP effluent 2014 (0.097 copies per 16S sequence) and the least abundant ARG was catB4 (phenicol resistance), found in the farm effluent 2014 (0.0001 copies per 16S sequence). When looking at the effluents individually, tetracycline resistance genes tetC (farm effluent 2012) and tetW (farm effluent 2013 and 2014) were the most abundant genes within the farm effluent samples. In comparison, the aminoglycoside resistance genes strA/strB (WWTP effluent 2012) and the sulfonamide resistance genes sul1/sul2 (WWTP effluent 2013 and 2014) were the most 33 abundant ARGs within the WWTP effluent samples. On average, the abundance of ARGs in the farm effluents was three times that of the river source water. Similarly, the average abundance of ARGs in the WWTP effluents was double that found in the river source water. In terms of the diversity of ARGs relative to the river source water, the farm effluent had an average of five different ARGs for each ARG found in the river source water, whereas the WWTP effluent had two different ARGs for each ARG present in the source water. When comparing samples over the three years, the abundance of ARGs was found to decrease year on year in the WWTP effluent for all but sulfonamide resistance genes, which were found to increase over time (11% average change in abundance of sulfonamide resistance genes over three years). The largest change over time for the farm effluent was the 10% increase in the abundance of aminoglycoside resistance genes observed between 2012-2013. Figure 3.1 Abundance of ARGs found in each effluent sample, binned by antimicrobial class. Abundance of antimicrobial resistance genes is normalised to the number of 16S sequences per sample. The MLS class of antimicrobial represents marcolides, lincosamides and streptogramins. 34 Table 3.2 Antimicrobial resistance gene analysis. The table lists the antimicrobial resistance genes found in the effluents entering the river catchments. Sample Gene name Antimicrobial class Relative ARG abundance for sample Farm effluent 2012 tetC tetracycline 73.4753 Farm effluent 2012 sulI sulfonamide 6.0742 Farm effluent 2012 tetA tetracycline 3.7590 Farm effluent 2012 tetR tetracycline 3.1282 Farm effluent 2012 qnrS2 quinolone 2.4774 Farm effluent 2012 blaOXA-320 beta lactam 1.9940 Farm effluent 2012 aph3-Ia aminoglycoside 1.6270 Farm effluent 2012 ACT-12 beta lactam 1.4771 Farm effluent 2012 blaOXY6-4 beta lactam 1.3378 Farm effluent 2012 strA aminoglycoside 1.0920 Farm effluent 2012 strB aminoglycoside 1.0209 Farm effluent 2012 qnrB7 quinolone 0.8548 Farm effluent 2012 blaOXY5-2 beta lactam 0.6711 Farm effluent 2012 blaCMY-95 beta lactam 0.5512 Farm effluent 2012 ampC1 beta lactam 0.4600 Farm effluent 2013 tetW tetracycline 17.1873 Farm effluent 2013 blaCFX-A5 beta lactam 12.6614 Farm effluent 2013 Str aminoglycoside 9.5407 Farm effluent 2013 lnuB MLS 7.2596 Farm effluent 2013 ermB MLS 6.8475 Farm effluent 2013 ant6-Ia aminoglycoside 6.4809 Farm effluent 2013 tetM tetracycline 5.8033 Farm effluent 2013 blaGES-22 beta lactam 3.9702 Farm effluent 2013 strB aminoglycoside 3.4575 Farm effluent 2013 Spc aminoglycoside 3.3259 Farm effluent 2013 sulI sulfonamide 3.0942 Farm effluent 2013 ermG MLS 2.8150 Farm effluent 2013 strA aminoglycoside 2.6841 Farm effluent 2013 tetO tetracycline 2.6719 Farm effluent 2013 sat4A aminoglycoside 2.5109 Farm effluent 2013 aph3-III aminoglycoside 2.1123 Farm effluent 2013 ermF MLS 2.1098 Farm effluent 2013 lnuA MLS 2.0399 Farm effluent 2013 tet-36 tetracycline 1.9703 Farm effluent 2013 blaOXA-210 beta lactam 1.4571 Farm effluent 2014 tetW tetracycline 24.3143 Farm effluent 2014 lnuB MLS 6.8709 Farm effluent 2014 tetM tetracycline 4.8201 Farm effluent 2014 ermF MLS 4.5820 Farm effluent 2014 tet-44 tetracycline 4.2108 Farm effluent 2014 ant6-Ia aminoglycoside 4.0593 Farm effluent 2014 ant6-Ib aminoglycoside 3.8291 35 Farm effluent 2014 tet-39 tetracycline 3.5128 Farm effluent 2014 aph3-III aminoglycoside 3.3706 Farm effluent 2014 strB aminoglycoside 3.3693 Farm effluent 2014 tetO tetracycline 3.2634 Farm effluent 2014 ermB MLS 3.1951 Farm effluent 2014 strA aminoglycoside 2.8502 Farm effluent 2014 Str aminoglycoside 2.5190 Farm effluent 2014 ermG MLS 2.4419 Farm effluent 2014 tet-36 tetracycline 2.4248 Farm effluent 2014 blaGES-22 beta lactam 2.1196 Farm effluent 2014 sulI sulfonamide 2.0911 Farm effluent 2014 sat4A aminoglycoside 2.0250 Farm effluent 2014 erm35 MLS 1.5781 Farm effluent 2014 tet-32 tetracycline 1.4578 Farm effluent 2014 sulII sulfonamide 1.4248 Farm effluent 2014 blaCFX-A5 beta lactam 1.3575 Farm effluent 2014 Spc aminoglycoside 1.3163 Farm effluent 2014 blaOXA-320 beta lactam 1.2794 Farm effluent 2014 tetT tetracycline 0.9209 Farm effluent 2014 tet-40 tetracycline 0.6100 Farm effluent 2014 aadA8b aminoglycoside 0.5367 Farm effluent 2014 blaTEM-201 beta lactam 0.4977 Farm effluent 2014 lnuF MLS 0.4964 Farm effluent 2014 ermA MLS 0.4632 Farm effluent 2014 blaOXA-210 beta lactam 0.4465 Farm effluent 2014 aph3-Ia aminoglycoside 0.3546 Farm effluent 2014 catA1 phenicol 0.3505 Farm effluent 2014 Cmr phenicol 0.3430 Farm effluent 2014 msrE MLS 0.3213 Farm effluent 2014 tetZ tetracycline 0.2206 Farm effluent 2014 catB4 phenicol 0.1553 WWTP effluent 2012 strB aminoglycoside 18.3772 WWTP effluent 2012 strA aminoglycoside 16.2801 WWTP effluent 2012 tetW tetracycline 14.9175 WWTP effluent 2012 tet-44 tetracycline 8.9014 WWTP effluent 2012 ant6-Ib aminoglycoside 7.2859 WWTP effluent 2012 ermB MLS 7.0075 WWTP effluent 2012 tetM tetracycline 5.6956 WWTP effluent 2012 lnuB MLS 5.6885 WWTP effluent 2012 tet-39 tetracycline 5.2060 WWTP effluent 2012 Str aminoglycoside 4.0791 WWTP effluent 2012 tetO tetracycline 1.8454 WWTP effluent 2012 sulII sulfonamide 1.4551 WWTP effluent 2012 lnuC MLS 1.3073 WWTP effluent 2012 Spc aminoglycoside 1.2387 WWTP effluent 2012 ermG MLS 0.7147 WWTP effluent 2013 sulI sulfonamide 29.0255 36 WWTP effluent 2013 ereA MLS 24.7971 WWTP effluent 2013 strB aminoglycoside 18.3611 WWTP effluent 2013 tetC tetracycline 17.0953 WWTP effluent 2013 strA aminoglycoside 10.7210 WWTP effluent 2014 sulII sulfonamide 86.4888 WWTP effluent 2014 blaCARB-10 beta lactam 5.0967 WWTP effluent 2014 blaGES-22 beta lactam 3.1151 WWTP effluent 2014 sulI sulfonamide 1.4774 WWTP effluent 2014 strB aminoglycoside 0.9286 WWTP effluent 2014 strA aminoglycoside 0.9062 WWTP effluent 2014 msrE MLS 0.8102 WWTP effluent 2014 tetW tetracycline 0.6260 WWTP effluent 2014 blaOXA-183 beta lactam 0.5512 River source 2013 aph3-III aminoglycoside 29.7367 River source 2013 strB aminoglycoside 25.7637 River source 2013 strA aminoglycoside 22.3138 River source 2013 catA1 phenicol 12.5585 River source 2013 aph3-Ia aminoglycoside 9.6274 3.4.3. Identification of mobile genetic elements In conjunction with determining the abundance and diversity of ARGs, the effluents were also interrogated for MGEs (Figure 3.2, Table 3.3). No MGEs were found to be present in the river source water. Mobilisable plasmids were the most abundant class of MGE found out of the combined metagenomic datasets, although no mobilisable plasmids were identified in the WWTP effluent 2012 or farm effluent 2014 samples. Class I and class II integrons, as well as transposon sequences, were found in all effluent samples. Class I integrons were more abundant in the collective farm effluent samples, compared to class II integrons that were more abundant in the collective WWTP effluent samples. 37 Figure 3.2 Abundance of MGEs found in each effluent sample, binned by MGE type. Plasmids were binned as mobilisation plasmids if they contained conjugation genes (tra, mob etc.) and integrons were binned as class I or II depending on the Integrall annotation. Relative abundance of MGEs is normalised to the number of 16S sequences per sample. 38 Table 3.3 Mobile genetic element analysis. The table lists the mobile genetic elements found in the effluents entering the river catchments. Sample MGE reference ID MGE description Database Farm effluent 2012 gi|15983520 Aeromonas salmonicida plasmid pRAS3.2 REFSEQ Farm effluent 2012 gi|15983531 salmonicida plasmid pRAS3.1 REFSEQ Farm effluent 2012 gi|198286625 Aeromonas hydrophila plasmid pBRST7.6 REFSEQ Farm effluent 2012 gi|20514397 KCL-2 plasmid pMGD2 REFSEQ Farm effluent 2012 gi|209947514 Klebsiella pneumoniae plasmid pIGMS31 REFSEQ Farm effluent 2012 gi|209947788 Escherichia coli plasmid pEC278 REFSEQ Farm effluent 2012 gi|255929160 Endophytic bacterium LOB-07 plasmid pLK39 REFSEQ Farm effluent 2012 gi|305678726 Klebsiella pneumoniae plasmid unnamed REFSEQ Farm effluent 2012 gi|435855445 Enterobacter cloacae strain BB1092 plasmid pB1023 REFSEQ Farm effluent 2012 gi|435855463 Klebsiella pneumoniae strain BB1088 plasmid pB1019 REFSEQ Farm effluent 2012 gi|482907348 U288 plasmid pSTU288-3 REFSEQ Farm effluent 2012 gi|507579660 Cronobacter sakazakii strain ATCC 29544 plasmid pCSA2 REFSEQ Farm effluent 2012 gi|690630974 Escherichia coli strain K317 plasmid ColE7-K317 REFSEQ Farm effluent 2012 gi|746219889 Enterobacter cloacae strain 34998 plasmid p34998-4.921kb REFSEQ Farm effluent 2012 gi|749202706 Enterobacter cloacae strain 34983 plasmid p34983-328.905kb REFSEQ Farm effluent 2012 gi|749293681 Klebsiella oxytoca strain M1 plasmid pKOXM1D REFSEQ Farm effluent 2012 gi|749296055 pneumoniae Kp13 plasmid pKP13b REFSEQ Farm effluent 2012 gi|765030385 Enterobacter cloacae strain 34978 plasmid p34978-4.938kb REFSEQ Farm effluent 2012 gi|817657570 Cronobacter sakazakii strain ATCC 29544 plasmid CSK29544_2p REFSEQ Farm effluent 2012 gi|57635337 plasmid pSEM integron INTEGRALL Farm effluent 2012 gi|20530945 integron-derived beta-lactamase (ampC) INTEGRALL Farm effluent 2012 gi|94442253 class 1 integron IntI1 INTEGRALL Farm effluent 2013 gi|763126141 Lactobacillus salivarius strain JCM 1046 plasmid pCTN1046 REFSEQ Farm effluent 2013 gi|587656492 class 1 integron IntI1 INTEGRALL Farm effluent 2014 gi|13345249 class 2 integron IntI2 INTEGRALL Farm effluent 2014 gi|788265642 class 2 integron IntI2 INTEGRALL Farm effluent 2014 gi|215397925 ICE integron putative INTEGRALL 39 WWTP effluent 2012 gi|13345249 class 2 integron IntI2 INTEGRALL WWTP effluent 2012 gi|563324892 class 2 integron IntI2 INTEGRALL WWTP effluent 2012 gi|710572 mercury resistance transposon INTEGRALL WWTP effluent 2012 gi|215397925 ICE integron putative INTEGRALL WWTP effluent 2013 gi|194442162 Bacteroides fragilis plasmid pBFP35 REFSEQ WWTP effluent 2013 gi|661525289 Bacteroides cellulosilyticus WH2 plasmid pBWH2A REFSEQ WWTP effluent 2013 gi|659224469 class 1 integron IntI1 INTEGRALL WWTP effluent 2014 gi|194442162 Bacteroides fragilis plasmid pBFP35 REFSEQ WWTP effluent 2014 gi|294057975 Sphingobium japonicum UT26S plasmid pUT2 DNA REFSEQ WWTP effluent 2014 gi|661525289 Bacteroides cellulosilyticus WH2 plasmid pBWH2A REFSEQ WWTP effluent 2014 gi|766626985 Aeromonas hydrophila strain AL06-06 plasmid pAH06-06-2 REFSEQ WWTP effluent 2014 gi|242876676 plasmid class 1 integron INTEGRALL WWTP effluent 2014 gi|215397925 ICE integron putative INTEGRALL 40 3.4.4. Taxonomic profiling and pathogen detection Finally, the effluent metagenomes were subjected to taxonomic profiling. At genus level, the most abundant prokaryotes in the farm samples were Pseudomonas (farm effluent 2012) and Butyrivibrio (farm effluent 2013 and 2014). The most abundant prokaryotes at genus level in the WWTP samples were Acinetobacter (WWTP effluent 2012), Thiomonas (WWTP effluent 2013) and Proteus (WWTP effluent 2014). For the river source water, the most abundant prokaryotic genus was Sphingobium. After cross-referencing the identified species level, clade-specific marker genes for all the metagenomes to the PATRIC pathogen database (Gillespie et al., 2011), a total of 35 species of potential bacterial pathogens were identified (Figure 3.3, Table 3.4). The most commonly identified species were E.coli, A.butzleri, E.rectale, R.bromii and S.enterica. The WWTP effluent 2014 contained the greatest diversity of potential bacterial pathogens, whereas the river source water and the WWTP effluent 2012 were found to contain the lowest diversity. 41 Figure 3.3 Metagenomic phylogenetic analysis and annotation of potential bacterial pathogens. The phylogenetic tree was built using Graphlan from the merged Metaphlan and LEfSe output for the effluent metagenomes. The PATRIC pathogens are highlighted as red stars and the external rings denote species prevalence in each metagenome. 42 Table 3.4 Pathogen analysis of effluents. The table lists the potential bacterial pathogens (according to Metaphlan and PATRIC analysis) found in the effluents entering the river catchments. Sample Bacterial pathogen Farm effluent 2012 Bifidobacterium adolescentis Farm effluent 2012 Enterobacter cloacae Farm effluent 2012 Escherichia coli Farm effluent 2012 Klebsiella oxytoca Farm effluent 2012 Klebsiella pneumoniae Farm effluent 2012 Pseudomonas aeruginosa Farm effluent 2012 Pseudomonas putida Farm effluent 2012 Ruminococcus bromii Farm effluent 2012 Salmonella enterica Farm effluent 2013 Arcobacter butzleri Farm effluent 2013 Bifidobacterium adolescentis Farm effluent 2013 Bifidobacterium longum Farm effluent 2013 Enterococcus faecium Farm effluent 2013 Escherichia coli Farm effluent 2013 Eubacterium rectale Farm effluent 2013 Faecalibacterium prausnitzii Farm effluent 2013 Lactobacillus fermentum Farm effluent 2013 Lactobacillus gasseri Farm effluent 2013 Lactobacillus plantarum Farm effluent 2013 Lactobacillus rhamnosus Farm effluent 2013 Ruminococcus bromii Farm effluent 2013 Salmonella enterica Farm effluent 2014 Arcobacter butzleri Farm effluent 2014 Bacteroides fragilis Farm effluent 2014 Bifidobacterium adolescentis Farm effluent 2014 Bifidobacterium longum Farm effluent 2014 Bifidobacterium pseudolongum Farm effluent 2014 Butyrivibrio fibrisolvens Farm effluent 2014 Campylobacter jejuni Farm effluent 2014 Corynebacterium aurimucosum Farm effluent 2014 Escherichia coli Farm effluent 2014 Eubacterium rectale Farm effluent 2014 Faecalibacterium prausnitzii Farm effluent 2014 Propionibacterium acnes Farm effluent 2014 Ruminococcus bromii Farm effluent 2014 Ruminococcus obeum Farm effluent 2014 Ruminococcus torques Farm effluent 2014 Salmonella enterica River source 2013 Enterococcus faecium River source 2013 Eubacterium rectale River source 2013 Lactobacillus gasseri River source 2013 Lactobacillus rhamnosus 43 River source 2013 Propionibacterium acnes River source 2013 Salmonella enterica WWTP effluent 2012 Arcobacter butzleri WWTP effluent 2012 Bifidobacterium adolescentis WWTP effluent 2012 Escherichia coli WWTP effluent 2012 Propionibacterium acnes WWTP effluent 2012 Ruminococcus torques WWTP effluent 2012 Salmonella enterica WWTP effluent 2013 Arcobacter butzleri WWTP effluent 2013 Bifidobacterium adolescentis WWTP effluent 2013 Bifidobacterium longum WWTP effluent 2013 Escherichia coli WWTP effluent 2013 Eubacterium rectale WWTP effluent 2013 Faecalibacterium prausnitzii WWTP effluent 2013 Gordonibacter pamelaeae WWTP effluent 2013 Ruminococcus bromii WWTP effluent 2013 Ruminococcus obeum WWTP effluent 2013 Ruminococcus torques WWTP effluent 2013 Streptococcus thermophilus WWTP effluent 2014 Arcobacter butzleri WWTP effluent 2014 Bacteroides fragilis WWTP effluent 2014 Bacteroides xylanisolvens WWTP effluent 2014 Bifidobacterium adolescentis WWTP effluent 2014 Bifidobacterium longum WWTP effluent 2014 Eggerthella lenta WWTP effluent 2014 Enterobacter cloacae WWTP effluent 2014 Enterococcus faecalis WWTP effluent 2014 Escherichia coli WWTP effluent 2014 Eubacterium cylindroides WWTP effluent 2014 Eubacterium rectale WWTP effluent 2014 Eubacterium siraeum WWTP effluent 2014 Faecalibacterium prausnitzii WWTP effluent 2014 Gordonibacter pamelaeae WWTP effluent 2014 Klebsiella pneumoniae WWTP effluent 2014 Proteus mirabilis WWTP effluent 2014 Pseudomonas aeruginosa WWTP effluent 2014 Ruminococcus bromii WWTP effluent 2014 Ruminococcus obeum WWTP effluent 2014 Ruminococcus torques WWTP effluent 2014 Streptococcus thermophilus WWTP effluent 2014 Vibrio parahaemolyticus 44 3.5. Discussion The comparative metagenomic approach used in this chapter has shown that two types of effluent entering a shared river catchment contain ARGs and MGEs at higher average abundances than were present in a background sample of the river source water. This would suggest that effluents such as these are likely to serve as sources of ARGs and thus contribute to the resistome of river catchments and other aquatic environments. Because of this, it may be appropriate to routinely monitor such effluents as sources of ARGs, particularly when considering the current view of ARGs as environmental contaminants (Pruden et al., 2006) and the call for an environmental framework to tackle antimicrobial resistance (Berendonk et al., 2015). One such reason for the high abundance of ARGs in effluents may be the presence of antimicrobial compounds that could consequently provide a selective pressure for the maintenance of ARGs. There have been several studies that document the presence of antimicrobial compounds, from both human and veterinary medicine, in the environment (Kemper, 2008, Hu et al., 2010). Although these compounds are often present at relatively low concentrations, some studies have shown therapeutic concentrations of antimicrobials being discharged into the environment, such as the effluent from Indian drug manufacturers containing toxic concentrations of antimicrobial compounds (Larsson et al., 2007). Subsequent studies by Larsson et al. found a high abundance of ARGs downstream of the effluent discharge point relative to upstream of the manufacturers and when compared to a Swedish WWTP (Kristiansson et al., 2011). While the environmental release of antimicrobial compounds at therapeutic concentrations is largely prevented in the UK, Europe and US through proper wastewater management and controls, clinically important antimicrobials can be found in the environment at sub-inhibitory concentrations and it is possible that these very low antimicrobial concentrations could be enriching for resistant bacteria and promote increased persistence of ARGs (Gullberg et al., 2011). Thus, it may be pertinent to couple future environmental ARG monitoring frameworks and antimicrobial resistance risk assessments with information on antimicrobial usage and the antimicrobial concentrations in the effluents being investigated. Interestingly, the average abundance of ARGs was found to be greater in the farm effluents than in the WWTP effluents (Figure 3.1). Although these two effluents are from differently treated faecal sources, one being a treated effluent (sedimentation treatment) from a municipal WWTP (i.e. predominantly human faecal source) and the other being an untreated 45 effluent from a farm (predominantly bovine faecal source), this finding does offer some insight into the debate surrounding the relative impact of human and animal contributions to the development of antimicrobial resistance (Phillips et al., 2004, Mather et al., 2013). The fact that WWTP effluent had undergone a form of water treatment prior to being released into the river catchment, whereas the farm effluent did not, may suggest that some form of water treatment could reduce the abundance or diversity of ARGs. A comparison of WWTP crude influent to the effluent could elaborate on the effectiveness of sedimentation treatment on the abundance of ARGs. Studies have shown that wastewater treatment processes do not completely remove ARGs (Wang et al., 2015) and that some WWTP processing can result in an increase in the proportion of antimicrobial resistant bacteria in WWTP effluents (Harris et al., 2012). Considering that effluents may also disseminate antimicrobial compounds, it raises the question as to whether the combination of ARGs and antimicrobial compounds within effluents is resulting in the expression of ARGs and the occurrence of phenotypic antimicrobial resistance. This will be addressed in the work of the next chapter with the aim of identifying additional factors that contribute to the dissemination of ARGs. In terms of the mobility of genes within the effluents, an array of mobilisable plasmids, integrons and transposons were present in the metagenomes (Figure 3.2) and many of the ARGs identified aligned to the Repository of Antibiotic resistance Cassettes (RAC) (Tsafnat et al., 2011). This raises the possibility that the ARGs within the effluents could be readily mobilised into other bacteria, including both directly into pathogens also discharged into the environment and environmental bacteria. These environmental bacteria in turn could pose a risk as potential bacterial intermediaries, harbouring these ARGs in the environment prior to transferring them into other pathogens. Indeed, the taxonomic analysis revealed a diverse array of bacterial species present across all of the effluent samples (Figure 3.3), including several pathogenic species (Table 3.4). Based on the observations in this study, it is recommended that future ARG monitoring frameworks and antimicrobial resistance risk assessments should incorporate direct MGE and pathogen detection with metagenomic assessments of effluents entering river catchments, especially considering the absence of MGEs and the lower diversity of pathogens found in the river source water. However, this study did identify five resistance genes in the river source water conferring resistance to two classes of antimicrobials. When normalised to 16S sequences the river source water were found to be accountable for the most abundant phenicol resistance gene and the third most abundant aminoglycoside resistance genes out of all the metagenome libraries examined. However, when using the 46 raw SEAR abundance metric, that does not include normalisation to the 16S sequences within the sample, the relative abundance of ARGs from the river source water were reduced relative to the other effluent samples. This raises the question as to whether 16S normalisation is the most appropriate approach to metagenomic abundance estimates as factors such as variation in 16S copy number can skew the data generated as well as interpretation (Case et al., 2007). An alternative could be to use the RPKM value generated as part of the SEAR analysis and featured in Table 3.2. The metagenomic approach used was relatively less sensitive than more direct-targeted measures of known ARG abundance (e.g. qPCR-based detections (LaPara et al., 2011)). However it had the advantages that it was relatively unbiased and semi-quantitative, giving a good estimation of relative key ARG and MGE abundance and diversity across bacterial populations. It was also potentially able to detect novel ARGs that would otherwise not be found using these more targeted approaches. 3.6. Conclusion The work in this chapter delivers a detailed metagenomic analysis of effluents entering a river catchment. Effluents were found to contain an array of ARGs, MGEs and pathogenic bacteria that, when compared to a background sample, were found to be more diverse and abundant than in the river source water. This study has shown that the discharge of effluents into river catchments contributes to the dissemination of ARGs, MGEs and pathogenic bacteria, and that they may play an important role in the propagation of environmental reservoirs of ARGs. This work has demonstrated the suitability of metagenomics as a key component of an environmental framework to monitor ARG dissemination. When used in conjunction with SEAR, metagenomics is able to quickly and efficiently detect horizontally acquired ARGs in samples from the aquatic environment. 47 Chapter 4. Expression of ARGs in effluents 4.1. Preface The work described in the previous chapters has begun to develop a framework for monitoring antimicrobial resistance genes (ARGs) in effluents that are entering the aquatic environment. Using this framework, effluents entering a single river catchment were shown to contain ARGs in greater abundance and diversity than the environment they were entering. This finding raises the question as to whether or not the ARGs in effluents are being actively maintained and used in the environment, thus providing a potential explanation for the higher abundance and diversity of ARGs in effluents than compared to background levels. The work of this chapter assesses whether ARGs within effluents are being transcribed across a series of longitudinal samples. In addition to this, the presence of potential antimicrobial selection pressures was explored and the occurrence of pathogenic species of bacteria in effluents was investigated. 4.2. Introduction The continued scientific interest in the effects of anthropogenic activities on environmental microbial communities has led to the impact of these activities on the development of antimicrobial resistance in the environment being questioned and consequently, the possible implications they may have for public and animal health being debated (Martinez, 2009, Wellington et al., 2013, Port et al., 2014, Berendonk et al., 2015) (see Chapter 6). Previous chapters have described a diverse array of ARGs in effluents entering an aquatic environment, adding to the wealth of research that has demonstrated the global dissemination of ARGs within a variety of environmental biomes (Pruden et al., 2006, Zhang and Zhang, 2011, Devarajan et al., 2015, Li et al., 2015a). The fact that the diversity (and abundance) of ARGs was found to be greater in effluents than in the receiving environment (see Chapter 3) may be due to a greater concentration of ARG-containing bacteria present in effluents from faecal sources. Additionally, it may be the case that there are selective forces impacting the maintenance of ARGs in these effluents that are either not present or diluted in the wider environment. To explore these possibilities, effluents should be tested for anthropogenic factors that may constitute selective forces for ARG uptake and maintenance. 48 One such anthropogenic factor likely to act as a selective force, impacting ARG abundance and diversity, is the presence of antimicrobial compounds that may select for specific ARGs in bacterial communities. Several recent studies have documented the presence of pharmaceutical residues in the aquatic environment, usually as a result of the manufacturing process and wastewater treatment plant (WWTP) treatment (Khan et al., 2013, McEneff et al., 2014). However farming practices, particularly in aquaculture and fishery management, also lead to the release of antimicrobial compounds into the environment (Kemper, 2008, Cabello et al., 2013); as do human sources such as hospital effluents (Martinez, 2009, Kleywegt et al., 2015) and WWTP effluents (Li et al., 2008). Studies have also identified antimicrobial compounds in river and estuary environments that are not in direct receipt of industrial pharmaceutical run-off or domestic waste (Lu et al., 2015, McEneff et al., 2014). As antimicrobial compounds are designed to kill susceptible bacteria or inhibit their growth, and there are reports of sub-inhibitory concentrations of sulfonamide antimicrobials in aquatic environments associated with expression of sulfonamide ARG sul1 (Bruchmann et al., 2013), it is reasonable to assume that the environmental release of antimicrobials might select for the presence and maintenance of ARGs. Indeed studies have interrogated environmental samples for both ARGs and antimicrobial compounds, however these studies have been limited to specific ARGs, and not performed using multiple effluent sites over multiple time points or have not taken into account the antimicrobial usage at the effluent source (Luo et al., 2010, Li et al., 2012, Lu et al., 2015). In this chapter, metagenomics and a combination of liquid chromatography mass spectrometry (LCMS) combined with antimicrobial usage data were used to determine if antimicrobial usage at effluent source points resulted in an enrichment of ARGs (and pathogenic species of bacteria) that are entering a river catchment. The work of this chapter aimed to establish whether the ARGs present in the environment are being expressed, thus offering insight into whether ARGs are actively utilised by the members of the bacterial communities within the effluents, thus exploring their maintenance in the environment. To assess both the ARGs and the ARG transcripts of bacterial communities within effluent samples, a combination of metagenomic and metatranscriptomic approaches were utilised. This study also aimed to relate the expression of ARGs to factors such as the abundance of ARGs, as well as selective pressures such as antimicrobial presence and usage. Indeed, studies that have investigated both antimicrobial residue concentration and ARG abundance in effluents (of a pharmaceutical WWTP) have shown significant correlations between antimicrobial concentrations and the associated relative ARG abundance (Luo et al., 2010, 49 Wang et al., 2015). However, no research has been done that aims to correlate ARG abundance and antimicrobial resistance selection pressures to the abundance of ARG transcripts in effluents and thus show that ARGs are expressed and utilised by active members of bacterial communities in the environment. Finally, through the collection of a series of longitudinal samples, this work documented the continued dissemination and dynamic trends in ARGs and pathogenic species of bacteria entering the environment via effluents. 4.3. Methods 4.3.1. Sample collection, DNA and RNA sequencing Samples were collected from three sources within the River Cam Catchment, Cambridge, UK on 02.05.2013 and over a five-month period (August 2014 – December 2014), an average of five weeks apart (n=6 sampling times, Appendix 3). Collections were made from the combined wastewater effluent of the main wards of Addenbrooke’s hospital, Cambridge, UK, via a combined sewage pit/drain access (latitude: 52.174343, longitude: 0.139346) prior to the effluent entering the municipal sewers. The other two sources from which collections were made were the effluent lagoon of the University of Cambridge dairy farm (prior to the effluent being distributed to surrounding fields as fertiliser, detailed in 2.3.3.2) and the River Cam source water (collected at Ashwell Spring, detailed in 3.3.1). The river source water served as a background sample for the environment that both effluents were entering. Samples for antimicrobial residue testing were collected in 1 L sterile glass containers and transported at 4°C to the laboratory. Samples for metagenome and metatranscriptome preparation were collected in 10 L sterile polypropylene containers, transported at 4°C to the laboratory and prokaryotic cells were isolated as described in 2.3.3.2. Each sample of prokaryotic cells was then split in two, for separate DNA and RNA extractions to generate a metagenome and metatranscriptome per sample. The metagenome preparation followed the same DNA extraction methodology described in 2.3.3.2. For each metagenome, 2 µg of DNA was used to generate Illumina paired-end libraries (100 bp). For each metatranscriptome, the prokaryotic cells were washed in phosphate buffered saline solution before being treated with Max Bacterial Enhancement reagent (ThermoFisher Scientific, UK) to denature bacterial proteins and deactivate RNases. Bacterial cell lysis and RNA extraction was then performed using TRIzol reagent (ThermoFisher Scientific, UK). For each metatranscriptome 2µg of RNA 50 was subjected to ribosomal RNA depletion (Ribo-Zero Gold, Epicentre, UK), quality checked using a BioAnalyzer (Agilent Technologies, US) and used to generate Illumina TruSeq RNA libraries (100 bp). All metagenome and metatranscriptome libraries were sequenced using an Illumina HiSeq2500 (Exeter Sequencing Service, UK). A full description of the metagenomic and metatranscriptomic samples used in this chapter is available in Appendix 3. 4.3.1.1. Antimicrobial residue testing Antimicrobial residues were quantified in effluent samples using LCMS. Quantification standards were created for the three most used compounds in each class of antimicrobials prescribed at Addenbrooke’s hospital in 2013. No standards could be generated for aminoglycosides or trimethoprims. LCMS antimicrobial residue testing was performed by RPS Mountainheath, Hertfordshire, UK. 4.3.2. Bioinformatic analyses 4.3.2.1. Identification of ARGs, MGEs and abundance analysis ARGs were identified in metagenomes and metatranscriptomes using the Search Engine for Antimicrobial Resistance (SEAR) with default parameters (Rowe et al., 2015). A full description of SEAR is available in Chapter 2. MGE detection was carried out on all metagenomes using the methodology described in Chapter 3. For each metagenome ARG and MGE data was normalised to the number 16S rRNA sequences as in 3.3.2.3. 4.3.2.2. Antimicrobial usage and statistics Monthly antimicrobial usage data for periods overlapping sample collection were obtained for Addenbrooke’s hospital. Usage data was generated via the EPIC health record system for each class of antimicrobial used by the hospital that month. Hospital antimicrobial usage is recorded as Defined Daily Dose (DDD), which is the assumed average maintenance dose per day for a given drug and is used as a statistical measure of drug consumption. 51 4.3.2.3. Taxonomic profiling and pathogen detection Taxonomic profiling and pathogen detection was performed on all metagenomes according to the methodology described in 3.3.2.4. 4.4. Results 4.4.1. Metagenome and metatranscriptome analysis In this chapter, 102 Giga base-pairs (Gbp) of sequencing data was generated across all samples. Metagenomes from three samples of the river source water (AS:M:4, AS:M:5 and AS:M:6) failed sequencing library quality checking and were removed from the study. A total of 15 metagenomes were successfully sequenced, passed quality checking and were found to contain ARGs at varying levels (Table 4.1). The percentage of reads matching ARGs was an average 10-fold greater in the hospital effluent samples, compared to the farm effluent samples, and approximately 70-fold greater than the background samples of river source water. The percentage of reads matching ARGs was an average 8-fold greater in the farm effluent compared to the background samples of river source water, however, one metagenome from the background sample (AS:M:2) was found to have a greater percentage of ARG reads than a metagenome from the farm effluent (DF:M:2). 52 Table 4.1 Summary of metagenomes used in this work (metadata available in Appendix 3). Sample Total reads Gbp Total ARG reads % ARG reads AH:M:1 64659230 3.2330 97698 0.1511 AH:M:2 52355416 2.6178 122164 0.2333 AH:M:3 109795652 5.4898 207767 0.1892 AH:M:4 61573380 3.0787 125019 0.2030 AH:M:5 50845128 2.5423 25987 0.0511 AH:M:6 53928494 2.6964 28629 0.0531 DF:M:2 66120642 3.3060 2317 0.0035 DF:M:3 184149408 9.2075 13094 0.0071 DF:M:4 262823622 13.1412 29006 0.0110 DF:M:5 58179398 2.9090 31518 0.0542 DF:M:6 53192154 2.6596 6999 0.0132 DF:M:7 49516248 2.4758 4072 0.0082 AS:M:1 54799282 2.7400 181 0.0003 AS:M:2 150787198 7.5394 7226 0.0048 AS:M:3 128125534 6.4063 1199 0.0009 AH = hospital effluent (Addenbrooke’s hospital) DF = farm effluent (dairy farm) AS = river source water (Ashwell spring) Numbers in sample name represent sequential sampling Of the RNA samples, all six samples of the river Cam source water (AS:T:1 - AS:T:6), two samples of hospital effluent (AH:T:2 and AH:T:3) and two samples of farm effluent (DF:T:2 and DF:T:3) did not yield sufficient RNA for metatranscriptome sequencing and were consequently removed from the analysis. A total of eight metatranscriptomes were successfully sequenced, passed quality checking and were found to contain transcripts of ARGs (Table 4.2). Table 4.2 Summary of metatranscriptomes used in this work (metadata available in Appendix 3). Sample Associated metagenome Total reads Total ARG reads % ARG reads AH:T:1 AH:M:1 152298536 308848 0.2028 AH:T:4 AH:M:4 74411930 948890 1.2752 AH:T:5 AH:M:5 61143518 23765 0.0389 AH:T:6 AH:M:6 51640378 40379 0.0782 DF:T:1 DF:M:2 123559962 8017 0.0065 DF:T:4 DF:M:5 49293728 4447 0.0090 DF:T:5 DF:M:6 64102402 7057 0.0110 DF:T:6 DF:M:7 64850756 1022 0.0016 53 4.4.2. Correlation of ARG abundance to ARG transcript abundance A Pearson product-moment correlation was performed to determine the relationship between the abundance of ARGs in the metagenome and the abundance of ARG transcripts in the metatranscriptome detected in each type of environmental effluent (hospital and farm). A strong, positive correlation was observed between ARG abundance and corresponding transcript abundance across all hospital effluent samples, which was statistically significant (r=0.8550, p<0.0005, n=648 ARGs) (Figure 4.1). However, only a weak, positive correlation was observed between ARG abundance and corresponding transcript abundance across all farm effluent samples, which was statistically significant (r=0.1467, p<0.005, n=368 ARGs) (Figure 4.2). Figure 4.1 Linear regression analysis of ARG abundance against corresponding transcript abundance, for ARGs detected in all hospital effluent samples. The coefficient of determination (r2) is included as a measure of how well the data fits the linear regression model. 54 Figure 4.2 Linear regression analysis of ARG abundance against corresponding transcript abundance, for ARGs detected in all farm effluent samples. Pearson product-moment correlations were also performed for ARG abundance and ARG transcript abundance for ARGs grouped by antimicrobial class. For the hospital effluent samples, a strong, positive correlation between ARG abundance and corresponding transcript abundance was observed for nine of ten classes of antimicrobial (Table 4.3). Abundance of sulfonamide ARGs and transcripts was the only antimicrobial class with a weak correlation, for which the null hypothesis (ARG transcript abundance is not associated with ARG abundance at significance level 0.05) was accepted. For the farm effluent samples, a weak positive correlation was observed for tetracycline ARGs and transcripts, however no correlation was found for all other antimicrobial classes (Table 4.4). 55 Table 4.3 Correlation of ARG and transcript abundance for all hospital effluent samples, according to antimicrobial class. The null hypothesis, that ARG transcript abundance is not associated with ARG abundance, was rejected for all classes of antimicrobials except sulfonamides (for which the P-value was >0.05). The MLS class of antimicrobial represents marcolides, lincosamides and streptogramins. Antimicrobial class r value p value Number of ARGs H0 (5%) aminoglycosides 0.8444 <0.0005 144 rejected beta lactams 0.9165 <0.0005 188 rejected glycopeptides 0.7611 <0.0005 52 rejected MLS 0.6802 <0.0005 68 rejected phenicols 0.9900 <0.0005 32 rejected quinolones 0.9891 <0.0005 16 rejected rifamycins 0.9842 <0.05 4 rejected sulfonamides 0.5290 <0.5 12 accepted tetracyclines 0.7207 <0.0005 88 rejected trimethoprims 0.9592 <0.0005 44 rejected Table 4.4 Correlation of ARG and transcript abundance for all farm effluent samples, according to antimicrobial class. Correlations could not be performed for glycopeptides, phenicols, quinolones or rifamycins due to insufficient data points. The null hypothesis, that ARG transcript abundance is not associated with ARG abundance, was only rejected for tetracycline ARGs and transcripts. NA denotes not analysed as insufficient data points available. Antimicrobial class r value p value Number of ARGs H0 (5%) aminoglycosides 0.0187 >0.5 96 accepted beta lactams 0.14419 <0.5 71 accepted glycopeptides NA NA 1 NA MLS 0.09823 <0.5 59 accepted phenicols NA NA 6 NA quinolones NA NA 1 NA rifamycins NA NA 1 NA sulfonamides 0.1989 >0.5 11 accepted tetracyclines 0.48158 <0.0005 79 rejected trimethoprims -0.11174 >0.5 11 accepted 4.4.3. Antimicrobial residues Concentrations of antimicrobial residues in all samples were determined by LCMS (Table 4.5). The reporting limit for LCMS had to be raised from 0.1µg/ L on several samples due to matrix interference, low sensitivity of target analyte, or poor recovery from matrix spikes. For the hospital effluent samples, eight antimicrobial compounds spanning six classes of antimicrobials were identified at least once in the samples. The antibiotics vancomycin 56 (glycopeptide), clarithromycin (MLS) and ciprofloxacin (quinolone) were the only antimicrobial compounds to be identified in every hospital effluent sample. Of the antimicrobial compounds tested for, the only classes of antimicrobial not identified in any hospital effluent samples were phenicols and tetracyclines. The only antibiotic found in the farm effluent was the sulfonamide antibiotic sulfadiazine; which was not found in any hospital effluent samples. No traces of antimicrobial residues were found in the river source water samples (not shown). 57 Table 4.5 LCMS results for hospital and farm effluent samples (associated with metagenome samples). A ‘<’ denotes a result of less than the reporting limit (listed after ‘<’), bold type indicates detected concentrations. Class Drug Concentration of antimicrobial present in sample (µg/litre) AH:M:2 AH:M:3 AH:M:4 AH:M:5 AH:M:6 DF:M:3 DF:M:4 DF:M:5 DF:M:6 DF:M:7 beta lactam amoxicillin < 10.0 < 10000.0 < 10000.0 failed < 10000.0 < 1000.0 < 50.0 < 1000.0 failed < 1000.0 beta lactam flucloxacillin < 0.1 < 1.0 < 0.1 < 0.5 24.5 < 100.0 < 1.0 < 2.0 < 5.0 < 100.0 beta lactam piperacillin < 200.0 < 100.0 < 100.0 < 200.0 < 1000.0 < 100.0 < 10.0 < 5.0 < 10.0 < 100.0 glycopeptide teicoplanin < 500.0 < 5000.0 < 100.0 < 1000.0 < 100.0 < 10000.0 < 5000.0 < 5000.0 < 10000.0 < 10000.0 glycopeptide vancomycin 18.2 3160 29.1 58.1 90.2 < 1000.0 < 500.0 < 100.0 < 1000.0 < 1000.0 MLS azithromycin 57.4 < 0.1 failed < 0.5 < 10.0 < 1000.0 < 5.0 failed < 50.0 < 1000.0 MLS clarithromycin 13.4 23.6 17.4 31.1 7.3 < 10.0 < 5.0 < 2.0 < 10.0 < 10.0 MLS erythromycin < 0.1 < 1.0 < 1.0 < 1.0 < 0.1 < 10.0 < 1.0 < 5.0 < 10.0 < 10.0 phenicol chloramphenicol < 0.1 < 1.0 < 0.1 < 1.0 < 1.0 < 100.0 < 10.0 < 10.0 < 100.0 < 100.0 quinolone ciprofloxacin 59.3 10 82.9 10.1 6.5 < 200.0 < 100.0 < 100.0 < 100.0 < 200.0 quinolone moxifloxacin 2 < 2.0 < 10.0 < 1.0 < 20.0 < 200.0 < 20.0 < 100.0 < 100.0 < 200.0 quinolone ofloxacin < 0.5 < 2.0 < 5.0 < 2.0 < 0.2 < 200.0 < 20.0 < 100.0 < 50.0 < 200.0 rifamycin rifabutin < 0.1 < 1.0 < 1.0 < 5.0 < 0.1 < 10.0 < 10.0 < 1.0 < 50.0 < 10.0 rifamycin rifampicin < 0.5 < 2.0 < 50.0 < 5.0 1.3 < 10.0 < 2.0 < 50.0 < 10.0 < 10.0 sulfonamide sulfadiazine < 0.1 < 0.1 < 0.1 < 0.1 < 0.1 < 10.0 1.7 2.7 < 10.0 < 10.0 sulfonamide sulfamethoxazole 257 1220 < 1.0 173 15.4 < 10.0 < 10.0 < 5.0 < 10.0 < 10.0 tetracycline demeclocycline < 0.5 < 100.0 < 100.0 < 10.0 < 10.0 < 100.0 < 100.0 < 1000.0 < 100.0 < 100.0 tetracycline doxycycline < 1.0 < 5.0 < 1000.0 < 5.0 < 5.0 < 100.0 < 500.0 < 1000.0 < 50.0 < 100.0 tetracycline tigecycline < 1.0 < 10000.0 < 1000.0 < 10.0 < 100.0 < 1000.0 failed < 1000.0 < 100.0 < 1000.0 58 4.4.4. Expression of ARGs in response to hospital antimicrobial usage Based on the strong, positive correlation observed between ARG abundance and ARG transcript abundance in hospital effluent across the four sampling dates, in addition to the presence of antimicrobial residues in hospital effluent samples, the hospital was the focus of further analysis. The abundance of ARG transcripts in each hospital effluent metatranscriptome was correlated to the monthly hospital antimicrobial usage records to test if antimicrobial usage was associated with ARG transcript abundance (Figure 4.3). As no correlation was found between sulfonamide ARGs and ARG transcripts (4.4.2), they were not included in this analysis. Also, due to hospitals recording sulfonamide usage together with trimethoprim usage, trimethoprim ARG transcripts were also not included in this analysis. Overall a weak, positive correlation was observed between ARG transcript abundance and antimicrobial usage in all hospital effluent samples, which was statistically significant (r=0.5051, p<0.005, n=32). However, as noted in the figure key (Figure 4.3), the association between transcript abundance and antimicrobial usage varied greatly between antimicrobial classes. The strongest, positive correlation was observed for the tetracycline class of antimicrobials (r=0.9163, r2=0.8397). Rifamycins also showed a positive correlation (r=0.5527, r2=0.3054) between transcript abundance and rifamycin usage. Negative correlations were observed for the glycopeptide and phenicol classes of antimicrobials. 59 Figure 4.3 Linear regression analysis of ARG transcript abundance against hospital antimicrobial usage, grouped by antimicrobial class for each hospital effluent sample. The Pearson product-moment correlation (r) and coefficient of determination (r2) values for each antimicrobial class are included in the figure key. In order to investigate the effect of both ARG abundance and antimicrobial usage on the transcript abundance of ARGs in hospital effluent, and to test how much the data for these three variables differ, a multiple linear regression and ANOVA test was performed. The correlation coefficient of the multiple linear regression revealed a strong positive correlation between all the three variables, for which ARG abundance and antimicrobial usage can account for 79% of the variation in ARG expression (r=0.8903, adjusted r2=0.7857, p<0.0005, n=32). Despite this model being true at significance level 0.05 (p=5.6x10-11) the two independent variables differed greatly, ARG abundance (coefficient=8.06, p<0.0005) was a significant variable whereas antimicrobial usage (coefficient=0.00, p<0.5) was insignificant 60 and thus the null hypothesis (ARG usage is not associated with ARG expression at significance level 0.05) was accepted for the antimicrobial usage variable. Finally, to further explore, and better visualise the data (ARG abundance, ARG transcript abundance and antimicrobial usage), a Principal Component Analysis (PCA) was performed on the three variables. Of the three resulting principal components (PC), PC1 and PC2 were able to explain the most variance between the variables (Figure 4.4). Figure 4.4 Scree plot showing the variance observed for the three principal components. When using a biplot to see how the initial variables contribute to these two principal components (and grouping observations by antimicrobial class), the least variation was found within the tetracycline, rifamycin, phenicol and glycopeptide groups (Figure 4.5). ARG transcript abundance, ARG abundance and antimicrobial usage can be negatively explained by PC1 (r=-0.95, -0.96, -0.64 respectively). ARG transcript and ARG abundance can be positively explained by PC2 (r=0.28 and 0.23 respectively), whereas antimicrobial usage is negatively explained by PC2 (r=-0.77). 61 Figure 4.5 Biplot showing how the initial variables (ARG abundance, transcript abundance and antimicrobial usage) contribute to principal components 1 and 2. 4.4.5. Taxonomic profiling and pathogen detection The number of different pathogenic bacterial species, in relation to the overall number of different species, was determined for each effluent metagenome (Figure 4.6). The hospital effluent sample from sampling point 2 (AH:M:2) contained the greatest number of pathogen species of all samples, whereas the farm effluent sample from sampling point 3 (DF:M:4) contained the lowest number of pathogen species of all samples. Sampling point 4 (AH:M:4, DF:M:5) was found to have the greatest number of pathogen species of all farm effluent samples and the lowest number of pathogen species of all hospital effluent samples. Overall, the number of pathogen species identified in the hospital and farm effluent samples over the six sampling points did not follow the same trend observed for the total species number in the same samples. 62 Figure 4.6 Graph showing the number of pathogenic bacteria species and the total species number, for both hospital and farm effluents, over time. In terms of the frequency of isolation of particular pathogen species, only Escherichia coli was identified in all hospital and farm effluent samples (Figure 4.7 A). A total of 68 different species of bacterial pathogen were identified across all effluent samples, 27 of which were unique to hospital effluent and 5 were unique to farm effluent (Figure 4.7, B). Of the 36 pathogen species shared by hospital and farm effluents, 47% (n=17) of species belonged to the phylum Firmicutes (predominantly Gram-positive, endospore forming bacteria). 63 Figure 4.7 A. Column chart showing the ten most commonly identified pathogen species (according to Metaphlan and PATRIC analysis) in the hospital and farm effluent metagenomes. B. Venn diagram depicting the number pathogen species found in the combined hospital and farm effluent datasets. There were 36 species of pathogen that were common between hospital and farm effluent. 4.4.6. MGEs The abundance of ARGs and MGEs was plotted for each effluent metagenome and compared against species number (Figure 4.8). The number of MGEs was consistently 64 higher than the number of ARGs in all metagenomes. The hospital effluent metagenomes had a greater abundance of MGEs than the farm effluent metagenomes. The hospital effluent metagenomes also had a greater abundance of ARGs than all but one of the farm effluent metagenomes; DF:M:5 had a greater abundance of ARGs than the hospital effluent samples AH:M:5 and AH:M:6. In terms of species number, all hospital effluent metagenomes had a greater number of bacterial species present than in the farm effluent metagenomes. The greatest number of species was observed in the AH:M:6 hospital effluent metagenome, which also featured the lowest number of ARGs and MGEs. 65 Figure 4.8 A. Graph showing the abundance of ARGs and MGEs, in relation to total species number, for hospital effluent over time. B. Graph showing the abundance of ARGs and MGEs, in relation to total species number, for farm effluent over time. 66 4.4.1. Abundance of ARGs over time Finally, to investigate the abundance of ARGs over time, all ARGs identified in the effluent samples were grouped by antimicrobial class and plotted by metagenome (Figure 4.9). The beta lactam class contained the highest abundance of ARGs for both hospital and farm effluent metagenomes; the beta lactam class was found to contain the highest abundance of ARGs in every farm effluent metagenome. Out of the farm effluent metagenomes, one sample (DF:M:4) had the highest abundance of ARGs across all classes of antimicrobials. For both hospital and farm effluent metagenomes, no trend was observed in ARG expression across antimicrobial classes over sampling time. 67 Figure 4.9 A. Graph showing the abundance of ARGs, grouped by antimicrobial class, in hospital effluent metagenomes. B. Graph showing the abundance of ARGs, grouped by antimicrobial class, in farm effluent metagenomes. 68 4.5. Discussion This chapter has confirmed that ARGs are expressed in effluents that are entering the environment. In addition, strong correlations between transcript abundance and antimicrobial usage for several classes of antimicrobial have been observed. Finally, pathogenic species of bacteria, as well as MGEs and ARGs, have been found in all effluent samples, taken over a period of several months. The initial analysis of all the metagenomes in this chapter revealed that the hospital effluent samples had a far greater abundance of ARGs than both the farm effluent samples (10-fold more ARG reads) and the background samples of river source water (70-fold more ARG reads) (Table 4.1). There are several factors that may contribute to this difference. Firstly, the hospital has a greater number of individuals contributing to the hospital effluent (1000 patient beds in the main hospital, in addition to staff members and visitors) than the individuals contributing to the farm effluent (an approximate herd size of 200). This difference in population size may result in a greater number and diversity of bacteria in the hospital effluent (as reflected in the Metaphlan species analysis, Figure 4.6), leading to increased competition within the effluent biome and thus, a greater number of ARGs. Secondly, the monthly antimicrobial usage of the hospital is much greater than the farm (usage data not shown for farm), as reflected by comparative antimicrobial residues detected in the farm and hospital effluents (Table 4.5). It is likely that proximate antimicrobial usage and the presence of antimicrobial residues may exert a selective force on the ARGs in the effluents, and that the variation in this selective force may explain the differences in ARG abundance exhibited in the two effluent types. In terms of the expression of ARG transcripts in relation to the abundance of ARGs, it was found that the hospital effluent had a far stronger positive correlation between ARG transcript abundance and ARG abundance than was observed for the farm effluent (Figure 4.1). On closer inspection, it was noted that certain classes of ARGs might be mainly responsible for the overall correlation observed between ARG and ARG transcripts (Table 4.3, Table 4.4). For the hospital effluent, only one class (sulfonamide ARGs) was found to have a correlation between ARGs and ARG transcripts that was deemed to be insignificant (at significance level 0.05), all other classes of ARGs were found to have a significant positive correlation. However for the farm effluent, only one class (tetracycline ARGs) was found to have a significant correlation between ARGs and ARG transcripts (at significance level 0.05). The fact that a significant, strong correlation between ARG transcript and ARG abundance was 69 found for the majority of ARG classes in the hospital effluent but not for the farm may be due to the far greater abundance of ARGs in the hospital effluent than the farm effluent (10-fold greater) that was discussed earlier. Another explanation may be the respective presence of selection pressures in the effluents, causing ARGs to be expressed at a greater level than in the presence of no selection pressures (e.g. the ARG abundance observed in the farm effluent may have predominately been related to background expression, while the hospital effluent data more truly represents the response to environmental pressures). The presence of possible selection pressures was confirmed by LCMS, which showed eight antimicrobial compounds (spanning six classes of antimicrobials) to be present in the combined hospital effluent samples and only one antimicrobial compound to be present in the combined farm effluent (Table 4.5). As the antimicrobial compounds tested for were selected based on hospital usage data, not all of these compounds will have been used on the farm and this may account for the fact that fewer compounds were detected in the farm effluent. Additionally, the half-life or the mode of excretion of each of these antimicrobial compounds may have impacted their detection (e.g. bile excretion may have a different impact on the amount of compound that can be detected, compared to urinary excretion). It should be noted that the LCMS experiments failed for aminoglycosides and trimethoprims, due to the matrix suppression and the inability to optimise the mass spectrometry for these antimicrobial standards. Another caveat that should be noted for the LCMS results is that the effluent samples were very impure water samples; consequently it was difficult to detect antimicrobial residues due to the solid matter causing very poor chromatography results. The difficulties experienced whilst using LCMS to detect antimicrobial residues in effluent samples suggest that it may not be the most appropriate methodology to use in future studies requiring antimicrobial residue testing. For the hospital effluent, it was hypothesised that the antimicrobial usage of the hospital for the month of sample collection would serve as a positive ARG selection pressure and impact the expression of ARGs in the effluent. This hypothesis was found to be true; overall a positive correlation was observed between ARG transcript abundance and antimicrobial usage in all hospital effluent samples, which was statistically significant (Figure 4.3). However, this overall correlation was weak and may have been largely due to a strong correlation for tetracycline ARG transcripts and tetracycline usage (in addition to correlations for rifamycins, glycopeptides and phenicols). In light of finding that both ARGs and antimicrobial usage could be correlated to ARG transcripts in the hospital effluent, a multiple linear regression, ANOVA and principal 70 components analysis was performed to determine the relative contribution of these two variables to ARG transcript abundance in the hospital data. A strong positive correlation between the three variables was found, for which ARG abundance and antimicrobial usage could account for 79% of the variation in ARG expression. Due to the fact that not all variation could be accounted for by ARG abundance and antimicrobial usage, as well as the large variation found within certain classes of antimicrobial (Figure 4.5), there are likely to be one or more variables that are unaccounted for in order to explain the expression of ARGs in effluents that are entering the environment (e.g. genetic context factors such as ARG promoters, or drug factors such as the half life of antimicrobial compounds, etc.). Another factor that should be taken into account when discussing the correlations found in this study is the small number of data points used when looking at expression according to antimicrobial class. Due to two metatranscriptomes failing sequencing for hospital and farm effluent samples, only four metatranscriptomes were available for analysis for each sample site. Four ARG transcript data points per antimicrobial class may be insufficient to draw meaningful correlations, and additional work should be carried out to augment these datasets. In addition to the effect of selection pressures (antimicrobial residues and antimicrobial usage) and ARG abundance, the function of the protein that the ARG encodes may play a role in ARG expression. For example, an ARG that encodes a tetracycline efflux pump may be expressed more readily than a non-efflux ARG, particularly if the efflux pump conferred an advantage other than just tetracycline resistance (for instance, multidrug efflux pumps, which may be particularly true in an environmental context) (Van Bambeke et al., 2003). Another case where gene type or protein function could play a role in ARG expression is in the case of linked or co-inherited genes, such as the sulfonamide resistance gene, sul1, which is typically co-inherited as a core component of class 1 integrons and thus might be expressed for reasons other than antimicrobial selection pressures (Mazel, 2006). With this in mind, future work to determine factors that contribute to ARG expression in the environment need to take into account ARG mechanism (and possibly gene mobility or genome location), rather than just antimicrobial resistance class. It may be necessary to weight each ARG with values that account for how readily expressed they are, the level of gene expression required to evoke a phenotypic response, or the specificity of the resistance mechanism etc. It should be noted that an in silico analysis limitation that may have impacted the findings in this chapter is the inability of the bioinformatics methodology to detect antimicrobial resistances that are caused by single nucleotide polymorphisms (SNPs), such as gyrase- 71 based quinolone resistance (Jacoby, 2005). This could explain instances where a low abundance of quinolone ARGs were detected yet a high level of quinolone usage, in addition to the presence of quinolone drug compounds, were recorded. In addition to the inability to detect SNP-based resistances, the methodology may also be missing novel ARGs that the ARG database cannot account for. A possible caveat of the wet-laboratory component of the metatranscriptomic methodology used in this study (that may contribute to differences between ARG and corresponding ARG transcript abundance) is the nucleic acid extraction approach. This approach involved dividing the isolated cells from each effluent sample in two and performing separate extractions, resulting in the RNA and DNA from each sample not being isolated from the same individual cells but the same bacterial community as a whole. Consequently an ARG detected in a metagenome could have been expressed by the bacteria of that metagenome but the same ARG may not have been expressed by the bacteria in the half of the sample that constitute the corresponding metatranscriptome. To circumvent this caveat, an alternative option would be to use the acidic phenol phase (containing DNA) of the RNA extraction to extract DNA, thus allowing RNA and DNA to be extracted from the same group of cells sampled. However, this single phenol-chloroform extraction approach would require considerable calibration and refinement in order to successfully isolate high-quality RNA and DNA suitable for high-throughput sequencing. The presence of pathogenic species of bacteria, as well as the number of MGEs and ARGs, was determined in effluents over time. A total of 68 different species of pathogenic bacteria were identified across all effluent samples and 36 pathogen species were present in both hospital and farm effluents. Not only does this illustrate that both effluents have clinical significance in terms of the pathogenic species of bacteria that they are contributing to the environment, but it also offers possible mechanisms by which ARGs could be maintained and disseminated in the environment. One such mechanism could be via endospore-forming bacteria that could harbour ARGs and protect them from degradation by the formation of stable endospores (Galperin, 2013). As 47% of the shared pathogen species identified in the effluents belonged to the Firmicute phylum, known for their ability to form endospores, the hypothesis that endospore-forming bacteria can harbour and disseminate ARGs appears to be a valid one and is explored in the next chapter (Chapter 5). No discernible trend was observed between pathogen species number and total species number over the sampling time points, neither was there any trend observed in ARG or MGE abundance over time. Despite no obvious trend, there was a large amount of temporal 72 variation in pathogen species number and total species number, as well as in ARG and MGE abundance. This may be due to the nature of random sampling, or it could be a result of seasonal variations in pathogen abundance (e.g. disease outbreaks) and environmental variables (e.g. rainfall) across the sampling dates. It is possible that the species diversity across the metagenomes may affect the correlations that have been reported between ARG transcript, ARG abundance and antimicrobial usage. An important direction for future study would be to investigate the effect of such seasonal variation on the abundance of ARGs, in addition to pathogens and MGEs, in order to determine possible temporal points that may lead to increased release of ARGs. This information would greatly contribute to models of ARG dissemination that are beginning to be utilised in surveillance and risk management programmes. 4.6. Conclusion The work of this chapter has shown that both ARGs and ARG transcripts are present in effluents entering the aquatic environment. A strong positive correlation was found between the abundance of ARGs and ARG transcripts across the samples. A correlation was also observed between ARG transcript abundance and antimicrobial usage in hospital effluent. Although correlation does not verify causation, this study demonstrates that ARGs are being expressed in effluents and that antimicrobial usage may influence the fate of ARGs in effluents entering the environment, thus suggesting a microbial response to the anthropogenic release of antimicrobials into the aquatic environment. In terms of monitoring ARG dissemination, selection pressures that are known to illicit a microbial response and effect ARG load are important factors that should be incorporated into environmental frameworks. 73 Chapter 5. Bacterial endospores present in the environment harbour antimicrobial resistance genes 5.1. Preface The metagenomic analyses documented in previous chapters have identified pathogenic bacteria and antimicrobial resistance genes (ARGs) in environmental effluents. I have shown that the genes of these pathogens and ARGs are expressed in the environment, apparently in response to the presence of antimicrobial selection pressures. However I have yet to definitely link the presence of ARGs with pathogenic species, and to demonstrate possible mechanisms of persistence of these ARGs in the environment. Validating the finding that ARGs are persisting in the environment would give significant insight into the fate of ARGs in the environment, particularly in the context of pathogenic bacteria. Taxonomic analysis of the environmental metagenomes in previous chapters has shown the phylum Firmicutes, in particular the genus Clostridia (several of which are known pathogens), to be consistently present across the sample sites. Many Firmicutes produce endospores, which are able to survive extreme conditions. In this way, Firmicutes could facilitate the persistence of ARGs in the environment. In the present study I explore this possibility through the isolation and genomic analysis of endospore-forming bacteria from environmental samples. 5.2. Introduction Thus far, metagenomics has been used to identify ARGs, mobile genetic elements (MGEs) and pathogenic bacteria within effluents that are entering a river catchment. Metagenomics has allowed the relative abundance of ARGs to be estimated for each sample and has been used to correlate ARG abundance to antimicrobial usage data. However, there are several challenges to microbial community analysis that metagenomics cannot address when used in isolation. One such challenge is the inability to assemble and model the whole genome of individual bacteria, which is often a result of insufficient metagenomic sequencing depth (Kuczynski et al., 2012, Zengler and Palsson, 2012). This means that potential functions identified within microbial communities, such as antimicrobial resistance, cannot be linked with phylogeny using a solely metagenomic approach (Clingenpeel et al., 2015). 74 The advent of multi-omic strategies for microbial community profiling has allowed researchers to apply multiple sequencing techniques to single environmental samples in order to address key questions, such as the use of metagenomics, metatranscriptomics and single-cell sequencing to determine microbial response to an oil spill (Mason et al., 2012). These approaches can be augmented by the use of whole genome sequencing (WGS) data from isolates of specific environmental bacteria and used in conjunction with metagenomics to offer insight into the dynamics of ARG carriage and the relationship to ARG diversity in the wider microbial community. In conjunction with identifying bacterial isolates that carry ARGs, culture of environmental bacteria could be used to target specific pathogens, thus testing whether clinically important bacteria are those harbouring ARGs and on a larger scale, observing any trends or relationships among isolates from different geographically-proximate sample sites (i.e. within the river catchment in this study). Finally, to address whether ARGs are persisting in the environment (as opposed to metagenomic approaches capturing the transient occurrence of ARGs), candidate bacterial species that are able to preserve their genetic content whilst withstanding the changes in environment often associated with effluent biomes should be targeted. 5.2.1. Endospore-forming bacteria Certain species of bacteria that belong to the phylum Firmicutes (mainly Gram-positive, low GC-content bacteria) feature a set of core sporulation proteins, allowing them to form highly resistant endospores (Galperin, 2013). Sporulation enables the bacteria to survive adverse environmental conditions by transitioning from a vegetative form, which allows maintenance of metabolic and reproductive activities, to a dormant endospore form that remains metabolically inactive (whilst preserving genetic content) until the environment becomes more favourable (Talukdar et al., 2015). Endospores have been found to withstand extreme environmental conditions, such as those found in the Arctic permafrost (Suetin et al., 2009) and outer space (Horneck et al., 2012), and endospore-forming bacteria are known to inhabit most aquatic and terrestrial environments (Galperin, 2013). Thus endospore-forming bacteria represent an important bacterial group to target in order to confirm the persistence of ARGs in effluents. Several species of endospore-forming bacteria, in particular of the Clostridium genus, are pathogenic and are known to acquire ARGs (Huang et al., 2009, Kouassi et al., 2014). 75 Members of the Clostridium genus have been identified in metagenomes from wastewater treatment plant (WWTP), dairy farm and hospital effluent sample sites, as described in Chapters 2, 3 and 4. Therefore, the Clostridium genus was chosen for investigating the persistence of ARGs in pathogenic species present in effluents. 5.2.1.1. Clostridium difficile Clostridium difficile (C. difficile) is an obligate anaerobe that is a major pathogen and frequently implicated in hospital-acquired infections (Rupnik et al., 2009). C. difficile infection (CDI) requires the disruption of the normal gut flora (which often results from antimicrobial therapy) and occurs once the bacterium is acquired exogenously. CDI results in Clostridium difficile-associated disease (CDAD). CDAD ranges in its severity, from diarrhoea, dehydration and metabolic changes through to pseudomembranous colitis and haemorrhaging. A number of factors, including the virulence of the C. difficile strain and the antimicrobial therapy a patient receives, influence the development of CDAD (Johnson and Gerding, 1998, Brouwer et al., 2013, Tonna and Welsby, 2005). C. difficile (along with other members of the Firmicute phylum) produce highly resistant endospores that are capable of withstanding extremes in environmental conditions and facilitate disease transmission (Lawley et al., 2009). Asymptomatic carriage of C. difficile causes low-level shedding of endospores, however antimicrobial therapy can result in a contagious super-shedder state in which the normal gut flora is disrupted, the reduction in microbiota diversity leading to an overgrowth of C. difficile and the shedding of high levels of endospores (Lawley et al., 2009). Owing to its significance as a major nosocomial pathogen, mandatory surveillance has been established in the UK that requires all cases of CDI in patients over the age of two years old to be reported to Public Health England (Public Health England, 2013). In addition to this, antimicrobial usage and resistance is also monitored as part of the England Stewardship of Antimicrobial Utilization and Resistance program (Ashiru- Oredope and Hopkins, 2013). Although considered a nosocomial pathogen, C. difficile can also be ‘community-acquired’, i.e. arise in individuals who have not been hospitalised or exposed to antimicrobial therapy (Wilcox et al., 2008). Such cases have led to questions being raised as to other transmission routes for C. difficile and the associated risk factors. In particular, animals have been suggested as possible sources for food-borne transmission of C. difficile (Gould and Limbago, 2010) as C. difficile is also a pathogen of several domestic and food animals, 76 including neonatal pigs, horses, cattle and companion animals (Hammitt et al., 2008). Recently several studies have found cases of CDI in animals and humans that share the same C. difficile strains, suggesting that the C. difficile may have originated from a common source, or have been involved in zoonotic transmission (Jhung et al., 2008, Debast et al., 2009). C. difficile can also be isolated from environmental sources, including soil, river and sea water samples (Saif and Brazier, 1996). C. difficile is known to harbour antimicrobial resistance genes and strains have been found to be phenotypically resistant to beta-lactams, lincosamides, quinolones and tetracyclines (Keessen et al., 2013). However the role of antimicrobial resistance in the epidemiology of CDI is complex, due in part to antimicrobial treatment resulting in favourable conditions for CDI development and also, the ARGs harboured by C. difficile often confer resistance to antimicrobials that are not used to treat CDI (Coia, 2009). The high number of MGEs associated with C. difficile means that C. difficile genome has the potential to be highly plastic (Brouwer et al., 2011). It is hoped that surveillance schemes, antimicrobial resistance data and WGS can be used to offer insight into the global spread of C. difficile (He et al., 2013). 5.2.1.2. Clostridium perfringens Clostridium perfringens (C. perfringens) is an obligate anaerobe that is found in many environments, including faeces, sewage, soils and food (Li et al., 2013), and is the causative agent of several diseases such as food poisoning and gas gangrene (Rood and Cole, 1991). Strains of C. perfringens are known to vary in their virulence and phenotypic characteristics due in part to the incorporation of MGEs, containing genes encoding toxins, sporulation factors and other secreted enzymes, into the genome (Myers et al., 2006). Strains of C. perfringens have also been found to carry to conjugative plasmids containing ARGs and toxin-encoding genes (Bannam et al., 2011). 5.2.2. Chapter hypothesis The Firmicute phylum, in particular the genus Clostridium, contains bacterial species that are clinically relevant pathogens in both human and veterinary medicine. These species are readily transmissible and can exhibit resistance to antimicrobials. Endospore formation and faecal shedding aids their widespread dissemination in the environment, and the multitude of effluents that are entering the aquatic environment are likely to be key sources in the 77 dispersal of endospore-forming bacteria. Therefore endospore-forming bacteria, as exemplified by Clostridium sp., were hypothesised to be contributors of ARGs to the environmental resistome and responsible for their persistence. The work in this chapter explored the link between endospore-forming, pathogenic bacteria and antimicrobial resistance in the environment by reactivating endospores from environmental samples and subjecting them to WGS in order to establish their phylogenetic and genomic relationships with respect to ARGs. 5.3. Materials and methods 5.3.1. Sample collection Environmental samples of effluents and faeces were collected over a four-month time period from all the sampling sites used in the previous chapters (hospital effluent, WWTP influent and effluent, dairy farm). In addition, effluent from a pig farm (latitude: 51.736053, longitude: 0.408824) was sampled on the 15.07.2014 and samples were also taken from the River Cam on 01.10.2014. Samples were collected in a 50 mL sterile polypropylene containers and transported at 4°C to the laboratory. Metagenomic sampling from the pig farm was also undertaken according to methods described in Chapter 2. 5.3.2. Isolate culture Environmental samples were subjected to ethanol (70%) shock for 24 hours to kill vegetative microorganisms before being spun down and re-suspended in sterile Phosphate Buffered Saline (PBS) solution. An aliquot of 1 mL was added to pre-reduced Brain-Heart Infusion (BHI) broth (containing 0.025 g sodium taurocholate hydrate) and grown for 24 hours at 37°C under anaerobic conditions to reactivate endospores. Aliquots of the 24-hour culture were serially diluted (by factors of 100, 101, and 102) in sterile PBS and streaked onto pre-reduced Brazier’s cefoxitin cycloserine egg yolk (CCEY) agar plates (containing 0.05 g sodium taurocholate hydrate) and grown for 48 hours at 37oC under anaerobic conditions. 78 5.3.3. DNA extraction and 16S rRNA sequencing Isolates of interest were identified using a similar 16S rRNA sequencing methodology as described by Lawley et al. (Lawley et al., 2012). Briefly, colonies that had distinct morphology were isolated, sub-cultured on pre-reduced CCEY agar plates and grown overnight at 37°C under anaerobic conditions. From each plate, a single colony was picked, added to 1 mL of PBS, subjected to bead beating for 30 seconds and centrifuged (13000 rpm, 10 minutes) to obtain the supernatant. The supernatant (containing isolate DNA) was used as template in the 16S PCR reaction with broad range primers. The 16S product was sequenced using capillary sequencing (Sanger Institute, UK). To identify bacterial species of each isolate, the 16S rRNA sequence was compared against the Ribosomal Database Project (RDP) and GenBank databases. 5.3.4. DNA extraction for whole genome sequencing Isolates for WGS sequencing were picked and grown overnight in pre-reduced BHI broth at 37°C under anaerobic conditions. For each isolate, 500 µL of culture was added to pre- reduced glycerol and stored at -80°C. The remaining culture was spun down for 10 minutes at 6500 RPM. The resulting pellet was washed with 10 mL of PBS solution, re-spun at 6500 RPM and incubated at -20°C for 24 hours. Genomic DNA was extracted from each pellet by cell lysis, phenol-chloroform extraction and ethanol precipitation. Briefly, each pellet was re- suspended in 2 mL of sucrose (25%) in TE buffer (10 mM Tris pH8 and 1 mM EDTA pH 8) and incubated at 37°C for 1 hour with 50 µL Lysozyme (100mg/mL; Sigma-Aldrich). To complete lysis, 100 µL of Proteinase K (18 mg/mL; Sigma-Aldrich), 30 µL RNase A (10 mg/µL; Invitrogen), 400 µL EDTA (0.5M) and 250 µL 10% Sarkosyl solution were added to the cell suspension and left on ice for 2 hours before being incubated at 50°C for 12 hours. Genomic DNA was extracted by phenol:chloroform:isoamyl alcohol (Sigma-Aldrich) washes and chloroform:isoamyl alcohol (Sigma-Aldrich) washes. Genomic DNA was then precipitated using 100% ethanol and purified with a wash of 70% ethanol. The purity of the DNA was assessed and quantified using Quibit Fluorometric Quantitation (Life Technologies). 79 5.3.5. Ribotyping Ribotyping was performed on potential C. difficile isolates by Wilco Knetsch at the Leiden University Medical Center, Netherlands. The methodology used was as described in (Knetsch et al., 2012). 5.3.6. Metagenome sequencing At the same time as isolate sample collection, a sample of the pig farm effluent from the lagoon was collected in a 10 L sterile polypropylene container, transported at 4°C to the laboratory and processed using the same methodology described in 2.3.3.2. A total of 2 µg of DNA was used to generate an Illumina sequencing library (100 bp, paired-end) that was sequenced using an Illumina HiSeq2500 (Exeter Sequencing Service, UK). A full description of the metagenomic samples used in this chapter is available in Appendix 3. 5.3.7. Construction of high-quality draft genomes Illumina sequencing libraries (150 bp, paired-end) were generated for all isolates with sufficient DNA for WGS and sequenced (Illumina MiSeq, San Diego, CA, USA) according to in-house protocols at the Wellcome Trust Sanger Institute (Quail et al., 2012). Assembly of each isolate was performed using the VelvetOptimiser script (Zerbino and Birney, 2008). Provisional classification of isolates was performed using Kraken (Wood and Salzberg, 2014); isolates that were unclassified by Kraken were classified using Metaphlan2 (Segata et al., 2012) and then aligned to the top species hit using BLAST (Altschul et al., 1990) to confirm classification. Isolates were then separated into species groups (difficile, perfringens etc.) for downstream analysis, which was done according to classification (as above) and features of genome assembly (total draft genome size, number of contiguous sequences (contigs)). Comparative genomics of assembled genome data was facilitated by comparison with reference genomes of C. difficile and C. perfringens (Appendix 4). Draft genomes from isolates in this study were annotated using Prokka (Seemann, 2014), which was also used to re-annotate reference genomes including in core genome analyses; annotations were used to explore genome features, as well as define core and accessory genomes. 80 5.3.8. Bioinformatic analysis For phylogenetic analysis of isolates, a multiple sequence alignment was generated by mapping isolate FASTQ data (and in-silico generated FASTQ files of reference isolates) to a suitable reference genome for each species (C. difficile 630 and C. perfringens ATCC13124) using SMALT (Ponstingl, 2014). For core genome analysis of each species group, core genes (genes present once in every isolate) were identified using the Roary pipeline (Page et al., 2015), aligned using SMALT (Ponstingl, 2014). For each alignment, the variable sites were used to construct a maximum likelihood tree using RAxML 7.8.6 (Stamatakis, 2006). Phylogenetic trees were visualised using Figtree (Rambaut, 2007) and Dendroscope (Huson and Scornavacca, 2012). Antimicrobial resistance gene annotation was performed for each isolate by comparing predicted protein coding sequences to the Antibiotic Resistance Database (ARDB) using the annotation tool ARDBanno.pl (Liu and Pop, 2009). For comparative genomics work, ABACAS (Algorithm-Based Automatic Contiguation of Assembled Sequences), Artemis, the Artemis Comparison Tool and DNAPlotter were used to identify coordinates generate image files (Carver et al., 2005). Where re-arrangements (i.e. insertions or deletions) were identified, de-novo assemblies where checked by mapping reads to a concatenated construct of the contiguous sequences (self-mapping) (BWA-mem) (Li, 2013). Taxonomic profiling of the metagenomic reads was performed using Metaphlan2 (Segata et al., 2012) and the ARG content of the metagenome was assessed using SEAR (Rowe et al., 2015) with default settings. Metagenomes were assembled using the VelvetOptimiser script and Metavelvet (Zerbino and Birney, 2008) and compared to isolate contigs using ABACAS. 5.4. Results 5.4.1. Isolation and identification of endospore-forming bacteria Over a period of 4 months a total of 100 samples were collected from various effluents in the environment, including hospital, WWTP, pig farm and dairy farm effluents (Appendix 3). Each sample was enriched for endospores and plated in a three dilution series to give 300 plates of environmental endospore cultures. Colony numbers varied (from one colony to entire lawns) on each plate. A total of 113 colonies were successfully sub-cultured and subjected to 81 16S sequencing. Based on sample location, colony morphology and 16S annotation, 68 of the 113 colonies were selected for isolation (aiming to select diverse isolates that included several different species of Clostridia), DNA extraction and WGS. Of the 68 isolates of environmental, endospore-forming bacteria prepared for WGS, a total of 59 isolates were successfully sequenced (Table 5.1). 82 Table 5.1 All endospore-forming bacteria successfully isolated from environmental samples and processed for whole genome sequencing. Sampling DNA extraction Assembly statistics Kraken annotations Is ol at e nu m be r Lo ca tio n D at e S am pl e ty pe 16 S R N A se qu en ci ng C on c. D N A (n g/ µL ) To ta l W G S yi el d (k b) To ta l l en gt h N o. c on tig s A vg . c on tig le ng th (b p) K ra ke n to p hi t un cl as si fie d C. d iffi cil e C. p er fri ng en s C. b ot uli nu m 1 Pig farm 15.07.2014 Combined farrow house effluent Clostridium perfringens 60 396967 7859847 979 8028 Unclassified 80.66 0.04 17.45 0.01 2 Pig farm 15.07.2014 Combined farrow house effluent Clostridium perfringens >120 384744 6919008 296 23375 C. perfringens 10.73 24.83 63.89 0.03 3 Pig farm 15.07.2014 Piglet crate effluent Clostridium difficile 91.2 513766 9196469 214 42974 C. difficile 45.75 52.85 0.03 0.01 4 Pig farm 15.07.2014 Piglet crate effluent Clostridium difficile 118 583950 3828920 56 68374 C. difficile 15.69 83.86 0.02 0.01 5 Pig farm 15.07.2014 Combined farrow house effluent Clostridium difficile 120 613668 3826452 57 67131 C. difficile 14.81 84.69 0.08 0.01 6 Pig farm 15.07.2014 Piglet crate effluent Clostridium difficile >120 553922 4054739 57 71136 C. difficile 5.23 93.57 0.02 0 7 Pig farm 15.07.2014 Piglet crate effluent Clostridium perfringens 120 622796 4760721 2576 1848 C. difficile 28.4 70.16 0.74 0.01 8 Pig farm 15.07.2014 Sow faeces Clostridium difficile 68.8 667938 3830384 45 85120 C. difficile 15.33 84.16 0.02 0.01 9 Pig farm 15.07.2014 Combined farrow house effluent Clostridium perfringens 112 477872 6806631 1474 4618 C. perfringens 10.75 13.44 75.13 0.03 10 Pig farm 15.07.2014 Fertiliser/lagoon effluent* Clostridium colicanis >120 453049 3394496 44 77148 Unclassified 95.17 0.18 0.7 0.65 12 Pig farm 15.07.2014 Fertiliser/lagoon effluent* Clostridium butyricum 94.6 433858 4534048 71 63860 Unclassified 92.76 0.06 0.18 1.05 13 Pig farm 15.07.2014 Fertiliser/lagoon effluent* Clostridium colicanis >120 521535 3455264 69 50076 Unclassified 94.46 0.08 0.82 0.71 14 Pig farm 15.07.2014 Manure heap Clostridium perfringens 112 447642 3394466 19 178656 C. perfringens 8.57 0.03 90.92 0.01 15 Pig farm 15.07.2014 Manure heap Clostridium perfringens 68.6 457313 3141691 84 37401 C. perfringens 9.96 0.03 89.38 0.04 16 Pig farm 15.07.2014 Combined weaner shed effluent Clostridium butyricum >120 528191 4453401 76 58597 Unclassified 92.55 0.04 0.16 1.04 17 Pig farm 15.07.2014 Combined weaner shed effluent Clostridium difficile >120 431443 3834726 56 68477 C. difficile 15.01 84.54 0.05 0.01 18 Pig farm 15.07.2014 Soil (ammended with effluent) Clostridium perfringens 106 443344 4093716 1820 2249 C. perfringens 9.58 0.02 89.83 0.04 19 Pig farm 15.07.2014 Soil (ammended with effluent) Clostridium perfringens 68 398784 3101644 112 27693 C. perfringens 8.84 0.04 90.38 0.06 20 Pig farm 15.07.2014 Soil (ammended with effluent) Clostridium perfringens 87 353803 3107330 93 33412 C. perfringens 9.94 0.03 89.43 0.02 21 Pig farm 15.07.2014 Soil (ammended with effluent) Clostridium perfringens >120 494861 3116792 93 33514 C. perfringens 10.79 0.04 88.53 0.02 22 WWTP 04.08.2014 WWTP influent Clostridium butyricum 85 391122 4516539 59 76552 Unclassified 92.62 0.04 0.18 1.01 83 23 WWTP 04.08.2014 WWTP influent Clostridium perfringens 104 476564 3307749 25 132310 C. perfringens 10.97 0.02 88.54 0.01 24 WWTP 04.08.2014 WWTP influent Clostridium difficile 114 502071 4339582 43 100921 C. difficile 9.11 88.83 0.02 0 25 WWTP 04.08.2014 WWTP influent Clostridium perfringens 81.6 527463 3266305 54 60487 C. perfringens 10.53 0.04 88.64 0.03 26 WWTP 04.08.2014 WWTP effluent Clostridium difficile 83.4 534890 4303354 55 78243 C. difficile 4.66 91.69 0.03 0 27 WWTP 04.08.2014 WWTP effluent Clostridium difficile 59.4 457216 4296951 42 102308 C. difficile 4.64 91.62 0.02 0 28 Hospital 04.08.2014 Hospital wastewater effluent Clostridium difficile >120 498694 4323406 60 72057 C. difficile 6.23 89.35 0.03 0.01 29 Hospital 04.08.2014 Hospital wastewater effluent Clostridium difficile 79 413581 4144436 40 103611 C. difficile 4.94 93.56 0.03 0 30 Hospital 04.08.2014 Hospital wastewater effluent Clostridium difficile 78.6 384058 4323683 58 74546 C. difficile 6.25 89.57 0.02 0 31 Hospital 04.08.2014 Hospital wastewater effluent Clostridium difficile 102 475056 4064553 39 104219 C. difficile 4.95 94.52 0.02 0 32 Hospital 04.08.2014 Hospital wastewater effluent Clostridium difficile 73.6 524438 4068536 41 99233 C. difficile 5.15 94.23 0.02 0 33 Hospital 04.08.2014 Hospital wastewater effluent Clostridium difficile 58 550368 4141364 40 103534 C. difficile 3.64 92.62 0.04 0 34 Hospital 04.08.2014 Hospital wastewater effluent Clostridium difficile 102 429473 4066105 33 123215 C. difficile 5.13 94.3 0.02 0 35 Hospital 04.08.2014 Hospital wastewater effluent Clostridium difficile 89.2 426528 4574826 297 15403 C. difficile 3.66 95.1 0.03 0 36 Dairy farm 06.08.2014 Combined milking shed effluent Terrisporobacter glycolicus 97.6 419652 4112310 54 76154 Unclassified 97.11 1.15 0.1 0.07 37 Dairy farm 06.08.2014 Combined milking shed effluent Terrisporobacter glycolicus 88.2 438254 4106488 46 89271 Unclassified 96.82 1.29 0.1 0.07 38 Dairy farm 06.08.2014 Calf shed effluent Terrisporobacter glycolicus 49.8 421671 4132142 48 86086 Unclassified 96.59 1.38 0.12 0.08 39 Dairy farm 06.08.2014 Calf shed effluent Terrisporobacter glycolicus 80.8 498836 4129645 43 96038 Unclassified 97.2 1.14 0.09 0.07 40 Dairy farm 06.08.2014 Calf shed effluent Clostridium perfringens 116 556516 3214184 14 229585 C. perfringens 5.66 0.03 93.89 0.01 41 Dairy farm 06.08.2014 Calf shed effluent Clostridium perfringens 98 627832 3215299 14 229664 C. perfringens 5.8 0.03 93.76 0.01 42 Dairy farm 06.08.2014 Calf shed effluent Clostridium perfringens 102 539625 3972568 1075 3695 C. perfringens 23.4 0.67 74.6 0.01 43 Hospital 15.09.2014 Hospital wastewater effluent Clostridium difficile 83.8 466178 4098800 37 110778 C. difficile 2.79 95.05 0.02 0 44 Hospital 15.09.2014 Hospital wastewater effluent Clostridium difficile 29.8 434080 4102890 36 113969 C. difficile 2.68 95.39 0.02 0 45 Hospital 15.09.2014 Hospital wastewater effluent Clostridium difficile 30 414952 4100726 35 117164 C. difficile 2.52 95.56 0.02 0 46 Hospital 29.09.2014 Hospital wastewater effluent Clostridium perfringens 20.4 466087 3272482 50 65450 C. perfringens 8.75 0.03 90.65 0 47 Hospital 29.09.2014 Hospital wastewater effluent Clostridium perfringens 28.4 432151 3272685 32 102271 C. perfringens 8.46 0.03 91 0.01 48 Hospital 27.10.2014 Hospital wastewater effluent Clostridium difficile 20.6 460378 6424279 47 136687 Unclassified 94.51 0.14 0.07 0 49 Hospital 27.10.2014 Hospital wastewater effluent Streptococcus lutetiensis 86.4 503362 4537607 30 151254 Unclassified 96.15 0.31 0.06 0 50 Dairy farm 15.09.2014 Fertiliser/lagoon effluent Clostridium beijerinckii 81.2 534596 4553648 57 79889 Unclassified 93.04 0.07 0.2 0.9 51 Dairy farm 29.09.2014 Fertiliser/lagoon effluent Clostridium beijerinckii 39.8 495111 4665555 87 53627 Unclassified 93.26 0.09 0.19 1.05 84 52 River 01.10.2014 River water Clostridium beijerinckii 19.1 388996 6542271 55 118950 Unclassified 96.63 0.21 0.05 0 53 River 01.10.2014 River water Clostridium difficile 77.4 497315 3737181 30 124573 C. botulinum 11.41 0.03 0.02 87.34 54 River 01.10.2014 River water Clostridium beijerinckii 80.6 484115 3572217 22 162374 C. botulinum 8.12 0.03 0.02 90.72 55 Dairy farm 29.09.2014 Fertiliser/lagoon effluent Clostridium perfringens 59 489987 3524380 17 207316 Unclassified 76.6 0.05 22.01 0.23 56 Hospital 29.09.2014 Hospital wastewater effluent Clostridium perfringens 15.7 506820 3276555 30 109219 C. perfringens 9.69 0.14 89.69 0.01 57 Hospital 29.09.2014 Hospital wastewater effluent Clostridium perfringens >120 582916 3280735 25 131229 C. perfringens 8.52 0.03 91.01 0.01 58 Hospital 29.09.2014 Hospital wastewater effluent Clostridium perfringens 16.6 587255 7502072 129 58156 C. perfringens 9.21 21.14 68.79 0 59 Hospital 29.09.2014 Hospital wastewater effluent Clostridium perfringens >120 523965 3281160 22 149144 C. perfringens 8.8 0.06 90.69 0.01 * fertiliser/lagoon effluent sample split and used for pig farm metagenome 85 The Kraken annotation of WGS data classified all isolates as being either a Clostridium species (43 isolates) or unclassified (16 isolates). 5.4.1. Unclassified environmental isolates Subsequent attempts to classify the unclassified isolates using Metaphlan2 resulted in the taxonomic assignment of reads to the Firmicutes phylum, (mainly into the Peptostreptococcaceae, Erysipelotrichaceae or Clostridium genera). Based on the assembled draft genome of each isolate (including size and number of contigs) and the annotation of reads matching Metaphlan2 taxonomic marker genes, 15 of the 16 unclassified isolates were putatively classified as Firmicutes (Table 5.2). Draft genomes of the isolates were compared to the NCBI Refseq database using BLAST; each isolate matched the corresponding ‘WGS species classification’ reference sequence with an average of 80% coverage (at 99% identity, data not shown), potentially indicating novel bacterial species. 86 Table 5.2 Unclassified environmental isolates and their identified ARGs. Isolate Effluent type WGS species classification 16S RNA Assembly statistics Identified ARGs (ARDB) (Metaphlan2) To ta l Le ng th No . Co nt igs Av g. Co nt ig Le ng th an t6 ia ba cA m ef A m ph b te tO te tP A te tP B va tB #10 Pig farm Clostridium colicanis Clostridium colicanis 3394496 44 77147.64 1 1 #12 Pig farm Clostridium butyricum Clostridium butyricum 4534048 71 63859.83 1 #13 Pig farm Clostridium colicanis Clostridium colicanis 3455264 69 50076.29 1 1 1 #16 Pig farm Clostridium butyricum Clostridium butyricum 4453401 76 58597.38 1 1 1 #22 WWTP* Clostridium butyricum Clostridium butyricum 4516539 59 76551.51 1 #36 Dairy farm Peptostreptococcaceae sp. Terrisporobacter glycolicus 4112310 54 76153.89 1 1 1 #37 Dairy farm Peptostreptococcaceae sp. Terrisporobacter glycolicus 4106488 46 89271.48 1 1 1 #38 Dairy farm Peptostreptococcaceae sp. Terrisporobacter glycolicus 4132142 48 86086.29 1 1 1 #39 Dairy farm Peptostreptococcaceae sp. Terrisporobacter glycolicus 4129645 43 96038.26 1 1 1 #49 Hospital Erysipelotrichaceae bacterium Streptococcus lutetiensis 4537607 30 151253.57 1 1 1 #50 Dairy farm Clostridium butyricum Clostridium beijerinckii 4553648 57 79888.56 1 #51 Dairy farm Clostridium butyricum Clostridium beijerinckii 4665555 87 53627.07 1 #52 River Clostridium bolteae Clostridium beijerinckii 6542271 55 118950.38 1 1 #53 River Clostridium botulinum Clostridium difficile 3737181 30 124572.7 1 1 #54 River Clostridium botulinum Clostridium beijerinckii 3572217 22 162373.5 1 1 * denotes WWTP influent 87 The draft genomes of each potentially novel bacterial species were ordered to the corresponding ‘WGS species classification’ reference sequence (as retrieved from the Metaphlan2 database), and then the ordered genomes of potential novel bacterial isolates were compared among themselves and revealed incomplete, but high identity, synteny in some areas (C. butyricum isolates shown as representative in Figure 5.1), potentially indicating some evolutionary relationship among novel bacterial isolates. Figure 5.1 Genome synteny (displayed using ACT) of potentially novel bacterial isolates, related to C. butyricum. Analysis of the ARG content of each of these potentially novel bacterial isolates revealed a total of 33 ARGs. All Peptostreptococcaceae-like isolates (dairy farm) contained tetracycline resistance genes tetPA and tetPB. All Clostridium botulinum-like isolates (river) contained streptogramin resistance gene vatB. The Erysipelotrichaceae-like isolate (hospital) was the only non-difficile/non-perfringens isolate to contain an aminoglycoside resistance gene (ant6ia). 88 5.4.1. Clostridium difficile isolates Based on the assembled draft genome and the percentage of reads matching taxonomic marker genes (Kraken annotations, Table 5.1), a total of 19 isolates had WGS data of sufficient quantity and quality to putatively annotate the isolate as C. difficile. 5.4.1.1. Phylogenetic analysis To assess the relatedness of the 19 C. difficile isolates obtained from the three different environmental effluents (hospital, WWTP and pig farm), a whole genome phylogeny was constructed alongside a set of references. The phylogeny showed that the majority of environmental C. difficile isolates fell into four distinct clades (Clades A-D, Figure 5.2). All 11 hospital C. difficile isolates fell in to clades A, B and C. The two isolates from the WWTP effluent fell in to clade C, whilst the isolate from the WWTP influent (isolate #24) was most closely related to the C. difficile M68 reference. For the pig farm, four of the isolates fell in to clade D (clustering with the C. difficile M120 reference) and the remaining isolate (#6) fell outside any of the defined clades. Notably clade B had isolates from two collection dates, a month and a half apart. Isolate #29 from hospital wastewater effluent (collected 04.08.2014) was 3295, 3283 and 3278 SNPs apart from isolates #43, #44 and #45 respectively, all of which were collected from hospital wastewater effluent (collected 15.09.2014). 89 Figure 5.2 Whole genome phylogenetic tree of Clostridium difficile isolates. The phylogenetic tree shows the evolutionary relationships among the C. difficile isolates and reference genomes. Colour coding represents the origin of the isolates. Bootstraps less than 90 are shown. Clades containing multiple novel samples are highlighted by lettered orange lines. The scale bar is in expected number of nucleotide substitutions per site (n= 188278). 90 To ensure the robustness of the results from the whole genome phylogeny of the isolates, and to explore the genetic content of the isolates, a core genome analysis of the C. difficile isolates was performed. The core genome was found to consist of 2521 genes (aligned to 2400468 sites), the total number of accessory genes was 4798 and the average number of genes per isolate was 3703 (Figure 5.3). Figure 5.3 Rarefaction curves for Clostridium difficile core genome analysis. A. Rarefaction curve showing the number of conserved genes for the isolates, i.e. the core genome. B. Rarefaction curve showing the number of genes in the pan-genome, i.e. the combined core and accessory genes. The core genes were aligned and the SNPs were used to build a core genome-based phylogeny. When the core genome-based tree was compared to the earlier mapping-based tree, the phylogenetic trees had a consistent topology (Figure 5.4). The consistent topology suggests that the previously defined clades (see Figure 5.2) are robust and can be used to group the C. difficile isolates for subsequent ARG analysis. 91 Figure 5.4 Tanglegram comparing two phylogenetic trees for the Clostridium difficile isolates (generated using different approaches). The left phylogeny (mapping-based tree) from Figure 5.2 is compared with the right phylogeny (core genome-based tree) was generated using 93953 SNPs present in the core genome of the isolates. Horizontal lines between trees connect cognate isolates. 92 5.4.1.2. Antimicrobial resistance genes For the 19 isolates that were identified as C. difficile, a total of 48 ARGs were identified using the ARDB isolate annotation tool (ARDBanno.pl, (Liu and Pop, 2009)) (Table 5.3). The 48 ARGs encompassed resistance to five classes of antimicrobial (aminoglycoside, bacitracin, macrolide, tetracycline and glycopeptide). In relation to the clades identified in the phylogenetic analysis (Figure 5.2), the aminoglycoside, macrolide and tetracycline ARGs were only found in clades A and C. 93 Table 5.3. Clostridium difficile isolates and their identified ARGs. Isolate Effluent type Assembly statistics Kraken results (percentage identified reads) Ribotype Clade Identified ARGs (ARDB) Toxin genes (core genome) To ta l le ng th No . c on tig s Av g. co nt ig len gt h un cla ss ifie d C. d iffi cil e C. p er fri ng en s C. b ot uli nu m an t6 ia ba cA er m B te tM va nR G tcd A tcd B #28 Hospital 4323406 60 72057 6.23 89.35 0.03 0.01 39 C 1 1 1 1 1 #29 Hospital 4144436 40 103611 4.94 93.56 0.03 0 14 B 1 1 1 1 #30 Hospital 4323683 58 74546 6.25 89.57 0.02 0 39 C 1 1 1 1 1 #31 Hospital 4064553 39 104219 4.95 94.52 0.02 0 11 A 1 1 1 1 #32 Hospital 4068536 41 99233 5.15 94.23 0.02 0 11 A 1 1 1 1 #33 Hospital 4141364 40 103534 3.64 92.62 0.04 0 10 C 1 1 1 1 #34 Hospital 4066105 33 123215 5.13 94.3 0.02 0 11 A 1 1 1 1 #35 Hospital 4574826 297 15403 3.66 95.1 0.03 0 unknown A 1 1 1 1 1 1 #43 Hospital 4098800 37 110778 2.79 95.05 0.02 0 untested B 1 1 1 1 #44 Hospital 4102890 36 113969 2.68 95.39 0.02 0 untested B 1 1 1 #45 Hospital 4100726 35 117164 2.52 95.56 0.02 0 untested B 1 1 1 1 #17 Pig farm 3834726 56 68477 15.01 84.54 0.05 0.01 78 D 1 1 1 #4 Pig farm 3828920 56 68374 15.69 83.86 0.02 0.01 78 D 1 1 #5 Pig farm 3826452 57 67131 14.81 84.69 0.08 0.01 78 D 1 1 #6 Pig farm 4054739 57 71136 5.23 93.57 0.02 0 26 - 1 1 1 1 #8 Pig farm 3830384 45 85120 15.33 84.16 0.02 0.01 78 D 1 1 #24 WWTP* 4339582 43 100921 9.11 88.83 0.02 0 unknown - 1 #26 WWTP 4303354 55 78243 4.66 91.69 0.03 0 39 C 1 1 1 1 #27 WWTP 4296951 42 102308 4.64 91.62 0.02 0 140 C 1 1 1 1 * denotes WWTP influent 94 5.4.1.2.1. Aminoglycoside resistance genes All five isolates in clade C and one isolate (#35) in clade A had an identical copy of the aminoglycoside resistance gene ant6ia, which was not present in any of the reference genomes. Comparative genomics of these six isolates to the C. difficile 630 reference genome (the most closely related reference genome in the phylogeny) revealed that ant6ia was contained within an insertion at base 1821844 of the reference genome (shown for isolate #26, Figure 5.5). The ant6ia-containing insertion was 8090 bases long and had an average GC content of 47.53%, 1.6 times greater than the average GC content of C. difficile (29.06%, Figure 5.5, (Sebaihia et al., 2006)). Notably, the ant6ia-containing insertion was located adjacent to another insertion containing phage-related genes (Figure 5.5, phage-like insertion). 95 Figure 5.5 Genome synteny between the Clostridium difficile reference 630 genome and contig 5 of isolate #26 (featuring ant6ia-containing insertion and a phage-like insertion). Aqua-coloured boxes represent putative coding sequences of each reading frame. 96 5.4.1.2.2. Tetracycline resistance genes In addition to ant6ia, four of the C. difficile isolates from clade C were found to contain the tetracycline resistance gene tetM. Comparative genomics of the identified tetM genes showed that they were 100% identical among all four isolates and, when compared to the non-redundant nucleotide collection, the tetM gene was found to share 100% identity with the tetM gene of conjugative transposon Tn5397 (GI: 669664115). Comparison of the tetM- containing contigs of the clade C isolates showed that they were identical to each other, and had collinear synteny with the C. difficile 630 reference genome and the conjugative transposon Tn5397 (isolate #26 shown as representative in Figure 5.6, A). The reference genome and Tn5397 shared 100% identity over the length of the transposon, however the four isolates from clade C all featured a deletion of 2795 bases from the transposon (henceforth referred to as the Tn5397-like element, Figure 5.6). Alignment of tetM-containing contigs to the C. difficile 630 reference genome revealed that the Tn5397-like element has changed location within the clade C isolates, relative to the reference genome (isolate #26 shown as representative in Figure 5.6, B). This was not attributable to a mis-assembly as read mapping was consistent across contigs (data not shown). Inspection of the Tn5397-like element in each clade C isolate showed an average GC content of 38.61%. In addition to the transposition of the tetM gene in the clade C isolates, the isolates were also found to be missing the nearby toxin-encoding genes tcdA and tcdB (confirmed in the core genome analysis, Figure 5.3). 97 Figure 5.6. A. Genome synteny among the Clostridium difficile reference 630 genome, the conjugative transposon Tn5397, and contig 18 of isolate #26. B. Genome synteny between the C. difficile reference 630 genome and contig 18 of isolate #26, showing varied location of the Tn5397 transposon and the Tn5397-like element. In addition, there has been a deletion of tcdA and tcdB, downstream of the element, in isolate #26 (as for all tetM-containing clade C isolates, not shown). 98 Isolate #35 (clade A) had a tetM gene divergent to the tetM gene found in the clade C isolates (henceforth referred to as tetM-type II), with which it shared 89% identity. When compared to the non-redundant nucleotide collection, the tetM-type II gene was found to share 100% identity with the tetM gene of Tn5801-like tetM gene (GI: 220898661). 5.4.1.2.3. Other types of antimicrobial resistance genes For the 19 C. difficile isolates, ARG annotation with ARDB identified 14 isolates that contained the glycopeptide resistance gene vanRG. On closer inspection, these 14 isolates were found to contain the entire vanG-type glycopeptide resistance locus described by Sebaihia et al. (Sebaihia et al., 2006). The five isolates not to feature the vanG-type glycopeptide resistance locus were all clade D isolates (pig farm) and the clade-less isolate #24 (WWTP influent). Three isolates from clade C (#28, #30, #33) and one isolate from clade A (#35) were found to contain the macrolide resistance gene ermB, encoding an adenine methyltransferase. ARG annotation with ARDB also revealed all isolates to contain the bacitracin resistance gene bacA. However, bacA is only known to confer antimicrobial resistance if overexpressed (Ghachi et al., 2005). 5.4.2. Clostridium perfringens isolates Based on the assembled draft genome and the percentage of reads matching taxonomic marker genes (Table 5.1), a total of 14 isolates had WGS data of sufficient quantity and quality to putatively annotate the isolate as C. perfringens. 5.4.2.1. Phylogenetic analysis To assess the relatedness of the 14 C. perfringens isolates from the four different environmental effluents (dairy farm, hospital, pig farm and WWTP), a whole genome phylogeny was performed alongside a set of references. The phylogeny resulted in the majority of environmental C. perfringens isolates falling in to 4 distinct clades (Clades A-D, Figure 5.7). 99 Figure 5.7 Phylogenetic tree of Clostridium perfringens isolates. The phylogenetic tree shows the evolutionary relationships between the C. perfringens isolates and reference genomes. Colour coding represents the origin of the isolates. Bootstraps less than 90 are shown. Clades containing multiple novel isolates are highlighted by lettered orange lines (A – D). The scale bar is in expected number of nucleotide substitutions per site (n= 139845). 100 To ensure the robustness of the results from the whole genome phylogeny of the isolates, and to explore the gene diversity of the isolates, a core genome analysis of the C. perfringens isolates was performed. The core genome was found to consist of 1842 genes (aligned to 1774527 sites), the total number of was 4982 and the average number of genes per isolate was 2850 (Figure 5.8). Figure 5.8 Rarefaction curves for Clostridium perfringens core genome analysis. A. Rarefaction curve showing the number of conserved genes for the isolates, i.e. the core genome. B. Rarefaction curve showing the number of genes in the pan-genome, i.e. the combined core and accessory genes. The core genes were aligned and the SNPs were used to build a core genome-based phylogeny. When the core genome-based tree was compared to the earlier mapping-based tree, the phylogenetic trees had consistent topology (Figure 5.9). The consistent topology illustrated by the tanglegram suggests that the previously defined clades (see Figure 5.7) are robust and could be used to group the C. perfringens isolates for subsequent ARG analysis. 101 Figure 5.9 Tanglegram comparing two phylogenetic trees for the Clostridium perfringens isolates (generated using different approaches). The left phylogeny (mapping-based tree) from Figure 5.7 is compared with the right phylogeny (core genome-based tree) generated using 70982 SNPs present in the core genome of the isolates. Horizontal lines between trees connect the isolates. 102 5.4.2.2. Antimicrobial resistance genes For the 14 isolates that were identified as C. perfringens, a total of 33 ARGs were identified using the ARDB isolate annotation tool (ARDBanno, (Liu and Pop, 2009)) (Table 5.4). The 33 ARGs encompassed resistance to three classes of antimicrobial (bacitracin, macrolide and tetracycline). In relation to the clades identified in the phylogenetic analysis (Figure 5.7), the macrolide ARGs were only found in clade D and the tetracycline ARGs were only found in clades A, C and D. The two dairy farm isolates (clade B) were the only C. perfringens isolates that did not contain tetracycline resistance genes. 103 Table 5.4 Clostridium perfringens isolates and their identified ARGs. Isolate Effluent type Assembly statistics Kraken results (percentage identified reads) Clade Identified ARGs (ARDB) To ta l le ng th No . c on tig s Av g. co nt ig len gt h un cla ss ifie d C. d iffi cil e C. p er fri ng en s C. b ot uli nu m ba cA er m Q m ph B te tP A te tP B #40 Dairy farm 3214184 14 229585 5.66 0.03 93.89 0.01 B 1 #41 Dairy farm 3215299 14 229664 5.8 0.03 93.76 0.01 B 1 #47 Hospital 3272685 32 102271 8.46 0.03 91 0.01 C 1 1 #57 Hospital 3280735 25 131229 8.52 0.03 91.01 0.01 C 1 1 #46 Hospital 3272482 50 65450 8.75 0.03 90.65 0 C 1 1 #59 Hospital 3281160 22 149144 8.8 0.06 90.69 0.01 C 1 1 #56 Hospital 3276555 30 109219 9.69 0.14 89.69 0.01 C 1 1 #14 Pig farm 3394466 19 178656 8.57 0.03 90.92 0.01 A 1 1 1 #19 Pig farm 3101644 112 27693 8.84 0.04 90.38 0.06 D 1 1 1 #20 Pig farm 3107330 93 33412 9.94 0.03 89.43 0.02 D 1 1 1 #15 Pig farm 3141691 84 37401 9.96 0.03 89.38 0.04 D 1 1 1 1 #21 Pig farm 3116792 93 33514 10.79 0.04 88.53 0.02 D 1 1 1 #25 WWTP* 3266305 54 60487 10.53 0.04 88.64 0.03 A 1 1 1 #23 WWTP* 3307749 25 132310 10.97 0.02 88.54 0.01 C 1 1 * denotes WWTP influent 104 5.4.2.2.1. Macrolide resistance genes Two of the C. perfringens isolates contained ARGs conferring resistance to macrolides. Isolate #19 contained the macrolide resistance gene mphB and isolate #15 contained both ermQ and mphB. Comparative genomics of isolate #15 to the C. perfringens ATCC13124 reference genome (the most closely related reference genome in the phylogenetic tree) revealed that ermQ was contained within an insertion at base 621786 of the reference genome (Figure 5.10). The ermQ-containing insertion was 4438 bases long and had an average GC content of 26.89% (consistent with the broader C. perfringens genome). Figure 5.10 Comparison (displayed using ACT) between the Clostridium perfringens reference ATCC13124 genome and contig 9 of isolate #15 (featuring ermQ-containing insertion). 105 The mphB gene present in both isolate #15 and #19 was found to be identical. In both isolates, the mphB gene was located on the end of a contig, therefore a construct was made with adjacent contigs (adjacent relative to the reference genome) for each isolate. When the mphB construct was compared to the ATCC13124 reference genome it was found to contain an insertion of 4246 bases at base 428411 of the reference genome (Figure 5.11). The average GC content of the mphB-containing insertion was 35.81%. Figure 5.11 Genome synteny between the Clostridium perfringens reference ATCC13124 genome, a construct of contigs 42, 31 and 28 of isolate #15 and contig 31 (containing mphB) of isolate #15. 106 5.4.2.2.2. Tetracycline resistance genes Of the 14 isolates, 12 contained the tetracycline resistance gene tetPA (encoding a tetracycline efflux pump). The tetPA gene shared an average of 98% identity among the 12 isolates, however it was present in different genomic locations amongst isolates. Where isolates were closely related the tetPA gene was found in the same genomic context, for example in Clade C the tetPA gene was present in an insertion at base 3203264 of the reference (isolate #15 shown as representative in Figure 5.12). The tetPA insertion is 10904bp long, has a GC content of 26.90% and also contains phage-like proteins on a single strand. Figure 5.12 Comparison (displayed using ACT) between the Clostridium perfringens reference ATCC13124 genome and the tetPA-containing contig of isolate #23. 107 In addition to tetPA, four isolates also contained the tetracycline resistance gene tetPB (encoding a ribosomal protection protein) adjacent to a second copy of tetPA. The tetPB gene was conserved between isolates but was not present in the reference genome. Using isolate #14 as a representative, the contig containing tetPB was compared the non-redundant nucleotide database using NCBI blast and was found to be similar to the C. perfringens plasmid pCW3 (GenBank: DQ366035.1). The plasmid pCW3 covered 63% of the contig with 99% identity. Comparative genomics between the tetPB-containing contig had a large (21777 bases) insertion relative to the pCW3 plasmid reference (Figure 5.13). Figure 5.13 A. Genome synteny between the Clostridium perfringens plasmid pCW3 and the tetPB and tetPA containing contig of isolate #14. B. Circularised representation of tetPB-containing plasmid (isolate #14 tetPB-containing plasmid shown). 108 5.4.3. Annotating endospore-formers from metagenomes To compare the taxonomic results of the metagenomic analysis and bacterial isolation from environmental samples, a split sample was used. A sample taken from the pig farm was used to prepare a metagenomic analysis (using methods described in Chapters 2-4), which was compared to three isolates obtained from that same sample (see footnote in Table 5.1). Metagenomic analysis binned 2% of reads in the Firmicute phylum; 1% in the Clostridia class and 0.3% into Clostridium sticklandii (C. sticklandii). No metagenomic reads were annotated as C. colicanis or C. butyricum; the most closely related species to the bacteria isolated from the sample (see Table 5.1, section 5.4.1). Further attempts were made to identify sequences related to C. sticklandii (GenBank: NC_014614) and the draft genome of one of the unknown isolates (#10, C. colicanis-like) from the metagenomic data. BLAST comparison of the assembled metagenome (resulting in a total of 14689361 contigs with an N50 of 153 bases) yielded an alignment that covered 6% of the C. sticklandii reference (Figure 5.14); whereas only 2% of the draft genome of isolate #10 was covered. However, mapping coverage of metagenomic reads against both genomes was equivocal (26% for C. sticklandii and 25% for isolate #10). Figure 5.14 Genome synteny between the Clostridium sticklandii reference genome (GI: 310657316) and the metagenome contigs of the pig farm lagoon effluent. 109 5.5. Discussion This chapter has shown that endospores of bacteria can be successfully reactivated to culture clinically relevant pathogens from all effluent sampling locations. Furthermore, the endospore-forming bacteria have been found to contain ARGs and also show evidence of being horizontally transmitted within their genomes. Samples of various environmental effluents were collected over a period of four months and the endospores in each sample were selected, reactivated and used to culture anaerobic bacteria. Of the cultured bacteria, those with Clostridium sp. colony morphology were subsampled and subjected to 16S sequencing for preliminary identification. A total of 59 isolates were then subjected to WGS. Although all isolates received preliminary identification by 16S sequencing, some isolates were unclassified after Kraken classification of the WGS data. Some WGS data also created bad assemblies. This was addressed by quality checking the WGS data and removing isolates from further analysis if the number of contigs exceeded 200, or if the assembled draft genome size was unusually large, and then re-classifying remaining isolates with Metaphlan2. This approach resulted in 48 isolates; all successfully classified as Firmicutes and used in the next stages of the analysis. The fact that Metaphlan2 successfully classified isolates when Kraken was unable to may be due to the fact that, unlike Kraken, Metaphlan2 does not attempt to assign every read to a taxonomic level (Segata et al., 2012), but uses taxonomic markers to bin reads and identify taxa. Consequently, the isolates that Kraken was unable to classify but passed WGS data quality checking and Metaphlan2 classification are likely divergent or novel species of bacteria that are related to endospore-forming Firmicutes. Therefore these isolates could be hitherto unknown clinically relevant pathogens. Considering that 15 of the 48 sequenced and quality-checked isolates were unclassified environmental isolates (Table 5.2), the issue of database limitations in the identification of bacterial species from WGS data is an important one that should be addressed. Further work is required to annotate the genomes of unclassified environmental isolates, in order to augment the available databases and facilitate in silico identification of bacterial isolates. More work is also needed to characterise the nature and pathogenic potential of these apparently novel bacterial species. Of the 48 isolates that were sequenced and passed quality checking, 19 isolates were classified as Clostridium difficile and 14 were classified as Clostridium perfringens, both of which are pathogenic bacteria that cause significant clinical disease in both human and veterinary medicine. Indeed, for the months that C. difficile was isolated from the hospital 110 effluent, cases of C. difficile infection were record by Public Health England’s data capture system for the hospital being sampled (Public Health England, 2013).Clade B of the C. difficile phylogenetic tree contained isolates from two collection dates, a month and a half apart. Additionally, clade C of the same phylogeny featured C. difficile isolates that were from different sampling locations (hospital and WWTP effluent samples) (Figure 5.2). Further work is required to see if identical isolates can be recovered over prolonged periods of time, thus potentially demonstrating the prolonged persistence of specific pathogenic bacteria in the environment. These isolates of pathogenic bacteria were found to harbour ARGs that conferred resistance to several classes of commonly used antimicrobials. The similarity of ARG-containing contigs to known MGEs and the abnormal GC-content of ARG-containing elements, in addition to the transposase and recombinase genes found in close proximity to the ARGs, suggest that these ARGs were likely to be horizontal acquisitions (Baran and Ko, 2008). This indicates the potential for bacterial endospores that are present in the environment to harbour mobilisable ARGs. In addition to ARGs, some of the C. difficile and C. perfringens isolates also contained toxin- encoding genes. In the case of C. difficile, eight isolates had both C. difficile toxin-encoding genes (tcdA and tcdB) and five isolates had the toxin-encoding gene tcdB only. It has been reported that only toxin B is required for C. difficile virulence (Lyras et al., 2009), meaning that 68% of C. difficile isolates were likely to be virulent. The presence of toxin-encoding genes, in addition to ARGs, in the genomes of these isolates of pathogenic bacteria suggests that environmental instances of these bacteria may have clinical significance if they were to re-enter human or animal populations. This is also suggested by the phylogenetic relatedness of these isolates to known pathogenic reference genomes. More extensive monitoring of effluents and effluent destinations may help elucidate the fate of pathogenic bacteria that are entering the environment. Due to the fact that all sampled effluents on the pig farm entered a single effluent lagoon (from where the lagoon effluent is subsequently used as a fertiliser), a metagenome was generated from the lagoon effluent for use as background sample. Although both C. difficile and C. perfringens were cultured from endospores collected from various pig farm effluents (but not the lagoon itself), no trace of C. difficile or C. perfringens were found in the metagenome, either by taxonomic assignment of reads or the alignment of metagenomic contigs to reference databases. It is possible that no C. difficile or C. perfringens were present in the lagoon; however, C. perfringens was cultured from lagoon effluent that was 111 being applied as fertiliser to a neighbouring field. It is likely that the metagenome sample was not sequenced at a sufficient sequencing depth, as exemplified by the fact that 473120 (0.3%) reads were classified as C. sticklandii yet only 6% of the C. sticklandii genome was assembled (26% of the genome was covered by mapping metagenomic reads). Alternatively, it may be the case that C. difficile and C. perfringens were only present in the metagenome sample as endospores (awaiting reactivation to a vegetative state when environmental conditions were appropriate), which the metagenomic methodology may have been unable to extract DNA from. In terms of isolates that were cultured from the split metagenome sample, isolate #10 (C. colicanis-like) was cultured yet only 2% of the C. colicanis genome was covered in the metagenome assembly (25% of the genome was covered by mapping metagenomic reads). While metagenomics is a powerful tool that has allowed for the characterisation of ARGs in environmental samples, it is as yet unable to facilitate the linking of function with taxonomy, specifically the identification of species of bacteria that are carrying ARGs. This chapter has shown that WGS of environmental isolates can be used to address this limitation of metagenomic analysis. At the same time it has also shown that the metagenomic identification of ARGs in the environment is corroborated by the genomic analysis of environmental isolates of clinically relevant pathogens, revealing them to harbour ARGs often carried on horizontally transmissible MGEs. The fact that ARGs were also present in potentially novel environmental bacteria suggests that there is more research required, both in terms of metagenomics and environmental culture, in order to characterise environmental bacteria and monitor ARGs from the environment. Finally, this chapter has revealed that endospores of environmental bacteria are harbouring ARGs. Considering that endospores can withstand extremes in environmental conditions and can remain dormant for many years until appropriate conditions for reactivation are encountered (Cano and Borucki, 1995), this finding suggests that endospore-forming bacteria are facilitating the persistence of ARGs in the environment. Taken in conjunction with the fact that clinically relevant pathogens can form endospores and the fact that the ARGs identified in this chapter appear to be mobile, this work offers insight into the dynamics of ARGs in the environment and suggests that the monitoring of bacterial endospores would be a valuable addition to the environmental framework to tackle antimicrobial resistance (Berendonk et al., 2015). 112 5.6. Conclusion The work in this chapter has contributed to the development of an environmental framework for monitoring ARG dissemination by showing that ARGs are harboured by bacterial endospores isolated from effluents, thus presenting a possible mechanism through which ARGs could persist in the environment. Furthermore, many of the bacterial endospores that have been reactivated and cultured in this study are known pathogenic species that cause significant human and animal disease. In summary, the presence of endospores in the environment that can be reactivated to yield viable pathogenic bacteria harbouring apparently mobile ARGs suggests that environmental reservoirs of ARGs pose a risk to human and animal health and the longevity of antimicrobials. 113 Chapter 6. General discussion 6.1. Preface The purpose of the work described in this dissertation has been to help establish an environmental framework to monitor antimicrobial resistance gene (ARG) dissemination in the aquatic environment. This framework offers the potential to quantify the risk of antimicrobial resistance in environmental samples and contribute toward antimicrobial resistance risk assessments. The establishment of this environmental framework has required the design and development of appropriate experimental methodology and analytical tools, as well as the identification of factors likely to influence ARG dissemination in the environment. In addition, this dissertation has described the persistence of ARGs in the environment through endospore-forming bacteria. This discussion examines the research findings of the doctoral research detailed in this dissertation and summarises the framework that has been developed. In addition, an outline of the work that would be required in order to incorporate this framework into future ARG research projects and environmental ARG monitoring programmes is discussed. 6.2. Summary of findings and dissertation outcomes In order to develop the basis for an environmental framework to monitor ARG dissemination in the aquatic environment, work outlined in this dissertation involved the development of a bioinformatics tool to identify ARGs in sequencing data, the implementation of a pilot study to confirm effluents entering the aquatic environment contain ARGs and the demonstration that a single river catchment contains a diverse array of bacterial endospores that can be re- activated and interrogated to reveal ARG presence. 114 6.2.1. Establishing an environmental framework for monitoring ARG dissemination in the aquatic environment 6.2.1.1. Summary of findings The first step in establishing an environmental framework for monitoring ARG dissemination was the development of suitable experimental methodology to detect and quantify ARG presence in environmental samples. A review of the literature suggested that metagenomics was likely to be the most appropriate technology to form the basis of the framework. However, at the time of planning the work, there were insufficient analytical tools available to identify ARGs within metagenomic sequencing data in a quick, efficient and quantitative manner, which were capable of taking into account the frequently changing ARG databases available. Therefore the first results chapter (Chapter 2) described the necessary development of the Search Engine for Antimicrobial Resistance (SEAR), a bioinformatic tool for the detection of horizontally acquired ARGs in raw sequencing data (Rowe et al., 2015). Firstly, a wet-laboratory methodology for preparing metagenome quality DNA from aquatic samples was developed. This involved determining the suitability of centrifugation and vacuum filtration for the isolation of bacteria from aquatic samples, as well as the assessment of DNA extraction protocols to determine the most appropriate method for preparing high quality DNA suitable for metagenomic sequencing. The results of this methodology development are detailed in Appendix 6 and the finalised wet-laboratory methodology for metagenome sequencing was implemented in Chapter 2. SEAR was then designed to utilise sequencing data straight from the sequencing machine; it begins by quality checking the data and subtracting unnecessary reads before reconstructing full-length ARG sequences. These full-length ARG sequences are subsequently annotated, compared to online databases and presented in a concise report. SEAR therefore enabled metagenomics to be used as a viable approach for rapidly detecting all known horizontally acquired ARGs in a given sample (according to available databases and not including SNP- based ARGs), whilst giving important information such as the relative abundance of each ARG and associated mobile genetic elements (MGEs). SEAR also served as a proof of principal demonstrating that sequencing applications such as metagenomics can be developed for rapid diagnostic testing; the diagnostic application of such technologies is currently being proposed in the scientific literature (Miller et al., 2013). Finally, in addition to using SEAR to analyse metagenomes for ARG content, additional software and databases to 115 facilitate the interrogation of metagenomes for pathogenic bacteria, mobile genetic elements and taxonomic composition (such as the PATRIC pathogen database) were trialled and subsequently implemented in the methodology described in results chapters 3 and 4. Once appropriate methodology had been developed and the bioinformatic tools for analysing metagenomic data were in place, the next step in establishing the environmental framework for monitoring ARG dissemination was to confirm the underlying assumption of this dissertation; namely that effluents entering the aquatic environment were disseminating ARGs. Chapter 3 served as a pilot study that used a set of longitudinal samples from two effluents entering a single river catchment and compared them to a background sample of river source water in order to confirm this underlying assumption. The work reported in Chapter 3 found that not only did effluents entering the aquatic environment from both human and animal faecal sources contain a diverse range of ARGs, but that the abundance of these genes was consistently higher than in the background sample of river water. In addition to abundance, the diversity of pathogens and MGEs were also greater in the effluent samples compared to the river water. Consequently, the underlying assumption that effluents disseminate ARGs was confirmed and the methodology used in the Chapter 3 work deemed suitable to serve as the foundation of analysis within the environmental framework to be developed. Specifically, this entailed the use of metagenomics and SEAR to assess ARG load of effluent samples entering the aquatic environment in addition to the use of background sampling to provide context for ARG load. The next step in establishing the environmental framework was to determine additional factors that may affect ARG dissemination in the effluents being monitored, identifying key factors that may need to be incorporated into the framework. To address this, the work detailed in Chapter 4 aimed to assess the impact of ARG load and potential selection pressures on the expression of ARGs in environmental samples. The results reported in Chapter 4 showed that in addition to hospital and farm effluents having a consistently high load of ARGs in relation to background samples across a series of monthly samples, many different ARGs were also being expressed and a strong positive correlation was observed between the abundance of ARGs and corresponding ARG transcripts. Chapter 4 also demonstrated a positive correlation between hospital antimicrobial usage and ARG transcript abundance in hospital effluent. Finally, Chapter 4 also demonstrated the presence of antimicrobial compounds in both hospital and farm effluents. Collectively, these results suggested that ARG load and potential antimicrobial selection pressures were impacting on the level of expression of ARGs in environmental samples. Thus, the work reported in 116 Chapter 4 developed the environmental framework through the application of metatranscriptomics, antimicrobial usage data and LCMS measurement of antimicrobial compounds to identify additional factors likely to be playing an important role in the dynamics of ARG abundance within effluents. In addition to the metabolically active bacteria that were assessed through use of metatranscriptomics in the work reported in Chapter 4, the potential impact of non- metabolically active bacteria on ARG dissemination in the environment was investigated in the work reported in Chapter 5. Chapter 5 described the identification of ARGs within bacterial endospores that were isolated and re-activated from a variety of environmental samples, including the effluents that had been studied in the previous chapters and other environmental sources. The majority of these endospores were dormant Clostridium difficile and Clostridium perfringens bacteria, which are clinically relevant pathogens (Tonna and Welsby, 2005, Rood and Cole, 1991). Several of the ARGs identified within the genomes of these bacteria showed evidence of mobility and recent horizontal transfer highlighting that bacterial endospores present a mechanism through which ARGs could persist in the environment and contribute to the resistome. 6.2.1.2. The environmental framework In summary, the results presented within this dissertation describe work that forms a basis for an environmental framework to monitor the dissemination of ARGs, via effluents in the aquatic environment (Figure 6.1). By placing strategic effluent monitoring points in the antimicrobial resistance risk assessment model introduced in Chapter 1, the environmental framework for monitoring ARG dissemination would generate an antimicrobial resistance potential (ARP) categorisation for a given effluent that would contribute to the ongoing overall antimicrobial resistance risk assessment for a given catchment. 117 Figure 6.1 Proposed environmental framework for monitoring ARG dissemination and assigning antimicrobial resistance potential to environmental samples. Effluent samples are taken within a catchment that is being subjected to an antimicrobial resistance risk assessment, which takes into account transmission of resistance to humans, ARG dissemination and emergence of resistance in the environment (upper half of figure, augmented from Chapter 1 to show points of effluent monitoring used in this work). Effluent samples are run through the environmental framework to monitor ARG dissemination (lower half of figure), giving an antimicrobial resistance potential categorisation that feeds into the overall antimicrobial resistance risk assessment. Flowchart key: dark blue represents an experimental process, green represents an analysis, aqua represents data handling and storage, orange represents a categorisation. 118 In the proposed environmental framework represented in Figure 6.1, an effluent sample is split and subjected to several experimental processes in order to determine the ARG and pathogen load of the sample, as well as the presence of selection pressures, virulence factors and metabolic activity. After normalising data and comparing to a suitable background sample, a primary ARP categorisation is made. Effluent samples with high ARG and pathogen load, in addition to the presence of potential antimicrobial selection pressures such as high antimicrobial usage at the effluent origin or the presence of antimicrobial residues in situ, would be deemed to have a high ARP and are subjected to additional experimental processes, such as bacterial culture and whole genome sequencing of isolates. These additional experimental processes (including culture, susceptibility testing, WGS and analysis of stored effluent samples) take considerably more time and effort than the initial experimental process and analysis components of the framework, so the use of a stored sample for isolate work is only introduced if the primary ARP outcome is high. This therefore ensures that the monitoring of ARG dissemination can be performed rapidly yet additional risk factors can be identified in the event of effluent samples with high ARP. The use of culturing and susceptibility testing also allows for the discovery of novel bacteria and ARGs, which in turn can be used to augment the existing monitoring databases. The ARP categorisation is designed to reflect the level of ARG dissemination that an effluent could be responsible for. It is envisaged that this environmental framework can be incorporated into antimicrobial risk assessments and facilitate a standardised approach for the regular monitoring of ARG dissemination in the aquatic environment. 6.2.2. Future work on an environmental framework for monitoring ARG dissemination 6.2.2.1. Assigning significance to antimicrobial resistance potential As part of the future work to develop the environmental framework, it is crucial that a measure of significance can be applied to the ARP categorisation, or to the factors that contribute to this categorisation. Indeed, in parallel to the work described in this thesis, efforts to develop an ARP categorisation for antimicrobial resistance determinants based on metagenomic analysis of ARGs, MGEs and pathogens in existing environmental samples has been reported (Port et al., 2014). In this study, the authors used their ARP (termed an antimicrobial resistance determinants index) and evaluated the differences between aquatic 119 environments with respect to their proximity to likely human impact. However, both the ARP categorisation used in Port et al (2014) (and indeed the expanded ARP categorisation outlined in the framework in Figure 6.1) use an arbitrary system that does not take into account the potential difference of effect among individual genes, pathogens or antimicrobial classes and also restricts the ARP categorisation to simplified discrete descriptive values (high, low etc.). It is therefore crucial that future work on the framework must formalise ARP categorisation, possibly by incorporating the continuous values generated from quantitative experimental analyses of the framework into the ARP categorisation and further exploring the differential impacts of the qualitative analysis of contributory data. To elaborate further on the value of assigning significance to ARP categorisation, it is currently possible that, despite a high ARP, an effluent may actually have a relatively low risk of contributing to the emergence of antimicrobial resistance in the environment or to the transmission of antimicrobial resistance to humans. This is due to the fact that the categorisation system described here looks at each analysis result as a whole, for example the total ARG load of the effluent, and does not take into account if an individual ARG is driving this, or whether the highly abundant ARGs are frequently implicated in clinical cases of antimicrobial resistance. This is demonstrated by the results described in Chapter 4, where particular classes of ARGs had high ARG abundance but did not correlate well with ARG transcript abundance or antimicrobial usage (e.g. sulfonamide ARGs in the hospital effluent samples). It is also clear that other factors may have a varied effect depending on ARG type. For instance, it was found that tetracycline ARG abundance had a strong positive correlation with tetracycline ARG transcript abundance and it was also found that tetracycline ARG transcript abundance in turn had a strong positive correlation with tetracycline usage. However, despite a strong positive correlation between beta lactam ARG abundance and ARG transcript abundance, very poor correlation was observed between beta lactam ARG transcript abundance and beta lactam usage. This illustrates the point that despite an overall trend such as total ARG abundance correlating to ARG transcript abundance, there are likely to be differential key forces driving this trend. These key forces are most likely at the level of an antimicrobial class, ARG mechanism or even individual ARGs. The differential impact discussed here in relation to antimicrobial class is equally applicable to the great diversity of MGEs (e.g. different plasmid Incompatibility groups) and pathogenic bacterial species in a sample, and much further work is needed to fully understand these differential effects. In conclusion, in order to assign significance to ARP categorisation it is likely that it will be necessary to look further than just the analysis scores (e.g. ARG load or pathogen load) and 120 give appropriate weighting to specific classes, types or individual components (e.g. a high abundance of tetracycline ARGs may result in a more severe ARP than an equivalent abundance of sulfonamides). However, an argument against the categorisation of ARP through weighting each component is that there may not be enough information available to be able to determine the effect of a particular component on the overall ARP. In light of this, a viable alternative would be to create a selection of sentinel ARGs and antimicrobial resistant bacteria that could be tested for within effluents and used to determine the ARP category, as suggested recently by Berendonk et al. (2015) in their opinion piece on tackling antimicrobial resistance (Berendonk et al., 2015). This approach would rely on the creation of a set of maximum admissible levels for the sentinel ARGs and bacteria. However, although offering considerable advantages, this approach would require a large amount of validation work to determine suitable sentinel candidates before it could be implemented. 6.2.2.2. Understanding ARG dynamics Following on from the previous points that it is necessary to go beyond ARG load as a whole and weight individual ARGs according to their own ARP, it is vital to the development of the framework that a greater understanding of ARG dynamics is achieved, particularly within the environment. A good first step towards this goal within the scientific community has been to view ARGs as emerging environmental contaminants in studies that examine the occurrence of ARGs in different environments (Pruden et al., 2006, Gillings, 2013). This has allowed for greater emphasis on the importance of the environment in the emergence of antimicrobial resistance, resulting in a great volume of studies in recent years that document the widespread dissemination of ARGs in the environment from human and animal sources (Berglund et al., 2015, Bengtsson-Palme et al., 2014). Consequently, environmental ARG research now incorporates additional factors that may contribute to the dissemination and persistence of ARGs that have been released into the environment, such as heavy metals and antibiotics (Lu et al., 2015, Khan et al., 2013). A key area of research concerning ARG dynamics, highlighted by the work in this dissertation and the current body of literature as requiring further understanding, is the effect of clinically used antimicrobials on ARGs in the environment. The work in this dissertation has shown that clinically used antimicrobials are present in effluents that also contain a high load of ARGs, MGEs and bacterial species, and it has also shown that antimicrobial usage at the 121 effluent source influences the occurrence of ARGs within the effluent. However, the work described in this dissertation has not determined if the presence (or concentration) of antimicrobial compounds in effluents plays a direct role on the abundance of specific ARGs (or their transcripts). Studies have shown that even sub-inhibitory concentrations of antimicrobials can influence bacterial behaviours in the environment (Bruchmann et al., 2013) and another recent study into environmental antimicrobial concentrations has shown that sub-inhibitory concentrations of antimicrobials can select for resistant bacteria (Gullberg et al., 2011). The fact that hospital effluents contain detectable levels of antimicrobial compounds that are in very high use at the effluent source may result in increased abundance of ARGs, as was observed in a study of pharmaceutical manufacturer effluents that found high concentrations of antimicrobials impacting ARG levels in the aquatic environment (Larsson et al., 2007, Kristiansson et al., 2011). With an improved understanding of the effect of clinically used antimicrobials on bacteria in the environment, the contribution of the presence of antimicrobial compounds in the environment to the dissemination of ARGs and the emergence of antimicrobial resistance can be assessed. Additional factors that may affect ARG dynamics include the genetic context of ARGs (e.g. ARGs located within MGEs or downstream of constitutively-active promoters) and the impact of non-antimicrobial selection pressures. The work presented in Chapter 4 describes the possible effect of antimicrobial selection pressures on ARG abundance, however non- antimicrobial selection pressures may also impact ARG abundance. For example, the presence of heavy metal contamination in the environment may result in the increased abundance of ARGs due to the co-selection that can occur if ARGs exist on the same MGE as heavy metal resistance genes (Baker-Austin et al., 2006, Knapp et al., 2011). It may also be a possibility that ARGs may be located on MGEs with varying horizontal transfer potential, with certain MGEs responding to specific environmental stimuli with an increase in abundance and relative gene transfer potential (Wright et al., 2008). Further work is required into the effects of genetic linkage and co-selection on the abundance of ARGs and the role that these might play in their dissemination before the importance of such influences can be fully understood. A final note on the areas of ARG dynamics to be addressed concerns the interplay between the environmental resistome and that found in the clinical setting. Specifically, which ARGs are present in both the environment and in pathogenic bacteria that are found in the clinical setting? Additionally, can these core ARGs be used as sentinel ARGs and form part of antimicrobial resistance risk assessments? There are a plethora of studies documenting 122 antimicrobial resistant pathogenic bacteria isolated from clinical samples and there are also numerous studies documenting the environmental release of similar bacteria from human and animal sources (Schwartz et al., 2003, Caplin et al., 2008, Fuentefria et al., 2011, Harris et al., 2012, Li et al., 2015a). Studies are now increasingly documenting antimicrobial resistant bacteria that are also present in the wider environment. Environments such as soil have been found to contain environmental bacteria that can subsist on clinically relevant antibiotics, some of these resistant bacteria being closely related to human pathogens (Dantas et al., 2008). However, a recent study of soil bacteria has shown that the transfer of ARGs between environmental bacteria may not occur as readily as between pathogenic bacteria and that the composition of bacterial communities may play a greater role in ARG load than compared to horizontal gene transfer (Forsberg et al., 2014). Despite this, it is conceivable that a common core set of ARGs (and also pathogens and MGEs) exists in both the environment and the clinic setting and that these ARGs are likely to be able to be transferred between taxonomic groups. Indeed the work reported in this dissertation has demonstrated that there is definite overlap between the ARGs and pathogens identified in numerous different environmental samples. If it is the case that a core set of ARGs exist, it would be extremely beneficial to the monitoring of ARG dissemination if these core ARGs were incorporated into environmental frameworks as a set of sentinel ARGs. For example, the wide range of ARGs identified within the human gut microbiota may contain good candidates for sentinel ARGs (Hu et al., 2013); which could be cross-referenced to those identified in environmental studies and used to create a set of sentinel ARGs that are likely to pose a risk to the emergence of resistance. 6.2.2.3. Incorporating additional factors and technologies into the framework There are several key aspects of the environmental framework for monitoring ARG dissemination in the environment that will require further development in future iterations of the framework. Firstly, the sequencing and bioinformatics technologies utilised in the framework are advancing rapidly, resulting in faster and more sophisticated analysis techniques. For example, single molecule sequencing technologies are being considered for isolate diagnostic testing and ARG detection due to the relatively high speed and low cost, although they are currently unsuitable for detecting SNP based resistances (Judge et al., 2015). The arrival of new technologies will need to be continually assessed for their 123 relevance to ARG monitoring and incorporated as necessary. This increase in technological power will also result in an increase in the quantity of data being generated, this too will have to be addressed and appropriate data handling and storage capabilities will be vital. With these changes in mind, the bioinformatic ARG detection tool (SEAR) that was developed in this work was designed to be scalable in terms of its server requirement and the ARG databases it utilises; the initial clustering database is static but can be upgraded whereas the ARG cross-referencing databases are regularly updated and SEAR utilises the current versions every search instance. Building on previous discussion points in this chapter and Chapter 2, future developments of SEAR that would improve the monitoring of ARGs include the inclusion of subtraction databases and regular database curation. Subtraction databases could be used to remove ARGs that do not have clinical importance or to separate ARGs based on their associated MGEs. This would contribute to assigning significance to the ARP of samples. The current drawbacks to the implementation of SEAR are centred on the restriction to only identify horizontally acquired ARGs. The inability to identify SNP based ARGs may have a large impact on the ARP categorisation of samples due to significant causes of antimicrobial resistance being omitted. For instance, the levels of quinolone ARGs and ARG transcripts in the hospital effluent of Chapter 4 do not seem to reflect the high quinolone usage and the presence of quinolone antimicrobials; the fact that a significant source of quinolone resistance is a result of SNP-based gyrase mutations (Jacoby, 2005) that are not detected by SEAR may thus account for this discrepancy. It is possible for SEAR to be upgraded to search for SNP-based ARGs, through the use of variant calling and incorporation of a SNP-based ARG database but this would require sufficient sequencing depth to ensure that SNPs are genuine and not a result of sequencing errors or poor coverage, which is a goal particularly difficult to achieve within metagenomic samples. SEAR also does not identify novel ARGs, although it can identify genes that are divergent from known ARGs through lower annotation stringency settings but these may not be bona fide ARGs. This raises the point that environmental samples may contain novel ARGs with clinical significance and thus should be monitored. To investigate the occurrence of novel ARGs, techniques such as functional metagenomics could be used to identify novel ARGs that could subsequently be incorporated into existing databases (Perron et al., 2015). It is also hoped that the culture and sensitivity testing proposed in the framework might add to this possibility. Metagenomic techniques could also be augmented with other technologies 124 such as proteomics in order to characterise ARGs and determine if the encoded proteins actually confer phenotypic resistance (Fouhy et al., 2015). 6.2.2.4. Problems remaining Despite the massive advancements that sequencing technologies allow for in the field of ARG dissemination and tracking, there are some questions that are not possible to investigate with these technologies. As exemplified in the work of this dissertation, metatranscriptomics is a powerful technique to identify ARGs that are expressed in the environment. However, metabolically inactive bacteria would not be accounted for by metatranscriptomics and to investigate what role inactive bacteria may have in ARG dissemination, techniques such as the re-activation and culturing of bacterial endospores are appropriate. Thus, a combination of technologies and experimental approaches is required to form a clear idea of what is contributing to ARG dissemination. This is especially true in the use of metagenomics as the sole technology in ARG monitoring. Metagenomics is a field that has received massive advancements in recent years (National Center for Biotechnology Information, 2007, Davenport and Tummler, 2013), moving from 16S amplicon-based community profiling to being able to extract whole genomes from metagenomic sequencing data (Albertsen et al., 2013). However, there are also shortfalls to metagenomics that are yet to be addressed. For instance, metagenomes may underrepresent certain groups of bacteria, such as the under-detection of Firmicutes observed in a recent study that attributes methodological biases to the low retrieval of endospore-forming bacteria (Filippidou et al., 2015). This is an important finding in light of the potential role of endospore-forming bacteria as persistent reservoirs of horizontally mobilisable ARGs in these sample types, as shown in the results reported in Chapter 5. In addition to methodological biases, other metagenomic studies have postulated database limitations as a possible cause for the underrepresentation of endospore-forming bacteria (For example, sporulation genes may be evolutionary distinct and not accounted for, or strategies for long-term cell survival may differ from known endospore pathways) (Kawai et al., 2015). The main advantage of the environmental framework proposed in this dissertation shown in Figure 6.1, as well as its improvement on other proposed environmental frameworks (Port et al., 2014), is that it does not rely on just one technology or experimental technique to assign ARP to samples. 125 Following on from the suggested inclusion of additional technologies in environmental frameworks for monitoring ARG dissemination, another valuable inclusion for the future development of this framework is the use of mathematical modelling to understand antimicrobial resistance (Opatowski et al., 2011). For instance, mathematical modelling can be used to assess the impact of antimicrobial exposure and treatment duration on the excretion level of antimicrobial resistant bacteria (Nguyen et al., 2014b); this research could be used to provide insight into ARG dissemination and be scaled to apply to effluents as a whole. 6.3. Overall conclusions of this work To conclude, this dissertation has established the basis for an environmental framework to monitor ARG dissemination and suggested the means to facilitate the quantification of antimicrobial resistance potential in environmental samples. This work offers an important step toward the routine monitoring of effluents entering the environment and the provision of antimicrobial resistance risk assessments. On a more specific level, this work has contributed a freely available analysis tool for the detection of ARGs to the scientific community. The work has also contributed several high quality linked metagenomes and metatranscriptomes to publically available databases complete with extensive metadata and ARG annotations, thus providing robust data for the scientific community to utilise and contribute to further understanding of microbial communities in the environment. In terms of the scientific findings produced by the work described in this dissertation, it has been shown that ARGs are present in several different effluents that are entering a single aquatic environment. These ARGs are more abundant than in background samples of the aquatic environment and there is apparent seasonal variation in the abundance of ARGs across the multiple sampling points. In addition to ARGs, the effluents have also been shown to contain MGEs and pathogenic bacteria as well as antimicrobial compounds. Furthermore, the ARGs were shown to be expressed in the effluent samples and a possible factor driving this expression in the effluent of a hospital was demonstrated to be the antimicrobial usage of the hospital. Finally, the work also found ARGs to be persisting in the environment as a result of carriage by endospore-forming bacteria. The work described in this dissertation has contributed to the potential for the development of an environmental framework to monitor ARG dissemination in the environment. This 126 framework utilises the experimental techniques employed in the results chapters to determine the antimicrobial resistance potential of effluent samples with a view to contributing to antimicrobial resistance risk assessments. With the current drive to incorporate environmental research findings into applications with clinical benefit, it is hoped that this dissertation provides a useful step toward achieving this goal. 127 References ALBERTSEN, M., HUGENHOLTZ, P., SKARSHEWSKI, A., NIELSEN, K. L., TYSON, G. W. & NIELSEN, P. H. 2013. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nature biotechnology, 31, 533-8. ALEKSHUN, M. N. & LEVY, S. B. 2007. Molecular Mechanisms of Antibacterial Multidrug Resistance. Cell, 128, 1037-1050. ALLEN, H. K., DONATO, J., WANG, H. H., CLOUD-HANSEN, K. A., DAVIES, J. & HANDELSMAN, J. 2010. Call of the wild: antibiotic resistance genes in natural environments. Nature reviews. Microbiology, 8, 251-9. ALTSCHUL, S. F., GISH, W., MILLER, W., MYERS, E. W. & LIPMAN, D. J. 1990. Basic local alignment search tool. Journal of Molecular Biology, 215, 403-410. ASHIRU-OREDOPE, D. & HOPKINS, S. 2013. Antimicrobial stewardship: English Surveillance Programme for Antimicrobial Utilization and Resistance (ESPAUR). J Antimicrob Chemother, 68, 2421-3. AUERBACH, E. A., SEYFRIED, E. E. & MCMAHON, K. D. 2007. Tetracycline resistance genes in activated sludge wastewater treatment plants. Water research, 41, 1143-51. BAKER-AUSTIN, C., WRIGHT, M. S., STEPANAUSKAS, R. & MCARTHUR, J. V. 2006. Co- selection of antibiotic and metal resistance. Trends in microbiology, 14, 176-82. BALCAZAR, J. L. 2014. Bacteriophages as Vehicles for Antibiotic Resistance Genes in the Environment. PLoS Pathogens, 10, e1004219. BANNAM, T. L., YAN, X.-X., HARRISON, P. F., SEEMANN, T., KEYBURN, A. L., STUBENRAUCH, C., WEERAMANTRI, L. H., CHEUNG, J. K., MCCLANE, B. A., BOYCE, J. D., MOORE, R. J. & ROOD, J. I. 2011. Necrotic Enteritis-Derived Clostridium perfringens Strain with Three Closely Related Independently Conjugative Toxin and Antibiotic Resistance Plasmids. mBio, 2, e00190-11. BARAN, R. H. & KO, H. 2008. Detecting Horizontally Transferred and Essential Genes Based on Dinucleotide Relative Abundance. DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes, 15, 267-276. BENGTSSON, J., HARTMANN, M., UNTERSEHER, M., VAISHAMPAYAN, P., ABARENKOV, K., DURSO, L., BIK, E. M., GAREY, J. R., ERIKSSON, K. M. & NILSSON, R. H. 2012. Megraft: a software package to graft ribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in 128 pyrosequencing-length metagenomes and similar environmental datasets. Res Microbiol, 163, 407-12. BENGTSSON-PALME, J., BOULUND, F., FICK, J., KRISTIANSSON, E. & LARSSON, J. 2014. Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5. BENGTSSON-PALME, J., HARTMANN, M., ERIKSSON, K. M., PAL, C., THORELL, K., LARSSON, D. G. & NILSSON, R. H. 2015. Metaxa2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Mol Ecol Resour. BERENDONK, T. U., MANAIA, C. M., MERLIN, C., FATTA-KASSINOS, D., CYTRYN, E., WALSH, F., BURGMANN, H., SORUM, H., NORSTROM, M., PONS, M.-N., KREUZINGER, N., HUOVINEN, P., STEFANI, S., SCHWARTZ, T., KISAND, V., BAQUERO, F. & MARTINEZ, J. L. 2015. Tackling antibiotic resistance: the environmental framework. Nat Rev Micro, 13, 310-317. BERGLUND, B., FICK, J. & LINDGREN, P. E. 2015. Urban wastewater effluent increases antibiotic resistance gene concentrations in a receiving northern European river. Environ Toxicol Chem, 34, 192-6. BROUWER, M. S. M., ROBERTS, A. P., HUSSAIN, H., WILLIAMS, R. J., ALLAN, E. & MULLANY, P. 2013. Horizontal gene transfer converts non-toxigenic Clostridium difficile strains into toxin producers. Nat Commun, 4. BROUWER, M. S. M., WARBURTON, P. J., ROBERTS, A. P., MULLANY, P. & ALLAN, E. 2011. Genetic Organisation, Mobility and Predicted Functions of Genes on Integrated, Mobile Genetic Elements in Sequenced Strains of Clostridium difficile. PLoS ONE, 6, e23014. BRUCHMANN, J., KIRCHEN, S. & SCHWARTZ, T. 2013. Sub-inhibitory concentrations of antibiotics and wastewater influencing biofilm formation and gene expression of multi- resistant Pseudomonas aeruginosa wastewater isolates. Environ Sci Pollut Res Int, 20, 3539- 49. BRUL, S., BASSETT, J., COOK, P., KATHARIOU, S., MCCLURE, P., JASTI, P. R. & BETTS, R. 2012. ‘Omics’ technologies in quantitative microbial risk assessment. Trends in Food Science & Technology. CABELLO, F. C., GODFREY, H. P., TOMOVA, A., IVANOVA, L., DOLZ, H., MILLANAO, A. & BUSCHMANN, A. H. 2013. Antimicrobial use in aquaculture re-examined: its relevance to antimicrobial resistance and to animal and human health. Environmental microbiology, 15, 1917-42. 129 CANICA, M., MANAGEIRO, V., JONES-DIAS, D., CLEMENTE, L., GOMES-NEVES, E., POETA, P., DIAS, E. & FERREIRA, E. 2015. Current perspectives on the dynamics of antibiotic resistance in different reservoirs. Res Microbiol, 166, 594-600. CANO, R. & BORUCKI, M. 1995. Revival and identification of bacterial spores in 25- to 40- million-year-old Dominican amber. Science, 268, 1060-1064. CANTON, R. & COQUE, T. M. 2006. The CTX-M beta-lactamase pandemic. Curr Opin Microbiol, 9, 466-75. CAPLIN, J. L., HANLON, G. W. & TAYLOR, H. D. 2008. Presence of vancomycin and ampicillin-resistant Enterococcus faecium of epidemic clonal complex-17 in wastewaters from the south coast of England. Environmental microbiology, 10, 885-92. CAPORASO, J. G., KUCZYNSKI, J., STOMBAUGH, J., BITTINGER, K., BUSHMAN, F. D., COSTELLO, E. K., FIERER, N., PENA, A. G., GOODRICH, J. K., GORDON, J. I., HUTTLEY, G. A., KELLEY, S. T., KNIGHTS, D., KOENIG, J. E., LEY, R. E., LOZUPONE, C. A., MCDONALD, D., MUEGGE, B. D., PIRRUNG, M., REEDER, J., SEVINSKY, J. R., TURNBAUGH, P. J., WALTERS, W. A., WIDMANN, J., YATSUNENKO, T., ZANEVELD, J. & KNIGHT, R. 2010. QIIME allows analysis of high-throughput community sequencing data. Nat Methods, 7, 335-6. CARVER, T. J., RUTHERFORD, K. M., BERRIMAN, M., RAJANDREAM, M. A., BARRELL, B. G. & PARKHILL, J. 2005. ACT: the Artemis Comparison Tool. Bioinformatics, 21, 3422-3. CASE, R. J., BOUCHER, Y., DAHLLÖF, I., HOLMSTRÖM, C., DOOLITTLE, W. F. & KJELLEBERG, S. 2007. Use of 16S rRNA and rpoB Genes as Molecular Markers for Microbial Ecology Studies. Applied and Environmental Microbiology, 73, 278-288. CENTERS FOR DISEASE CONTROL AND PREVENTION. 2013. Antibiotic Resistance Threats in the United States, 2013. CHARPENTIER, X., KAY, E., SCHNEIDER, D. & SHUMAN, H. A. 2011. Antibiotics and UV radiation induce competence for natural transformation in Legionella pneumophila. Journal of bacteriology, 193, 1114-21. CHEN, B., YANG, Y., LIANG, X., YU, K., ZHANG, T. & LI, X. 2013. Metagenomic profiles of antibiotic resistance genes (ARGs) between human impacted estuary and deep ocean sediments. Environ Sci Technol, 47, 12753-60. CLINGENPEEL, S., CLUM, A., SCHWIENTEK, P., RINKE, C. & WOYKE, T. 2015. Reconstructing each cell’s genome within complex microbial communities - dream or reality? Frontiers in Microbiology, 5. 130 COIA, J. E. 2009. What is the role of antimicrobial resistance in the new epidemic of Clostridium difficile? Int J Antimicrob Agents, 33 Suppl 1, S9-12. COLOMER-LLUCH, M., JOFRE, J. & MUNIESA, M. 2011. Antibiotic Resistance Genes in the Bacteriophage DNA Fraction of Environmental Samples. PLoS ONE, 6, e17549. COURVALIN, P. 2006. Vancomycin resistance in gram-positive cocci. Clin Infect Dis, 42 Suppl 1, S25-34. COX, G. & WRIGHT, G. D. 2013. Intrinsic antibiotic resistance: mechanisms, origins, challenges and solutions. Int J Med Microbiol, 303, 287-92. D'COSTA, V. M., MCGRANN, K. M., HUGHES, D. W. & WRIGHT, G. D. 2006. Sampling the antibiotic resistome. Science, 311, 374-7. DANCER, D., BAKER-AUSTIN, C., LOWTHER, J. A., HARTNELL, R. E., LEES, D. N. & ROBERTS, L. O. 2014. Development and Integration of Quantitative Real-Time PCR Methods for Detection of Mitochondrial DNA and Methanobrevibacter smithii nifH Gene as Novel Microbial Source Tracking Tools. Environmental Forensics, 15, 256-264. DANTAS, G., SOMMER, M. O., OLUWASEGUN, R. D. & CHURCH, G. M. 2008. Bacteria subsisting on antibiotics. Science, 320, 100-3. DAVENPORT, C. F. & TUMMLER, B. 2013. Advances in computational analysis of metagenome sequences. Environmental microbiology, 15, 1-5. DAVIES, J. & DAVIES, D. 2010a. Origins and evolution of antibiotic resistance. Microbiology and molecular biology reviews : MMBR, 74, 417-33. DAVIES, J. & DAVIES, D. 2010b. Origins and evolution of antibiotic resistance. Microbiology and molecular biology reviews, 74, 417-433. DEBAST, S. B., VAN LEENGOED, L. A., GOORHUIS, A., HARMANUS, C., KUIJPER, E. J. & BERGWERFF, A. A. 2009. Clostridium difficile PCR ribotype 078 toxinotype V found in diarrhoeal pigs identical to isolates from affected humans. Environ Microbiol, 11, 505-11. DECOUSSER, J. W., POIREL, L. & NORDMANN, P. 2001. Characterization of a chromosomally encoded extended-spectrum class A beta-lactamase from Kluyvera cryocrescens. Antimicrob Agents Chemother, 45, 3595-8. DEGNAN, P. H. & OCHMAN, H. 2012. Illumina-based analysis of microbial community diversity. ISME J, 6, 183-194. DEVARAJAN, N., LAFFITE, A., GRAHAM, N. D., MEIJER, M., PRABAKAR, K., MUBEDI, J. I., ELONGO, V., MPIANA, P. T., IBELINGS, B. W., WILDI, W. & POTE, J. 2015. Accumulation of clinically relevant antibiotic-resistance genes, bacterial load, and metals in freshwater lake sediments in central europe. Environ Sci Technol, 49, 6528-37. 131 DIEKEMA, D. J. & PFALLER, M. A. 2013. Rapid Detection of Antibiotic-Resistant Organism Carriage for Infection Prevention. Clinical Infectious Diseases, 56, 1614-1620. DOLLIVER, H. & GUPTA, S. 2008. Antibiotic Losses in Leaching and Surface Runoff from Manure-Amended Agricultural Land All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. J. Environ. Qual., 37, 1227-1237. EDGAR, R. C. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460-1. ELIOPOULOS, G. M., COSGROVE, S. E. & CARMELI, Y. 2003. The Impact of Antimicrobial Resistance on Health and Economic Outcomes. Clinical Infectious Diseases, 36, 1433-1437. ESCUDERO, J. A., LOOT, C., NIVINA, A. & MAZEL, D. 2015. The Integron: Adaptation On Demand. Microbiol Spectr, 3, Mdna3-0019-2014. ESPY, M. J., UHL, J. R., SLOAN, L. M., BUCKWALTER, S. P., JONES, M. F., VETTER, E. A., YAO, J. D., WENGENACK, N. L., ROSENBLATT, J. E., COCKERILL, F. R., 3RD & SMITH, T. F. 2006. Real-time PCR in clinical microbiology: applications for routine laboratory testing. Clinical microbiology reviews, 19, 165-256. FILIPPIDOU, S., JUNIER, T., WUNDERLIN, T., LO, C.-C., LI, P.-E., CHAIN, P. S. & JUNIER, P. 2015. Under-detection of endospore-forming Firmicutes in metagenomic data. Computational and Structural Biotechnology Journal, 13, 299-306. FORSBERG, K. J., PATEL, S., GIBSON, M. K., LAUBER, C. L., KNIGHT, R., FIERER, N. & DANTAS, G. 2014. Bacterial phylogeny structures soil resistomes across habitats. Nature, 509, 612-616. FORSBERG, K. J., REYES, A., WANG, B., SELLECK, E. M., SOMMER, M. O. A. & DANTAS, G. 2012. The shared antibiotic resistome of soil bacteria and human pathogens. Science, 337, 1107-1111. FORSLUND, K., SUNAGAWA, S., KULTIMA, J. R., MENDE, D. R., ARUMUGAM, M., TYPAS, A. & BORK, P. 2013. Country-specific antibiotic use practices impact the human gut resistome. Genome Research, 23, 1163-1169. FOUHY, F., STANTON, C., COTTER, P. D., HILL, C. & WALSH, F. 2015. Proteomics as the final step in the functional metagenomics study of antimicrobial resistance. Front Microbiol, 6, 172. FROST, L. S., LEPLAE, R., SUMMERS, A. O. & TOUSSAINT, A. 2005. Mobile genetic elements: the agents of open source evolution. Nat Rev Micro, 3, 722-732. 132 FUENTEFRIA, D. B., FERREIRA, A. E. & COR√S√£O, G. 2011. Antibiotic-resistant Pseudomonas aeruginosa from hospital wastewater and superficial water: Are they genetically related? Journal of environmental management, 92, 250-255. GALIMAND, M., GUIYOULE, A., GERBAUD, G., RASOAMANANA, B., CHANTEAU, S., CARNIEL, E. & COURVALIN, P. 1997. Multidrug resistance in Yersinia pestis mediated by a transferable plasmid. N Engl J Med, 337, 677-80. GALPERIN, M. Y. 2013. Genome Diversity of Spore-Forming Firmicutes. Microbiology spectrum, 1, TBS-0015-2012. GASC, C., RIBIERE, C., PARISOT, N., BEUGNOT, R., DEFOIS, C., PETIT-BIDERRE, C., BOUCHER, D., PEYRETAILLADE, E. & PEYRET, P. 2015. Capturing prokaryotic dark matter genomes. Res Microbiol. GAZE, W. H., KRONE, S. M., LARSSON, J. D. G., LI, X.-Z., ROBINSON, J. A., SIMONET, P., SMALLA, K., TIMINOUNI, M., TOPP, E., WELLINGTON, E. M. H., WRIGHT, G. D. & ZHU, Y.-G. 2013. Influence of Humans on Evolution and Mobilization of Environmental Antibiotic Resistome. Emerging Infectious Disease journal, 19. GENOME REFERENCE CONSORTIUM 2009. hg19. In: GENOME REFERENCE CONSORTIUM (ed.). UCSC. GHACHI, M. E., DERBISE, A., BOUHSS, A. & MENGIN-LECREULX, D. 2005. Identification of Multiple Genes Encoding Membrane Proteins with Undecaprenyl Pyrophosphate Phosphatase (UppP) Activity in Escherichia coli. Journal of Biological Chemistry, 280, 18689- 18695. GILLESPIE, J. J., WATTAM, A. R., CAMMER, S. A., GABBARD, J. L., SHUKLA, M. P., DALAY, O., DRISCOLL, T., HIX, D., MANE, S. P., MAO, C., NORDBERG, E. K., SCOTT, M., SCHULMAN, J. R., SNYDER, E. E., SULLIVAN, D. E., WANG, C., WARREN, A., WILLIAMS, K. P., XUE, T., SEUNG YOO, H., ZHANG, C., ZHANG, Y., WILL, R., KENYON, R. W. & SOBRAL, B. W. 2011. PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species. Infection and Immunity, 79, 4286-4298. GILLINGS, M. R. 2013. Evolutionary consequences of antibiotic use for the resistome, mobilome and microbial pangenome. Frontiers in Microbiology, 4. GOLET, E. M., ALDER, A. C., HARTMANN, A., TERNES, T. A. & GIGER, W. 2001. Trace Determination of Fluoroquinolone Antibacterial Agents in Urban Wastewater by Solid-Phase Extraction and Liquid Chromatography with Fluorescence Detection. Analytical Chemistry, 73, 3632-3638. 133 GOULD, L. H. & LIMBAGO, B. 2010. Clostridium difficile in food and domestic animals: a new foodborne pathogen? Clin Infect Dis, 51, 577-82. GUARDABASSI, L. & AGERSO, Y. 2006. Genes homologous to glycopeptide resistance vanA are widespread in soil microbial communities. FEMS microbiology letters, 259, 221-5. GULLBERG, E., CAO, S., BERG, O. G., ILBÄCK, C., SANDEGREN, L., HUGHES, D. & ANDERSSON, D. I. 2011. Selection of resistant bacteria at very low antibiotic concentrations. PLoS Pathog, 7, e1002158. GUPTA, S. K., PADMANABHAN, B. R., DIENE, S. M., LOPEZ-ROJAS, R., KEMPF, M., LANDRAUD, L. & ROLAIN, J.-M. 2014. ARG-ANNOT, a New Bioinformatic Tool To Discover Antibiotic Resistance Genes in Bacterial Genomes. Antimicrobial Agents and Chemotherapy, 58, 212-220. HAMMITT, M. C., BUESCHEL, D. M., KEEL, M. K., GLOCK, R. D., CUNEO, P., DEYOUNG, D. W., REGGIARDO, C., TRINH, H. T. & SONGER, J. G. 2008. A possible role for Clostridium difficile in the etiology of calf enteritis. Vet Microbiol, 127, 343-52. HARRIS, S., CORMICAN, M. & CUMMINS, E. 2012. The effect of conventional wastewater treatment on the levels of antimicrobial-resistant bacteria in effluent: a meta-analysis of current studies. Environmental geochemistry and health, 34, 749-62. HARRISON, E. M., PATERSON, G. K., HOLDEN, M. T. G., LARSEN, J., STEGGER, M., LARSEN, A. R., PETERSEN, A., SKOV, R. L., CHRISTENSEN, J. M., BAK ZEUTHEN, A., HELTBERG, O., HARRIS, S. R., ZADOKS, R. N., PARKHILL, J., PEACOCK, S. J. & HOLMES, M. A. 2013. Whole genome sequencing identifies zoonotic transmission of MRSA isolates with the novel mecA homologue mecC. EMBO molecular medicine, 5, 509-515. HE, M., MIYAJIMA, F., ROBERTS, P., ELLISON, L., PICKARD, D. J., MARTIN, M. J., CONNOR, T. R., HARRIS, S. R., FAIRLEY, D., BAMFORD, K. B., D'ARC, S., BRAZIER, J., BROWN, D., COIA, J. E., DOUCE, G., GERDING, D., KIM, H. J., KOH, T. H., KATO, H., SENOH, M., LOUIE, T., MICHELL, S., BUTT, E., PEACOCK, S. J., BROWN, N. M., RILEY, T., SONGER, G., WILCOX, M., PIRMOHAMED, M., KUIJPER, E., HAWKEY, P., WREN, B. W., DOUGAN, G., PARKHILL, J. & LAWLEY, T. D. 2013. Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat Genet, 45, 109-13. HEIMAN, K. E., KARLSSON, M., GRASS, J., HOWIE, B., KIRKCALDY, R. D., MAHON, B., BROOKS, J. T. & BOWEN, A. 2014. Shigella with Decreased Susceptibility to Azithromycin Among Men Who Have Sex with Men — United States, 2002–2013. Morbidity and Mortality Weekly Report - CDC, 63, 132. 134 HENSON, K. E., LEVINE, M. T., WONG, E. A. & LEVINE, D. P. 2015. Glycopeptide antibiotics: evolving resistance, pharmacology and adverse event profile. Expert Rev Anti Infect Ther, 13, 1265-78. HEUER, H., KROGERRECKLENFORT, E., WELLINGTON, E. M. H., EGAN, S., ELSAS, J. D., OVERBEEK, L., COLLARD, J.-M., GUILLAUME, G., KARAGOUNI, A. D., NIKOLAKOPOULOU, T. L. & SMALLA, K. 2002. Gentamicin resistance genes in environmental bacteria: Prevalence and transfer. Microbiology Ecology, 42, 289-302. HIRSCH, R., TERNES, T., HABERER, K. & KRATZ, K.-L. 1999. Occurrence of antibiotics in the aquatic environment. Science of The Total Environment, 225, 109-118. HOLT, K. E., BAKER, S., WEILL, F. X., HOLMES, E. C., KITCHEN, A., YU, J., SANGAL, V., BROWN, D. J., COIA, J. E., KIM, D. W., CHOI, S. Y., KIM, S. H., DA SILVEIRA, W. D., PICKARD, D. J., FARRAR, J. J., PARKHILL, J., DOUGAN, G. & THOMSON, N. R. 2012. Shigella sonnei genome sequencing and phylogenetic analysis indicate recent global dissemination from Europe. Nature genetics, 44, 1056-9. HOOPER, D. C. & JACOBY, G. A. 2015. Mechanisms of drug resistance: quinolone resistance. Ann N Y Acad Sci. HORNECK, G., MOELLER, R., CADET, J., DOUKI, T., MANCINELLI, R. L., NICHOLSON, W. L., PANITZ, C., RABBOW, E., RETTBERG, P., SPRY, A., STACKEBRANDT, E., VAISHAMPAYAN, P. & VENKATESWARAN, K. J. 2012. Resistance of Bacterial Endospores to Outer Space for Planetary Protection Purposes—Experiment PROTECT of the EXPOSE-E Mission. Astrobiology, 12, 445-456. HU, X., ZHOU, Q. & LUO, Y. 2010. Occurrence and source analysis of typical veterinary antibiotics in manure, soil, vegetables and groundwater from organic vegetable bases, northern China. Environmental pollution, 158, 2992-8. HU, Y., YANG, X., QIN, J., LU, N., CHENG, G., WU, N., PAN, Y., LI, J., ZHU, L., WANG, X., MENG, Z., ZHAO, F., LIU, D., MA, J., QIN, N., XIANG, C., XIAO, Y., LI, L., YANG, H., WANG, J., YANG, R., GAO, G. F. & ZHU, B. 2013. Metagenome-wide analysis of antibiotic resistance genes in a large cohort of human gut microbiota. Nature communications, 4, 2151. HUANG, H., WEINTRAUB, A., FANG, H. & NORD, C. E. 2009. Antimicrobial resistance in Clostridium difficile. International Journal of Antimicrobial Agents, 34, 516-522. HUGHES, V. M. & DATTA, N. 1983. Conjugative plasmids in bacteria of the 'pre-antibiotic' era. Nature, 302, 725-6. HUMAN MICROBIOME PROJECT 2012. Structure, function and diversity of the healthy human microbiome. Nature, 486, 207-14. 135 HUSON, D. H. & SCORNAVACCA, C. 2012. Dendroscope 3: An Interactive Tool for Rooted Phylogenetic Trees and Networks. Systematic Biology. ILLUMINA. 2013. Illumina MySeq Benchtop Sequencer [Online]. Available: http://www.illumina.com. INOUYE, M., DASHNOW, H., RAVEN, L.-A., SCHULTZ, M., POPE, B., TOMITA, T., ZOBEL, J. & HOLT, K. 2014. SRST2: Rapid genomic surveillance for public health and hospital microbiology labs. Genome Medicine, 6, 90. JACOBY, G. A. 2005. Mechanisms of Resistance to Quinolones. Clinical Infectious Diseases, 41, S120-S126. JHUNG, M. A., THOMPSON, A. D., KILLGORE, G. E., ZUKOWSKI, W. E., SONGER, G., WARNY, M., JOHNSON, S., GERDING, D. N., MCDONALD, L. C. & LIMBAGO, B. M. 2008. Toxinotype V Clostridium difficile in humans and food animals. Emerg Infect Dis, 14, 1039- 45. JOHNSON, S. & GERDING, D. N. 1998. Clostridium difficile-Associated Diarrhea. Clin Infect Dis., 26, 1027-1034. JØRGENSEN, S. E. & HALLING-SØRENSEN, B. 2000. Drugs in the environment. Chemosphere, 40, 691-699. JUDGE, K., HARRIS, S. R., REUTER, S., PARKHILL, J. & PEACOCK, S. J. 2015. Early insights into the potential of the Oxford Nanopore MinION for the detection of antimicrobial resistance genes. J Antimicrob Chemother, 70, 2775-8. KAWAI, M., UCHIYAMA, I., TAKAMI, H. & INAGAKI, F. 2015. Low frequency of endospore- specific genes in subseafloor sedimentary metagenomes. Environmental Microbiology Reports, 7, 341-350. KEESSEN, E. C., HENSGENS, M. P., SPIGAGLIA, P., BARBANTI, F., SANDERS, I. M., KUIJPER, E. J. & LIPMAN, L. J. 2013. Antimicrobial susceptibility profiles of human and piglet Clostridium difficile PCR-ribotype 078. Antimicrob Resist Infect Control, 2, 14. KEMPER, N. 2008. Veterinary antibiotics in the aquatic and terrestrial environment. Ecological Indicators, 8, 1-13. KHAN, G. A., BERGLUND, B., KHAN, K. M., LINDGREN, P. E. & FICK, J. 2013. Occurrence and abundance of antibiotics and resistance genes in rivers, canal and near drug formulation facilities--a study in Pakistan. PLoS One, 8, e62712. KLEYWEGT, S., PILEGGI, V., LAM, Y. M., ELISES, A., PUDDICOMB, A., PURBA, G., DI CARO, J. & FLETCHER, T. 2015. The contribution of pharmaceutically-active compounds 136 from healthcare facilities to a receiving sewage treatment plant in Canada. Environ Toxicol Chem. KNAPP, C. W., MCCLUSKEY, S. N. N. M., SINGH, B. K., CAMPBELL, C. D., HUDSON, G. & GRAHAM, D. W. 2011. Antibiotic Resistance Gene Abundances Correlate with Metal and Geochemical Conditions in Archived Scottish Soils. PloS one, 6, e27300. KNETSCH, C. W., TERVEER, E. M., LAUBER, C., GORBALENYA, A. E., HARMANUS, C., KUIJPER, E. J., CORVER, J. & VAN LEEUWEN, H. C. 2012. Comparative analysis of an expanded Clostridium difficile reference strain collection reveals genetic diversity and evolution through six lineages. Infection, Genetics and Evolution, 12, 1577-1585. KOSER, C. U., HOLDEN, M. T., ELLINGTON, M. J., CARTWRIGHT, E. J., BROWN, N. M., OGILVY-STUART, A. L., HSU, L. Y., CHEWAPREECHA, C., CROUCHER, N. J., HARRIS, S. R., SANDERS, M., ENRIGHT, M. C., DOUGAN, G., BENTLEY, S. D., PARKHILL, J., FRASER, L. J., BETLEY, J. R., SCHULZ-TRIEGLAFF, O. B., SMITH, G. P. & PEACOCK, S. J. 2012. Rapid whole-genome sequencing for investigation of a neonatal MRSA outbreak. The New England journal of medicine, 366, 2267-75. KOUASSI, K. A., DADIE, A. T., N'GUESSAN, K. F., DJE, K. M. & LOUKOU, Y. G. 2014. Clostridium perfringens and Clostridium difficile in cooked beef sold in Côte d'Ivoire and their antimicrobial susceptibility. Anaerobe, 28, 90-94. KRISTIANSSON, E., FICK, J., JANZON, A., GRABIC, R., RUTGERSSON, C., WEIJDEGÅRD, B., SÖDERSTRÖM, H. & LARSSON, D. G. J. 2011. Pyrosequencing of Antibiotic-Contaminated River Sediments Reveals High Levels of Resistance and Gene Transfer Elements. PloS one, 6, e17038. KUCZYNSKI, J., LAUBER, C. L., WALTERS, W. A., PARFREY, L. W., CLEMENTE, J. C., GEVERS, D. & KNIGHT, R. 2012. Experimental and analytical tools for studying the human microbiome. Nat Rev Genet, 13, 47-58. KÜMMERER, K. & HENNINGER, A. 2003. Promoting resistance by the emission of antibiotics from hospitals and households into effluent. Clinical Microbiology and Infection, 9, 1203-1214. LAHLAOUI, H., BEN HAJ KHALIFA, A. & BEN MOUSSA, M. 2014. Epidemiology of Enterobacteriaceae producing CTX-M type extended spectrum beta-lactamase (ESBL). Med Mal Infect, 44, 400-4. LAPARA, T. M., BURCH, T. R., MCNAMARA, P. J., TAN, D. T., YAN, M. & EICHMILLER, J. J. 2011. Tertiary-Treated Municipal Wastewater is a Significant Point Source of Antibiotic 137 Resistance Genes into Duluth-Superior Harbor. Environmental science & technology, 45, 9543-9549. LARSSON, D. G., DE PEDRO, C. & PAXEUS, N. 2007. Effluent from drug manufactures contains extremely high levels of pharmaceuticals. J Hazard Mater, 148, 751-5. LAWLEY, T. D., CLARE, S., WALKER, A. W., GOULDING, D., STABLER, R. A., CROUCHER, N., MASTROENI, P., SCOTT, P., RAISEN, C., MOTTRAM, L., FAIRWEATHER, N. F., WREN, B. W., PARKHILL, J. & DOUGAN, G. 2009. Antibiotic Treatment of Clostridium difficile Carrier Mice Triggers a Supershedder State, Spore- Mediated Transmission, and Severe Disease in Immunocompromised Hosts. Infection and Immunity, 77, 3661-3669. LAWLEY, T. D., CLARE, S., WALKER, A. W., STARES, M. D., CONNOR, T. R., RAISEN, C., GOULDING, D., RAD, R., SCHREIBER, F., BRANDT, C., DEAKIN, L. J., PICKARD, D. J., DUNCAN, S. H., FLINT, H. J., CLARK, T. G., PARKHILL, J. & DOUGAN, G. 2012. Targeted Restoration of the Intestinal Microbiota with a Simple, Defined Bacteriotherapy Resolves Relapsing Clostridium difficile Disease in Mice. PLoS Pathog, 8, e1002995. LAXMINARAYAN, R., DUSE, A., WATTAL, C., ZAIDI, A. K. M., WERTHEIM, H. F. L., SUMPRADIT, N., VLIEGHE, E., HARA, G. L., GOULD, I. M., GOOSSENS, H., GREKO, C., SO, A. D., BIGDELI, M., TOMSON, G., WOODHOUSE, W., OMBAKA, E., PERALTA, A. Q., QAMAR, F. N., MIR, F., KARIUKI, S., BHUTTA, Z. A., COATES, A., BERGSTROM, R., WRIGHT, G. D., BROWN, E. D. & CARS, O. 2013. Antibiotic resistance—the need for global solutions. The Lancet Infectious Diseases, 13, 1057-1098. LEVY, S. B. & MARSHALL, B. 2004. Antibacterial resistance worldwide: causes, challenges and responses. Nature medicine, 10, S122-9. LEWIN, A., JOHANSEN, J., WENTZEL, A., KOTLAR, H. K., DRABLOS, F. & VALLA, S. 2013. The microbial communities in two apparently physically separated deep subsurface oil reservoirs show extensive DNA sequence similarities. Environmental microbiology. LI, D., YANG, M., HU, J., REN, L., ZHANG, Y. & LI, K. 2008. Determination and fate of oxytetracycline and related compounds in oxytetracycline production wastewater and the receiving river. Environ Toxicol Chem, 27, 80-6. LI, H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27, 2987-93. LI, H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA- MEM. arXiv. 138 LI, H. & DURBIN, R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-60. LI, J., ADAMS, V., BANNAM, T. L., MIYAMOTO, K., GARCIA, J. P., UZAL, F. A., ROOD, J. I. & MCCLANE, B. A. 2013. Toxin Plasmids of Clostridium perfringens. Microbiology and Molecular Biology Reviews : MMBR, 77, 208-233. LI, J., CHENG, W., XU, L., STRONG, P. J. & CHEN, H. 2015a. Antibiotic-resistant genes and antibiotic-resistant bacteria in the effluent of urban residential areas, hospitals, and a municipal wastewater treatment plant system. Environmental Science and Pollution Research, 22, 4587-4596. LI, J., WANG, T., SHAO, B., SHEN, J., WANG, S. & WU, Y. 2012. Plasmid-mediated quinolone resistance genes and antibiotic residues in wastewater and soil adjacent to swine feedlots: potential transfer to agricultural lands. Environmental health perspectives, 120, 1144-9. LI, X. Z., PLESIAT, P. & NIKAIDO, H. 2015b. The challenge of efflux-mediated antibiotic resistance in Gram-negative bacteria. Clin Microbiol Rev, 28, 337-418. LINARES, J. F., GUSTAFSSON, I., BAQUERO, F. & MARTINEZ, J. L. 2006. Antibiotics as intermicrobial signaling agents instead of weapons. Proceedings of the National Academy of Sciences of the United States of America, 103, 19484-9. LIU, B. & POP, M. 2009. ARDB‚ Antibiotic Resistance Genes Database. Nucleic Acids Research, 37, D443-D447. LU, Z., NA, G., GAO, H., WANG, L., BAO, C. & YAO, Z. 2015. Fate of sulfonamide resistance genes in estuary environment and effect of anthropogenic activities. Sci Total Environ, 527-528C, 429-438. LUO, Y., MAO, D., RYSZ, M., ZHOU, Q., ZHANG, H., XU, L. & J. J. ALVAREZ, P. 2010. Trends in Antibiotic Resistance Genes Occurrence in the Haihe River, China. Environmental science & technology, 44, 7220-7225. LYRAS, D., O/'CONNOR, J. R., HOWARTH, P. M., SAMBOL, S. P., CARTER, G. P., PHUMOONNA, T., POON, R., ADAMS, V., VEDANTAM, G., JOHNSON, S., GERDING, D. N. & ROOD, J. I. 2009. Toxin B is essential for virulence of Clostridium difficile. Nature, 458, 1176-1179. MARSHALL, C. G. & WRIGHT, G. D. 1997. The glycopeptide antibiotic producer Streptomyces toyocaensis NRRL 15009 has both D-alanyl-D-alanine and D-alanyl-D-lactate ligases. FEMS Microbiol Lett, 157, 295-9. 139 MARSHALL, C. G., ZOLLI, M. & WRIGHT, G. D. 1999. Molecular mechanism of VanHst, an alpha-ketoacid dehydrogenase required for glycopeptide antibiotic resistance from a glycopeptide producing organism. Biochemistry, 38, 8485-91. MARTINEZ, J. L. 2009. Environmental pollution by antibiotics and by antibiotic resistance determinants. Environmental pollution, 157, 2893-902. MASON, O. U., HAZEN, T. C., BORGLIN, S., CHAIN, P. S., DUBINSKY, E. A., FORTNEY, J. L., HAN, J., HOLMAN, H. Y., HULTMAN, J., LAMENDELLA, R., MACKELPRANG, R., MALFATTI, S., TOM, L. M., TRINGE, S. G., WOYKE, T., ZHOU, J., RUBIN, E. M. & JANSSON, J. K. 2012. Metagenome, metatranscriptome and single-cell sequencing reveal microbial response to Deepwater Horizon oil spill. The ISME journal, 6, 1715-27. MATHER, A. E., REID, S. W. J., MASKELL, D. J., PARKHILL, J., FOOKES, M. C., HARRIS, S. R., BROWN, D. J., COIA, J. E., MULVEY, M. R., GILMOUR, M. W., PETROVSKA, L., DE PINNA, E., KURODA, M., AKIBA, M., IZUMIYA, H., CONNOR, T. R., SUCHARD, M. A., LEMEY, P., MELLOR, D. J., HAYDON, D. T. & THOMSON, N. R. 2013. Distinguishable Epidemics of Multidrug-Resistant Salmonella Typhimurium DT104 in Different Hosts. Science, 341, 1514-1517. MAZEL, D. 2006. Integrons: agents of bacterial evolution. Nature reviews. Microbiology, 4, 608-20. MCARTHUR, A. G., WAGLECHNER, N., NIZAM, F., YAN, A., AZAD, M. A., BAYLAY, A. J., BHULLAR, K., CANOVA, M. J., DE PASCALE, G., EJIM, L., KALAN, L., KING, A. M., KOTEVA, K., MORAR, M., MULVEY, M. R., O'BRIEN, J. S., PAWLOWSKI, A. C., PIDDOCK, L. J., SPANOGIANNOPOULOS, P., SUTHERLAND, A. D., TANG, I., TAYLOR, P. L., THAKER, M., WANG, W., YAN, M., YU, T. & WRIGHT, G. D. 2013. The comprehensive antibiotic resistance database. Antimicrobial agents and chemotherapy, 57, 3348-57. MCENEFF, G., BARRON, L., KELLEHER, B., PAULL, B. & QUINN, B. 2014. A year-long study of the spatial occurrence and relative distribution of pharmaceutical residues in sewage effluent, receiving marine waters and marine bivalves. Sci Total Environ, 476-477, 317-26. MCMANUS, P. S., STOCKWELL, V. O., SUNDIN, G. W. & JONES, A. L. 2002. Antibiotic use in plant agriculture. Annual review of phytopathology, 40, 443-65. MEYER, F., PAARMANN, D., D'SOUZA, M., OLSON, R., GLASS, E. M., KUBAL, M., PACZIAN, T., RODRIGUEZ, A., STEVENS, R., WILKE, A., WILKENING, J. & EDWARDS, R. A. 2008. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC bioinformatics, 9, 386. 140 MILLER, R., MONTOYA, V., GARDY, J., PATRICK, D. & TANG, P. 2013. Metagenomics for pathogen detection in public health. Genome Medicine, 5, 81. MORTAZAVI, A., WILLIAMS, B. A., MCCUE, K., SCHAEFFER, L. & WOLD, B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods, 5, 621-8. MOURA, A., SOARES, M., PEREIRA, C., LEITAO, N., HENRIQUES, I. & CORREIA, A. N. 2009. INTEGRALL: a database and search engine for integrons, integrases and gene cassettes. Bioinformatics, 25, 1096-1098. MULLANY, P. 2014. Functional metagenomics for the investigation of antibiotic resistance. Virulence, 5, 443-7. MYERS, G. S. A., RASKO, D. A., CHEUNG, J. K., RAVEL, J., SESHADRI, R., DEBOY, R. T., REN, Q., VARGA, J., AWAD, M. M., BRINKAC, L. M., DAUGHERTY, S. C., HAFT, D. H., DODSON, R. J., MADUPU, R., NELSON, W. C., ROSOVITZ, M. J., SULLIVAN, S. A., KHOURI, H., DIMITROV, G. I., WATKINS, K. L., MULLIGAN, S., BENTON, J., RADUNE, D., FISHER, D. J., ATKINS, H. S., HISCOX, T., JOST, B. H., BILLINGTON, S. J., SONGER, J. G., MCCLANE, B. A., TITBALL, R. W., ROOD, J. I., MELVILLE, S. B. & PAULSEN, I. T. 2006. Skewed genomic variability in strains of the toxigenic bacterial pathogen, Clostridium perfringens. Genome Research, 16, 1031-1040. NACCACHE, S. N., FEDERMAN, S., VEERARAGHAVAN, N., ZAHARIA, M., LEE, D., SAMAYOA, E., BOUQUET, J., GRENINGER, A. L., LUK, K.-C., ENGE, B., WADFORD, D. A., MESSENGER, S. L., GENRICH, G. L., PELLEGRINO, K., GRARD, G., LEROY, E., SCHNEIDER, B. S., FAIR, J. N., MARTÍNEZ, M. A., ISA, P., CRUMP, J. A., DERISI, J. L., SITTLER, T., HACKETT, J., MILLER, S. & CHIU, C. Y. 2014. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Research, 24, 1180-1192. NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION 2007. The New Science of Metagenomics. NCBI. NGUYEN, F., STAROSTA, A. L., ARENZ, S., SOHMEN, D., DONHOFER, A. & WILSON, D. N. 2014a. Tetracycline antibiotics and resistance mechanisms. Biol Chem, 395, 559-75. NGUYEN, T. T., GUEDJ, J., CHACHATY, E., DE GUNZBURG, J., ANDREMONT, A. & MENTRÉ, F. 2014b. Mathematical Modeling of Bacterial Kinetics to Predict the Impact of Antibiotic Colonic Exposure and Treatment Duration on the Amount of Resistant Enterobacteria Excreted. PLoS Comput Biol, 10, e1003840. O'BRIEN, F. G., YUI ETO, K., MURPHY, RILEY J. T., FAIRHURST, HEATHER M., COOMBS, G. W., GRUBB, W. B. & RAMSAY, J. P. 2015. Origin-of-transfer sequences 141 facilitate mobilisation of non-conjugative antimicrobial-resistance plasmids in Staphylococcus aureus. Nucleic Acids Research. OH, S., TANDUKAR, M., PAVLOSTATHIS, S. G., CHAIN, P. S. & KONSTANTINIDIS, K. T. 2013. Microbial community adaptation to quaternary ammonium biocides as revealed by metagenomics. Environmental microbiology. OLSON, A. B., SILVERMAN, M., BOYD, D. A., MCGEER, A., WILLEY, B. M., PONG- PORTER, V., DANEMAN, N. & MULVEY, M. R. 2005. Identification of a progenitor of the CTX-M-9 group of extended-spectrum beta-lactamases from Kluyvera georgiana isolated in Guyana. Antimicrob Agents Chemother, 49, 2112-5. OPATOWSKI, L., GUILLEMOT, D., BOELLE, P. Y. & TEMIME, L. 2011. Contribution of mathematical modeling to the fight against bacterial antibiotic resistance. Curr Opin Infect Dis, 24, 279-87. PAGE, A. J., CUMMINS, C. A., HUNT, M., WONG, V. K., REUTER, S., HOLDEN, M. T. G., FOOKES, M., FALUSH, D., KEANE, J. A. & PARKHILL, J. 2015. Roary: Rapid large-scale prokaryote pan genome analysis. Bioinformatics. PAN, J. C., YE, R., WANG, H. Q., XIANG, H. Q., ZHANG, W., YU, X. F., MENG, D. M. & HE, Z. S. 2008. Vibrio cholerae O139 multiple-drug resistance mediated by Yersinia pestis pIP1202-like conjugative plasmids. Antimicrob Agents Chemother, 52, 3829-36. PERRON, G. G., WHYTE, L., TURNBAUGH, P. J., GOORDIAL, J., HANAGE, W. P., DANTAS, G. & DESAI, M. M. 2015. Functional characterization of bacteria isolated from ancient arctic soil exposes diverse resistance mechanisms to modern antibiotics. PLoS One, 10, e0069533. PHILLIPS, I., CASEWELL, M., COX, T., DE GROOT, B., FRIIS, C., JONES, R., NIGHTINGALE, C., PRESTON, R. & WADDELL, J. 2004. Does the use of antibiotics in food animals pose a risk to human health? A critical review of published data. The Journal of antimicrobial chemotherapy, 53, 28-52. PONSTINGL, H. 2014. SMALT. http://www.sanger.ac.uk/resources/software/smalt/ Sanger Institute, WelcomeTrust. PORT, J. A., CULLEN, A. C., WALLACE, J. C., SMITH, M. N. & FAUSTMAN, E. M. 2014. Metagenomic Frameworks for Monitoring Antibiotic Resistance in Aquatic Environments. Environmental Health Perspectives, 122, 222-228. PRUDEN, A., PEI, R., STORTEBOOM, H. & CARLSON, K. H. 2006. Antibiotic resistance genes as emerging contaminants: Studies in northern Colorado. Environmental Science and Technology, 40, 7445-7450. 142 PUBLIC HEALTH ENGLAND. 2013. Antibiotic Resistance Monitoring & Reference Laboratory (ARMRL) [Online]. Available: http://www.hpa.org.uk/. QUAIL, M. A., OTTO, T. D., GU, Y., HARRIS, S. R., SKELLY, T. F., MCQUILLAN, J. A., SWERDLOW, H. P. & OYOLA, S. O. 2012. Optimal enzymes for amplifying sequencing libraries. Nat Meth, 9, 10-11. RAMBAUT, A. 2007. FigTree. http://tree.bio.ed.ac.uk/software/figtree/. RASMUSSEN, L. D., ZAWADSKY, C., BINNERUP, S. J., ØREGAARD, G., SØRENSEN, S. J. & KROER, N. 2008. Cultivation of Hard-To-Culture Subsurface Mercury-Resistant Bacteria and Discovery of New merA Gene Sequences. Applied and Environmental Microbiology, 74, 3795-3803. RELLER, L. B., WEINSTEIN, M. P. & MURDOCH, D. R. 2003. Diagnosis of Legionella Infection. Clinical Infectious Diseases, 36, 64-69. ROBILOTTI, E. & DERESINSKI, S. 2014. Carbapenemase-producing Klebsiella pneumoniae. F1000Prime Rep, 6, 80. RODRIGUEZ, C., LANG, L., WANG, A., ALTENDORF, K., GARCIA, F. & LIPSKI, A. 2006. Lettuce for human consumption collected in Costa Rica contains complex communities of culturable oxytetracycline- and gentamicin-resistant bacteria. Applied and environmental microbiology, 72, 5870-6. ROOD, J. I. & COLE, S. T. 1991. Molecular genetics and pathogenesis of Clostridium perfringens. Microbiological Reviews, 55, 621-648. ROWE, W., BAKER, K. S., VERNER-JEFFREYS, D., BAKER-AUSTIN, C., RYAN, J. J., MASKELL, D. & PEARCE, G. 2015. Search Engine for Antimicrobial Resistance: A Cloud Compatible Pipeline and Web Interface for Rapidly Detecting Antimicrobial Resistance Genes Directly from Sequence Data. PLoS ONE, 10, e0133492. ROZEN, S. & SKALETSKY, H. J. 1988. Primer3. RUPNIK, M., WILCOX, M. H. & GERDING, D. N. 2009. Clostridium difficile infection: new developments in epidemiology and pathogenesis. Nat Rev Microbiol, 7, 526-36. SACK, D., LYKE, C., MCLAUGHLIN, C. & SUWANVANICHKIJ, V. 2001. Antimicrobial resistance in shigellosis, cholera and campylobacteriosis. World Health Organization: Department of Communicable Disease Surveillance and Response. SAIF, N. & BRAZIER, J. S. 1996. The distribution of Clostridium difficile in the environment of South Wales. J Med Microbiol, 45, 133-7. 143 SCHWARTZ, T., KOHNEN, W., JANSEN, B. & OBST, U. 2003. Detection of antibiotic- resistant bacteria and their resistance genes in wastewater, surface water, and drinking water biofilms. FEMS microbiology ecology, 43, 325-335. SEBAIHIA, M., WREN, B. W., MULLANY, P., FAIRWEATHER, N. F., MINTON, N., STABLER, R., THOMSON, N. R., ROBERTS, A. P., CERDENO-TARRAGA, A. M., WANG, H., HOLDEN, M. T., WRIGHT, A., CHURCHER, C., QUAIL, M. A., BAKER, S., BASON, N., BROOKS, K., CHILLINGWORTH, T., CRONIN, A., DAVIS, P., DOWD, L., FRASER, A., FELTWELL, T., HANCE, Z., HOLROYD, S., JAGELS, K., MOULE, S., MUNGALL, K., PRICE, C., RABBINOWITSCH, E., SHARP, S., SIMMONDS, M., STEVENS, K., UNWIN, L., WHITHEAD, S., DUPUY, B., DOUGAN, G., BARRELL, B. & PARKHILL, J. 2006. The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat Genet, 38, 779-86. SEEMANN, T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30, 2068- 9. SEGATA, N. 2014. GraPhlAn. https://bitbucket.org/nsegata/graphlan/wiki/Home. SEGATA, N., IZARD, J., WALDRON, L., GEVERS, D., MIROPOLSKY, L., GARRETT, W. & HUTTENHOWER, C. 2011. Metagenomic biomarker discovery and explanation. Genome Biology, 12, R60. SEGATA, N., WALDRON, L., BALLARINI, A., NARASIMHAN, V., JOUSSON, O. & HUTTENHOWER, C. 2012. Metagenomic microbial community profiling using unique clade- specific marker genes. Nat Meth, 9, 811-814. STALDER, T., BARRAUD, O., CASELLAS, M., DAGOT, C. & PLOY, M. C. 2012. Integron involvement in environmental spread of antibiotic resistance. Front Microbiol, 3, 119. STAMATAKIS, A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics, 22, 2688-90. STEWART, E. J. 2012. Growing Unculturable Bacteria. Journal of Bacteriology, 194, 4151- 4160. SUETIN, S. V., SHCHERBAKOVA, V. A., CHUVILSKAYA, N. A., RIVKINA, E. M., SUZINA, N. E., LYSENKO, A. M. & GILICHINSKY, D. A. 2009. Clostridium tagluense sp. nov., a psychrotolerant, anaerobic, spore-forming bacterium from permafrost. International Journal of Systematic and Evolutionary Microbiology, 59, 1421-1426. SUNDE, M., SIMONSEN, G. S., SLETTEMEAS, J. S., BOCKERMAN, I. & NORSTROM, M. 2015. Integron, Plasmid and Host Strain Characteristics of Escherichia coli from Humans and 144 Food Included in the Norwegian Antimicrobial Resistance Monitoring Programs. PLoS One, 10, e0128797. SVARA, F. & RANKIN, D. J. 2011. The evolution of plasmid-carried antibiotic resistance. BMC Evol Biol, 11, 130. TALUKDAR, P. K., OLGUÍN-ARANEDA, V., ALNOMAN, M., PAREDES-SABJA, D. & SARKER, M. R. 2015. Updates on the sporulation process in Clostridium species. Research in Microbiology, 166, 225-235. TANG, S. S., APISARNTHANARAK, A. & HSU, L. Y. 2014. Mechanisms of beta-lactam antimicrobial resistance and epidemiology of major community- and healthcare-associated multidrug-resistant bacteria. Adv Drug Deliv Rev, 78, 3-13. TAO, R., YING, G. G., SU, H. C., ZHOU, H. W. & SIDHU, J. P. 2010. Detection of antibiotic resistance and tetracycline resistance genes in Enterobacteriaceae isolated from the Pearl rivers in South China. Environmental pollution, 158, 2101-9. TONNA, I. & WELSBY, P. D. 2005. Pathogenesis and treatment of Clostridium difficile infection. Postgraduate Medical Journal, 81, 367-369. TRINGE, S. G., VON MERING, C., KOBAYASHI, A., SALAMOV, A. A., CHEN, K., CHANG, H. W., PODAR, M., SHORT, J. M., MATHUR, E. J., DETTER, J. C., BORK, P., HUGENHOLTZ, P. & RUBIN, E. M. 2005. Comparative metagenomics of microbial communities. Science, 308, 554-7. TSAFNAT, G., COPTY, J. & PARTRIDGE, S. R. 2011. RAC: Repository of Antibiotic resistance Cassettes. Database, 2011. VAN BAMBEKE, F., GLUPCZYNSKI, Y., PLESIAT, P., PECHERE, J. C. & TULKENS, P. M. 2003. Antibiotic efflux pumps in prokaryotic cells: occurrence, impact on resistance and strategies for the future of antimicrobial therapy. The Journal of antimicrobial chemotherapy, 51, 1055-65. WANG, J., MAO, D., MU, Q. & LUO, Y. 2015. Fate and proliferation of typical antibiotic resistance genes in five full-scale pharmaceutical wastewater treatment plants. Sci Total Environ, 526, 366-73. WELLINGTON, E. M. H., BOXALL, A. B. A., CROSS, P., FEIL, E. J., GAZE, W. H., HAWKEY, P. M., JOHNSON-ROLLINGS, A. S., JONES, D. L., LEE, N. M., OTTEN, W., THOMAS, C. M. & WILLIAMS, A. P. 2013. The role of the natural environment in the emergence of antibiotic resistance in Gram-negative bacteria. The Lancet Infectious Diseases, 13, 155-165. 145 WILCOX, M. H., MOONEY, L., BENDALL, R., SETTLE, C. D. & FAWLEY, W. N. 2008. A case-control study of community-associated Clostridium difficile infection. J Antimicrob Chemother, 62, 388-96. WOOD, D. E. & SALZBERG, S. L. 2014. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol, 15, R46. WORLD HEALTH ORGANISATION 2012. The evolving threat of antimicrobial resistance - options for action. In: WHO (ed.). Geneva. WRIGHT, G. D. 2007. The antibiotic resistome: the nexus of chemical and genetic diversity. Nature reviews. Microbiology, 5, 175-86. WRIGHT, M. S., BAKER-AUSTIN, C., LINDELL, A. H., STEPANAUSKAS, R., STOKES, H. W. & MCARTHUR, J. V. 2008. Influence of industrial contamination on mobile genetic elements: class 1 integron abundance and gene cassette structure in aquatic bacterial communities. The ISME journal, 2, 417-28. YIM, G., WANG, H. H. & DAVIES, J. 2007. Antibiotics as signalling molecules. Philos Trans R Soc Lond B Biol Sci, 362, 1195-200. ZANKARI, E., HASMAN, H., COSENTINO, S., VESTERGAARD, M., RASMUSSEN, S., LUND, O., AARESTRUP, F. M. & LARSEN, M. V. 2012. Identification of acquired antimicrobial resistance genes. The Journal of antimicrobial chemotherapy, 67, 2640-4. ZENGLER, K. & PALSSON, B. O. 2012. A road map for the development of community systems (CoSy) biology. Nat Rev Microbiol, 10, 366-72. ZERBINO, D. R. & BIRNEY, E. 2008. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18, 821-829. ZHANG, X. X. & ZHANG, T. 2011. Occurrence, abundance, and diversity of tetracycline resistance genes in 15 sewage treatment plants across China and other global locations. Environmental science & technology, 45, 2598-604. ZHANG, X. X., ZHANG, T. & FANG, H. H. 2009. Antibiotic resistance genes in water environment. Applied microbiology and biotechnology, 82, 397-414. 146 Appendix 1. Manuscript and source code for SEAR. ROWE, W., BAKER, K. S., VERNER-JEFFREYS, D., BAKER-AUSTIN, C., RYAN, J. J., MASKELL, D. & PEARCE, G. 2015. Search Engine for Antimicrobial Resistance: A Cloud Compatible Pipeline and Web Interface for Rapidly Detecting Antimicrobial Resistance Genes Directly from Sequence Data. PLoS ONE, 10, e0133492. RESEARCH ARTICLE Search Engine for Antimicrobial Resistance: A Cloud Compatible Pipeline and Web Interface for Rapidly Detecting Antimicrobial Resistance Genes Directly from Sequence Data Will Rowe1*, Kate S. Baker2, David Verner-Jeffreys3, Craig Baker-Austin3, Jim J. Ryan4, Duncan Maskell1, Gareth Pearce1 1 Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom, 2 Wellcome Trust Sanger Institute, Cambridge, United Kingdom, 3 Centre for Environment, Fisheries and Aquaculture Science, Weymouth, United Kingdom, 4 Environment, Health and Safety, GlaxoSmithKline, Ware, United Kingdom * wpmr2@cam.ac.uk Abstract Background Antimicrobial resistance remains a growing and significant concern in human and veterinary medicine. Current laboratory methods for the detection and surveillance of antimicrobial resistant bacteria are limited in their effectiveness and scope. With the rapidly developing field of whole genome sequencing beginning to be utilised in clinical practice, the ability to interrogate sequencing data quickly and easily for the presence of antimicrobial resistance genes will become increasingly important and useful for informing clinical decisions. Addi- tionally, use of such tools will provide insight into the dynamics of antimicrobial resistance genes in metagenomic samples such as those used in environmental monitoring. Results Here we present the Search Engine for Antimicrobial Resistance (SEAR), a pipeline and web interface for detection of horizontally acquired antimicrobial resistance genes in raw sequencing data. The pipeline provides gene information, abundance estimation and the reconstructed sequence of antimicrobial resistance genes; it also provides web links to additional information on each gene. The pipeline utilises clustering and read mapping to annotate full-length genes relative to a user-defined database. It also uses local alignment of annotated genes to a range of online databases to provide additional information. We demonstrate SEAR’s application in the detection and abundance estimation of antimicrobial resistance genes in two novel environmental metagenomes, 32 human faecal microbiome datasets and 126 clinical isolates of Shigella sonnei. PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 1 / 12 a11111 OPEN ACCESS Citation: Rowe W, Baker KS, Verner-Jeffreys D, Baker-Austin C, Ryan JJ, Maskell D, et al. (2015) Search Engine for Antimicrobial Resistance: A Cloud Compatible Pipeline and Web Interface for Rapidly Detecting Antimicrobial Resistance Genes Directly from Sequence Data. PLoS ONE 10(7): e0133492. doi:10.1371/journal.pone.0133492 Editor:Willem van Schaik, University Medical Center Utrecht, NETHERLANDS Received: May 13, 2015 Accepted: June 27, 2015 Published: July 21, 2015 Copyright: © 2015 Rowe et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All novel environmental metagenomes are available from the European Nucleotide Archive (Metagenomics) database. ENA project numbers and dataset accession numbers are available in supplemental methods, S3 and S4 files. Funding: This research was funded by GlaxoSmithKline, the Centre for Environment, Fisheries and Aquaculture Science and the Biotechnology and Biological Sciences Research Council under an industrial CASE studentship. The funder Centre for Environment, Fisheries and Conclusions We have developed a pipeline that contributes to the improved capacity for antimicrobial resistance detection afforded by next generation sequencing technologies, allowing for rapid detection of antimicrobial resistance genes directly from sequencing data. SEAR uses raw sequencing data via an intuitive interface so can be run rapidly without requiring advanced bioinformatic skills or resources. Finally, we show that SEAR is effective in detecting antimicrobial resistance genes in metagenomic and isolate sequencing data from both environmental metagenomes and sequencing data from clinical isolates. Introduction The global threat of antimicrobial resistance is growing at an alarming rate; infections that were once easily treatable now constitute public health crises [1]. This has lead to the consensus that more must be done to monitor and combat the occurrence and spread of antimicrobial resistance [2, 3]. Current diagnostic laboratory practice for the detection of antimicrobial resis- tance relies on isolate culturing, followed by growth inhibition assays for the identification of resistant phenotypes and determination of Minimum Inhibitory Concentrations against a range of antimicrobials (MICs) [4]. Alternatively, antimicrobial resistance genes (ARGs) can be identified using polymerase chain reaction (PCR) and quantified using real-time PCR, requiring specific primers for the amplification of target sequences [5]. These approaches take time, consume resources, and have limitations that may result in clinically relevant resistances being undetected e.g. phenotypic testing will miss non-culturable bacteria and non-expressed ARGs, whereas limitations of multiplex composition and size in molecular testing complicates the detection of ARGs [6, 7]. Perhaps not surprisingly, the Centers for Disease Control and Prevention (CDC) identified one of the current downfalls in the approach to combatting antimicrobial resistance as the poor use of advanced molecular detection (AMD) technologies [8]. AMD technologies, such as the whole genome sequencing of bacterial isolates as well as uncultured bacteria (metagenomic sequencing), have the potential to identify antimicrobial resistance more quickly and effectively than conventional laboratory assays [8]. In addition to these well-understood advantages, AMD technologies can also be applied to circumvent the requirement of prior knowledge of causative agents and provide clinically relevant information for the treatment and surveillance of pathogens as well as antimicrobial resistance [9]. Upon receipt of a metagenomic (e.g. envi- ronmental or faecal microbiome) or isolate sample, DNA can be extracted, compiled into a library and sequenced within hours [10]. Indeed, AMD approaches to pathogen detection are currently being developed and seek to identify pathogens directly from metagenomic samples within clinically relevant timeframes [11]. Recent studies have also shown AMD to be effective in the epidemiological tracking of pathogens, as well as the detection of ARGs present in their genomes [12, 13]. AMD offers an alternative screening tool that may be quicker than tradi- tional culture-based techniques. For example, the detection ofMycobacterium tuberculosis requires inoculated isolation media to be incubated for several days in order to diagnose infec- tion and additional time for phenotypic characterisation of antimicrobial resistance [14]. This highlights the potential for developing more efficient diagnostic tests and the utilisation of AMD technologies to create more rapid alternatives for ARG detection. Search Engine for Antimicrobial Resistance PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 2 / 12 Aquaculture Science provided support in the form of salaries, research materials and facilities for authors DVJ and CBA, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The funder GlaxoSmithKline provided support in the form of salaries for author JR, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of the authors are articulated in the ‘author contributions’ section. Competing Interests: The authors would like to declare a commercial affiliation with GSK, who partly funded the research and employs JR. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials. In addition to these direct clinical applications, AMD technologies are also beginning to become a common tool in the detection of ARGs in the environment, which is vital for identify- ing reservoirs of ARGs [15–17]. However, there is need to establish a metagenomic framework for use in the monitoring of ARGs within the environment in order to influence public health decisions and the growing concern over antimicrobial resistance [18]. This must include the development of reliable surveillance methods and tools for risk assessment [19]. When design- ing metagenomic tools for the environmental monitoring of ARGs, it is therefore necessary to provide context in terms of the relative abundance of ARGs, so that these can be correlated with environmental variables (e.g. such as antimicrobial concentrations, etc.) as well as to obtain information on the mobile genetic elements (MGE) and pathogens that they are associated with. Currently published resources available for ARG detection are online databases that use the Basic Local Alignment Search Tool (BLAST) algorithm to find possible matches between the database and query sequences (e.g. ARDB, CARD, ResFinder) [20–23]. To our knowledge, no existing tools give an ARG abundance measure or simultaneously provide MGE information. The targeting of full-length gene matches using BLAST requires a sequence assembly step, add- ing time, infrastructure requirements, and complexity to the analysis. Furthermore, full-length gene assembly is often difficult to achieve in metagenomic samples where coverage is fre- quently low and uneven across the sample. Ideally, raw sequencing data would be used directly to rapidly identify and quantify ARGs of interest. Although mapping-based approaches have been used for individual studies [24, 25] and tools that work directly with reads (though on non-ARG databases) such as the SEED subsystems and SRST2 can be applied to work to this aim [26], there is as yet no such ARG-detection algorithm. Here, we present an automated pipeline, the Search Engine for Antimicrobial Resistance (SEAR), which quickly and accurately identifies antimicrobial resistance information from biological samples. Furthermore, it also provides abundance estimates and returns the true sample full-length reconstructed gene sequence. To demonstrate efficacy, we present the application of the pipeline to a range of sequencing data types including novel environmental metagenomes, human faecal metagen- omes and clinical isolates of pathogenic enteric bacteria (Shigella sonnei). Materials and Methods SEAR requirements Reference databases. SEAR requires reference databases for read subtraction and read clustering. Details of the supplied databases and how the user can supply their own custom databases are given in supplemental methods (Supplemental methods A in S1 File). The default databases supplied for read subtraction and read clustering are the human genome (HG19 build) and the ARG-annot database [27]. Hardware. Minimum hardware requirements for SEAR comprise a Unix server (tested using Ubuntu 10.04) with ~2 GB of disk space for reference data and software dependencies (see S1 Table). Whilst running, SEAR requires up to 2X the input FASTQ file size (bytes) in both RAM and disk space for temporary file storage. SEAR The pipeline. SEAR is a pipeline consisting of Perl, Shell and R scripts that call on several pieces of open source software and utilise a customisable reference database to annotate ARGs direct from short-read sequencing data. SEAR is downloadable from http://computing.bio. cam.ac.uk/sear/SEAR_WEB_PAGE/SEAR.html, in stand-alone command-line and web-based versions (Fig 1). Search Engine for Antimicrobial Resistance PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 3 / 12 Fig 1. Screen shot of SEARwebinterface including homepage (A) and quick start settings (B). doi:10.1371/journal.pone.0133492.g001 Search Engine for Antimicrobial Resistance PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 4 / 12 The pipeline follows five main steps in the annotation of ARGs: (1) processing of input files, (2) clustering of sequence reads to known ARGs in user-defined (or pre-loaded) database, (3) mapping of reads to reference sequences, (4) ARG annotation and calculation of relative abun- dance and (5) local alignment of annotated ARGs to online databases. (1) Processing of input files The pipeline accepts raw or compressed (.gz) FASTQ files (either 33 or 64 ASCII encoding) from metagenomic, metatranscriptomic or isolate sequencing. Where more than one input file (e.g. paired-end data) is provided, these files are merged to give a single input file (pair-end information is not currently utilised in the pipeline). The pipeline has the optional step of pre- filtering reads, by removing those that map against a user-defined reference, such as the human genome or a bacterial strain. FASTQ files are quality checked using user-defined cut offs and converted to FASTA formatted reads. (2) Clustering of sequence reads to ARG database The pipeline is supplied with a custom ARG database that has been built by clustering and annotating the ARGs held in the ARGannot-database [27]. Notably however, other ARG data- bases can be used or the user can use a custom FASTA file (Supplemental methods B in S1 File). Reads are clustered to the ARG database by global alignment with USEARCH (version 7.0.959) using a default identity cut-off of 99% [28]. Where multiple matches occur, the read is clustered with the highest identity match. SEAR parses the clusters by grouping reads to each matched reference gene and retrieving corresponding FASTQ information for each matched read. (3) Mapping of clustered sequence reads to ARG references The Burrows-Wheeler Aligner (BWA-mem version 0.7.8) [29] is used for read mapping each cluster of FASTQ reads to the corresponding reference gene. Samtools is then used to analyse the BWA alignment and generate a consensus sequence using mpileup [30]. (4) ARG annotation and relative abundance The consensus sequences are used to annotate ARGs and calculate relative abundance values; an ARG is present in the sample if sequence reads can be mapped to the ARG reference sequence above the defined coverage cut-off (coverage is the percentage length of reference ARG with mapped reads). For relative abundance calculation, SEAR uses a similar method to the reads per kilobase/million reads (RPKM) method that is commonly used in transcriptome studies [31]. Full details on cut-off values and abundance calculation are given in supplemental methods (Supplemental methods C in S1 File). (5) Local alignment The consensus sequences for annotated ARGs are aligned to the NCBI nucleotide and protein databases using commandline BLAST [23] (using the–remote BLAST service by default, see documentation to utilise local database versions). In addition, sequences are also aligned to the current Repository of Antimicrobial Resistance Cassettes (RAC) [32] and Antibiotic Resistance Database (ARDB) [20] databases using BLAST (though ARDB has not recently been curated). Pipeline outputs. In both command-line and web versions of SEAR, output includes: graphical overview, ARG annotations, relative abundance scores, consensus sequences, flat files (html, csv, blast files) and links to further gene information and homologues found in Search Engine for Antimicrobial Resistance PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 5 / 12 online databases (such as the repository of antimicrobial resistance cassettes, NCBI non-redun- dant nucleotide and protein databases). Demonstrating SEAR utility Data sets used in this study. Several datasets were used to demonstrate the utility of this pipeline across broad data categories. All datasets were analysed using a UNIX server (Ubuntu 10.04) running SEAR with default parameters (99% clustering identity and 90% coverage cut- off for ARG annotation, full default parameter list found in S2 Table). Novel environmental metagenomes. Information on metagenome sample collection, library construction and sequencing are provided in supplemental methods (Supplemental methods D, E F in S1 File). Briefly, faecal wastewater effluent samples were taken from a dairy farm (latitude: 52.22259, longitude: 0.02603) and a metropolitan (human) wastewater treat- ment works (WWTW) (latitude: 52.234469, longitude: 0.154614). Samples were vacuum fil- tered through 0.22μmmembranes, DNA extracted and sequenced using the Illumina HiSeq 2000 platform. Pre-existing metagenomic and clinical isolate data. Human Microbiome Project (HMP) data for 32 Spanish human faecal microbiomes (for which the ARGs have previously been characterised in an in silico study by Forslund et al. [25]) were used (SRA Study ERP002061). Additionally, SEAR was used to detect ARGs in a global dataset of 126 clinical isolates of the pathogenic bacteria Shigella sonnei (SRA Study ERP000182) [33]. In the case of the clinical iso- lates, SEAR ARG detection was compared with the published ARG content of the isolates, with SEAR being run with default parameters on a custom reference database of ARGs originally detected by 100% mapping [33]. Further details on datasets are provided in S3 Table. Results To test the utility of SEAR we ran the pipeline using a variety of sample types (environmental metagenomes, human faecal microbiome and bacterial clinical isolate), recorded pipeline run times (S4 Table) and then investigated the presence and abundance of ARGs in all samples. Discrimination of ARG presence and abundance between environmental metagenomes A total of 28 (15 in each) ARGs were identified among the environmental metagenomes from WWTW effluent and farm waste effluent (Fig 2). Only two genes, strA and strB (both confer- ring aminoglycoside resistance), were common between the metagenomes and each gene found in both sets was five times more abundant in the WWTW effluent compared to the farm effluent when using the normalised abundance values for the combined datasets. The WWTW effluent had ARGs conferring resistance to a total of four antimicrobial resistance profiles with the most diverse (i.e. greatest number of ARGs) being the aminoglycoside resistance profile and the greatest abundance being ARGs conferring tetracycline resistance. In contrast, the farm effluent had ARGs conferring resistance to five resistance profiles with the most diverse being the beta lactam resistance profile and the most abundant also being tetracycline resis- tance (Fig 2). The most abundant ARGs in the metagenome datasets were tetracycline resis- tance genes; tetC (41.6%) in the farm effluent, and tet39 (15.3%) in the WWTW effluent. A subset of ARGs identified by SEAR (tetA, qnrB and bla-ACT; chosen to encompass clinically relevant resistances, drugs with both a long and short history of resistance and chemically diverse antimicrobials) was confirmed in the original farm effluent DNA sample using PCR. Briefly, primers were designed using Primer3 [34] and were amplified using GoTaq DNA poly- merase (Promgega) (not shown). Search Engine for Antimicrobial Resistance PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 6 / 12 Fig 2. SEAR results for environmental metagenomes. The column chart in A shows the breakdown of the number of ARGs in each effluent, grouped by antimicrobial resistance profile. The column chart in B shows the relative abundance of ARGs found in each metagenome (coloured according to the key). doi:10.1371/journal.pone.0133492.g002 Search Engine for Antimicrobial Resistance PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 7 / 12 Efficacy of SEAR for detecting ARGs in human faecal microbiomes To assess the efficacy of SEAR for detecting ARGs in microbiome data, SEAR was tested on 32 faecal microbiome samples (S5 Table). ARGs were detected in 31 of the samples and a total of 295 genes conferring resistance to 6 classes of antimicrobials were identified across the samples (Table 1). Genes conferring resistance to tetracyclines were again the most common ARGs identified (39% of total ARGs detected). Accuracy of SEAR ARG detection using clinical isolate sequencing data To evaluate SEAR’s efficacy in detecting ARGs in clinical isolate sequencing data, SEAR was run on sequencing data from 126 isolates of the enteric pathogen Shigella sonnei. To evaluate SEAR’s performance, the results were compared to the ARG detection data presented in the original publication [33]. Of the 231 detection events (see methods for criteria) originally pre- sented in the publication, SEAR identified 221 of these, and a further 20 ARGs (Table 2, full results shown in S6 Table). Discussion SEAR is an ARG annotation tool that is freely available and may be downloaded as a cloud compatible web interface or a stand-alone command line program. It offers advantages over currently available ARG annotation tools as it provides ARG annotations, relative abundance Table 1. SEAR detection of ARGs across antimicrobial profile/classes in human faecal microbiomes. Antimicrobial resistance profile Number of ARGs Aminoglycosides 54 Beta lactams 38 Quinolones 0 Glycopeptides 0 Macrolides/Lincosamides/Streptogramins 82 Phenicols 1 Rifampicin 0 Sulfonamides 5 Tetracyclines 115 Trimethoprims 0 The table shows the number of genes identified in each antimicrobial resistance profile for the combined dataset of HMP samples. doi:10.1371/journal.pone.0133492.t001 Table 2. Accuracy of SEAR ARG detection using clinical isolate sequencing data. Reported in Holt et al. [33] detected not-detected TOTAL SEAR results detected 221 20 241 not-detected 10 0 10 TOTAL 231 20 The contingency table compares the detection and non-detection of ARGs by SEAR relative to the published ARG detection data for 126 S. sonnei isolates. doi:10.1371/journal.pone.0133492.t002 Search Engine for Antimicrobial Resistance PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 8 / 12 values, gene sequence and gene information from raw sequencing data without requiring any sequence assembly. In contrast to tools based on BLAST comparison of de novo assemblies, the clustering and mapping approach used by SEAR, combined with the customisable database and annotation parameters, allows the user to detect putative ARGs in incomplete or low cov- erage sequencing data that is common in metagenomic analyses. SEAR successfully identified ARGs in sequencing datasets that were generated from novel environmental metagenomic samples, human microbiomes and clinical isolates of Shigella sonnei. SEAR was able to detect the ARGs present in two novel environmental metagenomes allow- ing direct comparison between two different wastewater effluent samples. SEAR identified meaningful differences among ARGs of clinical interest, for example the presence of quinolone resistance genes (qnrB and qnrS) exclusively in the wastewater effluent from the farm source. It also showed that while the two sources had different qualitative ARG characteristics (with either aminoglycosides or beta-lactams being the most diverse resistance profiles) and in both tetracycline resistance genes were present in the greatest abundance. In addition to detecting important differences among these sample types, the confirmation of a subset of identified ARGs by PCR demonstrated the robustness of the pipeline. Similarly, SEAR was effective for identifying ARGs from clinical samples. ARGs were detected in human microbiomes demonstrating the potential of using metagenomic analyses for the surveillance and management antimicrobial resistance. Additionally, SEAR successfully identified ARGs in a global dataset of 126 clinical isolates of an important enteric pathogen. There were a few discrepancies, which were consistent with a given isolate or gene family, how- ever the results were overwhelmingly consistent. Furthermore, the congruence of ARG detec- tion results from SEAR with the published ARG content of the isolates further highlighted the effectiveness of the pipeline, providing further compelling argument for the application of high-throughput AMD into clinical microbiology. Limitations and future improvements SEAR offers increased functionality over existing bioinformatic tools by providing a consensus sequence of annotated ARGs, links to online resources containing information on the ARGs (and gene homologs) and a relative abundance estimate for each ARG detected. Each ARG consensus sequence is generated using reads that clustered to a reference sequence and conse- quently any variability in the consensus sequence in a metagenomic sample may be due to either sequencing noise or the presence of multiple bona fide sequence variants. The relative abundance estimate is relative within an individual sample, however the SEAR output features the information required to calculate relative abundance across multiple samples. Due to possi- ble large variations in user file size and upload speed, the SEAR interface and command line tool are available for use as downloadable packages. SEAR is designed for detecting ARGs that are horizontally acquired, not antimicrobial resis- tance that is caused (or inactivated) by single nucleotide polymorphisms (SNPs) e.g. SNPs in the gyrA gyrase gene that result in quinolone resistance. SNPs are not currently tested for due to the annotation parameters being calibrated for detecting partial ARG matches to compen- sate for low sequencing coverage. Hence, such SNPs may be missed by SEAR due to the num- ber of mismatches permitted or by a low coverage cut-off (though these are both customisable settings). For these reasons, it is not recommended to include SNP-based resistances in refer- ence databases used with SEAR as they may lead to false positives. The detection of SNP-based resistances in metagenomic samples represents a significant future challenge that needs to be addressed. It should also be stressed that the default SEAR parameters, which are based on high-stringency read clustering and mapping, result in an analysis that finds ARGs that are Search Engine for Antimicrobial Resistance PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 9 / 12 known in the reference data and it is not suited for discovery of emergent ARGs. The high- stringency settings are designed to exclude the possibility of non-competitive read mapping causing false positive results by ensuring that annotated ARGs have a high sequence identity compared to the reference database. Conclusion We have presented a bioinformatic pipeline that is highly effective for detecting ARGs directly from raw sequencing reads that also provides relative abundance estimation and sequences of identified genes. We have shown its application on sequence data from metagenomic datasets and bacterial isolates. We have demonstrated the application of SEAR in potential clinical and environmental monitoring applications, highlighting the advantages of automated interpreta- tion of sequencing data for generating timely and informative reports for informing public health and potentially clinical decision-making. With the increasing drive to integrate AMD technology and existing laboratory assays in order to combat antimicrobial resistance, we pres- ent this pipeline as a valuable step towards this important goal. Availability and requirements Project name: Search Engine for Antimicrobial Resistance (SEAR) Project home page: http://computing.bio.cam.ac.uk/sear/SEAR_WEB_PAGE/SEAR.html Operating system(s): UNIX Programming language: Perl Other requirements: Usearch (v.7), BWA, samtools, R License: GNU GPL (version 3) Any restrictions to use by non-academics: na Supporting Information S1 File. Supplemental methods. (PDF) S1 Table. A list of dependencies required by SEAR. (PDF) S2 Table. SEAR parameters. (PDF) S3 Table. NGS datasets. (PDF) S4 Table. Example runtimes for SEAR. (PDF) S5 Table. SEAR ARG detection for HMP sequence data. (PDF) S6 Table. SEAR ARG detection using clinical isolate sequence data. (PDF) Acknowledgments The authors would like to thank Dr. Jenny Barna for computing support. Search Engine for Antimicrobial Resistance PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 10 / 12 Author Contributions Conceived and designed the experiments: WR GP. Performed the experiments: WR KB. Ana- lyzed the data: WR. Contributed reagents/materials/analysis tools: WR KB. Wrote the paper: WR KB DVJ CBA JR DM GP. References 1. Sack D., Lyke C., McLaughlin C., Suwanvanichkij V. Antimicrobial resistance in shigellosis, cholera and campylobacteriosis. World Health Organization: Department of Communicable Disease Surveil- lance and Response, 2001. 2. WHO. The evolving threat of antimicrobial resistance—options for action. Geneva2012. 3. Laxminarayan R, Duse A, Wattal C, Zaidi AKM,Wertheim HFL, Sumpradit N, et al. Antibiotic resistance —the need for global solutions. The Lancet Infectious Diseases. 2013; 13(12):1057–98. doi: 10.1016/ S1473-3099(13)70318-9 PMID: 24252483 4. PHE. Antibiotic Resistance Monitoring & Reference Laboratory (ARMRL) 2013. Available from: http:// www.hpa.org.uk/. 5. Espy MJ, Uhl JR, Sloan LM, Buckwalter SP, Jones MF, Vetter EA, et al. Real-time PCR in clinical micro- biology: applications for routine laboratory testing. Clin Microbiol Rev. 2006; 19(1):165–256. Epub 2006/01/19. doi: 10.1128/CMR.19.1.165-256.2006 PMID: 16418529; PubMed Central PMCID: PMC1360278. 6. Diekema DJ, Pfaller MA. Rapid Detection of Antibiotic-Resistant Organism Carriage for Infection Pre- vention. Clinical Infectious Diseases. 2013; 56(11):1614–20. doi: 10.1093/cid/cit038 PMID: 23362298 7. Katherine E. Heiman, Karlsson Maria, Julian Grass, Becca Howie, Robert D. Kirkcaldy, Barbara Mahon, et al. Shigella with Decreased Susceptibility to Azithromycin Among MenWho Have Sex with Men—United States, 2002–2013. Morbidity and Mortality Weekly Report—CDC. 2014; 63(6):132. 8. CDC CfDCaP. Antibiotic Resistance Threats in the United States, 20132013. 9. Miller R, Montoya V, Gardy J, Patrick D, Tang P. Metagenomics for pathogen detection in public health. GenomeMedicine. 2013; 5(9):81. doi: 10.1186/gm485 PMID: 24050114 10. Illumina. Illumina MySeq Benchtop Sequencer 2013. Available from: http://www.illumina.com. 11. Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, et al. A cloud-compati- ble bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Research. 2014; 24(7):1180–92. doi: 10.1101/gr.171934.113 PMC4079973. PMID: 24899342 12. Koser CU, Holden MT, Ellington MJ, Cartwright EJ, Brown NM, Ogilvy-Stuart AL, et al. Rapid whole- genome sequencing for investigation of a neonatal MRSA outbreak. N Engl J Med. 2012; 366 (24):2267–75. Epub 2012/06/15. doi: 10.1056/NEJMoa1109910 PMID: 22693998; PubMed Central PMCID: PMC3715836. 13. Harrison EM, Paterson GK, Holden MTG, Larsen J, Stegger M, Larsen AR, et al. Whole genome sequencing identifies zoonotic transmission of MRSA isolates with the novelmecA homologuemecC. EMBOMol Med. 2013; 5(4):509–15. doi: 10.1002/emmm.201202413 PMID: 23526809 14. Kidenya BR, Kabangila R, Peck RN, Mshana SE, Webster LE, Koenig SP, et al. Early and Efficient Detection ofMycobacterium tuberculosis in Sputum by Microscopic Observation of Broth Cultures. PLoS One. 2013; 8(2):e57527. doi: 10.1371/journal.pone.0057527 PMID: 23469014 15. Lewin A, Johansen J, Wentzel A, Kotlar HK, Drablos F, Valla S. The microbial communities in two apparently physically separated deep subsurface oil reservoirs show extensive DNA sequence similari- ties. Environ Microbiol. 2013. Epub 2013/07/06. doi: 10.1111/1462-2920.12181 PMID: 23827055. 16. Mason OU, Hazen TC, Borglin S, Chain PS, Dubinsky EA, Fortney JL, et al. Metagenome, metatran- scriptome and single-cell sequencing reveal microbial response to Deepwater Horizon oil spill. Isme J. 2012; 6(9):1715–27. Epub 2012/06/22. doi: 10.1038/ismej.2012.59 PMID: 22717885; PubMed Central PMCID: PMC3498917. 17. Oh S, Tandukar M, Pavlostathis SG, Chain PS, Konstantinidis KT. Microbial community adaptation to quaternary ammonium biocides as revealed by metagenomics. Environ Microbiol. 2013. Epub 2013/ 06/05. doi: 10.1111/1462-2920.12154 PMID: 23731340. 18. Port JA, Cullen AC, Wallace JC, Smith MN, Faustman EM. Metagenomic Frameworks for Monitoring Antibiotic Resistance in Aquatic Environments. Environ Health Perspect. 2014; 122(3):222–8. doi: 10. 1289/ehp.1307009 PMC3948035. PMID: 24334622 Search Engine for Antimicrobial Resistance PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 11 / 12 19. Berendonk TU, Manaia CM, Merlin C, Fatta-Kassinos D, Cytryn E, Walsh F, et al. Tackling antibiotic resistance: the environmental framework. Nat Rev Micro. 2015;advance online publication. doi: 10. 1038/nrmicro3439 20. Liu B, Pop M. ARDB, Antibiotic Resistance Genes Database. Nucleic Acids Research. 2009; 37(suppl 1):D443–D7. doi: 10.1093/nar/gkn656 21. McArthur AG,Waglechner N, Nizam F, Yan A, Azad MA, Baylay AJ, et al. The comprehensive antibiotic resistance database. Antimicrob Agents Chemother. 2013; 57(7):3348–57. Epub 2013/05/08. doi: 10. 1128/AAC.00419-13 PMID: 23650175; PubMed Central PMCID: PMC3697360. 22. Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, et al. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother. 2012; 67(11):2640–4. Epub 2012/ 07/12. doi: 10.1093/jac/dks261 PMID: 22782487; PubMed Central PMCID: PMC3468078. 23. Altschul SF, GishW, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990; 215(3):403–10. doi: 10.1016/S0022-2836(05)80360-2 PMID: 2231712 24. Hu Y, Yang X, Qin J, Lu N, Cheng G, Wu N, et al. Metagenome-wide analysis of antibiotic resistance genes in a large cohort of human gut microbiota. Nat Commun. 2013; 4:2151. Epub 2013/07/24. doi: 10.1038/ncomms3151 PMID: 23877117. 25. Forslund K, Sunagawa S, Kultima JR, Mende DR, ArumugamM, Typas A, et al. Country-specific antibi- otic use practices impact the human gut resistome. Genome Research. 2013; 23(7):1163–9. doi: 10. 1101/gr.155465.113 PMID: 23568836 26. Inouye M, Dashnow H, Raven L-A, Schultz M, Pope B, Tomita T, et al. SRST2: Rapid genomic surveil- lance for public health and hospital microbiology labs. GenomeMedicine. 2014; 6(11):90. doi: 10.1186/ s13073-014-0090-6 PMID: 25422674 27. Gupta SK, Padmanabhan BR, Diene SM, Lopez-Rojas R, Kempf M, Landraud L, et al. ARG-ANNOT, a New Bioinformatic Tool To Discover Antibiotic Resistance Genes in Bacterial Genomes. Antimicrob Agents Chemother. 2014; 58(1):212–20. doi: 10.1128/aac.01310-13 PMID: 24145532 28. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010; 26 (19):2460–1. Epub 2010/08/17. doi: 10.1093/bioinformatics/btq461 PMID: 20709691. 29. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformat- ics. 2009; 25(14):1754–60. Epub 2009/05/20. doi: 10.1093/bioinformatics/btp324 PMID: 19451168; PubMed Central PMCID: PMC2705234. 30. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011; 27(21):2987–93. Epub 2011/09/10. doi: 10.1093/bioinformatics/btr509 PMID: 21903627; PubMed Central PMCID: PMC3198575. 31. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian tran- scriptomes by RNA-Seq. Nat Methods. 2008; 5(7):621–8. Epub 2008/06/03. doi: 10.1038/nmeth.1226 PMID: 18516045. 32. Tsafnat G, Copty J, Partridge SR. RAC: Repository of Antibiotic resistance Cassettes. Database. 2011;2011. doi: 10.1093/database/bar054 33. Holt KE, Baker S, Weill FX, Holmes EC, Kitchen A, Yu J, et al. Shigella sonnei genome sequencing and phylogenetic analysis indicate recent global dissemination from Europe. Nat Genet. 2012; 44(9):1056– 9. Epub 2012/08/07. doi: 10.1038/ng.2369 PMID: 22863732; PubMed Central PMCID: PMC3442231. 34. Rozen S, Skaletsky HJ. Primer3. 1988. Search Engine for Antimicrobial Resistance PLOS ONE | DOI:10.1371/journal.pone.0133492 July 21, 2015 12 / 12 147 Appendix 2. River Cam catchment area. 148 Appendix 3. List of sequencing data generated and used in these studies. Section Sample ID Location collected Data type Biome Date collected Chapter 2 DF:M:1 Cambridge University dairy farm Metagenome Fertiliser/lagoon effluent 22.06.2012 WWTP:M:1 City of Cambridge Wastewater Treatment Plant Metagenome Treated wastewater effluent 21.06.2012 Chapter 3 AS:M:1 Ashwell spring Metagenome River Cam source water 02.05.2013 DF:M:1 Cambridge University dairy farm Metagenome Fertiliser/lagoon effluent 22.06.2012 DF:M:2 Cambridge University dairy farm Metagenome Fertiliser/lagoon effluent 02.05.2013 DF:M:3 Cambridge University dairy farm Metagenome Fertiliser/lagoon effluent 06.08.2014 WWTP:M:1 City of Cambridge Wastewater Treatment Plant Metagenome Treated wastewater effluent 21.06.2012 WWTP:M:2 City of Cambridge Wastewater Treatment Plant Metagenome Treated wastewater effluent 02.05.2013 WWTP:M:3 City of Cambridge Wastewater Treatment Plant Metagenome Treated wastewater effluent 04.08.2014 Chapter 4 AH:M:1 Addenbrooke's hospital Metagenome Hospital wastewater effluent 02.05.2013 AH:M:2 Addenbrooke's hospital Metagenome Hospital wastewater effluent 04.08.2014 AH:M:3 Addenbrooke's hospital Metagenome Hospital wastewater effluent 15.09.2014 AH:M:4 Addenbrooke's hospital Metagenome Hospital wastewater effluent 29.09.2014 AH:M:5 Addenbrooke's hospital Metagenome Hospital wastewater effluent 27.10.2014 AH:M:6 Addenbrooke's hospital Metagenome Hospital wastewater effluent 24.11.2014 AS:M:1 Ashwell spring Metagenome River Cam source water 02.05.2013 AS:M:2 Ashwell spring Metagenome River Cam source water 04.08.2014 AS:M:3 Ashwell spring Metagenome River Cam source water 15.09.2014 AS:M:4 Ashwell spring Metagenome River Cam source water 29.09.2014 AS:M:5 Ashwell spring Metagenome River Cam source water 27.10.2014 AS:M:6 Ashwell spring Metagenome River Cam source water 24.11.2014 149 DF:M:2 Cambridge University dairy farm Metagenome Fertiliser/lagoon effluent 02.05.2013 DF:M:3 Cambridge University dairy farm Metagenome Fertiliser/lagoon effluent 06.08.2014 DF:M:4 Cambridge University dairy farm Metagenome Fertiliser/lagoon effluent 15.09.2014 DF:M:5 Cambridge University dairy farm Metagenome Fertiliser/lagoon effluent 29.09.2014 DF:M:6 Cambridge University dairy farm Metagenome Fertiliser/lagoon effluent 27.10.2014 DF:M:7 Cambridge University dairy farm Metagenome Fertiliser/lagoon effluent 24.11.2014 AH:T:1 Addenbrooke's hospital Metatranscriptome Hospital wastewater effluent 02.05.2013 AH:T:2 Addenbrooke's hospital Metatranscriptome Hospital wastewater effluent 04.08.2014 AH:T:3 Addenbrooke's hospital Metatranscriptome Hospital wastewater effluent 15.09.2014 AH:T:4 Addenbrooke's hospital Metatranscriptome Hospital wastewater effluent 29.09.2014 AH:T:5 Addenbrooke's hospital Metatranscriptome Hospital wastewater effluent 27.10.2014 AH:T:6 Addenbrooke's hospital Metatranscriptome Hospital wastewater effluent 24.11.2014 AS:T:1 Ashwell spring Metatranscriptome River source water 02.05.2013 AS:T:2 Ashwell spring Metatranscriptome River source water 04.08.2014 AS:T:3 Ashwell spring Metatranscriptome River source water 15.09.2014 AS:T:4 Ashwell spring Metatranscriptome River source water 29.09.2014 AS:T:5 Ashwell spring Metatranscriptome River source water 27.10.2014 AS:T:6 Ashwell spring Metatranscriptome River source water 24.11.2014 DF:T:1 Cambridge University dairy farm Metatranscriptome Fertiliser/lagoon effluent 02.05.2013 DF:T:2 Cambridge University dairy farm Metatranscriptome Fertiliser/lagoon effluent 06.08.2014 DF:T:3 Cambridge University dairy farm Metatranscriptome Fertiliser/lagoon effluent 15.09.2014 DF:T:4 Cambridge University dairy farm Metatranscriptome Fertiliser/lagoon effluent 29.09.2014 DF:T:5 Cambridge University dairy farm Metatranscriptome Fertiliser/lagoon effluent 27.10.2014 DF:T:6 Cambridge University dairy farm Metatranscriptome Fertiliser/lagoon effluent 24.11.2014 150 Chapter 5 PF:M:1 Writtle College pig farm Metagenome Fertiliser/lagoon effluent 15.07.2014 #1 Writtle College pig farm Isolate WGS Combined farrow house effluent 15.07.2014 #2 Writtle College pig farm Isolate WGS Combined farrow house effluent 15.07.2014 #3 Writtle College pig farm Isolate WGS Piglet crate effluent 15.07.2014 #4 Writtle College pig farm Isolate WGS Piglet crate effluent 15.07.2014 #5 Writtle College pig farm Isolate WGS Combined farrow house effluent 15.07.2014 #6 Writtle College pig farm Isolate WGS Piglet crate effluent 15.07.2014 #7 Writtle College pig farm Isolate WGS Piglet crate effluent 15.07.2014 #8 Writtle College pig farm Isolate WGS Sow faeces 15.07.2014 #9 Writtle College pig farm Isolate WGS Combined farrow house effluent 15.07.2014 #10 Writtle College pig farm Isolate WGS Fertiliser/lagoon effluent 15.07.2014 #12 Writtle College pig farm Isolate WGS Fertiliser/lagoon effluent 15.07.2014 #13 Writtle College pig farm Isolate WGS Fertiliser/lagoon effluent 15.07.2014 #14 Writtle College pig farm Isolate WGS Manure heap 15.07.2014 #15 Writtle College pig farm Isolate WGS Manure heap 15.07.2014 #16 Writtle College pig farm Isolate WGS Combined weaner shed effluent 15.07.2014 #17 Writtle College pig farm Isolate WGS Combined weaner shed effluent 15.07.2014 #18 Writtle College pig farm Isolate WGS Soil (ammended with lagoon effluent) 15.07.2014 #19 Writtle College pig farm Isolate WGS Soil (ammended with lagoon effluent) 15.07.2014 #20 Writtle College pig farm Isolate WGS Soil (ammended with lagoon effluent) 15.07.2014 #21 Writtle College pig farm Isolate WGS Soil (ammended with lagoon effluent) 15.07.2014 #22 City of Cambridge Wastewater Treatment Plant Isolate WGS Untreated wastewater influent 04.08.2014 #23 City of Cambridge Wastewater Treatment Plant Isolate WGS Untreated wastewater influent 04.08.2014 #24 City of Cambridge Wastewater Treatment Plant Isolate WGS Untreated wastewater influent 04.08.2014 151 #25 City of Cambridge Wastewater Treatment Plant Isolate WGS Untreated wastewater influent 04.08.2014 #26 City of Cambridge Wastewater Treatment Plant Isolate WGS Treated wastewater effluent 04.08.2014 #27 City of Cambridge Wastewater Treatment Plant Isolate WGS Treated wastewater effluent 04.08.2014 #28 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 04.08.2014 #29 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 04.08.2014 #30 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 04.08.2014 #31 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 04.08.2014 #32 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 04.08.2014 #33 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 04.08.2014 #34 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 04.08.2014 #35 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 04.08.2014 #36 Cambridge University dairy farm Isolate WGS Combined milking shed effluent 06.08.2014 #37 Cambridge University dairy farm Isolate WGS Combined milking shed effluent 06.08.2014 #38 Cambridge University dairy farm Isolate WGS Calf shed effluent 06.08.2014 #39 Cambridge University dairy farm Isolate WGS Calf shed effluent 06.08.2014 #40 Cambridge University dairy farm Isolate WGS Calf shed effluent 06.08.2014 #41 Cambridge University dairy farm Isolate WGS Calf shed effluent 06.08.2014 #42 Cambridge University dairy farm Isolate WGS Calf shed effluent 06.08.2014 #43 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 15.09.2014 #44 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 15.09.2014 #45 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 15.09.2014 #46 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 29.09.2014 #47 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 29.09.2014 #48 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 27.10.2014 152 #49 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 27.10.2014 #50 Cambridge University dairy farm Isolate WGS Fertiliser/lagoon effluent 15.09.2014 #51 Cambridge University dairy farm Isolate WGS Fertiliser/lagoon effluent 29.09.2014 #52 River Cam Isolate WGS River water 01.10.2014 #53 River Cam Isolate WGS River water 01.10.2014 #54 River Cam Isolate WGS River water 01.10.2014 #55 Cambridge University dairy farm Isolate WGS Fertiliser/lagoon effluent 29.09.2014 #56 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 29.09.2014 #57 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 29.09.2014 #58 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 29.09.2014 #59 Addenbrooke's hospital Isolate WGS Hospital wastewater effluent 29.09.2014 153 Appendix 4. Accession numbers of sequences used in these studies. Section Sequence source Sequence reference ID Source Accession Chapter 2 Faecal microbiomes used to test SEAR efficacy O2.UC1-0 ENA ERR209529 O2.UC4-0 ENA ERR209614 O2.UC11-0 ENA ERR209531 O2.UC12-0 ENA ERR209534 O2.UC13-0 ENA ERR209538 O2.UC16-0 ENA ERR209544 O2.UC17-0 ENA ERR209547 O2.UC18-0 ENA ERR209550 O2.UC19-0 ENA ERR209554 O2.UC21-0 ENA ERR209564 O2.UC22-0 ENA ERR209567 O2.UC23-0 ENA ERR209572 O2.UC24-0 ENA ERR209576 V1.CD2-0 ENA ERR209704 V1.CD3-0 ENA ERR209723 V1.CD6-0 ENA ERR209743 V1.CD8-0 ENA ERR209750 V1.CD9-0 ENA ERR209752 V1.CD11-0 ENA ERR209682 V1.CD12-0 ENA ERR209684 V1.CD13-0 ENA ERR209686 154 V1.CD14-0 ENA ERR209688 V1.UC7-0 ENA ERR209901 V1.UC8-0 ENA ERR209903 V1.UC9-0 ENA ERR209905 V1.UC10-0 ENA ERR209754 V1.UC13-0 ENA ERR209766 V1.UC14-0 ENA ERR209770 V1.UC15-0 ENA ERR209774 V1.UC17-0 ENA ERR209780 V1.UC19-0 ENA ERR209786 V1.UC21-0 ENA ERR209789 S. sonnei reference sequences used to test SEAR accuracy Sh41191 ENA ERR024604 Sh66470 ENA ERR024605 Sh74369 ENA ERR024606 Sh55623 ENA ERR024607 Sh60108 ENA ERR024608 Sh62542 ENA ERR024609 Sh65179 ENA ERR024610 Sh65387 ENA ERR024611 Sh65623 ENA ERR024612 ShIB1 ENA ERR024616 ShIB690 ENA ERR024617 ShIB691 ENA ERR024618 ShIB694 ENA ERR024619 155 ShIB2 ENA ERR024620 ShIB3 ENA ERR024621 ShIB10 ENA ERR024622 ShIB681 ENA ERR024625 ShIB683 ENA ERR024626 ShIB687 ENA ERR024627 ShIB695 ENA ERR025682 ShIB1970 ENA ERR025683 ShIB1976 ENA ERR025684 ShIB1980 ENA ERR025685 ShIB696 ENA ERR025686 ShIB697 ENA ERR025687 ShIB698 ENA ERR025688 ShIB713 ENA ERR025689 ShIB716 ENA ERR025690 ShIB717 ENA ERR025691 ShIB739 ENA ERR025692 ShIB748 ENA ERR025693 ShIB1985 ENA ERR025695 ShIB2009 ENA ERR025696 ShIB2012 ENA ERR025697 ShIB2013 ENA ERR025698 ShIB1987 ENA ERR025699 ShIB1990 ENA ERR025700 156 ShIB1993 ENA ERR025701 ShIB1995 ENA ERR025702 ShIB1997 ENA ERR025703 ShIB2000 ENA ERR025704 ShIB2004 ENA ERR025705 ShIB2008 ENA ERR025706 ShIB2015 ENA ERR025708 ShIB3488 ENA ERR025709 ShIB3507 ENA ERR025710 ShIB3580 ENA ERR025711 ShIB2018 ENA ERR025712 ShIB2024 ENA ERR025713 ShIB2026 ENA ERR025714 ShIB2493 ENA ERR025715 ShIB48279 ENA ERR025716 ShIB3277 ENA ERR025717 ShIB3300 ENA ERR025718 ShIB3374 ENA ERR025719 ShIB3599 ENA ERR025721 Sh54213 ENA ERR025722 Sh54228 ENA ERR025724 PWR105 ENA ERR025725 Sh54178 ENA ERR025726 Sh54179 ENA ERR025727 157 Sh54184 ENA ERR025729 Sh54185 ENA ERR025730 Sh54190 ENA ERR025731 Sh54210 ENA ERR025732 Sh658 ENA ERR025734 Sh1267 ENA ERR025735 Sh1567 ENA ERR025736 Sh259 ENA ERR025737 Sh1460 ENA ERR025738 Sh1461 ENA ERR025739 Sh1263 ENA ERR025741 Sh1265 ENA ERR025742 Sh1166 ENA ERR025743 Sh1167 ENA ERR025744 Sh1173 ENA ERR025746 Sh8883 ENA ERR025747 Sh970044 ENA ERR025748 Sh2073 ENA ERR025749 Sh273 ENA ERR025750 Sh373 ENA ERR025751 Sh1274 ENA ERR025752 Sh2574 ENA ERR025753 Sh4374 ENA ERR025754 Sh4474 ENA ERR025755 158 Sh476 ENA ERR025756 Sh988743 ENA ERR025758 Sh36224 ENA ERR025759 Sh989560 ENA ERR025761 Sh9810267 ENA ERR025762 Sh998911 ENA ERR025763 Sh2225 ENA ERR025764 Sh5827 ENA ERR025765 Sh31382 ENA ERR025767 Sh32222 ENA ERR025768 20051272 ENA ERR028671 20061758 ENA ERR028672 19911483 ENA ERR028673 19920319 ENA ERR028674 19910761 ENA ERR028675 20041367 ENA ERR028676 20060018 ENA ERR028677 20081885 ENA ERR028678 20040880 ENA ERR028679 CS2 ENA ERR028680 20051541 ENA ERR028681 20061309 ENA ERR028684 CS20 ENA ERR028685 CS6 ENA ERR028686 159 CS14 ENA ERR028687 20003593 ENA ERR028688 20021122 ENA ERR028689 20031275 ENA ERR028690 19984123 ENA ERR028691 20040489 ENA ERR028692 20052631 ENA ERR028693 19904011 ENA ERR028694 20011685 ENA ERR028695 20040924 ENA ERR028697 20062087 ENA ERR028699 20071599 ENA ERR028700 20010007 ENA ERR028702 20062313 ENA ERR028703 CS7 ENA ERR028704 CS8 ENA ERR028705 CS1 ENA ERR028706 Chapter 3 Mobile genetic element reference sequences detected in metagenomes Aeromonas salmonicida plasmid pRAS3.2 REFSEQ gi|15983520 salmonicida plasmid pRAS3.1 REFSEQ gi|15983531 Aeromonas hydrophila plasmid pBRST7.6 REFSEQ gi|198286625 KCL-2 plasmid pMGD2 REFSEQ gi|20514397 Klebsiella pneumoniae plasmid pIGMS31 REFSEQ gi|209947514 Escherichia coli plasmid pEC278 REFSEQ gi|209947788 Endophytic bacterium LOB-07 plasmid pLK39 REFSEQ gi|255929160 160 Klebsiella pneumoniae plasmid unnamed REFSEQ gi|305678726 Enterobacter cloacae strain BB1092 plasmid pB1023 REFSEQ gi|435855445 Klebsiella pneumoniae strain BB1088 plasmid pB1019 REFSEQ gi|435855463 U288 plasmid pSTU288-3 REFSEQ gi|482907348 Cronobacter sakazakii strain ATCC 29544 plasmid pCSA2 REFSEQ gi|507579660 Escherichia coli strain K317 plasmid ColE7-K317 REFSEQ gi|690630974 Enterobacter cloacae strain 34998 plasmid p34998-4.921kb REFSEQ gi|746219889 Enterobacter cloacae strain 34983 plasmid REFSEQ gi|749202706 Klebsiella oxytoca strain M1 plasmid pKOXM1D REFSEQ gi|749293681 pneumoniae Kp13 plasmid pKP13b REFSEQ gi|749296055 Enterobacter cloacae strain 34978 plasmid p34978-4.938kb REFSEQ gi|765030385 Cronobacter sakazakii strain ATCC 29544 plasmid REFSEQ gi|817657570 plasmid pSEM integron INTEGRALL gi|57635337 integron-derived beta-lactamase (ampC) INTEGRALL gi|20530945 class 1 integron IntI1 INTEGRALL gi|94442253 Lactobacillus salivarius strain JCM 1046 plasmid pCTN1046 REFSEQ gi|763126141 class 1 integron IntI1 INTEGRALL gi|587656492 class 2 integron IntI2 INTEGRALL gi|13345249 class 2 integron IntI2 INTEGRALL gi|788265642 ICE integron putative INTEGRALL gi|215397925 class 2 integron IntI2 INTEGRALL gi|13345249 class 2 integron IntI2 INTEGRALL gi|563324892 mercury resistance transposon INTEGRALL gi|710572 ICE integron putative INTEGRALL gi|215397925 161 Bacteroides fragilis plasmid pBFP35 REFSEQ gi|194442162 Bacteroides cellulosilyticus WH2 plasmid pBWH2A REFSEQ gi|661525289 class 1 integron IntI1 INTEGRALL gi|659224469 Bacteroides fragilis plasmid pBFP35 REFSEQ gi|194442162 Sphingobium japonicum UT26S plasmid pUT2 DNA REFSEQ gi|294057975 Bacteroides cellulosilyticus WH2 plasmid pBWH2A REFSEQ gi|661525289 Aeromonas hydrophila strain AL06-06 plasmid pAH06-06-2 REFSEQ gi|766626985 plasmid class 1 integron INTEGRALL gi|242876676 ICE integron putative INTEGRALL gi|215397925 Chapter 5 Reference sequences used in C. difficile phylogenetic tree C. difficile 630 GENBANK AM180355 C. difficile M68 GENBANK FN668375 C. difficile CF5 GENBANK FN665652 C. difficile M120 GENBANK FN665653 C. difficile BI-9 GENBANK FN668944 C. difficile BI-1 GENBANK FN668941 C. difficile 2007855 GENBANK FN665654 C. difficile CD196 GENBANK FN538970 C. difficile R20291 GENBANK FN545816 Reference sequences used in C. perfringens phylogenetic tree C. perfringens 13124 GENBANK CP000246 C. perfringens SM101 GENBANK CP000312 C. perfringens str. 13 GENBANK BA000016 162 Appendix 5. SEAR ARG detection using clinical isolate sequence data. This table shows the ARGs detected in clinical isolate data using SEAR, a value of 1 indicates gene presence using the defined cut-offs and a value of 0 indicates gene absence. Accession Method aadA1 strB tetA tetR TEM sul1 dhfr3b catB3 dfrA5 dfrA8 dfrA12 dfrA14 aadA2 CTXM-15 OXA-1 OXA-10 ERR024604 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR024605 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR024606 Holt et al., published 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ERR024607 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR024608 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR024609 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR024610 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR024611 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR024612 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR024616 Holt et al., published 0 1 1 1 1 1 0 0 0 0 0 1 0 0 0 0 SEAR 1 1 1 1 1 1 0 0 0 0 1 0 1 0 0 0 ERR024617 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR024618 Holt et al., published 1 1 1 1 0 1 0 0 0 1 0 1 0 0 0 0 SEAR 1 1 1 1 0 1 0 0 0 0 1 0 0 0 0 0 ERR024619 Holt et al., published 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 ERR024620 Holt et al., published 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 1 0 0 1 0 0 0 0 0 0 0 163 ERR024621 Holt et al., published 1 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 SEAR 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 ERR024622 Holt et al., published 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR024625 Holt et al., published 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 SEAR 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 ERR024626 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR024627 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025682 Holt et al., published 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 ERR025683 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025684 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025685 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025686 Holt et al., published 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 ERR025687 Holt et al., published 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 ERR025688 Holt et al., published 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 ERR025689 Holt et al., published 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 ERR025690 Holt et al., published 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 ERR025691 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025692 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 164 ERR025693 Holt et al., published 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 ERR025695 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025696 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025697 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025698 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025699 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025700 Holt et al., published 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 ERR025701 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025702 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025703 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025704 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025705 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025706 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025708 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025709 Holt et al., published 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025710 Holt et al., published 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 165 ERR025711 Holt et al., published 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025712 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025713 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025714 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025715 Holt et al., published 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025716 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025717 Holt et al., published 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025718 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025719 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025721 Holt et al., published 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025722 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025724 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025725 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025726 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025727 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ERR025729 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 166 ERR025730 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025731 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025732 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025734 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025735 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025736 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025737 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025738 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025739 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025741 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025742 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025743 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025744 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025746 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025747 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025748 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 167 ERR025749 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025750 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025751 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025752 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025753 Holt et al., published 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025754 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025755 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025756 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025758 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025759 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR025761 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025762 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025763 Holt et al., published 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ERR025764 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR025765 Holt et al., published 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 ERR025767 Holt et al., published 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 168 ERR025768 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR028671 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR028672 Holt et al., published 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 SEAR 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 ERR028673 Holt et al., published 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 ERR028674 Holt et al., published 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ERR028675 Holt et al., published 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ERR028676 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR028677 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR028678 Holt et al., published 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 SEAR 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 ERR028679 Holt et al., published 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ERR028680 Holt et al., published 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 SEAR 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 ERR028681 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR028684 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR028685 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR028686 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR028687 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 169 ERR028688 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR028689 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR028690 Holt et al., published 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR028691 Holt et al., published 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ERR028692 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR028693 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR028694 Holt et al., published 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 ERR028695 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR028697 Holt et al., published 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR028699 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR028700 Holt et al., published 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 ERR028702 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR028703 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR028704 Holt et al., published 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ERR028705 Holt et al., published 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ERR028706 Holt et al., published 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SEAR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 170 Appendix 6. Evaluation of metagenome preparation methodologies for aquatic samples. Assessment of filtering methodology: To confirm that vacuum filtration of aquatic samples would capture bacteria that could then be subjected to DNA extraction, a 100mL suspension of an overnight culture of Salmonella typhimurium was vacuum filtered (approximately 2 bar pressure) using a 0.22 µm filter membrane. Aliquots of culture (100μL), as well as the filter membrane, were plated on agar pre- and post-filtration and incubated at 37oC for 24 hours. Plate comparison showed the unfiltered culture and filter membrane had grown a bacterial lawn; the filtrate plate produced no colonies, demonstrating the successful retention of bacterial cells from liquid suspension using vacuum filtration. The Meta-G-Nome DNA extraction kit was then used to examine whether it was possible to extract DNA from the bacterial cells retained on the filter membrane. Successful extraction of DNA was confirmed by spectrophotometry (DNA concentration = 788.5ng/μl, A260:A280 = 1.98). Comparison of DNA extraction protocols: DNA extraction protocols were compared to determine DNA yield and type (plasmid and genomic). Manufacturer protocols were followed and the extracted DNA was eluted in volumes and buffers specified. DNA concentrations were measured using spectrophotometry, recorded in triplicate and averaged (Appendix table 1). For the MiniPrep kit the extraction of pUC19 was confirmed by restriction digest and gel electrophoresis. Appendix table 1 DNA concentrations from Escherichia coli DH5α DNA extractions. Extraction kit Manufacturer DNA concentration (ng/µl) Total DNA quantity (ng) Meta-G-Nome Epicentre 48.42 2421 PowerWater MoBio 12.02 1202 DNeasy Qiagen 12.16 2432 MiniPrep Qiagen 4.10 820 Relative yields of genomic and plasmid DNA from the extraction protocols were determined using qPCR (Appendix table 2). Genomic primers amplified a region of leuA on DH5α chromosomal DNA; plasmid primers amplified a region of bla on 171 pUC19. Standard curves generated from genomic DNA and plasmid copy numbers were used to calculate the ratio of genomic to plasmid DNA produced by each extraction protocol. Appendix table 2 Ratio of genomic to plasmid DNA copy number for DNA extractions. Extraction kit Concentration Genomic DNA (copies/μL) Concentration Plasmid DNA (copies/μL) Ratio Genomic:Plasmid DNA copy number Meta-G-Nome 3.26x104 5.82x106 1:178 PowerWater 4.73x103 1.59x106 1:336 DNeasy 2.68x105 6.85x106 1:26 The Meta-G-Nome and DNeasy DNA extraction kits were concluded to be suitable for metagenome DNA extraction as they yielded sufficient genomic and plasmid DNA, the kits were cost effective and the protocols were quick and efficient. These kits were used on subsequent environmental samples. Determination of sampling requirements: Due to the bacterial load in aquatic samples being undetermined, several different approaches were trialled in order to estimate the sample volume, filtering methodology and extraction kit required to extract sufficient DNA for commercial sequencing (>2μg). Appendix table 3 Amount of DNA extracted using different sample preparation methods. Sample Volume (mL) Filter membrane size (μm) Extraction kit Concentration of extracted DNA (ng/μL) Total DNA quantity (μg) 1 100 0.22 Meta-G-Nome 25.87 1.294 2 100 0.22 DNeasy 0.21 0.042 3 1000 0.22 Meta-G-Nome 139.42 6.971 4 1000 0.22 DNeasy 18.53 3.706 5 1000 3.0, 0.22 Meta-G-Nome 36.24 1.812 6 1000 3.0, 0.22 DNeasy 3.05 0.610 The results obtained from a set of samples collected on 16.04.2012 are shown in Appendix table 3. The Meta-G-Nome protocol consistently produced a greater yield than the DNeasy protocol. The use of 3.0 μm pre-filter resulted in a reduction in DNA 172 yield, however gel electrophoresis and A260:A280 indicated the DNA extraction was cleaner (less degradation and larger fragment size); indicating the pre-filtration was preferable for metagenome preparation. Subsequent sample processing was performed using a sample volume of 10000mL (not shown).