Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 19.
Published in final edited form as: Curr Opin Microbiol. 2020 Apr 2;53:61–70. doi: 10.1016/j.mib.2020.02.009

Comparative Genomics in Infectious Disease

Ahmed M Moustafa 1, Arnav Lal 2, Paul J Planet 1,3,4,*
PMCID: PMC10278558  NIHMSID: NIHMS1724236  PMID: 32248056

Abstract

With more than one million bacterial genome sequences uploaded to public databases in the last 25 years, genomics has become a powerful tool for studying bacterial biology. Here, we review recent approaches that leverage large numbers of whole genome sequences to decipher the spread and pathogenesis of bacterial infectious diseases.

Introduction

It has been 25 years since the first genome of a free-living organism, H. influenzae Rd, was sequenced in its entirety [1]. With the significant reduction of sequencing costs, currently there are 1,067,277 bacterial genome sequences on the Sequence Read Archive (SRA) (https://www.ncbi.nlm.nih.gov/sra/) [2]. However, the archive is highly biased and incomplete. For instance, more than 72% of these genomes are from only 10 species, and only 41% of them have been assembled and submitted to GenBank [3] Table 1 (Accessed 01/27/2020). The dramatic increase in the number of genomes has already spurred the development of bioinformatic tools that can process huge amounts of data, and, most importantly, extract salient biological information. Of specific interest has been the utility of these massive genomic databases for understanding the spread of infectious diseases. Here we review the approaches and methods that have emerged to use whole genome sequencing (WGS) in understanding pathogenesis, outbreaks, and transmission of bacterial infections. Phylogenetic trees still occupy a central place in defining how genomes are related, but a growing set of comparative methodologies, both tree-based and tree-free, allow sophisticated genome comparison and functional prediction (see Box 1 for a non-exhaustive list of WGS tools). Here we highlight some recent instances when WGS techniques have addressed real-world problems in infection spread and pathogenesis. Although, comparative approaches have been used to study all kinds of infections including viruses, parasites, protozoa, and fungi, we will focus here on bacteria.

Table 1.

Bacterial Species with the most genomes in the SRA.

Organism SRA GenBank
Salmonella enterica 289,439 186,968
Escherichia coli 131,451 22,635
Streptococcus pneumoniae 75,553 21,481
Mycobacterium tuberculosis 64,006 6,695
Staphylococcus aureus 63,372 12,413
Campylobacter jejuni 43,202 29,839
Listeria monocytogenes 36,209 27,136
Streptococcus pyogenes 27,336 2,717
Klebsiella pneumoniae 25,294 9,756
Neisseria meningitidis 21,557 1,974
Neisseria gonorrhoeae 19,509 657
Enterococcus faecium 15,950 1,871
Campylobacter coli 14,206 11,935
Streptococcus agalactiae 13,131 1,253
Clostridioides difficile 13,092 2,472
Shigella sonnei 12,333 1,822
Pseudomonas aeruginosa 10,931 5,314
Campylobacter sp. 8,075 150
Acinetobacter baumannii 7,740 4,197
Shigella flexneri 7,564 659

Box 1. Bioinformatic Techniques Used in Comparative Genomics.

  • Reference mapping and variant calling, annotation and visualization

    BWA [4], Minimap2 [5], SAMtools [6], FreeBayes [7], GATK [8], Snippy [9], SnpEff [10], IGV [11]

  • Phylogenetic Inference

    BEAST [12,13], RAxML [14], PhyML [15], MrBayes [16], TNT [17]

  • Recombination Detection

    ClonalFrameML [18], Gubbins [19], Clusterflock [20]

  • Tree Visualization

    GraPhlAn [21], Figtree [22], SplitsTree [23], Dendroscope [24]

  • Integrated Data Online Visualization

    iTOL [25], panX [26], Phandango [27], PATRIC [28], Microreact [29], Nextstrain [30]

  • Assembly

    SPAdes [31], HGAP [32], Unicycler [33]

  • Annotation

    Prokka [34], RAST [35]

  • Pangenome, GWAS and Ancestral Reconstruction

    Roary [36], bugwas [37], Scoary [38], pyseer [39], Mesquite [40]

  • Panallelome

    WhatsGNU [41]

  • Detection of Selection

    HyPhy [42], SNPGenie [43]

  • Phage identification

    PHASTER [44], PhiSpy [45]

  • Antimicrobial Resistance & Virulence

    ResFinder [46], CARD [47], Mykrobe [48], ABRicate [49], VFDB [50]

  • Plasmid Analysis

    PLACNETw [51], plasmidSPAdes [52], PlasmidFinder [53]

  • Graph Genomes

    vg [54], Sequence Tube Maps [55]

  • Sketching for Pairwise Distances

    Mash [56], Dashing [57]

Origin, Spread & Detection of Epidemics and Outbreaks

Perhaps the most straightforward application of WGSs has been in detecting the origin and spread of clonal outbreaks. Indeed, WGS techniques have emerged as a new gold standard that offers superior resolution and power for molecular epidemiology.

A recent study by Copin et al., used high-resolution WGS to investigate an apparent outbreak of MRSA in Brooklyn, New York, in an orthodox Jewish community [58]. Molecular typing showed that the isolates belonged to the widespread, epidemic community associated (CA)-MRSA USA300 clone, but could not distinguish these isolates from the ubiquitous background of this circulating strain. WGS phylogenetic comparison with other USA300 isolates from northern Manhattan and the Bronx showed that 93% of the isolates from patients living in the orthodox community in Brooklyn formed their own clade within USA300, strongly supporting the hypothesis of a new disseminating subclone (USA300-BKV) and a potential emerging public health threat [58].

In a nosocomial carbapenem-resistant K. pneumoniae (CRKP) outbreak in 2011 at the U.S. National Institutes of Health Clinical Center, Snitkin et al. [59] used WGS to investigate the spread among 17 patients beginning in the 3 weeks after the discharge of the index patient. A combined genomic and epidemiological approach linked the outbreak to three independent transmission events from the index patient [59]. A more recent study investigating the 2008 US regional CRKP outbreak that affected 26 health care facilities in 4 adjacent counties in Indiana and Illinois, showed the important role that interfacility patient sharing played in dissemination. WGS-based phylogenetic analysis enabled differentiation of intra- and interfacility transmission events showing that one of the facilities had three independent importation events with two subsequent intrafacility transmission events, rather than one importation and 4 intrafacility events [60].

Another recent study underlined the importance of environmental sampling for the identification of the source of a community outbreak of S. enterica associated with a buffet in a restaurant where WGS testing of raw food, fresh water, and food suppliers did not identify a clear source [61]. Thorough environmental sampling, however, showed that isolates from swabs from the sewer system had the same genomic profile as the outbreak isolates and grouped in the same phylogenetic clade. This study went on to identify an ineffective drainage system that acted as a bacterial reservoir for contaminated bio-aerosols. When the drainage system was remediated the outbreak ended [61].

Phylogenetic techniques also offer well-established techniques for inference of the geographic origins of outbreaks. A particularly clear example is the elucidation of the introduction of a V. cholera outbreak in Haiti by Nepalese U.N. aid workers, where WGS-based phylogenetic analysis of isolates from the Haitian outbreak suggested that they were more closely related to isolates from Nepal compared to other Western Hemisphere V. cholera isolates [62,63]. On the flip side, phylogenetic analysis of USA300 isolates from South America that had been thought to be an extension of the North American Epidemic of CA-MRSA, showed that isolates from the two regions diverged prior to the current epidemic, and represent two separate, parallel epidemics [64].

When there is not enough phylogenetic signal or robust enough sampling to establish the origin or direction of epidemic spread, other population genetic methods may offer alternatives to phylogeny for establishing the origins of epidemics. For instance, the concept of range expansion was recently used to infer the founding location of the USA300 clone in the US. This technique relies on the pattern produced from the multiple founder effects at the edge of an expanding pathogen front, which creates a pattern where diversity is lowest farthest away from the origin. By searching for the geographic location that maximized the slope of the diversity gradient, Challagundla et al. identified Pennsylvania as the most probable origin of USA300 [65].

While many outbreak WGS studies are retrospective out of necessity, WGS could offer a prospective strategy for detecting new emerging clones [6668]. Real-time, prospective WGS of Listeria monocytogenes isolates collected from patients, food, and food processing environments has recently been used to uncover the origins of listeriosis outbreaks [66]. The Listeria whole genome sequencing project leveraged raw sequence data and metadata collected by multiple collaborating agencies (CDC, FDA, USDA-FSIS, amongst others) [66], and used whole genome MLST (wgMLST) phylogenies in several outbreak investigations [6973]. In one outbreak, despite having different pulsed-field gel electrophoresis (PFGE) patterns, wgMLST of patient isolates showed they were highly similar and traceable to isolates from an ice cream producer. Overall, the initiative showed that the number of clusters detected and number of outbreaks “solved” increased by 1.5 and 4.5 times, respectively, by the second year of using WGS compared to pre-WGS technologies [66].

Transmission

WGS analysis can also allow for tracking of epidemics at the most granular level, that is in specific transmission events between patients. Differences shared by the donor and recipient, but not present in other circulating strains, provide strong evidence for transmission, so it is critical that the rates of evolution are fast enough to provide new genetic changes that “mark” donor /recipient pairs that are discoverable using phylogenies or other network approaches. Such transmission networks, have been used in the tuberculosis field [74] among others [59,60] to determine the direction of transmission.

Patient-to-patient transmission has recently become controversial in the nontuberculous mycobacterium field, in which WGS reports of M. abscessus infection in cystic fibrosis patients suggested episodes of patient-to-patient transmission and worldwide dissemination of a single clone [7578]. Subsequent reports have challenged the importance of person-to-person spread using WGS data [7981]. While this controversy is unresolved, the analytical and theoretical hurdles have been instructive. First, it seems that there is unlikely to be a single cut-off value (eg., SNP differences) that can clearly identify which isolates are the same and different. A second critical issue is that comprehensive sampling of the environment and other patients, perhaps in other locations, is critical to “ruling in” transmission. The more genomes that do not cluster tightly with the putative transmitted isolate genomes, the more robust the study.

Another key parameter in studies of transmission, is the genomic diversity and heterogeneity of the pathogen in each host, and more than one isolate per case may need to be sequenced to avoid reconstruction of inaccurate events [82,83]. For instance, in the aforementioned study by Snitkin et al., WGSs of isolates from different body sites of the index patient was valuable for reconstructing three independent transmission events to 17 colonized patients [59]. Another case in point is a recent study [84] that re-examined isolates from a 2011–2012 TB outbreak in the Canadian Arctic [85]. The initial study had identified two subgroups of isolates that could be differentiated from each other by a single polymorphism, pointing to two distinct source patients [85]. Deeper sequencing of mixed communities (sweeps of multiple colonies from media plates) from a single patient sample identified that one source patient harbored genomes with both polymorphisms, leading to the possibility that this individual was a super-spreader, who likely transmitted to a third of the patients during the outbreak [84]. Another recent study used WGS to establish a direct link between probiotic use and Lactobacillus bacteremia in ICU patients. In this study, genomic heterogeneity of L. rhamnosus in blood isolates reflected the genomic heterogeneity in the probiotic capsules [86].

Inferring Function & Pathogenesis: Targeted Searches, Ancestral Reconstruction, and GWAS

The above techniques focus on defining relatedness, but WGS becomes even more powerful when we can predict biological function. For instance, WGS can characterize the antibiotic “resistome” of a strain by detecting known genes or variants associated with drug resistance. This is especially helpful in pathogens that are difficult to grow in a timely manner, which has been demonstrated in same-day tuberculosis diagnosis and antibiotic susceptibility prediction from culture-free respiratory samples [87]. Likewise, a “virulome”, or the set of all genes or variants encoding known virulence factors, might be used to determine the pathogenic potential of an organism [49,50]. For instance, isolates of Salmonella serovars from different sources (animal, human and environment) that share the same virulence genes might suggest the capacity of the animal and environmental isolates to cause infection in humans when transmitted [88]. However, preliminary studies of virulomes suggest that reliable interpretations may be complicated by diverse pathogen virulence profiles [89], and at least one study examining S. aureus bacteremia has shown no correlation between the virulome and clinical outcomes [90]. More clinical, longitudinal studies are needed to address the utility of this approach [91].

In a less directed way, WGSs can be used to highlight specific genes that may have contributed to the evolution of a disease. One goal is to find genes or genetic changes that are associated with critical evolutionary events, which is most commonly done within the framework of ancestral reconstruction on a phylogenetic tree [64]. Genomic features that were acquired on a branch that represents an emerging epidemic, and were maintained in many of the descendent genomes, are strong candidate loci that could encode critical biological traits that made an outbreak strain more fit. For an outbreak, fitness might be multifactorial, including enhanced transmission, persistence, virulence, or immune evasion so it is worthwhile considering genes with multiple functions. For instance, in the case of the clone USA300-BKV, an inactivating SNP in the transcriptional repressor of the pyrimidine nucleotide biosynthetic operon (pyrR) was likely important for commensal metabolic fitness [58]. In the parallel USA300 epidemics in North and South America, independent acquisition of copper detoxification loci may have led to increased survival upon copper challenge in the environment and in macrophages [9294].

A key advantage of using WGSs in identifying genes involved in the emergence of new disease-causing strains is the ability to detect new additions to the genome. Horizontal gene transfer and the acquisition of mobile genetic elements appear to play a critical role. In the USA300-BKV clade, a prophage variant of ϕ11 was present in almost half of the isolates, and the presence of this phage was shown to enhance virulence in a murine skin model [58]. A recent study that investigated 109 isolates of outbreak-associated Clostridium perfringens from England and Wales over 7-year period showed that a specific enterotoxin producing clade (CPE) was responsible for 9 different food-poisoning outbreaks [95]. Surprisingly, although most C. perfringens enterotoxin had been thought to be chromosomally encoded, 83% of the outbreak strains carried enterotoxin-encoding (cpe) plasmids that had previously been thought to be relatively uncommon. Interestingly, the presence of the plasmid in phylogenetically distinct strains may indicate horizontal transfer by conjugation [95].

Another approach to ascertain function from WGSs is to use genome-wide association studies (GWAS). GWAS allows genetic features to be identified that are associated with some phenotype or clinical outcome. Sheppard et al. recently used GWAS in Campylobacter isolates to identify a genetic region for vitamin B5 biosynthesis that likely represents an adaptation to a diet of grasses in cattle [96]. Another large study by Levy et al. showed that genes for proteins involved in carbohydrate metabolism and transport are enriched in plant associated bacteria compared to non-plant associated genomes, and that a novel T6SS effector operon, involved in direct bacterial competition, was associated with the phytopathogenic bacteria of the genus Acidovorax [97]. In the S. aureus field, GWAS has been used to identify genomic loci associated with poor outcomes in bacteremia, but, importantly, the associations only held in some subdivisions (clonal complexes) of the species suggesting that specific genetic backgrounds have a large impact on pathogenicity, and arguing that phylogeny is critical for interpreting GWAS [98]. Indeed, a problem that has dogged GWAS studies in bacterial infectious disease is that phylogenetic relatedness of strains can strongly confound statistical inference of associations, however several techniques have recently become available that account for underlying phylogeny [99,100].

In-host Adaptation

Another way to detect functional properties important for pathogenesis or in-host persistence, is to identify adaptive changes that happen in each host. Evolution in situ provides strong evidence for the involvement of specific genes in host adaptation especially when seen repeatedly in different patients. A well-studied example of in-host adaptation is the development of “mucoidy”, or overproduction of alginate that impacts biofilm formation and antibiotic susceptibilities in strains of P. aeruginosa in cystic fibrosis [101]. While the development of mucoidy in different patients was described well before WGSs became widely available, whole genomes have clearly shown that this is an evolutionary step that often occurs by mutation and not by strain replacement. WGSs have also elucidated other parallel, predictable changes such as the loss of surface appendages and key regulatory networks [102]. A recent study by Riquelme et al. recapitulated many of these same observations, and also identified metabolic reprogramming as a major adaptational change for long-term P. aeruginosa infection [103]. In-host evolution has also been used to identify key genetic changes in other organisms such as S. aureus that often colonize for long periods of time before causing more acute infections [104,105].

An exciting new area of investigation afforded by WGSs is obtaining a better understanding of the “arms race” between the pathogen and the host, by using an integrative genomics approach that sequences both the host and the pathogen [106,107]. A recent study used an experimental host–pathogen model of the nematode Caenorhabditis elegans and its pathogen Bacillus thuringiensis to understand coadaptation. Interestingly the study showed that the phenotypic co-adaptation was explained by complex modifications both in the host and pathogen [108]. A recent joint GWAS study that sequenced both human and pneumococcal genomes suggested that 70% of invasiveness could be accounted for by bacterial genomic variation, but there seemed to be no effect on severity. Human genetic variations, on the other hand, explained half of the variation in meningitis susceptibility and a third of meningitis severity [109].

Data gained and data lost: scalability and database driven technologies

Because of the sheer volume of data, and the fact that our comparative algorithms do not scale well, many of our analytical techniques require data reduction and loss. For instance, most of the phylogenetic examples above use a comparative reference composed of either core genes or a single reference genome. Any sequence data not in the reference is lost. This missing data decreases the discriminatory power for WGS analyses, and may lead to inaccurate statements about relatedness or critical gene/SNP content. Using multiple references might help gauge the sensitivity of the analytical outcomes to reference choice, but the choice of which references to try is problematic, and also does not scale well.

Ideally, we would have ways to leverage all of the data and make full-scale comparisons between all of the genomes in the database. One approach is to reduce sequences to representative “sketches”, which could be used for efficiently calculating whole genome nucleotide similarity between sequences [56]. Tools such as Mash can very quickly identify the closest genomes from large databases and cluster large numbers of genomes [110,111]. However, by reducing the data, critical data about the basis of that similarity is lost. One promising new approach for retaining individual genomic data is using a graph structure representation of the genetic variations from a population of genomes such as in the tool variation graph (vg) [54]. The use of variation graphs is still in its infancy, but a recent tool “Sequence Tube Map” by Beyer et al. makes the visualization of variation graphs easier and more intuitive [55]. Pangenomic techniques such as PIRATE [112], MetaPGN [113], and Roary [36] may also offer ways to more fully estimate WGS diversity in a computationally tractable way.

Another source of data loss comes in the necessity to exclude genomes from phylogenetic analyses because of the computational cost with increasing numbers of taxa. Including publicly available genomes is generally a good practice because it provides context, enhances reproducibility, and may also fill in gaps in temporal data. However, the choice of which genomes to use is not trivial, especially in well sampled groups.

One way to access the full potential information of WGS databases, is to make the database the object of query in comparative analyses. Database driven methods like BLAST [114] have been absolutely indispensable, but become cumbersome when comparing whole genomes. We recently used a data compression approach to create a comparative tool that leverages all the diversity in the database. The tool, “WhatsGNU” (https://github.com/ahmedmagds/WhatsGNU), uses an exact-match, proteomic compression technique to remove redundant sequences keeping one copy of each protein allele, while preserving the genomes’ identifiers and all associated metadata (such as geographical location and clonal complex). This approach allows very rapid comparison of newly sequenced genomes to the compressed database to identify novel protein sequences and relate these to metadata. Recently we used this tool to detect novelty in skin adapted S. aureus genomes [115].

Database-driven techniques are, of course, affected by accessibility, composition, and the quality of the database itself (Box 2). Techniques are needed that can assess bias, error, and measure complexity. For instance, simple collectors’ curves of pan-genomes were instrumental in establishing the “open pan-genome concept” in which continuous horizontal gene transfer adds to the evolution of a species. Likewise, panallelome approaches such as WhatsGNU can evaluate the accumulation of new allelic variants as new genomes are sequenced (Figure 1).

Box 2. Current limitations and potential solutions.

  • Inconsistent annotations and inaccurate metadata

    Specialized species databases such as Staphopia [116], Enterobase [117] and Pseudomonas Genome Database [118] help considerably in curation and quality control, but they take large amounts of dedicated time and funding.

  • Sampling bias

    More than 50% of the 10,000+ assembled S. aureus genomes currently available in NCBI are from clonal complexes 5 and 8. To overcome this problem, unbiased sampling techniques are needed with a continuous assessment of the unsampled sequence diversity (eg., Figure 1).

  • Data storage

    Generally, data reduction [56] and compression approaches [41] along with graph genomes [54] will help in mitigating storage problems. In addition, increased adoption of cloud-computing and storage will help smaller labs with limited infrastructure.

  • Data shareability

    Data sharing is a crucial pillar in pathogen surveillance, and it is also critical in speeding up outbreak response times [119121]. One of the issues that face data sharing is concern for loss of control over data and needed protection for Protected Health Information, all of which results in delay in access to data. One potential solution would be decentralized technologies [122,123] like private or permissioned blockchains which would allow temporary granted access and anonymity for patient data.

  • Reproducibility

    A framework such as the BioCompute Object [124] that has all the tools’ versions, computational parameters, dependencies, usage and commands for a bioinformatic pipeline should be included as supplementary methods in all publications. In addition, whenever possible, multiple parallel bioinformatic pipelines should be used for the same analyses.

Figure 1.

Figure 1.

A collector’s curve expresses the number of exact matches (unique alleles) as a function of the number of genomes sequenced. 1000, 10000, 50000, 75000,100000, 125000,150000, 175000 and 200000 genomes from the 216,642 S. enterica genomes available on Enterobase [117] were randomly selected. The random sampling step was done three times independently with replacement. Error bars are shown in green. Note that though the slope of the curve gets less steep over time the curve does not plateau, representing unsampled diversity and/or continuing generation of allelic diversity through evolution.

Conclusions and Future Directions

The use of WGS has caused a leap forward in understanding the spread of bacterial infections with enhanced resolution for tracing transmission and spread. Technologies and infrastructure that can rapidly, and prospectively, sequence whole genomes in clinical settings will change molecular epidemiology, and will likely have a direct impact on treatment, prevention, and other interventions. A powerful aspect of WGSs is that they can be used to make predictions about function. As we go forward, it will be critical to find ways to efficiently test the biological hypotheses generated by WGS analyses at the bench or in the field, moving these studies closer to fundamental mechanisms with possible interventional targets.

In the bioinformatic sphere, WGSs have brought a new appreciation for the problems associated with enormous amounts of data, and it has become clear that our current tools may not be sufficient to extract all of the pertinent from these ever-expanding data. This will only be exacerbated by new datasets with parallel, integrative genomics (host/pathogen or host/microbiome/pathogen), or with spatial or temporal components. We need to develop new bioinformatic tools, that scale well to take advantage of the enormous numbers of genomes being produced, allowing for better and more rapid inference of biologically and epidemiologically important information.

Highlights.

  • 72% of the available 1,067,277 bacterial genomes on NCBI are from only 10 species.

  • WGS is a new gold standard for detection of clonal outbreaks and transmission events.

  • WGS ancestral reconstruction and GWAS allow unbiased functional prediction.

  • WGS of longitudinal infection can be used to detect host adaptations.

  • The data loss common in WGS methods may be tackled with database-driven approaches.

Footnotes

Declarations of interest:

None.

References

  • 1.Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al. : Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 1995, 269:496–512. [DOI] [PubMed] [Google Scholar]
  • 2.Leinonen R, Sugawara H, Shumway M, Collaboration obotINSD: The Sequence Read Archive. Nucleic Acids Research 2010, 39:D19–D21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I: GenBank. Nucleic Acids Res 2019, 47:D94–D99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Li H, Durbin R: Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 2009, 25:1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Li H: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018, 34:3094–3100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Subgroup GPDP: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25:2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Garrison E, Marth G: Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 2012.
  • 8.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. : The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010, 20:1297–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Seemann T: https://github.com/tseemann/snippy.
  • 10.Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 2012, 6:80–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Thorvaldsdóttir H, Robinson JT, Mesirov JP: Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics 2012, 14:178–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bouckaert R, Vaughan TG, Barido-Sottani J, Duchene S, Fourment M, Gavryushkina A, Heled J, Jones G, Kuhnert D, De Maio N, et al. : BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2019, 15:e1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A: Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol 2018, 4:vey016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Stamatakis A: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014, 30:1312–1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 2010, 59:307–321. [DOI] [PubMed] [Google Scholar]
  • 16.Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP: MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 2012, 61:539–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Goloboff PA, Farris JS, Nixon KC: TNT, a free program for phylogenetic analysis. Cladistics 2008, 24:774–786. [Google Scholar]
  • 18.Didelot X, Wilson DJ: ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 2015, 11:e1004041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Croucher NJ, Page AJ, Connor TR, Delaney AJ, Keane JA, Bentley SD, Parkhill J, Harris SR: Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Research 2014, 43:e15-e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Narechania A, Baker R, DeSalle R, Mathema B, Kolokotronis S-O, Kreiswirth B, Planet PJ: Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets. GigaScience 2016, 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Asnicar F, Weingart G, Tickle TL, Huttenhower C, Segata N: Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ 2015, 3:e1029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rambaut A, Drummond A: FigTree version 1.4. 0. 2012.
  • 23.Huson DH, Bryant D: Application of Phylogenetic Networks in Evolutionary Studies. Molecular Biology and Evolution 2006, 23:254–267. [DOI] [PubMed] [Google Scholar]
  • 24.Huson DH, Scornavacca C: Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks. Syst Biol 2012, 61:1061–1067. [DOI] [PubMed] [Google Scholar]
  • 25.Letunic I, Bork P: Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res 2019, 47:W256–W259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ding W, Baumdicker F, Neher RA: panX: pan-genome analysis and exploration. Nucleic Acids Res 2018, 46:e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hadfield J, Croucher NJ, Goater RJ, Abudahab K, Aanensen DM, Harris SR: Phandango: an interactive viewer for bacterial population genomics. Bioinformatics 2017. [DOI] [PMC free article] [PubMed]
  • 28.Wattam AR, Davis JJ, Assaf R, Boisvert S, Brettin T, Bun C, Conrad N, Dietrich EM, Disz T, Gabbard JL, et al. : Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res 2017, 45:D535–D542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Argimon S, Abudahab K, Goater RJE, Fedosejev A, Bhai J, Glasner C, Feil EJ, Holden MTG, Yeats CA, Grundmann H, et al. : Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Microb Genom 2016, 2:e000093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher RA: Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 2018, 34:4121–4123. ••An open science prize winner tool that tracks the spread of global pathogens in real-time.
  • 31.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, et al. : SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 2012, 19:455–477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, et al. : Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods 2013, 10:563–569. [DOI] [PubMed] [Google Scholar]
  • 33.Wick RR, Judd LM, Gorrie CL, Holt KE: Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLOS Computational Biology 2017, 13:e1005595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Seemann T: Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014. [DOI] [PubMed]
  • 35.Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, et al. : The RAST Server: rapid annotations using subsystems technology. BMC Genomics 2008, 9:75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MT, Fookes M, Falush D, Keane JA, Parkhill J: Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 2015, 31:3691–3693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Earle SG, Wu CH, Charlesworth J, Stoesser N, Gordon NC, Walker TM, Spencer CCA, Iqbal Z, Clifton DA, Hopkins KL, et al. : Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol 2016, 1:16041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Brynildsrud O, Bohlin J, Scheffer L, Eldholm V: Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 2016, 17:238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lees JA, Galardini M, Bentley SD, Weiser JN, Corander J: pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics 2018, 34:4310–4312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Maddison WP DR Maddison.: Mesquite: a modular system for evolutionary analysis. Version 3.61 http://www.mesquiteproject.org. 2019.
  • 41.Moustafa AM, Planet PJ (in press; ): WhatsGNU: a tool for identifying proteomic novelty. Genome Biology 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kosakovsky Pond SL, Poon AFY, Velazquez R, Weaver S, Hepler NL, Murrell B, Shank SD, Magalis BR, Bouvier D, Nekrutenko A, et al. : HyPhy 2.5 - a customizable platform for evolutionary hypothesis testing using phylogenies. Mol Biol Evol 2019. [DOI] [PMC free article] [PubMed]
  • 43.Nelson CW, Moncla LH, Hughes AL: SNPGenie: estimating evolutionary parameters to detect natural selection using pooled next-generation sequencing data. Bioinformatics 2015, 31:3709–3711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Arndt D, Grant JR, Marcu A, Sajed T, Pon A, Liang Y, Wishart DS: PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res 2016, 44:W16–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Akhter S, Aziz RK, Edwards RA: PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Research 2012, 40:e126–e126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, Aarestrup FM, Larsen MV: Identification of acquired antimicrobial resistance genes. Journal of Antimicrobial Chemotherapy 2012, 67:2640–2644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Alcock BP, Raphenya AR, Lau TTY, Tsang KK, Bouchard M, Edalatmand A, Huynh W, Nguyen A-LV, Cheng AA, Liu S, et al. : CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Research 2019, 48:D517–D525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Bradley P, Gordon NC, Walker TM, Dunn L, Heys S, Huang B, Earle S, Pankhurst LJ, Anson L, de Cesare M, et al. : Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nature Communications 2015, 6:10063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Seemann T: https://github.com/tseemann/abricate.
  • 50.Liu B, Zheng D, Jin Q, Chen L, Yang J: VFDB 2019: a comparative pathogenomic platform with an interactive web interface. Nucleic Acids Research 2018, 47:D687–D692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Vielva L, de Toro M, Lanza VF, de la Cruz F: PLACNETw: a web-based tool for plasmid reconstruction from bacterial genomes. Bioinformatics 2017, 33:3796–3798. [DOI] [PubMed] [Google Scholar]
  • 52.Antipov D, Hartwick N, Shen M, Raiko M, Lapidus A, Pevzner PA: plasmidSPAdes: assembling plasmids from whole genome sequencing data. Bioinformatics 2016, 32:3380–3387. [DOI] [PubMed] [Google Scholar]
  • 53.Carattoli A, Zankari E, García-Fernández A, Voldby Larsen M, Lund O, Villa L, Møller Aarestrup F, Hasman H: In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing. Antimicrobial Agents and Chemotherapy 2014, 58:3895–3903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Garrison E, Siren J, Novak AM, Hickey G, Eizenga JM, Dawson ET, Jones W, Garg S, Markello C, Lin MF, et al. : Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018, 36:875–879. ••Similar to sequence assembly graphs, this software library compactly represents genetic variations between multiple genomes as a graph.
  • 55. Beyer W, Novak AM, Hickey G, Chan J, Tan V, Paten B, Zerbino DR: Sequence tube maps: making graph genomes intuitive to commuters. Bioinformatics 2019. •Representing structural variants in graph genomes and sequence alignments as underground maps.
  • 56. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM: Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 2016, 17:132. •A pioneer genome sketching tool that calculates the pairwise distance of genomes.
  • 57.Baker DN, Langmead B: Dashing: fast and accurate genomic distances with HyperLogLog. Genome Biology 2019, 20:265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Copin R, Sause WE, Fulmer Y, Balasubramanian D, Dyzenhaus S, Ahmed JM, Kumar K, Lees J, Stachel A, Fisher JC, et al. : Sequential evolution of virulence and resistance during clonal spread of community-acquired methicillin-resistant Staphylococcus aureus. Proc Natl Acad Sci U S A 2019, 116:1745–1754. •Using high-resolution WGS this paper investigated a USA300 MRSA outbreak and showed the emergence of a disseminating subclone.
  • 59.Snitkin ES, Zelazny AM, Thomas PJ, Stock F, Group NCSP, Henderson DK, Palmore TN, Segre JA: Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Sci Transl Med 2012, 4:148ra116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Snitkin ES, Won S, Pirani A, Lapp Z, Weinstein RA, Lolans K, Hayden MK: Integrated genomic and interfacility patient-transfer data reveal the transmission pathways of multidrug-resistant Klebsiella pneumoniae in a regional outbreak. Sci Transl Med 2017, 9. [DOI] [PubMed] [Google Scholar]
  • 61.Mair-Jenkins J, Borges-Stewart R, Harbour C, Cox-Rogers J, Dallman T, Ashton P, Johnston R, Modha D, Monk P, Puleston R: Investigation using whole genome sequencing of a prolonged restaurant outbreak of Salmonella Typhimurium linked to the building drainage system, England, February 2015 to March 2016. Euro Surveill 2017, 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Hendriksen RS, Price LB, Schupp JM, Gillece JD, Kaas RS, Engelthaler DM, Bortolaia V, Pearson T, Waters AE, Upadhyay BP, et al. : Population genetics of Vibrio cholerae from Nepal in 2010: evidence on the origin of the Haitian outbreak. mBio 2011, 2:e00157–00111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Chin C-S, Sorenson J, Harris JB, Robins WP, Charles RC, Jean-Charles RR, Bullard J, Webster DR, Kasarskis A, Peluso P, et al. : The Origin of the Haitian Cholera Outbreak Strain. New England Journal of Medicine 2010, 364:33–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Planet PJ, Diaz L, Kolokotronis SO, Narechania A, Reyes J, Xing G, Rincon S, Smith H, Panesso D, Ryan C, et al. : Parallel Epidemics of Community-Associated Methicillin-Resistant Staphylococcus aureus USA300 Infection in North and South America. J Infect Dis 2015, 212:1874–1882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Challagundla L, Luo X, Tickler IA, Didelot X, Coleman DC, Shore AC, Coombs GW, Sordelli DO, Brown EL, Skov R, et al. : Range Expansion and the Origin of USA300 North American Epidemic Methicillin-Resistant Staphylococcus aureus. MBio 2018, 9. •Using a phylogeny-free approach, this paper used range expansion to identify Pennsylvania as the most probable origin of USA300.
  • 66.Jackson BR, Tarr C, Strain E, Jackson KA, Conrad A, Carleton H, Katz LS, Stroika S, Gould LH, Mody RK, et al. : Implementation of Nationwide Real-time Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and Investigation. Clin Infect Dis 2016, 63:380–386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Mellmann A, Bletz S, Böking T, Kipp F, Becker K, Schultes A, Prior K, Harmsen D: Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infection Control in an Institutional Setting. Journal of Clinical Microbiology 2016, 54:2874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Kwong JC, Mercoulia K, Tomita T, Easton M, Li HY, Bulach DM, Stinear TP, Seemann T, Howden BP: Prospective Whole-Genome Sequencing Enhances National Surveillance of Listeria monocytogenes. J Clin Microbiol 2016, 54:333–342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Jackson KA, Stroika S, Katz LS, Beal J, Brandt E, Nadon C, Reimer A, Major B, Conrad A, Tarr C, et al. : Use of Whole Genome Sequencing and Patient Interviews To Link a Case of Sporadic Listeriosis to Consumption of Prepackaged Lettuce. J Food Prot 2016, 79:806–809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Centers for Disease Control and Prevention. Multistate outbreak of listeriosis linked to Blue Bell creameries products. Available at: http://www.cdc.gov/listeria/outbreaks/ice-cream-03-15/. Accessed 12/19/2019.
  • 71.Centers for Disease Control and Prevention. Multistate outbreak of listeriosis linked to soft cheeses distributed by Karoun Dairies, Inc. Available at: http://www.cdc.gov/listeria/outbreaks/soft-cheeses-09-15/. Accessed 12/19/2019.
  • 72.Centers for Disease Control and Prevention.Wholesome Soy Products, Inc. sprouts and investigation of human listeriosis cases. Available at: http://www.cdc.gov/listeria/outbreaks/bean-sprouts-11-14/index.html. Accessed 12/19/2019. [Google Scholar]
  • 73.Centers for Disease Control and Prevention. Multistate outbreak of listeriosis linked to commercially produced, prepackaged caramel apples made from Bidart Bros. apples Available at: http://www.cdc.gov/listeria/outbreaks/caramel-apples-12-14/index.html. Accessed 12/19/2019.
  • 74.Walker TM, Ip CL, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, Eyre DW, Wilson DJ, Hawkey PM, Crook DW, et al. : Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis 2013, 13:137–146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Bryant JM, Grogono DM, Greaves D, Foweraker J, Roddick I, Inns T, Reacher M, Haworth CS, Curran MD, Harris SR, et al. : Whole-genome sequencing to identify transmission of Mycobacterium abscessus between patients with cystic fibrosis: a retrospective cohort study. Lancet 2013, 381:1551–1560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Davidson RM, Hasan NA, Reynolds PR, Totten S, Garcia B, Levin A, Ramamoorthy P, Heifets L, Daley CL, Strong M: Genome sequencing of Mycobacterium abscessus isolates from patients in the united states and comparisons to globally diverse clinical strains. J Clin Microbiol 2014, 52:3573–3582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Davidson RM, Hasan NA, de Moura VC, Duarte RS, Jackson M, Strong M: Phylogenomics of Brazilian epidemic isolates of Mycobacterium abscessus subsp. bolletii reveals relationships of global outbreak strains. Infect Genet Evol 2013, 20:292–297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Bryant JM, Grogono DM, Rodriguez-Rincon D, Everall I, Brown KP, Moreno P, Verma D, Hill E, Drijkoningen J, Gilligan P, et al. : Emergence and spread of a human-transmissible multidrug-resistant nontuberculous mycobacterium. Science 2016, 354:751–757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Doyle RM, Rubio M, Dixon G, Hartley J, Klein N, Coll P, Harris KA: Cross-transmission is not the source of new Mycobacterium abscessus infections in a multi-centre cohort of cystic fibrosis patients. Clin Infect Dis 2019. •This paper rules out direct patient-to-patient transmission of M. abscessus in CF patients and underlines the importance of environmental samples in establishing transmission networks.
  • 80.Harris KA, Underwood A, Kenna DT, Brooks A, Kavaliunaite E, Kapatai G, Tewolde R, Aurora P, Dixon G: Whole-genome sequencing and epidemiological analysis do not provide evidence for cross-transmission of mycobacterium abscessus in a cohort of pediatric cystic fibrosis patients. Clin Infect Dis 2015, 60:1007–1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Tortoli E, Kohl TA, Trovato A, Baldan R, Campana S, Cariani L, Colombo C, Costa D, Cristadoro S, Di Serio MC, et al. : Mycobacterium abscessus in patients with cystic fibrosis: low impact of inter-human transmission in Italy. Eur Respir J 2017, 50. [DOI] [PubMed] [Google Scholar]
  • 82.Worby CJ, Lipsitch M, Hanage WP: Within-host bacterial diversity hinders accurate reconstruction of transmission networks from genomic distance data. PLoS Comput Biol 2014, 10:e1003549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Didelot X, Walker AS, Peto TE, Crook DW, Wilson DJ: Within-host evolution of bacterial pathogens. Nat Rev Microbiol 2016, 14:150–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84. Lee RS, Proulx J-F, McIntosh F, Behr MA, Hanage WP: Previously undetected super-spreading of Mycobacterium tuberculosis revealed by deep sequencing. eLife 2020, 9:e53245. •This paper shows the potential of deep sequencing in public health investigations and for building accurate transmission clusters.
  • 85.Lee RS, Radomski N, Proulx J-F, Manry J, McIntosh F, Desjardins F, Soualhine H, Domenech P, Reed MB, Menzies D, et al. : Reemergence and Amplification of Tuberculosis in the Canadian Arctic. The Journal of Infectious Diseases 2015, 211:1905–1914. [DOI] [PubMed] [Google Scholar]
  • 86. Yelin I, Flett KB, Merakou C, Mehrotra P, Stam J, Snesrud E, Hinkle M, Lesho E, McGann P, McAdam AJ, et al. : Genomic and epidemiological evidence of bacterial transmission from probiotic capsule to blood in ICU patients. Nat Med 2019, 25:1728–1732. •This paper shows how heterogeneity of blood isolates of ICU patients with Lactobacillus bacteremia mirrors heterogeneity in probiotic capsules administered.
  • 87. Votintseva AA, Bradley P, Pankhurst L, del Ojo Elias C, Loose M, Nilgiriwala K, Chatterjee A, Smith EG, Sanderson N, Walker TM, et al. : Same-Day Diagnostic and Surveillance Data for Tuberculosis via Whole-Genome Sequencing of Direct Respiratory Samples. Journal of Clinical Microbiology 2017, 55:1285. •In this paper they used WGS to diagnose TB and predict the resistome profile of the strains in less than a day.
  • 88.Pornsukarom S, van Vliet AHM, Thakur S: Whole genome sequencing analysis of multiple Salmonella serovars provides insights into phylogenetic relatedness, antimicrobial resistance, and virulence markers across humans, food animals and agriculture environmental sources. BMC Genomics 2018, 19:801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Manara S, Pasolli E, Dolce D, Ravenni N, Campana S, Armanini F, Asnicar F, Mengoni A, Galli L, Montagnani C, et al. : Whole-genome epidemiology, characterisation, and phylogenetic reconstruction of Staphylococcus aureus strains in a paediatric hospital. Genome Med 2018, 10:82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Park K-H, Greenwood-Quaintance KE, Cunningham SA, Rajagopalan G, Chia N, Jeraldo PR, Mandrekar J, Patel R: Lack of correlation of virulence gene profiles of Staphylococcus aureus bacteremia isolates with mortality. Microbial Pathogenesis 2019, 133:103543. [DOI] [PubMed] [Google Scholar]
  • 91.Hourigan SK, Subramanian P, Hasan NA, Ta A, Klein E, Chettout N, Huddleston K, Deopujari V, Levy S, Baveja R, et al. : Comparison of Infant Gut and Skin Microbiota, Resistome and Virulome Between Neonatal Intensive Care Unit (NICU) Environments. Frontiers in Microbiology 2018, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Zapotoczna M, Riboldi GP, Moustafa AM, Dickson E, Narechania A, Morrissey JA, Planet PJ, Holden MTG, Waldron KJ, Geoghegan JA: Mobile-Genetic-Element-Encoded Hypertolerance to Copper Protects Staphylococcus aureus from Killing by Host Phagocytes. mBio 2018, 9. •This paper uses phylogenetic analysis to show the widespread horizontal transfer of copper resistance genes in different clonal complexes of S. aureus and their importance for protection from phagocyte killing.
  • 93.Purves J, Thomas J, Riboldi GP, Zapotoczna M, Tarrant E, Andrew PW, Londoño A, Planet PJ, Geoghegan JA, Waldron KJ, et al. : A horizontally gene transferred copper resistance locus confers hyper-resistance to antibacterial copper toxicity and enables survival of community acquired methicillin resistant Staphylococcus aureus USA300 in macrophages. Environmental Microbiology 2018, 20:1576–1589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Rosario-Cruz Z, Eletsky A, Daigham NS, Al-Tameemi H, Swapna GVT, Kahn PC, Szyperski T, Montelione GT, Boyd JM: The copBL operon protects Staphylococcus aureus from copper toxicity: CopL is an extracellular membrane-associated copper-binding protein. J Biol Chem 2019, 294:4027–4044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Kiu R, Caim S, Painset A, Pickard D, Swift C, Dougan G, Mather AE, Amar C, Hall LJ: Phylogenomic analysis of gastroenteritis-associated Clostridium perfringens in England and Wales over a 7-year period indicates distribution of clonal toxigenic strains in multiple outbreaks and extensive involvement of enterotoxin-encoding (CPE) plasmids. Microbial Genomics 2019, 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Sheppard SK, Didelot X, Meric G, Torralbo A, Jolley KA, Kelly DJ, Bentley SD, Maiden MC, Parkhill J, Falush D: Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc Natl Acad Sci U S A 2013, 110:11923–11927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Levy A, Salas Gonzalez I, Mittelviefhaus M, Clingenpeel S, Herrera Paredes S, Miao J, Wang K, Devescovi G, Stillman K, Monteiro F, et al. : Genomic features of bacterial adaptation to plants. Nat Genet 2017, 50:138–150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Recker M, Laabei M, Toleman MS, Reuter S, Saunderson RB, Blane B, Torok ME, Ouadi K, Stevens E, Yokoyama M, et al. : Clonal differences in Staphylococcus aureus bacteraemia-associated mortality. Nat Microbiol 2017, 2:1381–1388. [DOI] [PubMed] [Google Scholar]
  • 99.San JE, Baichoo S, Kanzi A, Moosa Y, Lessells R, Fonseca V, Mogaka J, Power R, de Oliveira T: Current Affairs of Microbial Genome-Wide Association Studies: Approaches, Bottlenecks and Analytical Pitfalls. Frontiers in Microbiology 2020, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Collins C, Didelot X: A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLoS Comput Biol 2018, 14:e1005958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Folkesson A, Jelsbak L, Yang L, Johansen HK, Ciofu O, Høiby N, Molin S: Adaptation of Pseudomonas aeruginosa to the cystic fibrosis airway: an evolutionary perspective. Nature Reviews Microbiology 2012, 10:841–851. [DOI] [PubMed] [Google Scholar]
  • 102.Marvig RL, Sommer LM, Molin S, Johansen HK: Convergent evolution and adaptation of Pseudomonas aeruginosa within patients with cystic fibrosis. Nat Genet 2015, 47:57–64. [DOI] [PubMed] [Google Scholar]
  • 103. Riquelme SA, Lozano C, Moustafa AM, Liimatta K, Tomlinson KL, Britto C, Khanal S, Gill SK, Narechania A, Azcona-Gutierrez JM, et al. : CFTR-PTEN-dependent mitochondrial metabolic dysfunction promotes Pseudomonas aeruginosa airway infection. Sci Transl Med 2019, 11. •This paper investigated longitudinal isolates of P. aeruginosa from CF patient showing their metabolic reprogramming for host persistence.
  • 104.Young BC, Golubchik T, Batty EM, Fung R, Larner-Svensson H, Votintseva AA, Miller RR, Godwin H, Knox K, Everitt RG, et al. : Evolutionary dynamics of Staphylococcus aureus during progression from carriage to disease. Proc Natl Acad Sci U S A 2012, 109:4550–4555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Young BC, Wu CH, Gordon NC, Cole K, Price JR, Liu E, Sheppard AE, Perera S, Charlesworth J, Golubchik T, et al. : Severe infections emerge from commensal bacteria by adaptive evolution. Elife 2017, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Napflin K, O’Connor EA, Becks L, Bensch S, Ellis VA, Hafer-Hahmann N, Harding KC, Linden SK, Olsen MT, Roved J, et al. : Genomics of host-pathogen interactions: challenges and opportunities across ecological and spatiotemporal scales. PeerJ 2019, 7:e8013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Jean Beltran PM, Federspiel JD, Sheng X, Cristea IM: Proteomics and integrative omic approaches for understanding host-pathogen interactions and infectious diseases. Mol Syst Biol 2017, 13:922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Papkou A, Guzella T, Yang W, Koepper S, Pees B, Schalkowski R, Barg MC, Rosenstiel PC, Teotonio H, Schulenburg H: The genomic basis of Red Queen dynamics during rapid reciprocal host-pathogen coevolution. Proc Natl Acad Sci U S A 2019, 116:923–928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109. Lees JA, Ferwerda B, Kremer PHC, Wheeler NE, Serón MV, Croucher NJ, Gladstone RA, Bootsma HJ, Rots NY, Wijmega-Monsuur AJ, et al. : Joint sequencing of human and pathogen genomes reveals the genetics of pneumococcal meningitis. Nature Communications 2019, 10:2176. ••A pioneer study of joint GWAS analysis of genetic variation in host and bacteria.
  • 110.Haidar G, Philips NJ, Shields RK, Snyder D, Cheng S, Potoski BA, Doi Y, Hao B, Press EG, Cooper VS, et al. : Ceftolozane-Tazobactam for the Treatment of Multidrug-Resistant Pseudomonas aeruginosa Infections: Clinical Effectiveness and Evolution of Resistance. Clin Infect Dis 2017, 65:110–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Fifer H, Cole M, Hughes G, Padfield S, Smolarchuk C, Woodford N, Wensley A, Mustafa N, Schaefer U, Myers R, et al. : Sustained transmission of high-level azithromycin-resistant Neisseria gonorrhoeae in England: an observational study. Lancet Infect Dis 2018, 18:573–581. [DOI] [PubMed] [Google Scholar]
  • 112.Bayliss SC, Thorpe HA, Coyle NM, Sheppard SK, Feil EJ: PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria. Gigascience 2019, 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Peng Y, Tang S, Wang D, Zhong H, Jia H, Cai X, Zhang Z, Xiao M, Yang H, Wang J, et al. : MetaPGN: a pipeline for construction and graphical visualization of annotated pangenome networks. Gigascience 2018, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL: BLAST+: architecture and applications. BMC Bioinformatics 2009, 10:421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Acker KP, Wong Fok Lung T, West E, Craft J, Narechania A, Smith H, O’Brien K, Moustafa AM, Lauren C, Planet PJ, et al. : Strains of Staphylococcus aureus that Colonize and Infect Skin Harbor Mutations in Metabolic Genes. iScience 2019, 19:281–290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116. Petit RA, 3rd, Read TD: Staphylococcus aureus viewed from the perspective of 40,000+ genomes. PeerJ 2018, 6:e5261. •This paper discusses the potential uses of the S. aureus database and analysis pipeline “Staphopia” for studying species evolution and for clinical diagnosis.
  • 117. Zhou Z, Alikhan NF, Mohamed K, Fan Y, Agama Study G, Achtman M: The EnteroBase user’s guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny and Escherichia core genomic diversity. Genome Res 2019. ••This paper shows the power of using database-driven technologies for deciphering local transmission of S. enterica serovar Agama, microevolution of Y. pestis and Escherichia genomic diversity.
  • 118. Winsor GL, Griffiths EJ, Lo R, Dhillon BK, Shay JA, Brinkman FS: Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database. Nucleic Acids Res 2016, 44:D646–653. •A specialized database of P. aeurginosa with community-driven curation of genome annotations.
  • 119.Rohde H, Qin J, Cui Y, Li D, Loman NJ, Hentschke M, Chen W, Pu F, Peng Y, Li J, et al. : Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. N Engl J Med 2011, 365:718–724. [DOI] [PubMed] [Google Scholar]
  • 120.Perrin A, Larsonneur E, Nicholson AC, Edwards DJ, Gundlach KM, Whitney AM, Gulvik CA, Bell ME, Rendueles O, Cury J, et al. : Evolutionary dynamics and genomic features of the Elizabethkingia anophelis 2015 to 2016 Wisconsin outbreak strain. Nat Commun 2017, 8:15483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.AMD: An In-Depth Look. https://www.cdc.gov/amd/pdf/amd-indepth-look-P.pdf.
  • 122.Bellod Cisneros JL, Møller Aarestrup F, Lund O: Public Health Surveillance using Decentralized Technologies. Blockchain in Healthcare Today 2018, 1. [Google Scholar]
  • 123. Mackey TK, Kuo T-T, Gummadi B, Clauson KA, Church G, Grishin D, Obbad K, Barkovich R, Palombini M: ‘Fit-for-purpose?’ – challenges and opportunities for applications of blockchain technology in the future of healthcare. BMC Medicine 2019, 17:68. •This paper discusses the potential uses of the revolutionary “Blockchain” technology in healthcare with making genomic data more accessible as one of the discussed cases.
  • 124. Simonyan V, Goecks J, Mazumder R: Biocompute Objects-A Step towards Evaluation and Validation of Biomedical Scientific Computations. PDA J Pharm Sci Technol 2017, 71:136–146. ••This framework is similar to a GenBank record but instead of having information regarding a genome sequence, it has all the computational parameters, dependencies, usage and commands for a bioinformatic pipeline.

RESOURCES