Abstract
Background
Cyanobacteria maintain extensive repertoires of regulatory genes that are vital for adaptation to environmental stress. Some cyanobacterial genomes have been noted to encode diversity-generating retroelements (DGRs), which promote protein hypervariation through localized retrohoming and codon rewriting in target genes. Past research has shown DGRs to mainly diversify proteins involved in cell-cell attachment or viral-host attachment within viral, bacterial, and archaeal lineages. However, these elements may be critical in driving variation for proteins involved in other core cellular processes.
Results
Members of 31 cyanobacterial genera encode at least one DGR, and together, their retroelements form a monophyletic clade of closely-related reverse transcriptases. This class of retroelements diversifies target proteins with unique domain architectures: modular ligand-binding domains often paired with a second domain that is linked to signal response or regulation. Comparative analysis indicates recent intragenomic duplication of DGR targets as paralogs, but also apparent intergenomic exchange of DGR components. The prevalence of DGRs and the paralogs of their targets is disproportionately high among colonial and filamentous strains of cyanobacteria.
Conclusion
We find that colonial and filamentous cyanobacteria have recruited DGRs to optimize a ligand-binding module for apparent function in signal response or regulation. These represent a unique class of hypervariable proteins, which might offer cyanobacteria a form of plasticity to adapt to environmental stress. This analysis supports the hypothesis that DGR-driven mutation modulates signaling and regulatory networks in cyanobacteria, suggestive of a new framework for the utility of localized genetic hypervariation.
Background
Cyanobacteria are a remarkably diverse lineage, in terms of metabolisms, morphologies, and habitat distribution. Perhaps most notably, this phylum contains the only prokaryotic organisms known to have evolved the capability for oxygenic photosynthesis; this trait was later acquired by eukaryotes through endosymbiosis with cyanobacteria, resulting in the formation of chloroplasts [1, 2], and driving the modern biosphere. Cyanobacteria have evolved an array of morphologies, including complex multicellular forms [3–6]. Representatives are typically classified into five subsections [7, 8]. Species of subsections I and II consist of single coccoid cells. Subsections III-V represent multicellular species that form filaments of varying complexity. Members of subsection III form reversibly-differentiable filaments of vegetative cells. Among subsections IV and V, cells can carry out terminal cellular differentiation in response to environmental stimuli, forming spore-like cells that are resistant to desiccation (akinetes), micro-oxic cells specialized for N2 fixation (heterocysts), and motile filaments (hormogonia) [9]. This morphological and metabolic complexity has allowed cyanobacteria to inhabit diverse environments.
Certain members of the cyanobacterial phylum possess an extensive capacity to adapt to various environmental pressures through tightly-controlled regulation of complex cellular programs for signal response. This is exemplified by abilities for metabolic switching (i.e. CO2/N2 fixation), maintaining photoreceptors of various wavelength sensitivities for binary programs of circadian rhythm, and forming specialized cells which can sometimes be terminally differentiated and lead to multicellularity [9, 10]. To regulate these complex programs, cyanobacteria have an extensive repertoire of genes governing signal transduction including proteases, kinases, and nucleases. Notably, paralogs of these regulatory proteins are more abundant among the more complex species of cyanobacteria (i.e. those belonging to subsections III-V) [11–15]. However, the mechanisms to diversify and adapt specific functionality in these duplicated genes remain largely unexplored. One mechanism may involve diversity-generating retroelements (DGRs), known to accelerate the evolution of the proteins they target.
Diversity-generating retroelements (DGRs) have been identified in the genomes of several genera of cyanobacteria [16–18]. In experimentally investigated bacterial and viral systems, DGRs drive site-specific hypermutation of a subset of codons in target genes [19, 20], while metagenomic and metatranscriptomic evidence also points to functional DGRs in archaea [21]. These retroelements utilize a uniquely targeted form of retrotransposition. To this end, DGRs insert variants into a flexible coding scaffold, while avoiding non-specific variation in conserved portions of a gene [22]. The essential features of a DGR are most often found within a single genomic locus spanning ~ 5–10 kbp (Fig. 1a), though the synteny and organization of DGR components can vary [17]. Diversification is mechanistically carried out by a reverse transcriptase (RT), which acts upon a non-coding RNA transcribed from the template repeat (TR) region in the locus [23]. This region is nearly identical to a variable region (VR) that typically resides in a nearby gene, which encodes a DGR-variable protein (VP). The TR-RNA intermediate is reverse transcribed into cDNA wherein A ➔ N mutation is highly favored by the error-prone RT. This cDNA then replaces VR, whose sequence commonly corresponds to flexible residues in ligand binding structural domains belonging to the C-type lectin or immunoglobulin-like protein families [19].
The first DGR variable protein was characterized from the bacteriophage, BPP-1. In these phage, DGRs diversify tail fiber tip proteins that recognize and bind to Bordetella host receptors [16, 24]. Other cellular DGRs have been characterized in bacterial pathogens, including Legionella pneumophila [20] and Treponema denticola [25], where DGRs target genes that encode for cellular surface proteins, presumably involved in cell-cell attachment. The conserved function of cell-cell or viral-cell attachment in these target genes lends to a perspective of DGRs for broad use in host recognition for symbiosis or infection. Moreover, several genera of cyanobacteria were identified in recent genomic and metagenomic surveys of DGRs [17, 26]. The essential components of DGRs can be found across most lineages of prokaryotic life [17, 21, 26–29], suggesting broad utility of this form of localized mutation.
Whereas previously characterized DGR target proteins appear to share a functional role in extracellular attachment to ligands displayed on foreign cells, these retroelements could potentially diversify other cellular proteins with entirely distinct functions. The intermediate RNA, which presents a template for DGR mutagenesis, has been shown to be highly expressed in lab isolates of Trichodesmium erythraeum IMS101 [18] and in Nodularia spumigena CCY9414 under light and oxidative stress [30, 31]. Here, a systematic analysis of DGRs and their variable proteins in cyanobacterial genomes leads to a new perspective on the utility of diversification and optimization of modular protein domains in paralogs that appear linked to signaling and transcriptional control.
Results and discussion
A conserved subclass of Retroelements in cyanobacteria
Our analysis identified 58 DGRs that include 90 target genes (i.e. encoding VPs) in 52 genomes of cyanobacteria spanning 31 different genera. These include filamentous, colonial, and symbiotic organisms (Fig. 1b and Additional file 1: Table S1). Sequence clustering of the 58 DGRs was performed with RT amino acid sequences (at 95% identity) to generate a non-redundant subset of 49 distinct RT genes for phylogenetic analysis, while the full set of 58 were also examined further. All DGRs were identified by presence of diagnostic and essential components: an RT gene; one or more VP genes with VR regions; and a TR region. Our initial RT search was conducted with the UniprotKB coding sequence database, which is in turn linked to complete and draft genomes in EMBL/GenBank/DDBJ databases. The resulting 52 cyanobacterial genomes represent all sequences where complete DGR cassettes were positively identified. Among the 52 genomes analyzed, four contain duplicate DGR cassettes, based on clustering, while one contains two unique DGR-RTs. Moreover, several individual DGRs have multiple target genes, and some VP genes have VRs with homology to other genes dispersed throughout the genome (paralogs) (Fig. 1c).
To evaluate the diversity of cyanobacteria-encoded DGRs, we first compared these representatives to a recently developed, global metagenomic DGR dataset [26]. Cyanobacterial DGR-RTs were clustered (i.e. at ≥50% AAI) with sequences in the global metagenomic dataset, then linked to a corresponding DGR clade and target protein cluster. All DGRs from our dataset were closely related to DGR Clade-5. The global dataset RTs in DGR Clade-5 are affiliated with target proteins in protein cluster 1 (i.e. PC_00001), which primarily contains cellular proteins that appear to be membrane-bound [26]. Given that the cyanobacterial DGRs appear to cluster tightly together, we next sought to analyze phylogenetic relationships within this set.
Phylogenetic analysis of cyanobacterial DGR-RTs revealed a monophyletic clade, unique from all other bacterial DGR-RTs (Fig. 2). The cyanobacterial DGR-RT clade comprises sequences that span nearly all major cyanobacterial genera within morphological subclasses I, III, IV, and V (Fig. 3a). None of the DGR-containing genomes correspond to genera within subclass II. Strikingly, cyanobacterial reverse transcriptases within the monophyletic cyanobacterial DGR clade share an average global sequence identity of 67% (minimum 55%; amino acid sequence). Whereas members of this DGR-RT subgroup do not appear to be shared with other bacteria or archaea, their phylogenetic relationships suggest a complex evolutionary history punctuated by horizontal exchange within the cyanobacterial phylum (Fig. 3b). Although none of the cyanobacterial DGRs could be definitively assigned to prophage elements, they were identified on plasmids of Anabaena sp. 90 (CP003287) and Fischerella sp. NIES-4106 (AP018301), which may indicate a vehicle for retroelement transfer between closely related populations. Among members of this RT clade, each corresponding DGR-VP contains a ligand-binding C-type lectin-like domain (CLec) with additional functional domains described in detail below.
Intragenomic dispersal of conserved domains with local Hypervariable regions
DGR variable proteins often contain multiple distinct structural domains [17, 21, 22]. To investigate the specific functions of cyanobacterial DGR-targeted proteins (i.e. containing the VR scaffold), we first separately analyzed the ligand-binding CLec domains in all DGR-VPs. This approach identified a conserved module (i.e. a putative C-terminal domain) with a localized region of hypervariable residues found in each of the 52 cyanobacterial VP representatives (Additional file 1: Table S1). The entire set of VR-containing modules share sequence homology with 50.5% average identity and, moreover, all of these protein sequences were clustered together with > 30% pairwise amino acid identity. Structural prediction of the representative C-terminal domain sequence (i.e. obtained from clustering) determined that each module most closely resembles the C-type Lectin domain, which is represented by the CLec-like superfamily (InterPro: IPR016187). In each of these proteins, the DGR variable region (VR) occurs within the C-terminal region of the otherwise conserved CLec-like domain. A search for similar proteins in the Uniprot database identified sequences from an array of other genomes among which 92% belong to cyanobacterial phyla (Additional file 2: Table S2). The similarity between CLec domains found in diverse DGRs may underlie a conserved utility for diversifying this module across different cyanobacterial taxa. The CLec-like superfamily has been linked to a variety of molecular processes in cells and viruses spanning the tree of life, with a common functional role in ligand binding generally predicted for this fold [32–34]. Thus, the modular and dispersed nature of a highly conserved CLec subclass may further point to multifaceted functional significance in cyanobacteria.
We next sought to address whether hypervariable CLec modules might arise from gene duplication and intragenomic dispersal, resulting in recognizable sets of paralogs in cyanobacterial genomes. This search was limited to 21 high-quality genomes of the 52-genome total, such that draft genomes composed of > 50 scaffolds were removed from the analysis. This approach uncovered 21 genomes that have multiple genes encoding CLec domain-containing proteins, with varying degrees of VR/TR homology (Fig. 4 and Additional file 3: Table S3). These paralogs occur both within DGR loci and dispersed throughout the genome and most often consist of either a single CLec domain or the C-terminal CLec grafted to an N-terminal putative serine kinase domain. Taken together, the multi-genome set of 219 cyanobacterial orthologs across 21 genomes share average pairwise identity of 50.5% within their CLec domains. The complete set of 219 orthologs comprises 121 genes that appear to be DGR-diversified based on VR/TR homology, including 45 VP genes encoded within a DGR. The additional 76 remote targets were associated with their respective genome’s DGR(s) using a threshold of TR identity greater than 50%; these matches were exclusively found near the 3′ terminus of CLec-encoding genes. The proximity to 3′-termini suggests that conserved, cis-acting features - such as DNA cruciforms or initiation of mutagenic homing sites required for cDNA integration [35] - may play a role in activating remote targets.
The genome of Nostoc sp. PCC 7120 (formerly Anabaena), contains two DGRs (RT accessions: all3497, all5014) and several dispersed VP paralogs (Fig. 4), providing the opportunity to examine the evolutionary history of these genes in an extensively-studied model organism. Within this genome, we identified three highly similar VP homologs (≥ 60% amino acid identity) in dispersed loci, wherein these genes may have proliferated by duplication and transposition from a common ancestral gene. Notably, one of these paralogs (all3226) contains remnant TR-VR homology, despite an absence of proximal RT genes or pseudogenes. Taken together, this suggests a capacity for intragenomic dispersal of DGR-targeted variable proteins, and perhaps removal of diversification components once an optimal variant is selected. In addition to its tractability, the common constellation of DGR VPs that occurs in PCC 7120, as observed in other cyanobacteria, make this species an ideal representative for further analysis of the physiological, ecological, and evolutionary ramifications of DGR VP functionality and modularity in cyanobacteria.
To assess whether transposable elements were found in proximity to DGRs, we analyzed neighborhoods surrounding each hypervariable protein, including remote VPs with respect to a DGR-RT (i.e. > 5 kbp upstream/downstream). This search uncovered transposase genes belonging to various families in DGR-proximal loci which may be responsible for VP dispersal throughout the genome (Additional file 4: Table S4 and Additional file 5: Table S5). Within the subset of 21 high-quality genomes, Trichodesmium erythraeum IMS 101 has the greatest number of proximal transposase genes, spanning six different insertion sequence (IS) families. The most widely-distributed transposases were those belonging to the IS200/IS605 family, found nearby 9 VPs from 6 distinct species. Transposases belonging to this family employ a single-stranded DNA intermediate for a “peel-and-paste” mechanism of transposition [36, 37]. The genome of Anabaena sp. 90 contains remnants of a putative degraded DGR cassette – containing only the RT with no other detectable features – and notably, the RT gene is flanked by proximal transposase genes. This provides a potential mechanism for select components of the DGR to be mobilized within the genome. DGR recruitment to one gene from another would allow favorably diversified genes to become conserved while targeting hypervariation elsewhere in the genome. Selective pressures can then influence the recruitment of DGRs to genes wherein hypervariation for ligand-binding residues offers selective advantages. Through this mechanism of transposition, cyanobacterial DGRs may provide a newly-diversified, modular, ligand binding domain to signaling genes.
Function of multidomain variable proteins
In part, functional diversity of DGR variable proteins is found in their multidomain complexity. We examined cyanobacterial VPs and their paralogs, which consist of N-terminal domains that are grafted to the C-terminal CLec domain (Fig. 4, Fig. 5). Toward assessing cellular localization, transmembrane and/or signal peptide regions were predicted for 4 DGR-associated VPs and 9 remote VPs, spanning 11 of the 21 high-quality genome set (Additional file 3: Table S3). Most cyanobacterial DGR VPs are predicted to be cytosolic, however evidence exists for TM localization and secretion as well.
The most common functional domain of DGR-internal target protein (VPs) in cyanobacteria have similarity to the protein kinase superfamily (Additional file 1: Table S1). Multidomain DGR-external VPs and paralogs of DGR VPs are also most-often predicted to be kinases (Fig. 4). The VP and VP paralog kinase proteins are further predicted to be serine/threonine kinases (STKs) based on the following factors: 1) identification of Hanks and Hunter-type Motifs I through IX [38] (Additional file 6: Fig. S1); 2) common NCBI CDS annotations of “serine/threonine protein kinase CDS”; or 3) identification of an STK in previous literature [14]. STKs are mostly associated with eukaryotic signal transduction pathways. In prokaryotes, two-component regulation controls most phosphorylation pathways with a receptor histidine kinase paired with various response regulators phosphorylated on aspartic acid residues. These kinases often control the expression of certain genes [39]. However, Hanks-type STKs have been found in an array of prokaryotic organisms where their genomic abundance is often correlated with genome size, physiological and ecophysiological complexity, and ability to tolerate complex environments [14, 38, 40]. These STKs are implicated in the regulation of various aspects of bacterial physiology through post-translational modification of proteins, which may themselves be components of phosphorelay and transcriptional regulatory pathways [40–43]. Serine/threonine protein kinases were first associated with the pknA gene of Nostoc sp. PCC 7120, which is involved in growth and differentiation [14], and in other bacteria their activity regulates processes such as cell growth, segregation, virulence, metabolism, stress adaptation, and cell wall/envelope biogenesis [40]. Ser/Thr kinases in cyanobacteria are usually associated with three different processes: developmental regulation, stress response, and pathogenicity [14]. Slight changes, not in function but in the strength of substrate recognition to a variety of phosphorylation targets, may contribute to the ability to finely tune networks of signal transduction.
Compared to histidine kinases of two-component systems, which exhibit strong substrate discrimination, STKs have relaxed substrate specificities. This has been linked to a lack of co-evolution between the kinase and its cognate target [44, 45]. Accelerated evolution of the substrate-binding domain of these kinases may have resulted in the further expansion of this class of proteins in the Cyanobacteria phylum, contributing to a wide range of adaptability to external stimuli and challenging environments. We hypothesize that the VR-containing CLec domain could be autoinhibitory, and activation of kinase activity would occur upon binding a small molecule or protein ligand. In this case, DGR-mediated diversity could allow rapid recognition of various ligands for activating phosphorylation cascades. Alternatively, the CLec domain could function in ligand recognition (i.e. determining what protein(s) are phosphorylated). In this case CLec variants could have different substrate specificities. Segregation of phosphorylation targets between paralogous kinases has been shown to play a strong selective pressure in their evolution [46]. In turn, DGR-driven hypervariation of binding components in signaling proteins may offer additional selective advantages in cyanobacteria through preventing cross-talk, which is characteristic of this of kinase class.
We also identified orthocaspase-like peptidase domains in VP N-termini, which are also common among their paralogs (Fig. 4). Caspase proteins are proteases involved in the initiation of programmed cell death in metazoans [47]. The peptidase domains that we identified in many VPs and their paralogs were predicted as orthocaspases, which are the prokaryotic homologs of eukaryotic caspase-type proteases [48]. While these protein types are homologous to metazoan caspases, current evidence supports a broader role in cell homeostasis during normal cellular conditions, programs of cellular differentiation, or ageing as well as potential apoptosis [13, 49, 50]. Previous studies have found orthocaspases to be enriched in morphologically complex filamentous cyanobacteria of subsections III-V (e.g. Trichodesmium erythraeum IMS 101, Anabaena spp., and Nostoc spp.) as well as various strains of the unicellular toxin-producing species, Microcystis aeruginosa. Conversely, orthocaspases are entirely absent from unicellular genera Synechococcus, Prochlorococcus, Cyanobium, and Cyanothece and are underrepresented in the genomes of cyanobacteria belonging to subsections I-II. This suggests their utility in enabling the complex signal response and regulatory programs that exist in cyanobacteria capable of cellular differentiation, toxin production, and diazotrophy [13].
In addition to the serine/threonine kinase and orthocaspase-like peptidase domains, we identified less-common features including repeat motifs, toll/interleukin receptor (TIR)-like, GAF-like, GUN4-like, and CHAT-like domains (Fig. 5). Repeat motifs may have a role in protein-protein interactions (e.g., TPR, WD40 repeat, ARM repeat, and VWA-CoxE) [51–54], while the other domains have been linked to intracellular signal transduction [55–59].
A common feature for nearly all of the N-terminal domains, including the prevalent protein kinase-containing paralogs, is their potential to serve a functional role in signal transduction in response to external stimuli (e.g. light, nutrient deprivation, and general stress response) [9]. A previous study found that genes encoding complex multidomain proteins involved in signal transduction are highly enriched in the filamentous cyanobacterium Anabaena sp. PCC 7120 when compared to the genomes of unicellular Synechocystis sp. PCC 6803 and Pseudomonas aeruginosa [60]. Moreover, regulatory proteins involved in signal transduction could lend to the complex regulation necessary for the physiology of filamentous cyanobacteria. These physiologies include a capacity for cell-differentiation, producing heterocysts during nitrogen deprivation and akinetes under environmental stress, as well as programmed apoptosis [49, 61]. The presence of DGRs in cyanobacteria follows this trend in the abundance of specialized signal transduction proteins – being seemingly enriched in filamentous nitrogen-fixing taxa and absent from genomes of unicellular taxa, Synechococcus spp. and Prochlorococcus spp., though they are present in other unicellular species.
DGR-programmed variation of the ligand-binding domain of receptor-binding proteins in Bordetella bacteriophage has been shown to increase the capacity of these proteins to recognize a vast array of molecules. Moreover, diversification of oligomeric structures appears to confer an amplification of binding affinity, or avidity [19, 62]. Specifically, the existence of 12 DGR-variable target protein trimers in each bacteriophage virion was shown to increase the binding strengths of these proteins to their ligand, pertactin, by relaxing the requirement for optimal binding between the ligand and any single monomer. This multivalent binding was also shown to lead to more distinction in binding events, contributing to enhanced selectivity [19]. These two properties of avidity through multivalency are hypothesized to be characteristic of other DGR systems as a means to provide ligand-recognition flexibility to evolve under constrained conditions, while maintaining selectivity. We hypothesize that, in cyanobacteria, DGR-programmed variation might have a role in providing multimeric avidity in terms of ligand binding for signal response. In the case of autoinhibitory variable proteins attached to a kinase, rather than providing flexibility in host-receptor binding, as in Bordetella bacteriophage, increased avidity may hold a kinetic advantage for substrate binding, whereby flexible activation accelerates signal transduction and regulation. More generally, the available genomic evidence is consistent with a phenomenon of targeted diversification acting to tune cyanobacterial regulatory networks.
Conclusions
The DGR-enabled diversification of proteins involved in host attachment should lead to selective advantages, as this offers an offensive countermeasure to variation by the host cell. By genomic inference, DGR-containing prokaryotes seem to have adopted hypermutation for mechanisms of virulence, and other cell-cell or virus-host binding interactions. By contrast, our findings suggest a selective use of DGRs for purposes of isolated hyper-diversification of a small pocket in the C-terminal binding domains of multidomain proteins broadly involved in signal transduction within cyanobacteria. This class of DGR-target proteins is currently unique to the cyanobacterial phylum. Diversification of the binding site of these proteins, paired with natural selection over iterations of diversity generation and the ability to segregate resulting beneficial mutants via transposition, may contribute to the complexity and adaptability of cellular regulation amongst cyanobacterial taxa. In developing a better grasp on the functional significance of DGR hypervariation, it is clear that the phenomenon adds new layers of complexity in the expansion of bacterial protein networks.
Methods
DGR identification and annotation
First, we identified all cyanobacterial genomes containing a DGR-RT-like coding sequence by comparing a consensus sequence for previously-identified cyanobacterial DGR-RT sequences against protein databases using pHMMER. All matches were linked to corresponding genome or nucleotide sequences, which were then downloaded from NCBI. A set of potential DGR candidates was first developed using a workflow with Python and Geneious Prime v 2019.2.3 (Biomatters) as previously described [21]. Briefly, RT genes were manually inspected for core NTP-binding site motifs, before searching for near-repeats in a 10-kbp proximal region (i.e., RT +/− 5-kbp). Repeats in this region were then aligned and inspected for: i) random mismatches in one sequence (VR), which predominantly occur in 1st and 2nd codon positions of an ORF, and ii) > 80% of mismatches correspond to adenines in the non-coding near-repeat sequence (TR). Next, retroelements were further analyzed using myDGR [63] which is especially effective at identifying putative trans-acting accessory DGR components, and separately, remote VP and VP homolog genes.
The entire DGR dataset contains several RT and VP sequences that are near-identical, but shared by distinct genomes (Additional file 1: Table S1). To generate a representative subset of these redundant DGRs, we used CD-HIT [64] to cluster RT amino acid sequences using the following settings: 0.9 global alignment; 95% identity threshold. For comparison with the global metagenomic DGR dataset developed by Roux et al. [18], we conducted pairwise alignments with RT sequences using blastp [65] and identified similar representatives at ≥50% amino acid identity.
Genes, homologous to VPs within the DGR cassette, were inspected by aligning amino acid sequences for the CLec domain of each putative remote VP to the DGR-VP within the same organism using Clustal Omega. Genes with CLec domains having a putative VR with ≥ 50% nucleotide identity to a DGR-VP were designated as remote VPs, while those with < 50% were designated as VP homologs. DGR and remote VP neighborhood regions were defined as regions 10 kb upstream and downstream from the DGR cassette, remote VP, or VP homolog.
Neighborhood analyses
In order to identify potential transposons, we first examined existing genomic annotations in the neighborhood (i.e. +/− 10 kbp) of each VP and Remote VP for the following features: transposase, integrase, mobile element. Next, we conducted a transposon search using ISFinder [66] using expanded VP loci (60 kbp) that contain one or more annotations associated with mobile elements.
Phylogenetic analyses
To construct a phylogenetic tree of cyanobacteria (Table S6), we used a set of 16 ribosomal proteins often used for phylogenomic analysis (RpL2, 3, 4, 5, 6, 14, 15, 16, 18, 22, and 24, and RpS3, 8, 10, 17, 19 [67]. Each ribosomal protein was identified using HMMER [68] and hidden Markov models from the Pfam [69] database (accessed September 2018). Each individual marker gene was aligned using MUSCLE [70], trimmed using TrimAL [71], manually assessed to remove end gaps and ambiguously aligned regions and concatenated. A maximum likelihood tree was constructed using RAxML v. 8.2.9 [72] with the PROTCATLG model.
To reconstruct RT phylogeny, putative DGR-RT coding sequences were identified, as described above, then translated. Sequences were de-replicated and non-redundant candidates were chosen using CD-Hit [64] with a global alignment threshold of 99% identity. All DGR-RT sequences and a set of Group-II intron RT sequences from Bacteria, Archaea, plastids, and mitochondria were aligned with a hidden markov model of the reverse transcriptase protein family (PF00078) using HMMalign [68]. A phylogenetic tree of DGR-RTs was constructed using FastTree2 [73] with the WAG substitution matrix, and the CAT approximation to optimize branch lengths. The cyanobacterial DGR-RT representatives were extracted from the complete alignment, realigned using Clustal Omega [74] and used to construct an unrooted phylogenetic tree.
Protein function analysis
VP domain architecture was annotated using InterProScan, pHMMER, and HMMScan tools. CD-HIT analysis was performed on CLec domains for all VPs using the following settings: 0.3 global alignment; 30% identity threshold. Amino acid sequences for the CLec domain of all VPs were aligned using Clustal Omega. The C-terminal sequence of all DGR-VP CLecs was extracted based on the InterProScan feature positions, then further aligned using Clustal Omega and a consensus sequence was picked at 75% sequence similarity (Additional file 6: Fig. S1). This consensus sequence was used to further identify homologous domains. Using hmmscan, 1579 hits were returned using an E-value cutoff of 10− 40 to generate Table S2 (Additional file 2: Table S2).
Supplementary information
Acknowledgements
The Research Mentorship Program at UCSB provided summer research opportunities for KD and SR.
Abbreviations
- ARM
Armadillo repeat
- CHAT
Caspase HetF Associated with Tprs
- CLec
C-type lectin
- Cox
CO oxidizing
- DGR
Diversity-generating retroelement
- GAF
cGMP-specific phosphodiesterases, adenylyl cyclases and FhlA
- IS
Insertion sequence
- RT
Reverse transcriptase
- RVP
Remote variable protein
- STK
Serine/threonine kinase
- S/T
Serine/Threonine
- TIR
Toll/interleukin receptor
- TPR
Tetratricopeptide repeat
- TR
Template region
- VP
Variable protein
- VR
Variable region
- VWA
Von Willebrand factor type A
Authors’ contributions
AVE and BP designed the study and carried out comparative genomic analyses of retroelements in cyanobacteria. EA conducted phylogenomic analysis for cyanobacterial genomes. SM, SR, KD, SR participated in clustering and annotation of the retroelements. AVE, JFM, DV, and BP wrote the manuscript. The authors read and approved the manuscript.
Funding
This research was funded by a Challenge Grant from the California NanoSystems Institute (CNSI-UCSB). B.G.P. was supported by the Marine Biological Laboratory, and through the National Science Foundation’s XSEDE computing resource (award DEB170007). AVE is supported by a National Science Foundation Graduate Research Fellowship Program under Grant No. 1650114, and by the NSF California LSAMP Bridge to the Doctorate Fellowship under Grant No. HRD-1701365. DLV is supported by NSF grant OCE-1635562. The work conducted by the U.S. Department of Energy Joint Genome Institute is supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC02-05CH11231.
Availability of data and materials
Phylogenetic trees and sequence alignments are available on TreeBase at the following URL: http://purl.org/phylo/treebase/phylows/study/TB2:S26861. Additional datasets supporting the conclusions of this article are included within the article and its additional files.
Ethics approval and consent to participate
Not Applicable.
Consent for publication
Not Applicable.
Competing interests
J.F.M. is a cofounder, equity holder and a member of the Board of Directors of Pylum Biosciences, Inc., a biotherapeutics company in South San Francisco, CA, USA.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary information accompanies this paper at 10.1186/s12864-020-07052-5.
References
- 1.Sagan L. On the origin of mitosing cells. J Theor BiolAcademic Press. 1967;14. [DOI] [PubMed]
- 2.Giovannoni SJ, Turner S, Olsen GJ, Barns S, Lane DJ, Pace NR. Evolutionary relationships among cyanobacteria and green chloroplasts. J Bacteriol. 1988;170:3584–3592. doi: 10.1128/JB.170.8.3584-3592.1988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Flores E, Herrero A. Compartmentalized function through cell differentiation in filamentous cyanobacteria. Nat Rev Microbiol. 2010;8:39–50. doi: 10.1038/nrmicro2242. [DOI] [PubMed] [Google Scholar]
- 4.Mullineaux CW, Mariscal V, Nenninger A, Khanum H, Herrero A, Flores E, et al. Mechanism of intercellular molecular exchange in heterocyst-forming cyanobacteria. EMBO J. 2008;27:1299–1308. doi: 10.1038/emboj.2008.66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Flores E, Herrero A, Wolk CP, Maldener I. Is the periplasm continuous in filamentous multicellular cyanobacteria? Trends Microbiol. 2006;14:439–443. doi: 10.1016/j.tim.2006.08.007. [DOI] [PubMed] [Google Scholar]
- 6.Giddings TH, Staehelin LA. Observation of microplasmodesmata in both heterocyst-forming and non-heterocyst forming filamentous cyanobacteria by freeze-fracture electron microscopy. Arch Microbiol. 1981;129:295–298. doi: 10.1007/BF00414700. [DOI] [Google Scholar]
- 7.Castenholz RW, Wilmotte A, Herdman M, Rippka R, Waterbury JB, Iteman I, et al. Phylum BX. Cyanobacteria. Bergey’s manual® Syst Bacteriol. New York: Springer New York; 2001. pp. 473–599. [Google Scholar]
- 8.Stanier RY, Deruelles J, Rippka R, Herdman M, Waterbury JB. Generic assignments, strain histories and properties of pure cultures of cyanobacteria. Microbiology. 1979;111:1–61. doi: 10.1099/00221287-111-1-1. [DOI] [Google Scholar]
- 9.Sarma TA. Handbook of cyanobacteria. 2012. [Google Scholar]
- 10.Wiltbank LB, Kehoe DM. Diverse light responses of cyanobacteria mediated by phytochrome superfamily photoreceptors. Nat Rev Microbiol. 2019:37–50 Nature Publishing Group; [cited 2020 Aug 10]. Available from: www.nature.com/nrmicro. [DOI] [PubMed]
- 11.Jiang Q, Qin S, Wu Q. Genome-wide comparative analysis of metacaspases in unicellular and filamentous cyanobacteria. BMC Genomics. 2010;11:198. doi: 10.1186/1471-2164-11-198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Asplund-Samuelsson J, Sundh J, Dupont CL, Allen AE, McCrow JP, Celepli NA, et al. Diversity and expression of bacterial metacaspases in an aquatic ecosystem. Front Microbiol. 2016;7:1043. [DOI] [PMC free article] [PubMed]
- 13.Klemenčič M, Funk C. Structural and functional diversity of caspase homologues in non-metazoan organisms. ProtoplasmaSpringer-Verlag Wien. 2018;255:387–97. [DOI] [PMC free article] [PubMed]
- 14.Zhang X, Zhao F, Guan X, Yang Y, Liang C, Qin S. Genome-wide survey of putative serine/threonine protein kinases in cyanobacteria. BMC Genomics. 8:395 BioMed Central; 2007 [cited 2020 Mar 22]. Available from: http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-8-395. [DOI] [PMC free article] [PubMed]
- 15.Larsson J, Nylander JAA, Bergman B. Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits. BMC Evol Biol. 2011;11:187. doi: 10.1186/1471-2148-11-187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Doulatov S, Hodes A, Dai L, Mandhana N, Liu M, Deora R, et al. Tropism switching in Bordetella bacteriophage defines a family of diversity-generating retroelements. Nature. 2004;431:476–481. doi: 10.1038/nature02833. [DOI] [PubMed] [Google Scholar]
- 17.Wu L, Gingery M, Abebe M, Arambula D, Czornyj E, Handa S, et al. Diversity-generating retroelements: natural variation, classification and evolution inferred from a large-scale genomic survey. Nucleic Acids Res. 2018;46:11–24. doi: 10.1093/nar/gkx1150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pfreundt U, Kopf M, Belkin N, Berman-Frank I, Hess WR. The primary transcriptome of the marine diazotroph Trichodesmium erythraeum IMS101. Sci Rep. 2015;4:6187. doi: 10.1038/srep06187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Miller JL, Le CJ, Hodes A, Barbalat R, Miller JF, Ghosh P. Selective ligand recognition by a diversity-generating Retroelement variable protein. Bjorkman PJ, editor. PLoS Biol. 2008;6:e131. doi: 10.1371/journal.pbio.0060131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Arambula D, Wong W, Medhekar BA, Guo H, Gingery M, Czornyj E, et al. Surface display of a massively variable lipoprotein by a legionella diversity-generating retroelement. Proc Natl Acad Sci. 2013;110:8212 LP–8218217. doi: 10.1073/pnas.1301366110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Paul BG, Burstein D, Castelle CJ, Handa S, Arambula D, Czornyj E, et al. Retroelement-guided protein diversification abounds in vast lineages of bacteria and Archaea. Nat MicrobiolNature Publishing Group. 2017;2:17045. [DOI] [PMC free article] [PubMed]
- 22.Guo H, Arambula L, Ghosh P, Miller JF. Diversity-generating Retroelements in phage and bacterial genomes. Washington, DC: ASM Press; 2015. pp. 1237–1252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Naorem SS, Han J, Wang S, Lee WR, Heng X, Miller JF, et al. DGR mutagenic transposition occurs via hypermutagenic reverse transcription primed by nicked template RNA. Proc Natl Acad Sci U S A. 2017;114:E10187–E10195. doi: 10.1073/pnas.1715952114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Liu M, Deora R, Doulatov SR, Gingery M, Eiserling FA, Preston A, et al. Reverse transcriptase-mediated tropism switching in Bordetella bacteriophage. Science (80- ) 2002;295:2091–2094. doi: 10.1126/science.1067467. [DOI] [PubMed] [Google Scholar]
- 25.Le Coq J, Ghosh P. Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement. Proc Natl Acad Sci U S A. 2011;108:14649–14653. doi: 10.1073/pnas.1105613108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Roux S, Paul BG, Bagby SC, Allen MA, Attwood G, Cavicchioli R, et al. Ecology and molecular targets of hypermutation in the global microbiome[cited 2020 May 25]; Available from. 10.1101/2020.04.01.020958. [DOI] [PMC free article] [PubMed]
- 27.Schillinger T, Zingler N. The low incidence of diversity-generating retroelements in sequenced genomes. Mob Genet Elem. 2012;2:287–291. doi: 10.4161/mge.23244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Yan F, Yu X, Duan Z, Lu J, Jia B, Qiao Y, et al. Discovery and characterization of the evolution, variation and functions of diversity-generating retroelements using thousands of genomes and metagenomes. BMC Genomics. 2019;20:595. doi: 10.1186/s12864-019-5951-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Paul BG, Bagby SC, Czornyj E, Arambula D, Handa S, Sczyrba A, et al. Targeted diversity generation by intraterrestrial archaea and archaeal viruses. Nat Commun. 2015;6:1–8. doi: 10.1038/ncomms7585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Kopf M, Möke F, Bauwe H, Hess WR, Hagemann M. Expression profiling of the bloom-forming cyanobacterium Nodularia CCY9414 under light and oxidative stress conditions. ISME J. 2015;9:2139–2152. doi: 10.1038/ismej.2015.16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Voß B, Bolhuis H, Fewer DP, Kopf M, Möke F, Haas F, et al. Insights into the physiology and ecology of the brackish-water-adapted Cyanobacterium Nodularia spumigena CCY9414 based on a genome-Transcriptome analysis. Janssen PJ, editor. PLoS One. 2013;8:e60224. doi: 10.1371/journal.pone.0060224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hoving JC, Wilson GJ, Brown GD. Signalling C-type lectin receptors, microbial recognition and immunity. Cell Microbiol. 2014;16:185–94. [DOI] [PMC free article] [PubMed]
- 33.del Fresno C, Iborra S, Saz-Leal P, Martínez-López M, Sancho D. Flexible signaling of myeloid C-type Lectin receptors in immunity and inflammation. Front Immunol. 2018;9:804. doi: 10.3389/fimmu.2018.00804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zelensky AN, Gready JE. The C-type lectin-like domain superfamily. FEBS J. 2005;272:6179–6217. doi: 10.1111/j.1742-4658.2005.05031.x. [DOI] [PubMed] [Google Scholar]
- 35.Guo H, Tse LV, Nieh AW, Czornyj E, Williams S, Oukil S, et al. Target site recognition by a diversity-generating Retroelement. Burkholder WF, editor. PLoS Genet. 2011;7:e1002414. doi: 10.1371/journal.pgen.1002414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Barabas O, Ronning DR, Guynet C, Hickman AB, Ton-Hoang B, Chandler M, et al. Mechanism of IS200/IS605 family DNA Transposases: activation and transposon-directed target site selection. Cell. 2008;132:208–220. doi: 10.1016/j.cell.2007.12.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.He S, Corneloup A, Guynet C, Lavatine L, Caumont-Sarcos A, Siguier P, et al. The IS200/IS605 family and “Peel and paste” single-strand transposition mechanism. Microbiol SpectrAmerican Society for Microbiology. 2015;3:609-30. [DOI] [PubMed]
- 38.Hanks SK, Hunter T. Protein kinases 6. The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. FASEB J. FASEB. 1995;9:576–596. doi: 10.1096/fasebj.9.8.7768349. [DOI] [PubMed] [Google Scholar]
- 39.Stock AM, Robinson VL, Goudreau PN. Two-component signal transduction. Annu Rev Biochem. 2000;69:183–215. doi: 10.1146/annurev.biochem.69.1.183. [DOI] [PubMed] [Google Scholar]
- 40.Janczarek M, Vinardell J-M, Lipa P, Karaś M. Hanks-type serine/threonine protein kinases and phosphatases in bacteria: roles in signaling and adaptation to various environments. Int J Mol Sci. 2018;19:2872. doi: 10.3390/ijms19102872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Libby EA, Goss LA, Dworkin J. The eukaryotic-like Ser/Thr kinase PrkC regulates the essential WalRK two-component system in Bacillus subtilis. PLoSGenetPublic Library of Science. 2015;11:e1005275. [DOI] [PMC free article] [PubMed]
- 42.Mijakovic I, Macek B. Impact of phosphoproteomics on studies of bacterial physiology. FEMS Microbiol Rev. 2012;36:877–92. [DOI] [PubMed]
- 43.Dworkin J. Ser/Thr phosphorylation as a regulatory mechanism in bacteria. Curr Opin MicrobiolElsevier Ltd. 2015;24:47–52. [DOI] [PMC free article] [PubMed]
- 44.Shi L, Pigeonneau N, Ravikumar V, Dobrinic P, Macek B, Franjevic D, et al. Cross-phosphorylation of bacterial serine/threonine and tyrosine protein kinases on key regulatory residues. Front MicrobiolFrontiers Research Foundation. 2014;5:495. [DOI] [PMC free article] [PubMed]
- 45.Shi L, Ji B, Kolar-Znika L, Boskovic A, Jadeau F, Combet C, et al. Evolution of bacterial protein-tyrosine kinases and their relaxed specificity toward substrates. Genome Biol Evol. 2014;6:800–817. doi: 10.1093/gbe/evu056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Capra EJ, Perchuk BS, Skerker JM, Laub MT. Adaptive mutations that prevent crosstalk enable the expansion of paralogous signaling protein families. Cell. 2012;150:222–232. doi: 10.1016/j.cell.2012.05.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Aravind L, Dixit VM, Koonin EV, Aravind L, Dixit VM, Koonin EV, et al. The domains of death: evolution of the apoptosis machinery. Trends Biochem Sci. 1999;24:47–53. doi: 10.1016/S0968-0004(98)01341-3. [DOI] [PubMed] [Google Scholar]
- 48.Klemenčič M, Novinec M, Dolinar M. Orthocaspases are proteolytically active prokaryotic caspase homologues: the case of Microcystis aeruginosa. Mol Microbiol. 2015;98:142-50. [DOI] [PubMed]
- 49.Spungin D, Bidle KD, Berman-Frank I. Metacaspase involvement in programmed cell death of the marine cyanobacterium Trichodesmium. Environ Microbiol. 2018; 21:667-81. [DOI] [PubMed]
- 50.Asplund-Samuelsson J. The art of destruction: revealing the proteolytic capacity of bacterial caspase homologs. Mol Microbiol. 2015;98:1–6. doi: 10.1111/mmi.13111. [DOI] [PubMed] [Google Scholar]
- 51.Kenneth Allan R, Ratajczak T. Versatile TPR domains accommodate different modes of target protein recognition and function. Cell Stress ChaperonesSpringer. 2011;16:353–67. [DOI] [PMC free article] [PubMed]
- 52.van der Voorn L, Ploegh HL. The WD-40 repeat. FEBS Lett. 1992;307:131–134. doi: 10.1016/0014-5793(92)80751-2. [DOI] [PubMed] [Google Scholar]
- 53.Tewari R, Bailes E, Bunting KA, Coates JC. Armadillo-repeat protein functions: questions for little creatures. Trends Cell BiolElsevier Current Trends. 2010;20:470–81. [DOI] [PubMed]
- 54.Colombatti A, Bonaldo P, Doliana R. Type a modules: interacting domains found in several non-Fibrillar collagens and in other extracellular matrix proteins. Matrix. 1993;13:297–306. doi: 10.1016/S0934-8832(11)80025-9. [DOI] [PubMed] [Google Scholar]
- 55.Swiderski MR, Birker D, Jones JDG. The TIR domain of TIR-NB-LRR resistance proteins is a signaling domain involved in cell death induction. Mol Plant Microbe Interact. 2009;22:157–165. doi: 10.1094/MPMI-22-2-0157. [DOI] [PubMed] [Google Scholar]
- 56.Spear AM, Loman NJ, Atkins HS, Pallen MJ. Microbial TIR domains: not necessarily agents of subversion? Trends Microbiol. 2009;17:393–398. doi: 10.1016/j.tim.2009.06.005. [DOI] [PubMed] [Google Scholar]
- 57.Ho YS, Burden LM, Hurley JH. Structure of the GAF domain, a ubiquitous signaling motif and a new class of cyclic GMP receptor. EMBO J. 2000;19:5288–5299. doi: 10.1093/emboj/19.20.5288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Sobotka R, Dühring U, Komenda J, Peter E, Gardian Z, Tichy M, et al. Importance of the cyanobacterial Gun4 protein for chlorophyll metabolism and assembly of photosynthetic complexes. J Biol Chem. 2008;283:25794–25802. doi: 10.1074/jbc.M803787200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Aravind L, Koonin EV. Classification of the caspase-hemoglobinase fold: detection of new families and implications for the origin of the eukaryotic separins. Proteins Struct Funct Genet. 2002;46:355–367. doi: 10.1002/prot.10060. [DOI] [PubMed] [Google Scholar]
- 60.Ohmori M, Ikeuchi M, Sato N, Wolk P, Kaneko T, Ogawa T, et al. Characterization of genes encoding multi-domain proteins in the genome of the filamentous nitrogen-fixing Cyanobacterium anabaena sp. strain PCC 7120. DNA Res. 2001;8:271-84. [DOI] [PubMed]
- 61.Asplund-Samuelsson J, Bergman B, Larsson J. Prokaryotic Caspase homologs: phylogenetic patterns and functional characteristics reveal considerable diversity. Driks a, editor. PLoS One. 2012;7:e49888. doi: 10.1371/journal.pone.0049888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Mammen M, Choi S-K, Whitesides GM. Polyvalent interactions in biological systems: implications for design and use of multivalent ligands and inhibitors. Angew Chem Int Ed. 1998;37:2754–2794. doi: 10.1002/(SICI)1521-3773(19981102)37:20<2754::AID-ANIE2754>3.0.CO;2-3. [DOI] [PubMed] [Google Scholar]
- 63.Sharifi F, Ye Y. MyDGR: a server for identification and characterization of diversity-generating retroelements. Nucleic Acids Res. 2019;47:W289–W294. doi: 10.1093/nar/gkz329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
- 65.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 66.Siguier P. ISfinder: the reference Centre for bacterial insertion sequences. Nucleic Acids Res. 2006;34:D32–D36. doi: 10.1093/nar/gkj014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Hug LA, Castelle CJ, Wrighton KC, Thomas BC, Sharon I, Frischkorn KR, et al. Community genomic analyses constrain the distribution of metabolic traits across the Chloroflexi phylum and indicate roles in sediment carbon cycling. Microbiome. 2013;1:1–17. doi: 10.1186/2049-2618-1-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39:W29–W37. doi: 10.1093/nar/gkr367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Bateman A. The Pfam protein families database. Nucleic Acids Res. 2002;30:276–280. doi: 10.1093/nar/30.1.276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Price MN, Dehal PS, Arkin AP. FastTree 2 - approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5:e9490. doi: 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal omega. Mol Syst Biol. 2011;7:539. doi: 10.1038/msb.2011.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics. 2019;36:1925–1927. doi: 10.1093/bioinformatics/btz848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol. 2018;36:996–1004. doi: 10.1038/nbt.4229. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Phylogenetic trees and sequence alignments are available on TreeBase at the following URL: http://purl.org/phylo/treebase/phylows/study/TB2:S26861. Additional datasets supporting the conclusions of this article are included within the article and its additional files.