Skip to main content
Elsevier Sponsored Documents logoLink to Elsevier Sponsored Documents
. 2010 Sep;26(9-3):388–393. doi: 10.1016/j.tig.2010.06.004

Lineage-specific expansion of DNA-binding transcription factor families

Varodom Charoensawan 1, Derek Wilson 1,1, Sarah A Teichmann 1
PMCID: PMC2937223  PMID: 20675012

Abstract

DNA-binding domains (DBDs) are essential components of sequence-specific transcription factors (TFs). We have investigated the distribution of all known DBDs in more than 500 completely sequenced genomes from the three major superkingdoms (Bacteria, Archaea and Eukaryota) and documented conserved and specific DBD occurrence in diverse taxonomic lineages. By combining DBD occurrence in different species with taxonomic information, we have developed an automatic method for inferring the origins of DBD families and their specific combinations with other protein families in TFs. We found only three out of 131 (2%) DBD families shared by the three superkingdoms.

Phylogenetic analysis of DNA-binding transcription factor families

All sequence-specific transcription factors (TFs) contain DNA-binding domains (DBDs), evolutionary units that mediate the specificity of the TF–DNA interaction. Domain-based analysis of TFs is thus effective functionally as well as phylogenetically. TFs and their binding targets have been under intensive study, and previous key publications on TFs and DBDs tended to focus on specific phylogenetic groups [1–4]. Here, we analyze the distribution of all known DBDs in 538 organisms from superkingdoms Bacteria, Archaea and Eukaryota. TF and DBD classifications were obtained from the DBD database [5], a transcription factor resource that annotates TFs based on the presence of DBDs from a manually curated list. The DBD database predicts TFs in all publicly available genomes from diverse phylogenetic lineages using a single platform, and is thus an ideal resource for exploring the phylogenetic distribution of TF families across the tree of life. We provide an overview of conserved and lineage-specific DBD families, using 131 Pfam domains [6] classified as DBDs to illustrate our findings. Note that what we discuss here for Pfam DBDs applies also to 87 SCOP families [7] classified manually as DBDs by the DBD database (see the supplementary material online for a complete list of genomes and DBD families).

TF DBD families are highly lineage-specific

Earlier, we have introduced a heatmap representation to aid visualisation of the expansion and contraction of DBD families in order to investigate the distribution of DBDs in different lineages [5] (Figure 1a). Each column of the heatmap corresponds to a DBD family and each row represents a species. Species are ordered according to the NCBI taxonomic tree, an expertly curated taxonomic hierarchy [8]. The Z-score of a number of TFs containing a particular DBD family of interest was calculated for each family to highlight the organisms in which the family is expanded relative to other species. Orange indicates positive Z-scores and thus a relative expansion of the DBD in that particular lineage and blue indicates negative Z-scores or a contraction. The distinct expansion pattern of different groups of DBDs in prokaryotes and eukaryotes implies that the DBD families are highly specific to two types of cells: nucleate and anucleate.

Figure 1.

Figure 1

Lineage-specific expansion patterns of DBD families. (a) The heatmap demonstrates the specific expansion patterns of DBD families between eukaryotic and prokaryotic genomes. Columns correspond to DBD families hierarchically clustered by their occurrence patterns in different species. Rows represent species ordered using the NCBI taxonomy. Orange indicates relative DBD expansion and blue represents contraction. The vertical coloured bars to the left of the heatmap indicate superkingdoms, kingdoms, or phyla to which species belong. Eukaryota (red) is divided into three kingdoms: Metazoa (pink), Fungi (orange) and Viridiplantae (yellow). Euryarchaea and Crenarchaea are labelled in pale and dark green, respectively. Bacteria are labelled using shades of blue: Actinobacteria (purple), Firmicutes (navy) and Proteobacteria (pale blue). The white areas in the right-hand coloured bar are species that do not belong to the main kingdoms/phyla mentioned above, e.g. protists and choanoflagellate. Specific patterns of occurrence were observed within the eukaryotic species. At the right-hand side is shown the detailed expansion patterns of selected eukaryotic lineages: protists including Mycetozoa (Dictyostelid) and Stramenopiles, animals including V for vertebrates and I for invertebtrates, MB for M. brevicollis (choanoflagellate), fungi, and plants including S for streptophyta (land plants) and C for chlorophyta (green algae). (b) A Venn diagram representing the number of Pfam DBD families that have taxonomic limits belonging to the three main superkingdoms. Only 19 out of 131 (15%) DBDs were found in more than one superkingdom, whereas most of these DBDs are shared by Bacteria and Archaea but not by Eukaryota. Only three DBD families (CSD, HTH_psq, and HTH_3) are shared by all of the superkingdoms.

In addition to the heatmap, we have developed a new simple method for inferring the origin of protein domains. By combining DBD family occurrence with taxonomic information from the NCBI taxonomy tree, we demonstrate that the method is able to estimate when each DBD family emerged. We term this the taxonomic limit. The same method is used to estimate when the combinations of DBDs and other protein families in TFs emerged. We provide the taxonomic conservation density, which is the fraction of species containing the DBD out of the total number of species within taxonomic clades (see Box 1 for an example of the calculation steps and see the supplementary material online for a complete list of taxonomic limits and conservation densities).

Box 1. Taxonomic limits of DBD families.

We have developed an automatic method for inferring the origins of DBD families by combining DBD occurrence in different species with taxonomic information. Although there are similar methods (e.g. Refs [31–33]) that use protein content profiles and species trees to reconstruct evolutionary scenarios, they are not identical with our method and are not used for the same purpose (see the supplementary material online for a detailed discussion).

To obtain a taxonomic limit for a particular DBD family D, we first collected all species that have the DBD family detected in their genomes and computed TD,i,, the number of TFs containing the family D (tD,i), normalised by the number of genes (Gi) in species i (Equation I). On the basis of the NCBI taxonomic tree, the last common ancestor (LCA) between each species and all other species that share the DBD of interest was derived. This step was repeated for all possible pairs of species, in an all-against-all fashion (all possible pairs of i,j species that contain family D). For each pair of species i,j, the average number of TFs containing family D (TD,i,j) was computed (Equation II).

TD,i=tD,iGi (I)
TD,i,j=TD,i+TD,j2 (II)
FD,X=i,jTD,X,i,ji,jTD,i,j (III)

We defined the frequency fraction of a taxonomic node X (FD,X), as the ratio of the sum of normalised TFs containing family D, sharing the LCA at node X (i,jTD,X,i,j), over the sum of all normalized TFs containing family D in all taxonomic nodes (i,jTD,i,j) (Equation III). We identified the most frequent LCA (highest frequency fraction node) to be the taxonomic limit of this DBD family. However, the bias due to different numbers of genomes in different branches (e.g. Proteobacteria dominate Bacteria) might decrease the accuracy of taxonomic limit estimation. We corrected the estimation by shifting the taxonomic limit to the parental node if its frequency fraction over the highest frequency fraction was greater than a cut-off of 0.2 (see the supplementary material online for the calibration of the method and cut-off threshold).

In addition to the taxonomic limit, we calculated the fraction of species containing the DBD over the total number of species under the taxonomic limit. We termed this the taxonomic conservation density. DBD families that emerged from the same speciation event should be detectable in most of the children species (taxonomic conservation density close to 1). In contrast, the DBDs that are observed sporadically in taxonomically distant lineages (small conservation density), are likely to have been disseminated through horizontal gene transfer or have gone through multiple gene loss events. Figure I demonstrates how the method operates using a simplified taxonomy tree.

Using our method, we found that 19 out of 131 (15%) DBDs have cellular organisms as their taxonomic limits (shared by more than one superkingdom). Eleven of these DBDs are shared by Archaea and Bacteria but not Eukaryota, and only three (2%) are shared by all three superkingdoms (Figure 1b). When we apply the same method to all Pfam domains, we observed that 33% have cellular organisms as taxonomic limits, suggesting that the repertoires of DBD families are more lineage-specific than proteins with other functions. This conclusion is in line with the results of an earlier study that used a different method [9].

Uniform expansion pattern of DBD families in prokaryotes

Focusing on the prokaryotic genomes, helix-turn-helix is by far the commonest DBD structure [1]. The majority of prokaryotic DBDs belong to the winged helix structural class, which might explain the uniform expansion of DBD occurrence observed here. Archaea are thought to be phylogenetically closer to Eukaryota and have more closely related core components of transcription machinery, such as RNA polymerases and basal TFs [1,4]. Interestingly, our heatmap and taxonomic limit assignments suggest a greater number of archaeal DBDs shared with Bacteria than with Eukaryota. Examples of DBDs shared by the two prokaryotic superkingdoms Archaea and Bacteria are Fe_dep_repress (iron-dependent repressor), MarR (antibiotic resistance) and NikR (nickel-responsive regulator). These DBD families regulate specific genes required for adaptation to environmental stress, and might have been established and maintained through multiple horizontal gene transfers [1,10].

The heatmap shown in Figure 1a suggests that the prokaryotic DBD distribution is widespread among the prokaryotic species and there is no clearly distinguishable expansion scheme within the three major bacterial phyla in our dataset: Actinobacteria, Firmicutes and Proteobacteria. Indeed, we find that 30 out of 61 (49%) bacterial DBDs have Bacteria as their taxonomic limit (shared by more than one phylum). These shared DBDs participate in basic carbon source metabolism, e.g. HTH_AraC, LacI and Gnt, as well as in more specific processes, such as FUR (ferric uptake regulator), MerR (mercury resistance) and HTH_8 (virulence gene expression).

Examples of prokaryotic phylum-specific DBDs include WhiB, a DBD specific to Actinobacteria that regulates mycelium formation. The FlhC and FlhD TFs, with Proteobacteria (Gram-negative) as their taxonomic limit, have been shown to be global regulators involved in many cellular processes, including flagella transcriptional activators [11]. On the basis of their restricted phylogenetic distribution and flagella regulation, they might be linked to the Gram-negative's four-support-ring flagella, as opposed to the Gram-positive's two-support-ring flagella. Additional discussions on lineage-specific DBD families and the biological processes they are implicated in are available in the supplementary material online.

The eukaryotic DBD repertoire is highly specific at the kingdom level

In contrast to the uniform DBD occurrence in Bacteria, Figure 1a shows more distinct expansion patterns among the three main eukaryotic kingdoms: Metazoa (animals), Fungi and Viridiplantae (plants). Indeed, a relatively small proportion (29%) of eukaryotic DBD families have Eukaryota as their taxonomic limit. These eukaryotic families include the zinc finger families, HLH (helix-loop-helix) and bZIPs (basic leucine zippers). In addition, the homeobox family, well known for its role in morphogenesis and animal body development [12], is found throughout eukaryotic organisms, including fungi and plants.

The most notable difference in the Metazoa is between vertebrates and invertebrates. Although the majority of DBDs found in animals are present in both groups, the expansion tends to be less pronounced in the invertebrates. The DBDs with particularly extensive expansion in vertebrates include STAT (signal transduction), T-box (body plan and organogenesis) and p53 (cell cycle arrest and apoptosis). DBDs such as IRF (interferon regulator factor) and Churchill (neural development) are absent from invertebrates, which might reflect the more elaborate immune and nervous systems in vertebrates. In contrast, the Runt and GCM families regulate fundamental developmental processes in both vertebrates and invertebrates, and are equally expanded in both groups.

Metazoa and Fungi are phylogenetically closer and share more DBD families with Viridiplantae (see the supplementary material online). In accordance with earlier work [13], we observed a number of fungal-specific DBDs, including Zn2/Cys6 (Zn cluster), and Copper-fist (copper utilisation). Interestingly, HTH_AraC (arabinose operon regulatory) and FMN (flavin mononucleotide) binding domains are exceptional cases of bacterial DBDs broadly found across Fungi. These families have been shown experimentally to be involved in sugar uptake [14] and sporulation regulation [15] in Bacteria. Their functionality in Fungi has yet to be investigated. Plants possess a number of mainly plant-specific DBDs, such as AP2 (activation of defence genes) and SBP (flowering development).

Apart from the three major kingdoms, we observe an interesting DBD occurrence in the unicellular eukaryote Monosiga brevicollis, a marine choanoflagellate that is thought to be the closest sequenced unicellular relative of animals [16]. Earlier studies showed that the species contains a considerable amount of signalling components in common with animals [17]. Besides the more elaborate signalling machineries, uni- to multicellular transitions also require a greater amount of components that contribute to the more complex genetic regulatory networks in functionally diverse cell types [18]. One possible way to enhance the regulation capacity is by recruiting novel sets of TFs. We observed DBDs common to the fungi/animal group in M. brevicollis, and many DBDs specific to animals (MB, Figure 1a). Among these DBDs there are families that regulate animal-specific processes such as STAT (signal transduction), p53 (apoptosis) and Tub (nervous system development), as well as those involved in more general pathways like E2F/DP (cell cycle).

In addition, we observed several interesting DBD occurrences in rare protist genomes (Figure 1a). For example, STAT and WRKY were detected in Dictyostelid [19,20] and are detected in our dataset. We note the occurrence of two DBDs thought to be plant-specific DBDs in protists. Apart from AP2, which was detected in apicomplexa [21], we discovered a rare presence of the zinc finger LSD1 in many euglenozoa for the first time. Our understanding of transcriptional regulation and the number of sequenced genomes in these protists are, however, still very limited.

Variety in domain architecture adds complexity to TF structures

TFs have DBDs as core components and often contain other protein domains of different functions, which we term partner domains. In Figure 2, we use a network-style representation to provide an overview of the most commonly occurring TF architectures (those occuring in >5% of TFs in each family). Using our taxonomic limit method to infer the origins of the DBD–partner domain combinations, we observed many lineage-specific TF architectures on top of lineage-specific DBDs (different coloured arrows connecting domain nodes in Figure 2).

Figure 2.

Figure 2

Network representation of DBD families and partner domains. (a) Examples of network representation of bacterial TF architectures. DBDs are shown as oblongs in protein chains and as circular nodes in our network representation. Partner domains are shown as rectangles in protein chains and as squares in the network representation. DBDs, e.g. HTH_1 and Fe_dep_repress, and their adjacent partner domains, e.g. LysR_substrate and Fe_dep_repr_C, are linked by unbroken arrows, pointing in the N- to C-terminal orientation. Broken arrows connect DBDs and partner domains that occur in the same TF chain but are not adjacent to DBDs, e.g. FeoA. Numbers on the top of each domain indicate its order from N- to C-terminus. Node sizes and arrow thickness are proportional to the abundance of domains and their combination, respectively. Coloured nodes and arrows indicate phylum-specific domain occurrence and domain combination, obtained from the taxonomic limit method described in Box 1 (e.g. the blue arrow linking HTH_1 to LysR_substrate indicates the combination is common to all Bacteria). Colour codes are as described for Figure 1. A white node means that the DBD is shared with other superkingdoms, e.g. HTH_1 and Fe_dep_repress are shared by Archaea. DBDs that occur alone as single-domain TFs in more than 25% of all their architectural patterns have orange borders, e.g. FlhC (see the supplementary material online for a complete bacterial TF architectural network and statistics used to generate the network). (b) Lineage-specific TF architectures in eukaryotes. A eukaryote-specific Tub DBD (represented by red oblongs in protein chains and by a circular node in our network) has distinct domain combinations in animals and plants. Although the Tub DBD occurs in single-domain TFs without a partner throughout eukaryotic species (green border), in animals, it occurs also C-terminal to the animal-specific SOCS_box (shown as a pink square node, a pink arrow indicates an animal-specific architecture). In contrast, the Tub DBD co-occurs with the F-box domain in plants, a eukaryote-specific partner domain (shown as a red square). This combination is observed exclusively in land plants (linked by a yellow arrow). (c) A network representing eukaryotic TF architectures. All architectures that occur in more than 5% of TFs in each DBD family are shown. DBDs that occur alone as single-domain TFs in more than 25% of all their architectural patterns have green borders. We observed the repetition of the same DBD within a TF (self-looping arrow) in 29% of eukaryotic DBDs, whereas DBD repeats in prokaryotes were observed in only one bacterial DBD, HTH_AraC.

The combinations between DBDs (circular nodes) and their partner domains (square nodes) in bacterial TFs are mostly (31 out of 44) shared by more than one bacterial phylum (see supplementary material online for a complete bacterial TF architectural network). For instance, HTH_1 (lysR family), the most abundant DBD in prokaryotes, is always located upstream of the LysR substrate-binding domain (Figure 2a). The blue arrow linking the two domains indicates that this architecture is broadly conserved in all Bacteria. A few TF architectures, such as in Fe_dep_repress, are conserved in all Bacteria as well as in Archaea. In agreement with earlier observations [22], we note that bacterial partner domains function predominantly in small molecule binding or two-component signal transduction. Interestingly, we observed that 16 out of 19 phylum-specific DBDs occur in single-domain TFs without a partner, e.g. FlhC. This is possibly because they emerged relatively recently and have not had sufficient time to combine with other domains to form more elaborate architectures.

Specific DBD–partner domain combinations are observed in animals, fungi and plants. The eukaryotic-specific Tub family, for instance, occurs in a single-domain TF in more than 25% of eukaryotic TFs (green border node). It occurs also downstream of the SOCS_box domain only in animals, and co-occurs exclusively with F-box in plants (Figure 2b). This family is absent from Fungi. These findings suggest that some eukaryotic DBDs have gained new regulatory modes by combining with different partner domains in different kingdoms.

Another distinctive feature of eukaryotic TF architectures not found in prokaryotes is the repetition of the same DBD family within a single TF chain (self-looping arrows in Figure 2c). DBD repeats are found in 22 out of 77 (29%) eukaryotic DBDs, mostly in the zinc fingers. Other DBDs in this category include CUT, E2F/DP and Tea. Additionally, AP2, B3 and WRKY are families that exhibit repeats exclusively in plants (yellow self-looping arrows). The function of repeated DBDs in eukaryotic TFs is most likely to boost the specificity and diversity of motif recognition at TF–DNA interfaces by increasing the number of possible DNA-binding sequences from a limited number of DBD families [23].

The partner domains in eukaryotic TFs have more diverse functions than those in Bacteria, and the commonest function is to mediate protein–protein interaction and dimerisation. This is thought to be important to the formation of composite protein modules, a crucial step towards combinatorial regulation. Examples of these families include BTB, Bromodomain, SAM, ANK and hATC.

Concluding remarks

DBDs are essential to all sequence-specific TFs because they regulate the specificity of TF–DNA binding, which in turn governs differential expression and determines physiological diversity in different species across the tree of life. With this analysis of conserved and lineage-specific DBDs, and TF architectures using our new method for inferring taxonomic limits, we contribute new insights into the global picture of the TF repertoire and its evolution. Our findings can facilitate the experimental design of high-throughput studies on transcriptional regulators, e.g. Refs [24–28]. In addition to providing an improved understanding on how different DBD families are related, our taxonomic inference methods can be applied to other protein domains apart from DBD families.

We demonstrate a limited conservation of DBD families between prokaryotes and eukaryotes. Only 15% of known DBDs have cellular organisms as their taxonomic limit, as opposed to 33% of all Pfam domains. Lineage-specific DBD repertoires can be seen at the eukaryotic kingdom level: only 29% of eukaryotic families are shared by more than two superkingdoms. Prokaryotic DBDs are less specific to the major bacterial phyla, with 49% of families being shared. In addition to DBD, the variety in DBD and partner domain combination adds another level of complexity to TF structures. The specific DBD families and TF architectures in different lineages can be used as signatures for the genetic regulatory circuits in diverse phylogenetic groups. Knowledge of the phylogenetic distribution of DBD families and their domain combinations can improve methods for remote homology detection [29,30] and advance the discovery of new TFs in genomes.

Figure I.

Figure I

Examples of the taxonomic limit and conservation density calculations for a DBD family using a simplified tree of life. Suppose a DBD family is detected in one TF per genome in one out of 20 eukaryotic genomes and 19 out of 20 bacterial genomes. As there are 20 genomes containing the DBD of interest, there are 190 possible ways of picking a pair of these 20 genomes (binomial coefficient of 20C2, i.e. choose 2 from 20). Out of these 190 pairs of genomes, 171 have Bacteria as their LCA. The frequency fraction of Bacteria is 0.9 (171/190), which is higher than that at the cellular organisms node of 0.1 (19/190). The frequency fraction ratio of cellular organisms over Bacteria is 0.11 (0.1/0.9), less than the cut-off threshold of 0.2. The method consequently identifies Bacteria as the taxonomic limit and regards the DBD found in the eukaryotic genome of Xenopus tropicalis as contamination. The taxonomic conservation density at the bacterial node is 0.95 (19/20), suggesting that the DBD emerged from the same speciation event rather than from horizontal gene transfer. The frequency fraction and conservation density at each node are shown in bold and bracketed italics, respectively.

Acknowledgements

We thank Cyrus Chothia, Daniel Hebenstreit, Joseph Marsh, Madan Mohan Babu and Benjamin Lang for critical commentary on the manuscript. This work was funded by the Medical Research Council and a Royal Thai Government Scholarship to V.C.

Footnotes

Appendix A

Supplementary data associated with this article can be found in the attached Supplementary Material document and from the authors’ project website. (http://www.mrc-lmb.cam.ac.uk/genomes/varodom/DBD_analysis_supplementary/) doi:10.1016/j.tig.2010.06.004.

Contributor Information

Varodom Charoensawan, Email: varodom@mrc-lmb.cam.ac.uk.

Sarah A. Teichmann, Email: sat@mrc-lmb.cam.ac.uk.

Appendix A. Supplementary data

mmc1.pdf (3MB, pdf)

References

  • 1.Aravind L., Koonin E.V. DNA-binding proteins and evolution of transcription regulation in the archaea. Nucleic Acids Res. 1999;27:4658–4670. doi: 10.1093/nar/27.23.4658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Riechmann J.L. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science. 2000;290:2105–2110. doi: 10.1126/science.290.5499.2105. [DOI] [PubMed] [Google Scholar]
  • 3.Coulson R.M., Ouzounis C.A. The phylogenetic diversity of eukaryotic transcription. Nucleic Acids Res. 2003;31:653–660. doi: 10.1093/nar/gkg156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Perez-Rueda E. Phylogenetic distribution of DNA-binding transcription factors in bacteria and archaea. Comput. Biol. Chem. 2004;28:341–350. doi: 10.1016/j.compbiolchem.2004.09.004. [DOI] [PubMed] [Google Scholar]
  • 5.Wilson D. DBD–taxonomically broad transcription factor predictions: new content and functionality. Nucleic Acids Res. 2008;36:D88–D92. doi: 10.1093/nar/gkm964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Finn R.D. The Pfam protein families database. Nucleic Acids Res. 2010;38:D211–D222. doi: 10.1093/nar/gkp985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wilson D. SUPERFAMILY–sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 2009;37:D380–D386. doi: 10.1093/nar/gkn762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Benson D.A. GenBank. Nucleic. Acids Res. 2009;37:D26–D31. doi: 10.1093/nar/gkn723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wunderlich Z., Mirny L.A. Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 2009;25:434–440. doi: 10.1016/j.tig.2009.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kunin V. The net of life: reconstructing the microbial phylogenetic network. Genome Res. 2005;15:954–959. doi: 10.1101/gr.3666505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Pruss B.M. FlhD/FlhC-regulated promoters analyzed by gene array and lacZ gene fusions. FEMS Microbiol. Lett. 2001;197:91–97. doi: 10.1111/j.1574-6968.2001.tb10588.x. [DOI] [PubMed] [Google Scholar]
  • 12.Pavlopoulos A., Akam M. Hox go omics: insights from Drosophila into Hox gene targets. Genome Biol. 2007;8:208. doi: 10.1186/gb-2007-8-3-208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Shelest E. Transcription factors in fungi. FEMS Microbiol. Lett. 2008;286:145–151. doi: 10.1111/j.1574-6968.2008.01293.x. [DOI] [PubMed] [Google Scholar]
  • 14.Saviola B. Arm-domain interactions in AraC. J. Mol. Biol. 1998;278:539–548. doi: 10.1006/jmbi.1998.1712. [DOI] [PubMed] [Google Scholar]
  • 15.Honjo M. A novel Bacillus subtilis gene involved in negative control of sporulation and degradative-enzyme production. J. Bacteriol. 1990;172:1783–1790. doi: 10.1128/jb.172.4.1783-1790.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.King N. The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature. 2008;451:783–788. doi: 10.1038/nature06617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pincus D. Evolution of the phospho-tyrosine signaling machinery in premetazoan lineages. Proc. Natl. Acad. Sci. U. S. A. 2008;105:9680–9684. doi: 10.1073/pnas.0803161105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rokas A. The molecular origins of multicellular transitions. Curr. Opin. Genet. Dev. 2008;18:472–478. doi: 10.1016/j.gde.2008.09.004. [DOI] [PubMed] [Google Scholar]
  • 19.Brown J.M., Firtel R.A. Regulation of cell-fate determination in Dictyostelium. Dev. Biol. 1999;216:426–441. doi: 10.1006/dbio.1999.9485. [DOI] [PubMed] [Google Scholar]
  • 20.Babu M.M. The natural history of the WRKY-GCM1 zinc fingers and the relationship between transcription factors and transposons. Nucleic Acids Res. 2006;34:6505–6520. doi: 10.1093/nar/gkl888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Balaji S. Discovery of the principal specific transcription factors of Apicomplexa and their implication for the evolution of the AP2-integrase DNA binding domains. Nucleic Acids Res. 2005;33:3994–4006. doi: 10.1093/nar/gki709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Martinez-Antonio A. Internal-sensing machinery directs the activity of the regulatory network in Escherichia coli. Trends Microbiol. 2006;14:22–27. doi: 10.1016/j.tim.2005.11.002. [DOI] [PubMed] [Google Scholar]
  • 23.Itzkovitz S. Coding limits on the number of transcription factors. BMC Genomics. 2006;7:239. doi: 10.1186/1471-2164-7-239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Mukherjee S. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat. Genet. 2004;36:1331–1339. doi: 10.1038/ng1473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hallikas O., Taipale J. High-throughput assay for determining specificity and affinity of protein-DNA binding interactions. Nat. Protoc. 2006;1:215–222. doi: 10.1038/nprot.2006.33. [DOI] [PubMed] [Google Scholar]
  • 26.Gilad Y. Expression profiling in primates reveals a rapid evolution of human transcription factors. Nature. 2006;440:242–245. doi: 10.1038/nature04559. [DOI] [PubMed] [Google Scholar]
  • 27.Meng X., Wolfe S.A. Identifying DNA sequences recognized by a transcription factor using a bacterial one-hybrid system. Nat. Protoc. 2006;1:30–45. doi: 10.1038/nprot.2006.6. [DOI] [PubMed] [Google Scholar]
  • 28.Deplancke B. A gateway-compatible yeast one-hybrid system. Genome Res. 2004;14:2093–2101. doi: 10.1101/gr.2445504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Coin L. Enhanced protein domain discovery by using language modeling techniques from speech recognition. Proc. Natl. Acad. Sci. U. S. A. 2003;100:4516–4520. doi: 10.1073/pnas.0737502100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Coin L. Enhanced protein domain discovery using taxonomy. BMC Bioinformatics. 2004;5:56. doi: 10.1186/1471-2105-5-56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Snel B. Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res. 2002;12:17–25. doi: 10.1101/gr.176501. [DOI] [PubMed] [Google Scholar]
  • 32.Kunin V., Ouzounis C.A. GeneTRACE-reconstruction of gene content of ancestral species. Bioinformatics. 2003;19:1412–1416. doi: 10.1093/bioinformatics/btg174. [DOI] [PubMed] [Google Scholar]
  • 33.Mirkin B.G. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 2003;3:2. doi: 10.1186/1471-2148-3-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.pdf (3MB, pdf)

RESOURCES