(A and B) Protein domain co-occurrence matrices of transcription factor (TF) (A) or extracellular matrix (ECM)-related gene families (B), inferred at the LCA of Metazoa (≥90% probability). Horizontal and vertical axes of the heatmap represent individual protein domains and their mutual co-occurrence frequency, and have been clustered according to the number of shared domains (dendrogram based on Spearman correlation distances and Ward clustering algorithm). Note that, for TFs, most co-occurrence clusters are located along the diagonal, indicating isolated domain communities; whereas ECM genes tend to contain promiscuous domains shared in multiple domain co-occurrence communities. Representative examples of independent and promiscuous domain clusters have been highlighted in both heat maps (orange and pink, respectively). (C) Phylogenetic tree of LIM Homeobox TFs, with mapped protein domains architectures. (D) Phylogenetic tree of CBP/p300 TFs based on HAT/KAT11 domain, with mapped consensus protein domain architectures. (E) Phylogeny of type IV collagen genes based on the C4 domain. All extant homologs, from Ministeria to animals, have a C4-C4 dual arrangement of filozoan origin (reflected in the phylogeny by two parallel clades representing the first and second domains within each gene). Ministeria (orange) and human (blue) homologs are highlighted. In C, D and E panels, bold branches represent unicellular holozoan genes and are color-coded by taxonomic assignment. All trees are Bayesian inferences (BI). Protein domain architectures and statistical supports (BPP/UFBS) are shown for selected nodes (see Figure 9—figure supplement 1 for the complete BI and ML trees with statistical supports). Figure 7—source data 1 and 2.
DOI:
http://dx.doi.org/10.7554/eLife.26036.034