(
A) Phylogenetic distribution of the NMD molecular toolkit across eukaryotes, as defined in
Whelan et al. (2015), with a focus on unicellular holozoans and selected metazoans. The analysis includes 12 gene families: the core regulatory factors Upf1, Upf2 and Upf3 (also Smg2-4); the accessory proteins Smg1, Smg5/6/7 and Smg8/9; the release factors eRF1 and eRF3; and the exon junction complex (EJC) proteins eIF4A3, Y14, Magoh and MLN51. Extant species are color-coded by taxonomic assignment. Black dots indicate reconstructed LCAs within the line of descent between Metazoa and the LECA, whereas gray dots indicate LCAs not affiliated with Metaoza. Red, hollow circles indicate that a specific gene is absent from a given animal genome, but nevertheless present in a close relative in the same lineage (e.g., poriferans other than
Amphimedon). See Methods for details on ortholog identification. Complete survey at the species level available as
Figure 5—source data 3. Note that the core NMD tool-kit is conserved in most post-LECA LCAs, both in the animal and ichthyophonid ancestry. Secondary losses affect extant taxa:
Corallochytrium lacks the complete EJC (only eIF4A3 is conserved) and homologs of Smg5/6/7 and Smg8/9. (
B) Phylogenetic distribution of the SR splicing factors involved in alternative splicing determination across eukaryotes, as defined in
Plass et al. (2008), with a focus on unicellular holozoans and selected metazoans. The analysis is focused on the following RNA-binding genes: SRP20/9G8 (human paralogs SRSF3/7), ASF (human paralogs SRSF1/9), SRP2 (human paralogs SRSF4/5/6), SRP1 (human paralogs SRSF2/8) and TRA2 (human paralogs TRA2A/B). Extant species are color-coded by taxonomic assignment. Black dots indicate reconstructed LCAs within the line of descent between Metazoa and the LECA, whereas gray dots indicate LCAs not affiliated with Metaoza. See Methods for details on ortholog identification. Complete survey at the species level available as
Figure 5—source data 3. Note that the complement of SR genes is conserved in most post-LECA LCAs, including Opisthokonta, Holozoa and Metazoa. Secondary losses are, however, frequent in some lineages: SRSF1/9 is lost in Fungi, Teretosporea and Filasterea;
Corallochytrium lacks all canonical SR genes but TRA2 and a fragmentary SRSF4/5/6; choanoflagellates have lost SRSF4/5/6, etc. (
C) Counts of RNA-binding protein domains in extant eukaryotic genomes. Unicellular holozoans have a rich complement of RNA-binding proteins (domains per species: average 127.8; median 120), albeit less abundant than metazoans’ (domains per species: average 225.3; median 187).