Fig. 1. Variability of upstream AUG (uAUG) prevalence among eukaryotes and evolutionary driving forces.
a Overview of the 216 eukaryotes analyzed in this study. The left panel is the cladogram of the 216 eukaryotes. The number of species in each clade is shown in brackets. The middle panel shows the total number of protein-coding genes in 35 representative species. Genes with an annotated 5′ untranslated regions (5′ UTR+) are colored by clade, and those without 5′ UTR annotation (5′ UTR-) are shown in gray. The unavailability of annotated 5′ UTRs for many genes in less-studied organisms is presumably caused by the lack of accurate annotations. The right panel shows the ratio of the observed number of uAUGs to the expected number of uAUGs (O/E ratio) in the 35 species. The error bars indicate the 95% confidence interval of the O/E ratio. b O/E ratios of uAUGs in sex chromosome (X or Z) genes (Sex, blue) and autosomal genes (Auto, red) in humans, mice, opossum, flies, and chickens. n = 1000 permutation replicates for each category of genes in each species. Center point, median; error bars, 95% confidence intervals. P values were obtained by two-sided Wilcoxon signed-rank tests, and no correction for multiple testing was made. c Relationship between the effective population size (Ne) and the O/E ratio of uORFs among 14 animals. The blue line indicates the local polynomial regression fit of the O/E ratio against Ne, and the gray band indicates the standard error of the fit. Spearman’s correlation (ρ) between Ne and the O/E ratio and the two-sided P value are shown in the plot. d Relationship between the genome-wide median number of nonsynonymous changes per nonsynonymous site over the number of synonymous changes per synonymous site (ω) of coding sequences (CDSs) and the O/E ratio of uORFs among 56 animals. The blue line indicates the local polynomial regression fit and the gray band indicates the standard error of the fit. Both Spearman’s correlation and the significance of the two-sided phylogenetic independent contrast (PIC) between ω and the O/E ratio (PPIC) are shown. Source data are provided as a Source Data file.