(
A) Histogram of PhyloCSF scores for C-terminal extensions. Blue: phylogenetically predicted extensions that were confirmed in our datasets. Yellow: unpredicted extensions discovered in our datasets. Gray: global distribution of all potential extensions. The distribution of novel extensions is not substantially different from the global distribution, suggesting that many of these extensions are not phylogenetically conserved beyond
melanogaster. Source data may be found in supplementary table 2 (at Dryad:
Dunn et al., 2013). (
B) A second Z-curve classifier was trained on 81-nucleotide windows of coding regions, and 81-nucleotide windows of distal 3′ UTRs, but excluding the last 50 bases of annotated UTR to remove potential effects of polyadenylation signals upon classifier scoring. As in
Figure 6B, predicted extensions overlay coding regions, and novel extensions display a significant shift in median from distal 3′ UTRs (p=3.81 × 10–22, Mann–Whitney U test), indicating the shift identified in
Figure 6B is not due to polyadenylation signals.