Figure 4.
Properties of the identified novel genes. (A) Heatmap of the Spearman rank correlation (rho) matrix computed from the expression estimates of the 368 novel genes expressed (i.e., with a log expression estimate >0) in at least one sample. The dendrogram was computed using complete-linkage clustering with distance specified as one minus the correlation coefficient. (B) Violin plots and overlaid box plots of sequence conservation (UCSC phastCons 100) values for known long non-coding (lnc)RNA, novel non-coding genes, novel potentially coding genes and coding genes annotated in Ensembl 75. The phastCons scores were obtained from multiple alignment of the human (hg19) sequences to the sequences of 99 other vertebrate species. (C) Violin plots and overlaid box plots of expression estimates (expressed as log2(+1), where μ is the real scale expression estimate) of known lncRNA, novel non-coding genes, novel potentially coding genes and coding genes annotated in Ensembl 75. (D) Violin plots and overlaid box plots of the expression specificity of known lncRNA, novel non-coding genes, novel potentially coding genes and coding genes annotated in Ensembl 75. (B-D) Pairwise comparisons for which the Wilcoxon signed-rank test yielded P<0.05 following Bonferroni adjustment are highlighted. Abbreviations as in Figure 1