Skip to main content
. 2021 Apr 19;38(8):3397–3414. doi: 10.1093/molbev/msab111

Fig. 4.

Fig. 4.

(A) Distribution of feature separation scores for features used to build the inclusive redundancy (RD9) model. To identify features that may contribute to mis-predictions, feature values were compared between 1) nonredundant gene pairs predicted as nonredundant (NR/NR), 2) nonredundant pairs predicted as redundant (NR/RD9), and 3) redundant pairs predicted as redundant (RD9/RD9). Redundant pairs predicted as nonredundant (RD9/NR) were not included in this analysis due to the small sample size. Using the median value (Med) in each class/predicted class category, we calculated a normalized feature separation score as follows: (MedNR/RD9 - MedNR/NR)/(MedRD9/RD9-MedNR/NR). For each feature, the feature separation score represents the difference in feature values between correctly and incorrectly predicted nonredundant gene pairs, with a score of 0 meaning that correctly and incorrectly predicted pairs had the same feature values and a score of 1 meaning that incorrectly predicted pairs had the same feature values as redundant gene pairs. Close to 20% of the features had a separation score of 1. (B) Distribution of values for selected features among the three categories of actual and predicted redundancy described in (A). Horizontal bars indicate the median. “Min. dev. expr.” is the minimum number of tissues and developmental stages in which a gene in the pair is differentially expressed. “Recip. (max. b. expr. down)” is the reciprocal of the maximum number of biotic stress conditions in which one or both genes in the pair are downregulated. “Recip. (min. CpG root)” is the reciprocal of the minimum level of CpG methylation in root tissue for genes in the pair. “Recip. (diff. CpG sperm)” is the reciprocal of the difference in CpG methylation level in sperm cells for genes in the pair. These four features had a feature separation score close to 1 and had feature importance scores in the top 10 for the inclusive redundancy model, implicating them in mis-predictions. (C) Dimensions 1 and 2 of a PCA performed to identify features that were different between correctly and incorrectly predicted nonredundant pairs. Dimension 1 explains 18.1% of the variance and Dimension 2 explains 10.0% of the variance. The top 24 features contributing to Dimension 1 were related to CpG methylation levels (supplementary table S5, Supplementary Material online).