Skip to main content
. 2022 Nov 15;25(12):105594. doi: 10.1016/j.isci.2022.105594

Figure 2.

Figure 2

Recoding the data improves accuracy when the model fails to fit the amino acid alignment

(A) Accuracy of amino acids and recoded data as models that can account for more across-sites compositional heterogeneity are used.

(B) Table summarizing the Total Accuracy (TA) for amino acids and recoded data under each model. TA is calculated (from the values in A) as the percentage of accurate trees (see STAR methods) under both Porifera- and Ctenophora-sister.

(C) Change in the fit (expressed as Z-scores) of the model to the data (estimated using PPA-Div) as models that can account for more across-sites compositional heterogeneity are used. In Orange amino acid datasets; in Blue recoded datasets.

(D) Correlation between the difference in Z-scores achieved by each considered model on the amino acid and recoded datasets (δPPA-Div), against the difference in TA achieved before and after recoding (δTA). See Figures S4–S7 for sensitivity tests showing that our conclusions would not have changed if we used Maximum Likelihood instead of Bayesian analyses, if we run our Bayesian analyses 8,000 more generations (convergence was achieved before 1000 cycles), if we used a more stringent threshold to define success (PP = 0.95 instead of PP = 0.5), and if we used alternative recoding schemes (SR6 and KGB6 see STAR methods for details) instead of Dayhoff-6.