Table 2.
Real Data Sets Used for Testing CodonPhyML.
| ID | Protein-Coding Sequence | Organism Range | No. Taxa | Sequence Length (Nucleotides) | Phylog. Signala | Average Branch Length | RF Distanceb |
Best-Fitting Modelc | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Codon vs. AA | Codon vs. NT | AA vs. NT | ||||||||
| R1 | Caudal-like protein activation region (PF04731) | Metazoa | 8 | 567 | 0.42 | 0.38 | 0.00 | 0.15 | 0.15 | LG + F + Γ |
| R2 | PetM family of cytochrome b6f complex subunit 7 (PF08041) | Bacteria and Eukaryota | 12 | 135 | 0.26 | 0.39 | 0.10 | 0.67 | 0.67 | WAG + F + Γ, |
| LG + F + Γ | ||||||||||
| R3 | DKCLD (NUC011) domain (PF08068) | Archaea and Eukaryota | 18 | 177 | 0.24 | 0.24 | 0.42 | 0.67 | 0.67 | LG + F + Γ |
| R4 | 6-phosphofructo-2-kinase (PF01591) | Eukaryota | 11 | 726 | 0.17 | 0.57 | 0.00 | 0.32 | 0.32 | LG + F + Γ |
| R5 | Intermediate filament head (DNA binding) region (PF04732) | Eukaryota | 30 | 315 | 0.38 | 0.46 | 0.42 | 0.39 | 0.49 | WAG + F + Γ, |
| LG + F + Γ | ||||||||||
| R6 | Zinc finger, ZZ type (PF00569) | Eukaryota | 8 | 138 | 0.24 | 0.62 | 0.15 | 0.77 | 0.77 | WAG + F, |
| WAG + F + Γ | ||||||||||
| R7 | Protein of unknown function (PF08004) | Archaea | 6 | 393 | 0.13 | 1.29 | 0.00 | 0.00 | 0.00 | ECM07 + ωM0 + κ + Γ |
| R8 | Repeated sequence found in lipoprotein LPP (PF04728) | Bacteria | 9 | 33 | 0.41 | 0.32 | 0.53 | 0.13 | 0.53 | ECM07 + ωM0 + κ + Γ, ECM07 + ωM5 + κ + Γ |
| R9 | Myogenic basic domain (PF01586) | Metazoa | 6 | 345 | 0.58 | 0.46 | 0.00 | 0.00 | 0.00 | WAG + F + Γ, |
| LG + F + Γ | ||||||||||
| R10 | 7 transmembrane receptor, rhodopsin family (PF00001) | Metazoa | 64 | 819 | 0.23 | 0.61 | 0.58 | 0.45 | 0.64 | LG + F + Γ |
| R11 | Homeobox domain (PF00046) | Eukaryota | 179 | 174 | 0.34 | 0.24 | 0.62 | 0.61 | 0.73 | LG + Γ |
| R12 | Protein of unknown function (PF01973) | Archaea and Bacteria | 23 | 522 | 0.30 | 0.47 | 0.42 | 0.23 | 0.42 | WAG + F + Γ, |
| LG + F + Γ | ||||||||||
| R13 | EPH receptor A4 | Mammalia | 21 | 3,141 | 0.18 | 0.09 | 0.51 | 0.05 | 0.56 | M0 + Γ |
| R14 | Transcription factor 20 | Mammalia | 21 | 6,081 | 0.21 | 0.09 | 0.41 | 0.10 | 0.41 | M3 |
| R15 | WD repeat domain 23 | Mammalia | 21 | 1,677 | 0.20 | 0.07 | 0.51 | 0.31 | 0.56 | M3 |
| R16 | Tu translation elongation factor, mitochondrial | Mammalia | 21 | 1,377 | 0.22 | 0.11 | 0.51 | 0.10 | 0.46 | M3 |
| R17 | Zinc finger protein 641 | Mammalia | 21 | 1,323 | 0.21 | 0.10 | 0.62 | 0.10 | 0.62 | M3 |
| R18 | Nucleoporin like 2 | Mammalia | 21 | 1,380 | 0.22 | 0.15 | 0.26 | 0.10 | 0.31 | M5 |
| R19 | Gm527 | Mammalia | 21 | 930 | 0.16 | 0.05 | 0.72 | 0.31 | 0.72 | M0 + Γ |
| R20 | Integrin β11 binding protein 1 | Mammalia | 21 | 600 | 0.27 | 0.11 | 0.87 | 0.26 | 0.87 | M3 |
| R21 | GALA (type III effectors)d | Bacterium | 426 | 81 | 0.84 | 0.16 | 0.41 | 0.30 | 0.39 | ECM07 + ωM5 + κ |
| R22 | Lady bird early (lbe)e | Drosophila | 73 | 429 | 0.37 | 0.005 | 0.97 | 0.45 | 0.97 | M0 + Γ |
| R23 | Lady bird early (lbl)e | Drosophila | 72 | 420 | 0.31 | 0.002 | 0.45 | 0.38 | 0.10 | M0 + Γ |
Note.—Data sets R1–R12 are from PANDIT (Whelan et al. 2006); detailed annotations available from PANDITplus (Dimitrieva and Anisimova 2010); in parentheses shown are their respective Pfam IDs (Punta et al. 2012). Data sets R13–R20 are OMA orthologs (Altenhoff et al. 2011).
aThe phylogenetic signal is the proportion of the total tree length that is taken up by internal branches (Phillips et al. 2001).
bPairwise normalized RF distance between phylogenies inferred with best-fitting codon, amino acid (AA), and nucleotide (NT) models.
cWithin two units to the minimum AICc.
dData from Kajava et al. (2008).
eData from Balakirev et al. (2011).