Skip to main content
. 2013 Feb 23;30(6):1270–1280. doi: 10.1093/molbev/mst034

Table 2.

Real Data Sets Used for Testing CodonPhyML.

ID Protein-Coding Sequence Organism Range No. Taxa Sequence Length (Nucleotides) Phylog. Signala Average Branch Length RF Distanceb
Best-Fitting Modelc
Codon vs. AA Codon vs. NT AA vs. NT
R1 Caudal-like protein activation region (PF04731) Metazoa 8 567 0.42 0.38 0.00 0.15 0.15 LG + F + Γ
R2 PetM family of cytochrome b6f complex subunit 7 (PF08041) Bacteria and Eukaryota 12 135 0.26 0.39 0.10 0.67 0.67 WAG + F + Γ,
LG + F + Γ
R3 DKCLD (NUC011) domain (PF08068) Archaea and Eukaryota 18 177 0.24 0.24 0.42 0.67 0.67 LG + F + Γ
R4 6-phosphofructo-2-kinase (PF01591) Eukaryota 11 726 0.17 0.57 0.00 0.32 0.32 LG + F + Γ
R5 Intermediate filament head (DNA binding) region (PF04732) Eukaryota 30 315 0.38 0.46 0.42 0.39 0.49 WAG + F + Γ,
LG + F + Γ
R6 Zinc finger, ZZ type (PF00569) Eukaryota 8 138 0.24 0.62 0.15 0.77 0.77 WAG + F,
WAG + F + Γ
R7 Protein of unknown function (PF08004) Archaea 6 393 0.13 1.29 0.00 0.00 0.00 ECM07 + ωM0 + κ + Γ
R8 Repeated sequence found in lipoprotein LPP (PF04728) Bacteria 9 33 0.41 0.32 0.53 0.13 0.53 ECM07 + ωM0 + κ + Γ, ECM07 + ωM5 + κ + Γ
R9 Myogenic basic domain (PF01586) Metazoa 6 345 0.58 0.46 0.00 0.00 0.00 WAG + F + Γ,
LG + F + Γ
R10 7 transmembrane receptor, rhodopsin family (PF00001) Metazoa 64 819 0.23 0.61 0.58 0.45 0.64 LG + F + Γ
R11 Homeobox domain (PF00046) Eukaryota 179 174 0.34 0.24 0.62 0.61 0.73 LG + Γ
R12 Protein of unknown function (PF01973) Archaea and Bacteria 23 522 0.30 0.47 0.42 0.23 0.42 WAG + F + Γ,
LG + F + Γ
R13 EPH receptor A4 Mammalia 21 3,141 0.18 0.09 0.51 0.05 0.56 M0 + Γ
R14 Transcription factor 20 Mammalia 21 6,081 0.21 0.09 0.41 0.10 0.41 M3
R15 WD repeat domain 23 Mammalia 21 1,677 0.20 0.07 0.51 0.31 0.56 M3
R16 Tu translation elongation factor, mitochondrial Mammalia 21 1,377 0.22 0.11 0.51 0.10 0.46 M3
R17 Zinc finger protein 641 Mammalia 21 1,323 0.21 0.10 0.62 0.10 0.62 M3
R18 Nucleoporin like 2 Mammalia 21 1,380 0.22 0.15 0.26 0.10 0.31 M5
R19 Gm527 Mammalia 21 930 0.16 0.05 0.72 0.31 0.72 M0 + Γ
R20 Integrin β11 binding protein 1 Mammalia 21 600 0.27 0.11 0.87 0.26 0.87 M3
R21 GALA (type III effectors)d Bacterium 426 81 0.84 0.16 0.41 0.30 0.39 ECM07 + ωM5 + κ
R22 Lady bird early (lbe)e Drosophila 73 429 0.37 0.005 0.97 0.45 0.97 M0 + Γ
R23 Lady bird early (lbl)e Drosophila 72 420 0.31 0.002 0.45 0.38 0.10 M0 + Γ

Note.—Data sets R1–R12 are from PANDIT (Whelan et al. 2006); detailed annotations available from PANDITplus (Dimitrieva and Anisimova 2010); in parentheses shown are their respective Pfam IDs (Punta et al. 2012). Data sets R13–R20 are OMA orthologs (Altenhoff et al. 2011).

aThe phylogenetic signal is the proportion of the total tree length that is taken up by internal branches (Phillips et al. 2001).

bPairwise normalized RF distance between phylogenies inferred with best-fitting codon, amino acid (AA), and nucleotide (NT) models.

cWithin two units to the minimum AICc.