Skip to main content
. 2020 Feb 3;86(4):e02051-19. doi: 10.1128/AEM.02051-19

TABLE 1.

Predictive performance of the statistical fitness when different groups of homologous sequences are used in the starting MSA

Designation Length (no. of aa)a No. of sequences Effective no. of sequencesb (±SD) AUC (±SE)c
Epistatic model Independent model
Alllax-29k 141–199 29,498 6,352 0.894 0.807
Allstringent-23k 155–186 23,176 4,809 0.856 0.857
Alllax-3k 3,037 1,420 ± 33 0.815 ± 0.013 0.744 ± 0.026
Bacterialax-27k 137–200 26,950 5,565 0.868 0.888
Bacteriastringent-23k 149–188 23,194 4,595 0.851 0.871
Bacterialax-3k 3,037 1,309 ± 39 0.839 ± 0.006 0.782 ± 0.013
Firmicuteslax-3k 133–201 3,037 940 0.852 0.795
Firmicutesstringent-2k 163–192 2,007 600 0.840 0.830
Firmicutes+alllax-3k 3,037 1,344 ± 34 0.856 ± 0.005 0.818 ± 0.014
Firmicutes+Bacterialax-3k 3,037 1,317 ± 30 0.851 ± 0.004 0.814 ± 0.014
Firmicutes+nonbacterialax-2k 5,585 0.865 0.799
a

The acceptable amino acid lengths across the six initial groupings.

b

The effective number is the sum of the inverse of the neighborhood size of each sequence, where the neighborhood is defined as the number of sequences within 80% identity. SD, standard deviation.

c

Results are presented as mean values from 20 subsamplings from the parent group(s). SE, standard error.