Figure 1.
Recoding data
The six-bin Dayhoff-6 recoding scheme,33 see STAR methods, is the most widely used recoding strategy, and is the one primarily tested in this study. Dayhoff-6 recoding partitions amino acids into six differently sized bins (see STAR methods), based on how frequently they are expected to exchange with each other.
(A) The bins of the Dayhoff-6 scheme and the biochemical properties of the amino acids in each bin.
(B) An exemplar amino acid dataset and its Dayhoff-6 recoded representation. Dayhoff-6 recoding is achieved by replacing, in multiple sequence alignments, one letter amino acid codes with one letter codes representing the bin where the considered amino acid is clustered.
