Skip to main content
. 2022 Apr 8;13:1914. doi: 10.1038/s41467-022-29443-w

Fig. 6. The effect of alignment preprocessing on the ability of representations to reliably decode back to protein sequences.

Fig. 6

Box plots (median, upper/lower quartiles, 1.5 interquartile range) show the distribution of reconstruction accuracies across the two subclasses of β-lactamase (A1: n = 14 sequences, A2: n = 13 sequences). Query-centric denotes an alignment where columns in the alignment have been removed if they contain a gap in the query sequence of interest. Reweighted refers to the standard practice of reweighting protein sequences based on similarity to other sequences. All four cases contain the same protein sequences. A2 sequences are seen to have substantially worse representations when alignments are focused on a query from the A1 class.