Skip to main content
. 2019 Jun 18;36(10):2340–2351. doi: 10.1093/molbev/msz142

Fig. 1.

Fig. 1.

A schematic of our graph-based filtering method. On the left is the original MSA. First, the unaligned sequences are used to calculate PPs of shared homology between pairs of residues using a pair-HMM. Our method then considers each column in turn (middle left), with the residues arranged in a circular ordering and the gaps are set aside. The darkness of the lines linking the residues represent relative the PPs of residues being homologous. The aim of our method is to break that residue graph apart in some meaningful manner. We use a heuristic to an agglomerative clustering approach with an appropriate cut off to identify the groups of putatively homologous residues (middle right). Based on these homologous clusters, we propose two filtering schemes. The first is partial filtering (bottom right), which selects the largest cluster of residues and filters the remaining residues, resulting in a partial column in the alignment. The second is divvying (top right), which splits each cluster into its own new column in the MSA. We insert a “static” character “=” for sequences with a known residue in another column to represent missing data not arising as the result of an insertion of deletion event. For partial filtering and divvying, the set aside gaps are restored to the MSA in all relevant columns.