(A) A hypothetical situation in which the query protein A is fragmented into A1 and A2 by the inclusion of a match partner E that is only weakly similar (perhaps non-homologous) to part of A. The similarity graph shows strong links among the real homologs (A, B, C and D) and a weak link between A2 and E. Subsequent Markov clustering effectively cuts the weak link and results in one MACHOS containing both A1 and A2, which are then joined and treated as a single subsequence, correcting for the initial over-fragmentation. (B) The distributions of subsequence lengths, before and after subsequences are clustered into MACHOS at different granularities (indicated by coloured lines), are shown with a bin size of 10 residues. Although the underlying data are discrete points, we have represented the data as smooth coloured lines to aid visual analysis.