Table 2.
Detection sample obtained with TRF with different alignment weights, Sputnik with different mismatch penalty, and Mreps with different resolution, in the human X chromosome.
start | end | divergence | motif | sequence | ||
TRF | ||||||
alignment scores | ||||||
2,7,7 | 304646 | 304658 | 0 | CTCTC | CTCTCCTCTCCTC | |
304696 | 304713 | 5.55 | TCCTC | TCCTCTTCTCTCCTCTCC | ||
305863 | 305872 | 0 | CCTTC | CCTTCCCTTC | ||
2,5,7 | c | 304646 | 304713 | 18.3099 | TCTCC | CTCTCCTCTCCTCCTTCTCCGCTCCCTGCACTGCCCTCCGCTCCCTCCGGTCCTCTTCTCTCCTCTCC |
305863 | 305872 | 0 | TTCCC | CCTTCCCTTC | ||
2,5,5 | 304646 | 304713 | 18.0556 | TCTCC | CTCTCCTCTCCTCCTTCTCCGCTCCCTGCACTGCCCTCCGCTCCCTCCGGTCCTCTTCTCTCCTCTCC | |
e | 305836 | 305872 | 17.9487 | TTCCC | CCCTCTCCACTTCCTTCTCTTCCACCTCCTTCCCTTC | |
2,3,5 | e | 304643 | 304713 | 18.9189 | CTCCT | CTGCTCTCCTCTCCTCCTTCTCCGCTCCCTGCACTGCCCTCCGCTCCCTCCGGTCCTCTTCTCTCCTCTCC |
n | 305765 | 305800 | 25.641 | CCA | CCACACCACCTCTGACGCCCACCACAGCCCCCCACC | |
305836 | 305872 | 17.9487 | CCCTT | CCCTCTCCACTTCCTTCTCTTCCACCTCCTTCCCTTC | ||
Sputnik | ||||||
mismatch penalty | ||||||
-10 | 552928 | 552935 | 0 | AG | GAGAGAGA | |
552939 | 552948 | 0 | AG | GAGAGAGAGA | ||
552954 | 552963 | 0 | AAGAG | AAGAGAAGAG | ||
552964 | 552975 | 0 | AG | AGAGAGAGAGAG | ||
-6 | 552928 | 552935 | 0 | AG | GAGAGAGA | |
552939 | 552948 | 0 | AG | GAGAGAGAGA | ||
c | 552954 | 552975 | 9.09 | AAGAG | AAGAGAAGAGAGAGAGAGAGAG | |
-5 | c | 552928 | 552948 | 9.52 | AG | GAGAGAGAAAGGAGAGAGAGA |
552954 | 552975 | 9.09 | AAGAG | AAGAGAAGAGAGAGAGAGAGAG | ||
Mreps | ||||||
resolution | ||||||
1 | 119591 | 119610 | 20 | AAT | ACAAAAAATAATAATTATAA | |
119611 | 119628 | 5.56 | AAAAAT | ATAAATAAAAATAAAAAT | ||
2 | e | 119591 | 119615 | 24 | AAT | ACAAAAAATAATAATTATAAATAAA |
119611 | 119628 | 5.56 | AAAAAT | ATAAATAAAAATAAAAAT | ||
3 | c | 119591 | 119638 | 33.33 | A | ACAAAAAATAATAATTATAAATAAATAAAAATAAAAATTCAACTGTAA |
6 | e | 119590 | 119638 | 34.69 | A | TACAAAAAATAATAATTATAAATAAATAAAAATAAAAATTCAACTGTAA |
Threshold alignment score of TRF was set to 20 and alignment weights varied from {2,7,7} to {2,3,5}. Sputnik mismatch penalty was set to -10, -6, and -5. Mreps resolution value varied from 1 to 6. For each detection, we report the start/end positions, divergence from a pure repeat, motif and actual sequence. Variation of detection when reducing weights is as follows: n: newly detected sequence; e: enlargement of a previous sequence; c: concatenation of previous sequences. New nucleotides detected by enlarging or concatenating previous sequences are underlined. The sequence at position 305765 is an example of a microsatellite detected at low values of alignment weights of TRF. It cannot be detected with alignment weights down to {2,3,5} because correct match bonuses cannot compensate for imperfection penalties. Reducing alignment weights may also enlarge detections, as shown for alignment weights {2,5,5} at position 305836. A succession of close errors (in boldface) decreases the alignment score, which falls under the threshold score for weight values larger than {2,5,5}. Reducing alignment weights also provokes concatenation, when an enlarged tandem repeat overlaps with one of its neighbors. At position 304696, two substitutions (in boldface), stops detection when alignment weights are set to {2,7,7}. With a smaller substitution penalty (5 or less), the detection is enlarged up to position 304646 and overlaps with the other detection. Reducing Sputnik mismatch penalty allows detection of larger microsatellites, by concatenating shorter, perfect ones. The two detections at position 552928 and 552939 are concatenated with a mismatch penalty of -5, because the penalty induced by two errors at position 552936 and 552938 are compensated by the second detection. A second concatenation occurs at position 552964 with a mismatch of -6. The two merged detections are not of the same motif, but the two errors induced by this difference are compensated by the matching bases with low values of mismatch penalty.
A larger resolution value for Mreps enlarges already-detected tandem repeats. In the first part of the tandem repeat at position 119591, adjacent repeats are separated by at most one error, and this part is detected at resolution 1; however repeats TAT and AAA are separated by two errors, so the second part can only be found at resolution 2 or higher. Finally, increasing resolution provokes concatenation. Detections for resolution 2 at positions 119591 and 199611 are enlarged when resolution is 3; both periods are reduced to 1 (see explanations in Methods), and the two sequences are merged.