Table 5. Comparison of selected short sequence repeats (SSRs) in PRV strains Kaplan, Becker, and Bartha.
IDstrain, genome position a | Locationb | Kaplan# units | Becker # units | Bartha# units | Repeat unit consensus (unit length) c |
Intergenic: | |||||
SSRKa151 | left terminus | 3 | NF d | NF | TACCTGGCACCCTGCCAACCCCAATCCCCCTCC (33mer) |
SSRKa2093 | between ORF-1 & UL54 | 8.3 e | NF | 6 | GGGGAGATGGGGAGAGGAGAT (21 mer) |
SSRKa15795 | between UL46 & UL27 | 13.7 e | 59 e | 18.4 e | ACGGAGGGGAGAGGG (15 mer) |
SSRKa31884 | between UL35 & UL36 | 5.6 | 2.7 | 4.8 | CCCCAAGTCCCCCAATCC (18 mer) |
SSRKa62103 | between UL22 & OriL | 6.3 | NF | NF | CGCCCTCTCTCCCAC (15 mer) |
SSRKa62261 | between UL22 & OriL | 6.5 | 10.5 | 11.5 e | AAGGGGTCTCT (11 mer) |
SSRKa79207 | between UL11 & UL10 | 6.4 | 7.4 | 8.1 e | TGGGGGAGAGGA (12 mer) |
SSRKa94997 | between UL1 & EP0 | 17 | 18 | 15 | GGAGCA (6 mer) |
SSRBe100922 | between left edge IR & IE180 | 2.1 | 3 | 3 | CCCCCCCCCCCATTTGCATATGACCGCTTCCCCCGGACGTGACGCTCGGG (50 mer) |
SSRBe101633 | between left edge IR & IE180 | 3.3 | 3.1 | NF | GACCACCGGGACCACCAACACCGTCTACCTCCCACCAG (38 mer) |
SSRKa106596 | promoter f: IE180 | 3.2 | NF | 3.2 | CGGCCAATGGGATTTCTCTCGCCAACTTCCTCTCGCGTCTACTTTGCATGTCCGGCCCCCGCGGCGGCCATCTTGGCCCCTCGA (84 mer) |
SSRKa107138 | between IE180 & OriS (in IR) | 12.5 e | 4.1 | 8.8 | TGTGGTGGTCTCTGTGTTG (19 mer) |
SSRKa115377 | between US1 & edge of IR | 3.1 | NF | NF | GGGGAGTGGGATGGGGGTGGAGACGGTGGAGGGAGA (36 mer) |
SSRBe115911 | between US1 & edge of IR | NF | 20.6 e | NF | GGTGGAGGGAGAGGGGGAC (19 mer) |
SSRKa115550 | promoter: US3 | 9 | 3 | 10 | GGGGGAGTCC (10 mer) |
In coding sequences: | |||||
SSRBe33478 | UL36 | 1.4 | 5.2 | 5.4 | GGGGCCGGCCGCGAAGGTGGT (21 mer) |
SSRBa32980 | UL36 | 1.1 | NF | 3.1 | GGCCGGCCGCGAAGGTGGTGGGGCCGGCGGTGGTGC (36 mer) |
SSRKa57529 | UL25 | 3.2 | 3.2 | 3.2 | CCTCGGGCGCCTCCTCGGCGGCGCGCG (27 mer) |
SSRKa114728 | US1 | 72.8 | 63.8 | 80.2 | CGAGGA (6 mer) |
Repeats selected have a TRF alignment score ≥100 and/or VarScore ≥1, with a repeat unit length ≥6 and ≥3 repeat units. The PRV Kaplan genome was primarily used for repeat screening, with additional searches run on the other genomes to detect SSRs with high scores in Becker or Bartha but not Kaplan. SSR identifier (IDs) denote the strain name where the SSR was first detected (Ka, Kaplan; Be, Becker; or Ba, Bartha) and the start position on that genome. For clarity, only the IR copy of SSRs falling into the large IR/TR repeats is shown (see Table S7 for a full listing of all SSRs).
Boldface indicates SSRs previously noted in the initial description of the mosaic PRV genome [30].
Boldface indicates CTCF binding sites within these SSRs, as defined by Amelio et al. [104].
NF, not found. Indicates that a homologous repeat was not found in this strain or had diverged beyond detection.
CAPRE was used to estimate repeat unit length of these perfect SSRs. See Figure S1 (in Text S1) and Methods for details.
Promoter refers to sequences within 500 bp upstream of a start codon.