TABLE 2.
Motif | Sequence | Width (bases) | No. of sites | Avg no. of occurrences |
---|---|---|---|---|
1 | GGCGCGCGCCTGTAATCCCAGCACCTCGGGAGGCCGAGGCGGGGGGATCA | 50 | 500 | 1.17 |
2 | CCCCGGGTGGCGGGGATTGCAGGGATCTGCGATCACGCCAAGC | 43 | 500 | 1.17 |
3 | CCAGCCTGGGCAACAGAGTGAGACCCCGTCT | 31 | 461 | 1.07 |
4 | TGCCTCAGCCTCCCAAATAGCTGGGATTACAGGCGTGAGCCACCACGCCC | 50 | 450 | 0.99 |
5 | AGACCAGCCTGGGCAACATAGTGAAACCCCGTCTCTACAAAAAAAAAAAA | 50 | 450 | 0.99 |
6 | GCAGTGGCGCGATCTCGGCTCACTGCAACCTCCGCCTCCCGGGTTCAAGC | 50 | 348 | 0.77 |
HIV-1 integration sequences were downloaded from the public nucleotide database as reported by Schröder et al. (47) and analyzed by MEME (3). Consensuses were calculated according to the type of sequence, the number of sequences in the set of data, the weight assigned to each sequence (=1), the minimum width of a consensus (5 bp), the maximum width of a consensus (50 bp), the number of times a consensus is expected to be present in a single sequence (zero, one, or more than one time per sequence), and finally the number of sequences found in the total set of data. A position-specific probability matrix was then plotted, and the consensus sequence was determined accordingly and is presented in bold in rows 1 through 3. A similar analysis was also carried out for sequences picked randomly from the human genome such that the length of each sequence was 2,000 bp. The results are given in rows 4 through 6. The motifs obtained in the sequences flanking integration sites are significantly different from those obtained from the randomly picked sequences from the human genome. The average number of occurrences of the given motif per sequence should be noted. The average number of occurrences was obtained by dividing the total number of occurrences by 429 (for rows 1 through 3) or 452 (for rows 4 to 6), the total number of sequences used for analysis.