. 2007 Mar 21;81(11):5617–5627. doi: 10.1128/JVI.01405-06

TABLE 2.

Consolidated data obtained for the consensus motifs from the databases of sequences flanking the integration sites and randomly picked human sequences of the same length^a

Motif	Sequence	Width (bases)	No. of sites	Avg no. of occurrences
1	GGCGCGCGCCTGTAATCCCAGCACCTCGGGAGGCCGAGGCGGGGGGATCA	50	500	1.17
2	CCCCGGGTGGCGGGGATTGCAGGGATCTGCGATCACGCCAAGC	43	500	1.17
3	CCAGCCTGGGCAACAGAGTGAGACCCCGTCT	31	461	1.07
4	TGCCTCAGCCTCCCAAATAGCTGGGATTACAGGCGTGAGCCACCACGCCC	50	450	0.99
5	AGACCAGCCTGGGCAACATAGTGAAACCCCGTCTCTACAAAAAAAAAAAA	50	450	0.99
6	GCAGTGGCGCGATCTCGGCTCACTGCAACCTCCGCCTCCCGGGTTCAAGC	50	348	0.77

HIV-1 integration sequences were downloaded from the public nucleotide database as reported by Schröder et al. (47) and analyzed by MEME (3). Consensuses were calculated according to the type of sequence, the number of sequences in the set of data, the weight assigned to each sequence (=1), the minimum width of a consensus (5 bp), the maximum width of a consensus (50 bp), the number of times a consensus is expected to be present in a single sequence (zero, one, or more than one time per sequence), and finally the number of sequences found in the total set of data. A position-specific probability matrix was then plotted, and the consensus sequence was determined accordingly and is presented in bold in rows 1 through 3. A similar analysis was also carried out for sequences picked randomly from the human genome such that the length of each sequence was 2,000 bp. The results are given in rows 4 through 6. The motifs obtained in the sequences flanking integration sites are significantly different from those obtained from the randomly picked sequences from the human genome. The average number of occurrences of the given motif per sequence should be noted. The average number of occurrences was obtained by dividing the total number of occurrences by 429 (for rows 1 through 3) or 452 (for rows 4 to 6), the total number of sequences used for analysis.