TABLE 2.
Integration sites analyzed and their similarities to known sequences
| Sequence namea | Length (bp)b | Dup seqc | Identified similaritiesd | Identified similarities truncated to 50 bpe |
|---|---|---|---|---|
| MolH 1 | 106 | ATGTC | *f | * |
| MolH 2 | 60 | CAAGC | * | * |
| SupH 1 | 156 | TCTTC | LINE-1 [2–153, SW = 508] | * |
| SupH 2 | 132 | GCTAC | * | * |
| SupH 3 | 91 | GGAAA | * | * |
| SupH 4 | 139 | GTGGT | * | * |
| SupH 5 | 140 | TATAT | * | * |
| SupH 6 | 114 | ATCCC | * | * |
| SupH 7 | 230 | GCATG | * | * |
| SupH 9 | 82 | CTATA | * | * |
| SupH 10 | 212 | TACAC | LINE-1 [2–107, SW = 251] | * |
| SupH 11 | 166 | CATGC | Alu [15–110, SW = 716] | Alu [SW = 304] |
| SupH 12 | 89 | GTTGG | * | * |
| SupH 13 | 63 | CTCAC | Transcription unit (cDNA) [5–62, P = 1.6 × 10−16] | Transcription unit (cDNA) [P = 1.9 × 10−12] |
| SupH 14 | 111 | GTCAC | * | * |
| SupH 15 | 164 | TATGG | LINE-1 [2–107, SW = 400] | * |
| SupH 16 | 66 | AACAG | * | * |
| SupH 17 | 54 | CTCAC | * | * |
| SupH 18 | 159 | GTTGT | * | * |
| SupH 20 | 342 | GTTTC | Alu [3–125, SW = 956] | Alu [SW = 373] |
| SupH 21 | 173 | CATAT | * | * |
| SupH 22 | 38 | CACAC | * | Excluded |
| SupH 23 | 258 | CATTC | * | * |
| SupH 24 | 110 | GTAAT | * | * |
| SupH 25 | 37 | CTTTT | * | Excluded |
| SupH 27 | 160 | CCATT | * | * |
| SupH 28 | 93 | AATAC | Transcription unit (cDNA) [1–93, P = 3.7 × 10−33] | Transcription unit (cDNA) [P = 1.5 × 10−13] |
| SupH 29 | 143 | GCCCA | * | * |
| SupH 31 | 188 | ATATT | * | * |
| SupH 32 | 157 | GTTGA | Transcription unit (cDNA) [59–157, P = 5.9 × 10−34] | * |
| SupH 33 | 50 | CTTCA | Transcription unit (VACH1 gene) [1–50, P = 6 × 10−13] | Transcription unit (VACH1 gene) [P = 6 × 10−13] |
| SupH 34 | 50 | AGTTG | * | * |
| SupH 35 | 420 | TTAAC | Transcription unit (cDNA) [52–143, P = 2.8 × 10−25]; LINE-2 [223–274, SW = 252] | * |
| SupH 36 | 237 | CTTGT | * | * |
| SupH 37 | 69 | CACAC | Alu [1–69, SW = 471] | Alu [SW = 371] |
| SupH 38 | 68 | GTTAT | * | * |
| SupH 39 | 89 | CAAAA | * | * |
| SupH 41 | 41 | ATGGC | * | Excluded |
| SupH 42 | 437 | AAAAC | LINE-1 [1–437, SW = 2684] | LINE-1 [SW = 264] |
| SupH 43 | 179 | ATAGT | Transcription unit (cDNA) [1–179, P = 9.4 × 10−65]; other repeat (LTR element) [98–152, SW = 198] | Transcription unit (cDNA) [P = 3.8 × 10−13] |
| SupH 44 | 337 | GAAAC | Other repeat (MIR, SINE) [191–315, SW = 493] | * |
| SupH 46 | 81 | GGGAG | Transcription unit (cDNA) [1–33, P = 3.9 × 10−6] | Transcription unit (cDNA) [P = 4.6 × 10−6] |
| SupH 47 | 111 | AAAAC | Transcription unit (cDNA) [1–57, P = 2.1 × 10−13] | Transcription unit (cDNA) [P = 2.2 × 10−9] |
| SupH 48 | 125 | CTGTG | Other repeat (MIR, SINE) [1–123, SW = 474] | Other repeat (MIR, SINE) [SW = 245] |
| SupH 49 | 260 | TTTTG | Alu [1–128, SW = 698] | Alu [SW = 300] |
| SupS 1 | 176 | GCAGG | Transcription unit (CD27 gene) [1–176, P = 2.7 × 1062] | Transcription unit (cDNA) [P = 5.4 × 10−13] |
| SupS 2 | 113 | GTTCT | * | * |
| SupS 3 | 125 | ATACC | Alu [4–115, SW = 540] | Alu [SW = 195] |
| SupS 4 | 215 | CCCTC | Other repeat (MER74, LTR element) [1–213, SW = 599] | Other repeat (MER74, LTR element) [SW = 277] |
| SupS 5 | 147 | CAGCA | * | * |
| SupS 7 | 171 | GAGTC | * | * |
| SupS 8 | 85 | TGAGT | Transcription unit (cDNA) [1–81, 3.2 × 10−26] | Transcription unit (cDNA) [P = 3.6 × 10−13] |
| SupS 9 | 86 | GTACC | * | * |
| SupS 10 | 52 | AAAGC | Alu [2–59, SW = 356] | Alu [SW = 310] |
| SupS 11 | 147 | CTAAC | * | * |
| SupS 12 | 131 | GTTTC | * | * |
| SupS 13 | 94 | ATGTG | Transcription unit (cDNA) [1–94, P = 5.1 × 10−28] | Transcription unit (cDNA) [P = 3.4 × 10−12] |
| SupS 14 | 184 | GAGAC | * | * |
| SupS 15 | 120 | AAATG | * | * |
| SupS 16 | 161 | CTCTG | * | * |
| SupS 17 | 215 | GTATG | * | * |
| Total bp | 8,809 | 2,900 | ||
| Avg | 144 | 50 |
Laboratory designation for each DNA clone.
Number of human DNA base pairs sequenced adjacent to the HIV cDNA terminus.
Nucleotide sequence of the 5 bp of human DNA at the junction with viral DNA expected to be duplicated upon integration.
Sequence similarities found by comparison to sequence databases (the first designation is the sequence class given in Table 3, the name in parentheses is a more detailed designation, and the numbers in brackets represent the location of the sequence match [e.g., 1 = the first cDNA-proximal base pair in host DNA] and the degree of similarity).
Similarities identified in the 50-bp sequence data set. For explanation of bracketed data, see footnote d.
*, anonymous.