TABLE 1.
Characteristics of HIV-1 integration sites in infected resting CD4+ T cells
Chromosome | Patient identification no. | Junctional sequencea | Chromosome locus | Host nt at the junctionb | Integration sitec | Host gene | Description | Orientationd | nt from host transcription start sitee | Host gene expressionf
|
|
---|---|---|---|---|---|---|---|---|---|---|---|
Resting | With PHA | ||||||||||
1 | 21 | AAGGTGTTCC | 1p36.23 | 8158129 | I | RERE | Arg-Glu dipeptide (RE) repeats | − | 428941 (0.92) | + | + |
21 | AAGGTTGCCA | 1p36.22 | 9744195 | I | NMNAT1 | Nicotinamide adenylyltransferase | + | 31162 (0.78) | + | + | |
149 | AAGGTATGAG | 1p36.11 | 23360670 | I | FUSIP1 | FUS-interacting protein 1 | − | 6447 (0.46) | + | + | |
22 | AAGGTCCAAC | 1p34.3 | 35378363 | I | PKD1-like | Polycystic kidney disease 1 related | + | 72287 (0.59) | + | + | |
21 | AAGGTGAGGG | 1p34.3 | 35405894 | I | PKD1-like | Polycystic kidney disease 1 related | − | 44772 (0.36) | + | + | |
150 | AAGGTCTATA | 1p31.3 | 64357286 | I | KIAA1573 | Unknown function | − | 15771 (0.08) | + | + | |
144 | AAGGTCTAAT | 1q21.3 | 151082976 | I | p66beta | Transcription repressor p66 beta | − | 29040 (0.25) | + | + | |
153 | AAGGTTCTCC | 1q22 | 151774696 | I | ADAR | Adenosine deaminase, RNA isoform | + | 22765 (0.87) | + | + | |
143 | AAGGTCAGAA | 1q32.1 | 196246333 | Int | |||||||
2 | 153 | AAGGTGGTTC | 2p23.2 | 28443777 | I | BRE | Brain and reproductive organ expressed | + | 355646 (0.79) | + | + |
153 | AAGGTGAGTC | 2q37.3 | 241972574 | I | Predictedg | − | 22723 (0.89) | − | − | ||
3 | 21 | AAGGTTAATC | 3p24.1 | 30695399 | I | TGFBR2 | Transforming growth factor β receptor II | − | 72396 (0.85) | + | + |
21 | AAGGTTACAC | 3p21.31 | 49514417 | I | DAG1 | Dystroglycan 1 precursor | + | 47814 (0.73) | + | + | |
21 | AAGGTAGAGG | 3q29 | 198226821 | I | DLG1 | Synapse-associated protein 97 | − | 125147 (0.49) | + | + | |
4 | 21 | AAGGTATATG | 4p14 | 38495952 | I | Predictedg | − | 37220 (0.69) | − | − | |
21 | AAGGTCAAAA | 4q25 | 111007127 | I | FLJ20647 | Hypothetical protein | − | 66890 (0.52) | + | + | |
5 | 103 | AAGGTTGACG | 5q11.2 | 54977619 | I | FLJ90709 | Hypothetical protein | − | 46361 (0.54) | + | + |
21 | AAGGTTATTC | 5q15 | 96212730 | E | ARTS-1 | Aminopeptidase regulator of shedding | + | 4985 (0.11) | + | + | |
142 | AAGGTGCATA | 5q23.1 | 121481991 | E | LOX | Lysyl oxidase preproprotein | − | 8154 (0.68) | − | − | |
21 | AAGGTTATGT | 5q23.2 | 127559778 | I | SLC12A2 | Solute carrier family 12 | − | 64079 (0.62) | + | + | |
22 | AAGGTCGTAC | 5q31.3 | 139878769 | I | FLJ20288 | Hypothetical protein | − | 68775 (0.47) | + | + | |
6 | 21 | AAGGTAATTG | 6p25.1 | 5551277 | I | FARS1 | Phenylalanine-tRNA synthetase | + | 237482 (0.59) | + | + |
144 | AAGGTCAGAA | 6p21.33 | 30612175 | I | Predictedg | + | 48696 (0.89) | + | + | ||
144 | AAGGTGATGG | 6p21.32 | 32940155 | I | TAP2 | TAP2 | + | 203049 (0.68) | + | + | |
79 | AAGGTGACAC | 6p21.31 | 35693844 | I | FKBP51 | FK506-binding protein 51 | + | 9725 (0.09) | + | + | |
21 | AAGGTTGTAA | 6p21.2 | 38617237 | I | BTBD9 | KIAA1880 protein | − | 37208 (0.08) | + | + | |
21 | AAGGTTTTTG | 6q23.3 | 135304182 | I | HBS1L | HBS1-like | − | 52410 (0.55) | + | + | |
21 | AAGGTGATCC | 6q23.3 | 135501140 | I | MYB | v-myb myeloblastosis homolog | + | 18030 (0.48) | + | + | |
22 | AAGGTTGATA | 6q23.3 | 135792304 | I | FLJ20069 | Hypothetical protein | + | 7149 (0.03) | − | − | |
7 | 21 | AAGGTAATAT | 7q22.3 | 106071860 | I | PIK3CG | Phosphatidylinositol 3-kinase, catalytic, gamma | + | 5405 (0.13) | + | + |
8 | 21 | AAGGTGGGTA | 8q24.3 | 144830024 | E | hmRNAh | + | 935 (0.28) | + | + | |
9 | 22 | AAGCTGCAGC | 9q21.2 | 75991003 | I | GNAQ | GTP-binding protein, q protein | + | 112487 (0.36) | + | + |
22 | AAGGTTACCG | 9q32 | 112854741 | I | Predictedg | + | 46833 (0.43) | − | − | ||
10 | 21 | AAGGTAAGGA | 10q25.1 | 111475750 | I | ADD3 | Adducin 3 isoform a | − | 45357 (0.35) | − | − |
11 | 21 | AAGGTAAAAA | 11p15.4 | 9744195 | I | NAP1L4 | Nucleosome assembly protein | − | 7800 (0.16) | + | + |
21 | AAGGTCACAG | 11q13.1 | 64319021 | I | SF1 | Splicing factor 1 | − | 2574 (0.18) | + | + | |
21 | AAGGTATTAT | 11q13.4 | 71857871 | I | SKD3 | Suppressor of K transport defect 3 | + | 14005 (0.10) | + | + | |
12 | 143 | AAGGTCGATT | 12p13.32 | 3825992 | I | c12orf6 | Chromosome 12 open reading frame 6 | + | 26809 (0.44) | + | + |
152 | AAGGTCAAAG | 12p11.21 | 31748771 | I | hmRNAh | + | 24508 (0.42) | − | + | ||
150 | AAGGTGAAAC | 12q13.12 | 49358494 | I | FLJ34278 | Hypothetical protein | − | 173459 (0.99) | + | + | |
151 | AAGGTGTCGA | 12q13.13 | 52922906 | I | CBX5 | Heterochromatin protein 1 | + | 16679 (0.93) | + | + | |
145 | AAGGTAATGG | 12q22 | 91298430 | I | Predictedg | + | 47946 (0.32) | − | − | ||
15 | 146 | AAGGTCACTC | 15q22.2 | 57090553 | I | RNF111 | Ring finger protein 111 | − | 94632 (0.87) | + | + |
21 | AAGGTGAATC | 15q23 | 66135863 | I | PIAS1 | Inhibitor of activated STAT | + | 73473 (0.55) | + | + | |
153 | AAGGTTATAG | 15q25.3 | 83919694 | I | AKAP13 | A-kinase (PRKA) anchor protein 13 | − | 266055 (0.72) | + | + | |
16 | 153 | AAGGTTTTTC | 16p11.2 | 29719156 | I | hmRNAh | + | 522325 (0.81) | − | − | |
21 | AAGGTTCCGA | 16p11.2 | 29762139 | I | hmRNAh | − | 479342 (0.74) | − | − | ||
21 | AAGGTGAGAA | 16p11.2 | 29847207 | I | KIF22 | Kinesin family member 22 | − | 7512 (0.51) | + | + | |
21 | AAGGTCATTG | 16q13 | 57040912 | I | hmRNAh | + | |||||
79 | GAGGTTTTAG | 16q22.1 | 67956187 | I | NFATc3 | Nuclear factor of activated T cells | + | 60576 (0.43) | + | + | |
17 | 21 | AAGGTGTCAG | 17p13.3 | 2435251 | E | FLJ10543 | Hypothetical protein | − | 10740 (0.80) | + | + |
21 | AAGGTGAAAG | 17p11.2 | 16262774 | I | NCoR1 | Nuclear receptor corepressor 1 | − | 56637 (0.31) | + | + | |
21 | AAGGTGGAGG | 17p11.2 | 20012102 | I | AKAP10 | A-kinase anchor protein 10 | + | 31056 (0.43) | + | + | |
150 | AAGGTGGACC | 17p21.2 | 38936487 | I | TOP2A | DNA topoisomerase II alpha | + | 10739 (0.38) | − | + | |
103 | AAGGTGGTTT | 17q21.31 | 42202698 | I | MEOX1 | Mesenchyme homeobox 1 | + | 11398 (0.54) | − | + | |
21 | AAGGTGAAAG | 17q25.1 | 73929654 | I | GRB2 | Growth factor receptor-bound 2 | − | 57225 (0.78) | + | + | |
21 | AAGGTCACTT | 17q25.1 | 74812346 | I | hmRNAh,i | − | + | + | |||
153 | AAGGTGAAGG | 17q25.1 | 74938954 | I | PRPSAP1 | PRPP synthetase associated | − | 8100 (0.19) | + | + | |
153 | AAGGTGACTG | 17q25.3 | 76515866 | I | EVER1 | Epidermodysplasia verruciformis 1 | + | 206035 (0.55) | + | + | |
21 | AAGGTCAGGC | 17q25.3 | 80511487 | Int | LINE (L1) | ||||||
20 | AAGGTGGATC | 17q25.3 | 81407969 | I | TBCD | β-Tubulin cofactor D | − | 19195 (0.10) | + | + | |
19 | 99 | AAGGTCTCTA | 19p13.2 | 10153709 | I | DNMT1 | DNA methyltransferase 1 | + | 13102 (0.21) | + | + |
21 | AAAGTGATCC | 19p13.2 | 10309206 | I | ICAM3 | Intercellular adhesion molecule 3 | + | 2094 (0.36) | + | + | |
21 | AAGGTTATTC | 19p13.11 | 17800839 | I | JAK3 | Janus kinase 3 | − | 15443 (0.84) | + | + | |
22 | AAGCTGCAGT | 19q13.11 | 39535086 | I | KIAA0355 | Hypothetical protein | + | 97769 (0.97) | + | + | |
21 | AAGGTGGGTC | 19q13.12 | 40889171 | Int | |||||||
152 | AAGGTCTACC | 19q13.33 | 55192036 | I | VRK3 | Vaccinia virus-related kinase 3 | − | 28412 (0.58) | + | + | |
21 | AAGGTGAGTT | 19q13.43 | 61779316 | I | Predictedg | − | 9227 (0.17) | + | + | ||
20 | 153 | AAGGTGAGGA | 20q13.13 | 50223866 | I | ADNP | Activity-dependent neuroprotector | − | 9083 (0.22) | + | − |
22 | 22 | AAGGTCAATT | 22q11.21 | 19609567 | I | CRKL | v-crk oncogene homolog | + | 13299 (0.40) | + | + |
153 | AAGGTGAGGA | 22q13.1 | 35834266 | Int | LTR (endogenous retrovirus 1) | ||||||
21 | AAGGTTAGTG | 22q13.1 | 36501308 | I | EIF3S6IP | Eukaryotic translation initiation factor 3 | − | 12844 (0.33) | + | + | |
22 | AAGGTCATGA | 22q13.33 | 49112299 | Int | Alu | ||||||
X | 22 | AAGGTCTAAA | Xp11.3 | 41971993 | I | Predictedg | + | 75610 (0.48) | − | − |
Junction between the 5′ end of the HIV-1 LTR (first five letters) and the host cell DNA.
The host nucleotide number at the junction was determined by using the UCSC Bioinformatics Human Genome Database (July 2003 assembly freeze). Bold type indicates clusters of two integration events within a 1-Mb window (P, 3.9 × 10−5). Italic type indicates clusters of three integration events within a 1-Mb window (P, 2.6 × 10−5).
Nature of the integration site: I, intron; E, exon; Int, intergenic. Overall, 93.2% (69 of 74) of the integration sites were in defined or predicted genes, while 6.8% (5 of 74) were in intergenic regions. Among integrations in genes, 94.2% (65 of 69) were in introns and 5.8% (4 of 69) were in exons.
Transcriptional orientation: +, the host gene and the HIV-1 insert have the same transcriptional orientation; −, the gene and the insert have the opposite orientation. Of genes in transcription units, 49.3% (34 of 69) were in the + orientation and 50.7% (35 of 69) were in the − orientation.
Distance, in nucleotides, between the start site for transcription of the host gene and the HIV-1 integration site. Numbers in parentheses indicate the relative position of the integration site within the gene, with 0 representing the start of transcription and 1 representing the end of the transcript.
For integration sites within known or predicted genes, RT-PCR was carried out with gene- specific primers spanning an intron on total RNA isolated from purified resting or activated (with PHA) CD4+ T cells. +, presence of a PCR product of the predicted size; −, no PCR product under conditions that gave a readily detectable band for a ubiquitously expressed gene (GAPDH) and, for characterized genes, a correct product from cell lines or primary tissues known to express the gene. These included chondrocytes (LOX), kidney cells (ADD3), and mesenchymal stem cells (MEOX1). The expression of TOP2A, MEOX1, and a human mRNA from chromosome 12p11.21 was detected in activated but not resting CD4+ T cells. For integration events in well-characterized (RefSeq) genes, expression of the targeted gene in resting CD4+ T cells was observed for 91.1% (51 of 56) of the genes.
Gene predicted by the Genscan algorithm.
hmRNA, human mRNA from GenBank.
The start site for transcription has not yet been determined.