Table 1.

Integration site data sets used in the study.

					Average values per 1 Mb surrounding integration site^c
Virus	Target cell	Set name	Total sequence reads^a	Unique integration sites^b	% G/C content/Mb	No. Transc. units/Mb	No. CpG islands/Mb
HIV	Activated CD4⁺ T cell	Activated	1183	524	46.7^***	9.5^***	60.9^***
HIV	Resting CD4⁺ T cell	Resting	1955	947	44.5^***	6.7^***	40.0^***
HIV	Activated CD4⁺ T cell	Activated+dN	1500	663	47.3^***	9.5^**	67.0^***
HIV	Resting CD4⁺ T cell	Resting+dN	1076	527	44.4^***	6.9^**	38.0^***

Transc., transcription.

The number of sequences recovered by pyrosequencing that contained the proper barcode and long terminal repeat (LTR) primer.

The number of total sequence reads (^a) that had a single best match to the human genome of >98% identity, where the terminal viral sequence (5′-CA-3′) is within 3 bp of the high quality match and where all duplicate integration sites were condensed into a single entry.

The average values of ‘% G/C content/Mb’, ‘No. Transc. units/Mb’ and ‘No. CpG islands/Mb’ correspond to data sets used to generate heatmap tiles in Fig. 4 for ‘GC content, 1 Mb’, ‘Expression density, Unigene, gene density, 1 Mb’ and ‘CpG Islands, Density, 1 Mb’, respectively.

*P < 0.05.

^**

P < 0.01.

^***

P < 0.001.