PacBio long-read sequencing, TCGA, and UCSC Genome Browser (http://genome.ucsc.edu) data were used to annotate the HPV integration site within the candidate IDG, USP36. TCGA-C5-A8XH long-read data (PacBio) and TCGA sequencing (RNAseq), CNV (blue = loss; red = gain), and methylation data (blue = hypomethylation; red = hypermethylation) covering the integration event spanning ~200 kb of the human genome (a). The Ribbon programme was used to generate a schematic of two PacBio reads covering the area of integration. Thick bars across the top (b) represent the HPV and human reference genomes, which are connected by dashed lines to two unique PacBio reads covering the integration to show how they are specifically mapped to each genome. Data from all PacBio long reads covering the integration event were used to hand annotate the integration event (c). Breakpoints identified from TCGA short-read sequencing (SR) are highlighted in the yellow boxes. The diagonal double line represents a breakpoint connecting two non-contiguous sequences of the human genome. TCGA RNAseq data suggest the expression of the fused USP36-encoding DNA with upstream intergenic DNA located between the SCAT and CYTH1 genes and sharp upregulation of USP36 expression beginning at intron 4, potentially driven by the inserted viral URR.