Detection of integrations not in the human reference sequence. (A) Schematic of pipeline for finding loci showing how mapping of trimmed reads is linked to result of RetroSeq analysis. Mapping creates a cluster of trimmed reads that are derived from HK2 loci, which are inside the cluster of RetroSeq anchor reads. In contrast, trimmed reads derived from other regions by chance sequence similarity are scattered around the genome. The next stage is confirmation of integration by BreakAlign analysis. Chr, chromosome. (B) Example of the Integrative Genomics Viewer genome browser (49) screenshot showing evidence for the 4q22.3 locus (from chromosome 4 [chr4], coordinates 9602941 to 9603548). (Top) Mapping of all reads with colored ones representing RetroSeq anchors (see Materials and Methods; the color shows the chromosome on which the mate has been mapped to another HK2 locus in the reference genome); (middle) mapping of trimmed reads, with the coverage at each nucleotide position being shown above the reads. The short overlap representing the 6-nt target site duplication causes a doubling of coverage at these 6 nt, forming the tower in the characteristic submarine-shaped profile of the coverage. (Bottom) RepeatMasker track. In this instance, the HK2 virus has integrated into an existing ERV belonging to another lineage, HERVS71.