Skip to main content
. 2018 Dec 18;9:36. doi: 10.1186/s13100-018-0142-3

Fig. 2.

Fig. 2

Flowchart of findprovirus pipeline. The first step indexes the coordinates of solo LTRs of a HERV family in the reference genome. Mapped reads (of mapping quality score (MAPQ) equal or greater than 30) and mates of discordant reads are extracted in a window extending ±100-bp from each LTR. Homology based searches are performed with mates of discordant reads against the respective consensus of internal sequence of HERV to infer the presence of a provirus allele at the locus. The read depth for each locus is calculated and compared to the average of read depths for all solo LTRs of that family in an individual. Increased read depth may be observed for some candidate loci reflecting the presence of a provirus allele. A local de novo assembly of the reads is also performed to infer the presence or absence of a solo LTR allele at the locus. These two additional approaches (enclosed by dashed lines) are performed by the pipeline but are not primarily used to infer the presence of a provirus