Skip to main content
. 2019 Apr 11;15(4):e1008043. doi: 10.1371/journal.pgen.1008043

Table 1. Error filters used in the computational pipeline.

Filter name Definition
Improper alignment We rejected reads with less than 30-bp alignment or more than 3 mismatches to the reference genome.
Diagnostic motif We rejected reads without L1Hs diagnostic G motif (position 6012 relative to the L1Hs Repbase consensus) [29].
Chimera within L1 segment We rejected reads with less than 95% identity (> 4 mismatches) to the L1Hs 3' end consensus sequence.
Chimera within
poly-A tail
We rejected reads at risk of being chimeric [12]. We applied BLAST to find the best alignments for retrotransposon and non-retrotransposon segments from hg19. Reads were removed as a putative chimera when the sequences of two segments overlapped > 10 bp with A% (percentage of adenine nucleotides) ≥ 50% or overlapped 6–10 bp with A% < 50%.
Subfamily filter We rejected putative somatic insertion sites that overlapped with L1 young subfamilies (L1Hs and L1PA2–4) reference insertions.
Known non-reference filter We rejected putative somatic insertion sites that overlapped with known non-reference L1 insertions in euL1db [30].
Misaligned reads We rejected reads at risk of being misaligned, defined as inconsistent BWA and BLAT alignment.
Local structural variation (SV) We rejected reads at risk of being derived from a nearby reference L1Hs [12]. We extracted 2 kb downstream of aligned non-retrotransposon segment from hg19 and aligned the full contig against this sequence by BLAT to exclude potential genomic rearrangement events.
Observed in common We rejected putative somatic insertion sites observed in two or more individuals.
PCR duplicate We rejected somatic insertion sites without supporting PCR duplicates.
HHS Vulnerability Disclosure