Table 1.
TE library | Reference TPa | Reference FNb | NA12878 TPc | NA12878 FNd | FPe | Validated Novelf |
---|---|---|---|---|---|---|
L1HS | 589 (84642) | 35 | 54 (1493) | 22 | 19 (74) | 10 (38) |
AluYa5/8 | 2335 (51529) | 404 | 143 (874) | 91 | 9 (44) | 6 (32) |
AluYb8/9 | 1664 (61099) | 183 | 119 (953) | 29 | 3 (12) | 1 (4) |
aReference TP, observed TE insertions (reads) in the reference truth set with a TE cluster within 600 bp window of 3′ terminal position and match to predicted TE subfamily. Clusters contain filtered reads with a minimum 2 or more Illumina read 1 derived from the unique flanking sequence. See text for details
bReference FN, false negatives computed as reference TE subfamily members lacking cluster within 600 bp window of TE 3′ terminal position
cNA12878 TP, observed 1000 Genomes Phase 3 MEI calls in NA12878 having an identified TE cluster within 600 bp window of 3′ terminal position and matching predicted TE class (Alu, LINE1)
dNA12878 FN, MEI calls with TE subfamily classification lacking an observed cluster within 600 bp window of TE 3′ terminal position
eFP, false positive clusters lacking previous evidence of TE insertion within 600 bp window of cluster position before validation with GiaB and ONT long-read data
fValidated Novel, FP clusters supported by evidence from GiaB and ONT long-read data