Table 1. VNTRseek accuracy, minimum flank length 20.
Genotype calling | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Read | Read mapping | Reference TR mapping | Unmodified TR | Homozygous VNTR | Heterozygous VNTR† | |||||||
Set | Sen | Spec | Sen | Spec | Sen | Spec | Sens | Spec | PPV | Sens | Spec | PPV |
454 Exact (avg. 261 nt) | 97.5% | 99.6% | 96.9% | 99.2% | 97.7% | 100%* | 95.8% | 100%* | 96.3% | 84.2% | 100%* | 91.6% |
454 Errors (avg. 261 nt) | 90.1% | 99.5% | 94.7% | 99% | 93.7% | 99.9% | 91.9% | 100%* | 91.8% | 76.6% | 100%* | 85.8% |
Illumina Exact (100 nt) | 94.5% | 99.5% | 96.8% | 99.6% | 95.0% | 100%* | 93.6% | 100%* | 98.1% | 83.1% | 100%* | 86.5% |
Illumina Errors (100 nt) | 70.4% | 97.7% | 78.1% | 97.4% | 67.6% | 100%* | 64.8% | 100%* | 94.2% | 47.3% | 100%* | 74.5% |
Average accuracy measures for 12 simulated read sets, three each for two technologies (454 and Illumina) generated from the reference genome (Exact) and three each obtained by introducing errors into exact reads (Errors). Read Mapping is the accuracy of assigning reads to the correct reference TRs. Reference TR Mapping is the accuracy with which reference TRs were assigned reads. Genotype Calling is the accuracy of calling unmodified reference TRs and homozygous VNTRs in a modified reference set where 1118 randomly selected reference TRs (approximately 0.5% of the total) were modified by adding or subtracting one or two pattern copies, and the accuracy of calling heterozygous VNTRs where the unmodified reference set was used and reads were selected equally from two chromosome sets, one exact and one modified to match the modified references. PPV is positive predictive value, the fraction of called VNTRs that were correct. Typical data are shown in Supplementary Tables S2-S6. *Specificity for unmodified TR calling and VNTR calling is slightly less than 100%. †Heterozygous VNTR values for Illumina reads obtained by combining three data sets into one in order to obtain enough ref-TR loci spanned by at least two reads in both the modified and unmodified chromosome sets.