Table 3. Comparison of gene abundance estimates for simulated CAST and DO RNA-seq data after alignment to NCBIM37 reference and individualized transcriptomes.
No. genes with estimates x% from ground truth |
||||||
---|---|---|---|---|---|---|
Aligned to | Mismatches allowed | Genes above threshold | <5% | <10% | >10% | >50% |
CAST reads | ||||||
NCBIM37 | 3 | 12,186 | 4,319 | 8,217 (67) | 3,969 (33) | 485 |
CAST | 3 | 12,108 | 8,718 | 10,544 (87) | 1,542 (13) | 174 |
NCBIM37 | 0 | 12,137 | 1,465 | 2,925 (24) | 9,212 (76) | 1,576 |
CAST | 0 | 12,059 | 7,023 | 9,568 (79) | 2,491 (21) | 152 |
DO reads | ||||||
NCBIM37 | 3 | 11,899 | 7,260 | 9,805 (82) | 2,094 (18) | 230 |
DO IRG | 3 | 11,863 | 8,569 | 10,471 (88) | 1,380 (12) | 161 |
NCBIM37 | 0 | 11,879 | 2,309 | 4,810 (40) | 7,069 (60) | 530 |
DO IRG | 0 | 11,857 | 7,110 | 9,575 (81) | 2,262 (19) | 164 |
Alignment of simulated CAST reads to the individualized CAST transcriptome results in twice as many gene estimates (N = +4399) that fall within 5% of ground-truth value and fewer than half as many gene estimates (N = −2427) that deviate >10% from the ground truth. Gene estimates in the simulated DO sample are also improved by read alignment to the individualized transcriptome, yielding 18% more estimates (N = +1309) within 5% of the ground-truth value and 34% fewer estimates (N = −714) that deviate >10% from the ground truth.