Table 1. Summary of computation.
Organism name | #reference bp (millions) | #unique traces (millions) | Mean coverage | Space (Gb) | Time (millions of node seconds) |
Anopheles gambiae | 260 | 4.3 | 9.9 | 13 | 0.56 |
Callithrix jacchus | 2,900 | 22 | 4.6 | 160 | 1.5 |
Canis familiaris | 2,400 | 33 | 8.3 | 370 | 3.4 |
Drosophila melanogaster | 160 | 0.67 | 2.5 | 2.5 | 0.06 |
Gallus gallus | 1,000 | 12 | 7.2 | 30 | 1.3 |
Homo sapiens | 2,900 | 85 | 18 | 530 | 30 |
Mus musculus | 2,600 | 93 | 21 | 4,200 | 114 |
Pan troglodytes | 2,900 | 32 | 6.6 | 150 | 7.0 |
Takifugu rubripes | 350 | 2.5 | 4.2 | 6.4 | 1.2 |
Xenopus tropicalis | 1400 | 14 | 6.0 | 360 | 4.8 |
Total | 298.47 | 5821.90 | 163.82 |
Total data generated from analysis of 603,249,815 traces, 30% of the total number of traces at NCBI (outside the short-read archive). Approximately half were placed uniquely while applying our cutoffs, with total data consuming six terabytes of disk and more than five “node years”of CPU time. The computation on mouse traces produced the bulk of the data.