Table 1.
Volume of Data MB = megabyte = 106 bytes, GB = gigabyte = 109 bytes, TB = terabyte = 1012 bytes, PB = petabyte = 1015 bytes |
D. Computational Power (CPU transistor counts) Moore's Law | Years | ||||||
---|---|---|---|---|---|---|---|---|
Single Cryo brain Volume 1600 cm2 | B. Neuroimaging (annually) | C. Genomics (BP/Yr) | ||||||
A. Voxel Resolution | Gray Scale | RGB Color | 200 GB | 10 MB | 1×105 | 1985-1989 | ||
Size | Count | 8bits | 16bits | 24bits | 1 TB | 100 MB | 1×106 | 1990-1994 |
1cm | 12×15×9 | 1620 | 3000 | 4860 | 50 TB | 10 GB | 5×106 | 1995-1999 |
1mm | 120×150×90 | 1.62 MB | 3.24 MB | 4.86 MB | 250 TB | 1TB | 1×107 | 2000-2004 |
100 μm | 1200×1500×900 | 1.62 GB | 3.24 GB | 4.86 GB | 1 PB | 30TB | 8×106 | 2005-2009 |
10 μm | 12000×15000×9000 | 1.62 TB | 3.24 TB | 4.86 TB | 5 PB | 1 PB | 1×109 | 2010-2014 |
1 μm | 120000×150000×90000 | 1.62 PB | 3.24 PB | 4.86 PB | 10+ PB | 20+ PB | 1×1011 | 2015-2019 (estimated) |
Legend:
A. Recent technological advances enable significant increases of the level of detail of optical imaging (e.g., cryotomographic brain images) into the micron (μm) resolution [6-8].
B. By 2012, there were 55PBs of neuroimaging data [9, 10], which may exaggerate the volume of neuroimaging data due to different publications sharing the same datasets. As of 2010, the Imaging Data Archive, a Laboratory of Neuro Imaging brain database, stored about 5×1015B=5PBs data. Recent neuroimaging studies may generate 1.5 TB of data each week [11].
C. In 2011, the size of the genetics data is estimated to be 30TBs (based on 10,000 human genomes) [12, 13]. As the total number of complete human genomes sequenced by the end of 2011 worldwide was >10,000, this figure may be orders of magnitude smaller than the real genomics data size. Furthermore, data derived from genome sequencing of other species and ‘partial genomes’ (e.g., exome capture sequencing, RNA sequencing and chromatin immunoprecipitation sequencing) is not included in this estimate. By 2015 more than a 106 human genomes will be sequenced [12]. Assuming each genome takes about 1011B (100GB) this translates into a total data volume of 1017B (100PB). Some of the sequences may be whole-genome 100X depth/coverage acquisitions, and some may be acquired at lower depth.
D. Data volume may be increasing at a faster pace compared to the well-established growth of computational power, Moore's law [14, 15].