Skip to main content
. Author manuscript; available in PMC: 2013 Aug 27.
Published in final edited form as: Nat Biotechnol. 2010 Nov;28(11):1181–1185. doi: 10.1038/nbt1110-1181

Figure 1.

Figure 1

Figure 1

Figure 1

Layers of reproducible computing in the cloud

The reproducibility of scientific compuitng in the cloud can be understood at three layers of scientific computation. (A) Data Layer: Generators of large scientific data sets can publish their data to the cloud as large data volumes (1) and substantial updates to these data volumes can exist in parallel without loss or modification of the previous volume (2). Primary investigators can clone entire data volumes within the cloud (3) and apply custom scripts or software computations (4) to derive published results (5). An indepdendent investigator can obtain digital replicates of the original primary data set, software, and published results within the cloud to replicate a published analysis and compare with published results (6). (B) System Layer: Investigators can set up and conduct scientific computations using cloud-based virtual machine images, incorporating all the software, configuration, and scripts necessary to execute the analysis. The customized machine image can be cloned wholesale and shared with other investigators within the cloud for replicate analyses. (C) Service Layer: Instead of making in-place modifications or updates to the systems or data comprising the underlying infrastructure of a scientific computing service, the entire infrastructure can be virrtualized in the cloud and cloned prior to update or modification to retain the state and characteristics of the previous version of the service. Requests made by external tools or applications through the external service interface could incorporate a version parameter into requests to the service, so that published results citing previous versions of the service can be evaluated for reproducibility.