Skip to main content
. 2023 Aug 23;2023:gigabyte87. doi: 10.46471/gigabyte.87

Table 1.

Details of the four datasets used during the testing phase of the aws-s3-integrity-check tool. All datasets were independently tested. The log files produced by each independent test are available on GitHub [21]. All processing times were measured using the in-built time Linux tool (version 1.7) [22]. Processing times refer to the time (in minutes and seconds) required for the aws-s3-integrity-check tool to process and evaluate the integrity of the totality of the files within each dataset.

Amazon S3 bucket Data origin Details Number of files tested Bucket size Processing time Log file
mass-spectrometry-imaging GigaDB Imaging-type supporting data for the publication “Delineating Regions-of-interest for Mass Spectrometry Imaging by Multimodally Corroborated Spatial Segmentation” [23]. 36 16 GB real 1m52.193s
user 1m8.964s
sys 0m24.404s
logs/mass-spectrometry-imaging.S3_integrity_log.2023.07.31-22.59.01.tx
rnaseq-pd EGA Contents of the EGA dataset EGAS00001006380, containing bulk-tissue RNA-sequencing paired nuclear and cytoplasmic fractions of the anterior prefrontal cortex, cerebellar cortex, and putamen tissues from post-mortem neuropathologically-confirmed control individuals [24]. 872 479 GB real 62m56.793s
user 36m26.604s
sys 16m10.548s
logs/rnaseq-pd.S3_integrity_log.2023.07.31-23.02.47.txt
tf-prioritizer GigaDB Software-type supporting data for the publication “TF-Prioritizer: a Java pipeline to prioritize condition-specific transcription factors” [25]. 6 3.7 MB real 0m15.131s
user 0m2.012s
sys 0m0.240s
logs/tf-prioritizer.S3_integrity_log.2023.07.31-22.58.33.txt
ukbec-unaligned-fastq EGA A subset of the EGA dataset EGAS00001003065, containing RNA-sequencing Fastq files generated from 180 putamen and substantia nigra control samples [26]. 131 440 GB real 51m12.058s
user 31m27.348s
sys 14m7.084s
logs/ukbec-unaligned-fastq.S3_integrity_log.2023.08.01-01.03.58.txt