Table 2.
Benchmarking study | Application | No. of tools | Model of study | Raw input data type | Gold standard data preparation method | Parameter optimization |
---|---|---|---|---|---|---|
Yang et al. 2013 | Error correction | 7 | I | R | SIMUL | N |
Aghaeepour et al. 2013 | Flow cytometry analysis | 14 | C | R | EXPERT | N |
Bradnam et al. 2013 | Genome assembly | 21 | C | R | ALTECH | n/a |
Hunt et al. 2014 | Genome assembly | 10 | I | R, S | SOFTWARE | N |
Lindgreen et al. 2016 | Microbiome analysis | 14 | I | S | SIMUL | No |
McIntyre et al. 2017 | Microbiome analysis | 11 | I | R, S | MOCK | N |
Sczyrba et al. 2017 | Microbiome analysis | 25 | C | S | SIMUL | n/a |
Altenhoff et al. 2016 | Ortholog prediction | 15 | I | DB | DB | Y |
Jiang et al. 2016 | Protein function prediction | 121 | C | R | DB | n/a |
Radjvojac et al. 2013 | Protein function prediction | 54 | C | R | DB | n/a |
Baruzzo et al. 2017 | Read alignment | 14 | I | S | SIMUL | Y |
Earl et al. 2014 | Read alignment | 12 | C | R, S | SIMUL | n/a |
Hatem et al. 2013 | Read alignment | 9 | I | R, S | SIMUL | Y |
Hayer et al. 2015 | RNA-Seq analysis | 7 | I | R, S | ALTECH | N |
Kanitz et al. 2015 | RNA-Seq analysis | 11 | I | R, S | ALTECH | N |
Łabaj et al. 2016 | RNA-Seq analysis | 7 | I | R | ALTECH | N |
Łabaj et al. 2016 | RNA-Seq analysis | 4 | I | R | DB | N |
Li et al. 2014 | RNA-Seq analysis | 5 | I | R | ALTECH | Y |
Steijger et al. 2013 | RNA-Seq analysis | 14 | C, I | R | ALTECH | n/a |
Su et al. 2014 | RNA-Seq analysis | 6 | I | R | ALTECH | Y |
Zhang et al. 2014 | RNA-Seq analysis | 3 | I | R | ALTECH | Y |
Thompson et al. 2011 | Sequence alignment | 8 | I | DB | DB | N |
Bohnert et al. 2017 | Variant analysis | 19 | I | R, S | I&A | Y |
Ewing et al. 2015 | Variant analysis | 14 | C | S | SIMUL | n/a |
Pabinger et al. 2014 | Variant analysis | 32 | I | R, S | SIMUL | N |
Surveyed benchmarking studies published from 2011 to 2017 are grouped according to their area of application (indicated in column “Application”). We also recorded the number of tools benchmarked by each study (“Number of Tools”). We documented the coordinating model used to conduct the benchmarking study (“Model of Study”), such as those independently performed by a single group (“I”), a competition-based approach (“C”), and a hybrid approach combining elements of “I” and “C” (“C, I”). Types of raw omics data (“Raw Omics Data”) and gold standard data (“Gold Standard Data Preparation Method”) were documented across benchmarking study. When a benchmarking study uses computationally simulated data, we marked the study as “S”; when real raw data were experimentally generated in the wet-lab, we marked the study as “R”. When the study used both simulated and real data, we marked the study as “R, S”. Gold standard data types included data that were computationally simulated (marked as “SIMUL”), manually evaluated by experts (marked as “EXPERT”), prepared by alternative technology (“marked as ALTECH”), prepared as curated software input (marked as “SOFTWARE”), prepared as mock community (marked as “MOCK”), prepared from curated databases (marked as “DB”), and prepared using an integration and arbitration approach (marked as “I&A”). In competition-based benchmarking studies, parameter optimization (“Parameter Optimization”) is performed by each team and is not mandatory (marked here as “n/a”). More details about the characteristics of techniques to prepare gold standard data sets are provided in Table 1