Table 2.
Generation of a curated benchmark ORF set. The benchmark set archives contain GFF files for labels of all annotated ORF sets (positive/negative), MS labels, tool predictions, close-proximity genes, genome sequences, and reference annotations to enable inspection in a genome browser. Links to the original data sources are provided. For each dataset the sequencing depth is given (total number of reads times average read length divided by genome length) [80]. The number of ORFs from each annotated ORF set (translatome, sORFs, close-proximity genes and stand-alone genes) that have been identified as translated (positive) or nontranslated (negative) are listed.
| Organism | E. coli | L. monocytogenes [56] | P. aeruginosa [59] | S. typhimurium [58] | ||||
|---|---|---|---|---|---|---|---|---|
| Benchmark set [zip] | E. coli | L. monocytogenes | P. aeruginosa | S. typhimurium | ||||
| Growth conditions | WT, LB @ 37 C |
WT, BHI @ 37 C |
WT, n-alkanes | WT, LB @ 37 C |
||||
| Data | GSE131514 | SAMEA3864955 | SAMN06617371 | SRX3456030 | ||||
| SAMEA3864956 | SRX3456038 | |||||||
| Sequencing depth | 42.98 | 939.76 | 81.92 | 38.92 | ||||
| Set | Positive | Negative | Positive | Negative | Positive | Negative | Positive | Negative |
| Translatome | 2763 (65%) | 1485 (35%) | 2288 (80%) | 579 (20%) | 3935 (71%) | 1638 (29%) | 3284 (66%) | 1689 (34%) |
| sORFs | 54 (48%) | 60 (52%) | 7 (100%) | 0 (0%) | 7 (58%) | 5 (42%) | 31 (31%) | 69 (69%) |
| Close-proximity genes | 1794 (64%) | 1015 (36%) | 1622 (80%) | 432 (20%) | 2511 (69%) | 1113 (31%) | 1947(66%) | 1010(34%) |
| Stand-alone genes | 969 (67%) | 470 (33%) | 666 (82%) | 147 (18%) | 1424 (73%) | 525 (27%) | 1337 (66%) | 679 (34%) |


