Skip to main content
. 2016 Mar 19;32(14):2103–2110. doi: 10.1093/bioinformatics/btw152

Table 4.

Evaluation datasets

Name Species Size Cov. N50
PB-ce-40X Caenorhabditis elegans 104M 45 16 572
ERS473430 Citrobacter koseri 4.9M 106 7543
ERS544009 Yersinia pseudotuberculosis 4.7M 147 9002
ERS554120 Pseudomonas aeruginosa 6.4M 90 7106
ERS605484 Vibrio vulnificus 5.0M 155 5091
ERS617393 Acinetobacter baumannii 4.0M 237 7911
ERS646601 Haemophilus influenzae 1.9M 258 4081
ERS659581 Klebsiella sp. 5.1M 129 8031
ERS670327 Shimwellia blattae 4.2M 155 6765
ERS685285 Streptococcus sanguinis 2.4M 224 5791
ERS743109 Salmonella enterica 4.8M 188 6051
PB-ecoli Escherichia coli 4.6M 160 13 976
PBcR-PB-ec Escherichia coli 4.6M 30 11 757
PBcR-ONT-ec Escherichia coli 4.6M 29 9356
MAP-006-1 Escherichia coli 4.6M 54 10 892
MAP-006-2 Escherichia coli 4.6M 30 10 794
MAP-006-pcr-1 Escherichia coli 4.6M 30 8080
MAP-006-pcr-2 Escherichia coli 4.6M 60 8064

Evaluation dataset name, species, reference genome size, theoretical sequencing coverage and the N50 read length. Names starting with ‘MAP’ are unpublished recent ONT data provided by the Loman lab (http://bit.ly/loman006). Names starting with ‘ERS’ are accession numbers of unpublished PacBio data from the NCTC project (http://bit.ly/nctc3k). PB-ecoli and PB-ce-40X are PacBio public datasets sequenced with the P6/C4 chemistry (http://bit.ly/pbpubdat; retrieved on 11/03/2015). PBcR-PB-ec is the PacBio sample data (P5/C3 chemistry) used in the tutorial of the PBcR pipeline; PBcR-ONT-ec is the ONT example originally used by Loman et al. (2015). ‘pls2fasta –trimByRegion’ was applied to ERS* and PB-ecoli datasets as they do not provide read sequences in the FASTQ format.