Skip to main content
. Author manuscript; available in PMC: 2021 Jun 2.
Published in final edited form as: Quant Biol. 2020 Mar;8(1):64–77. doi: 10.1007/s40484-019-0187-4

Table 1.

The number of viral sequences of various sizes from viral genomes discovered before January 2014, between January 2014 and May 2015, and after May 2015

Length Training (Before 1/2014) Validation (1/2014–5/2015) Test (After 5/2015) Total
150 bp 505,259 164,918 355,204 705,697
300 bp 252,630 82,458 177,416 512,504
500 bp 154,640 50,350 106,298 311,288
1000 bp 77,014 25,087 52,956 155,057
3000 bp 25,263 8,246 17,385 50,894

The three parts of the dataset partitioned by dates were used for training, validation, and testing, respectively.