Skip to main content
. 2020 May 20;9(5):giaa048. doi: 10.1093/gigascience/giaa048

Table 1:

Synthetic and real dataset used in the experiments

Sequence Group Length (bp) Description
GGA18 Aves 11,373,140 Access. CM000110Gallus gallus chromosome 18
MGA20 Aves 10,730,484 Access. CM000981Meleagris gallopavo isolate NT-WF06-2002-E0010 breed Aviagen turkey brand Nicholas breeding stock chromosome 20
GGA14 Aves 16,219,308 Access. CM000106G. gallus chromosome 14
MGA16 Aves 14,878,991 Access. CM000977Meleagris gallopavo isolate NT-WF06-2002-E0010 breed Aviagen turkey brand Nicholas breeding stock chromosome 16
HS12 Mammalia 133,275,309 Access. NC_000012Homo sapiens chromosome 12, GRCh38.p13 Primary Assembly
PT12 Mammalia 130,995,916 Access. NC_036891Pan troglodytes isolate Yerkes chimp pedigree #C0471 (Clint) chromosome 12
PXO99A Bacteria 5,238,555 Access. CP000967Xanthomonas oryzae pv. oryzae causes the major disease of bacterial blight of rice (Oryza sativa L.). X. oryzae pv. oryzae PXO99A strain is virulent toward a large number of rice varieties representing diverse genetic sources of resistance [25]
MAFF 311018 Bacteria 4,940,217 Access. AP008229X. oryzae pv. oryzae MAFF 311018 is a Japanese race 1 strain [26]
ScVII Fungi 1,090,940 Access. NC_001139Saccharomyces cerevisiae S288C chromosome VII
SpVII Fungi 1,105,967 Access. CP020299Saccharomyces paradoxus strain UFRJ50816 chromosome VII
RefS Synthetic 1,500 It consists of 3 segments of 500 bp size.
TarS Synthetic 1,500 To build TarS, segment I is mutated 2%, II is inversely repeated, and III is duplicated.
RefM Synthetic 100,000 It has 4 segments of 25 kb size.
TarM Synthetic 100,000 For building TarM, segment I of RefM (out of total 4) is inversely repeated, II is mutated 90%, III is duplicated, and IV is mutated 3%
RefL Synthetic 5,000,000 It includes 2 segments, 2,500,000 bp each
TarL Synthetic 5,000,000 Segment I is inversely repeated, and II is mutated 2% for building TarL
RefXL Synthetic 100,000,000 It is made of 4 segments, 25,000,000 bp each
TarXL Synthetic 100,000,000 Segment I is mutated 1%, segments II and III are inversely repeated, and segment IV is duplicated to make TarXL
RefMut Synthetic 60,000 It includes 60 segments of 1 kb size
TarMut Synthetic 60,000 To build TarMut, the first segment (I) is mutated 1%, the second segment is mutated 2%, the third one is mutated 3%, and so on
RefComp Synthetic 1,000,000 It consists of 10 segments of 100 kb
TarComp Synthetic 1,000,000 To build it, the first segment (I) of RefComp is duplicated, and the second, third, and fourth segments are mutated 1%, 2%, and 3%, respectively. Segments V, VI,and VII of RefComp are inversely repeated, then mutated 4%, 5%, and 6%, respectively. Finally, segments VIII, IX, and X are mutated 7%, 8%, and 9%, respectively.
RefPerm Synthetic 3,000,000 It includes 3 segments of 1 Mb size. In addition to the original sequence, it is permutated, using GOOSE toolkit, by blocks of sizes 450 kb, 30 kb, 1 kb and 30 bp.
TarPerm Synthetic 3,000,000 To build TarPerm, the first segment is mutated 1%, the second segment is inversely repeated, and the third one is mutated 2%.

The real dataset can be download from NCBI via accession number (access.) provided in the descriptions.