Skip to main content
. 2008 Feb;18(2):331–342. doi: 10.1101/gr.7004508

Table 3.

A list of 11 structural GI models, built based on different training sets

graphic file with name 331tbl3.jpg

Training sets include (1) 421 Salmonella regions, (2) 107 Streptococcus regions, (3) 140 Staphylococcus regions (including two regions overlapping rRNA operons), (4) 138 Staphylococcus regions (no rRNA operons), (5) 245 Staphylococcus–Streptococcus regions, (6) 559 Salmonella–Staphylococcus regions, (7) 528 Salmonella–Streptococcus regions, (8) 666 Salmonella–Staphylococcus–Streptococcus regions. Training sets 9–11 include three subsets of ∼140 different Salmonella-specific regions combined with the Staphylococcus- and Streptococcus-specific regions. Each model, expressed through function Si, is the weighted sum of eight basis functions (structural features): The Interpolated Variable Order Motif (IVOM) score that measures both low- and high-order compositional deviation from the backbone composition and is expressed as the relative entropy between the query and the genome-backbone (variable order) compositional distribution, the insertion point (INSP) of each genomic region; two states were (binary) evaluated: insertion point within a CDS locus (disrupting the corresponding CDS) or insertion within an intergenic part of the chromosome, the size (SIZE) of each genomic region, the gene density (DENS = number of genes per kilobase) of each region, presence or absence (binary) of direct/inverted repeats (REPEATS) flanking the boundaries of each genomic region, presence or absence (binary) of integrase and/or integrase-like (INT) protein domains, presence or absence (binary) of phage-related protein domains (PHAGE), presence or absence (binary) of non-coding RNA (RNA) in the proximity of each region.