Skip to main content
. 2020 Nov 30;16(11):e1008085. doi: 10.1371/journal.pcbi.1008085

Table 1. Benchmark dataset statistics and training alignment information content.

Family Ntrain Ntest Consensus Length 1° info (bits) 2° info (bits) 3° info (bits)
tRNA 1357 30 72 40.3 26.7 1.2
Twister 1005 40 65 57.5 4.5 1.3
SAM 192 8 108 121.8 11.1 0.0

Hand-curated MSAs are split into training and test sets based on [45]. For each training MSA, information content in the primary sequence (in bits) is calculated [39], while information in secondary structure (nested base pairs) and tertiary structure (all other disjoint pairwise interactions between sites) is estimated using mutual information [6]. Each family’s consensus structure is inferred using CaCoFold and R-scape on the training alignment [41, 42]. Though R-Scape identifies no tertiary structure using the SAM riboswitch training alignment, a four-base pair pseudoknot has been observed experimentally [46]. This lack of pseudoknot detection is a characteristic of our SAM training alignment; R-scape predicts the pseudoknot when analyzing the RF00162 seed alignment.