Table 1.
Selected features (n = 16) identified using the most parsimonious and performant model (prediction accuracy = 90.9%).
Id | Selected Features | Description |
---|---|---|
Genome | bp_genome_total | Genome size |
bp_genA | Total number of Adenines (within the genome) | |
bp_genT | Total number of Thymines (within the genome) | |
fr_genG | Frequency of Guanines (number of Guanines divided by DNA total length) within the genome | |
genomic_shannon_score | Shannon’s Entropy of total genome sequence | |
CDS | n_cds_total | Total number of CDS elements (Coding DNA Sequences) |
bp_cds_total | Total number of CDS nucleotides | |
bp_cdsA | Total number of CDS Adenines | |
bp_cdsG | Total number of CDS Cytosines | |
bp_cdsT | Total number of CDS Thymines | |
cds_chargaff_score_ct | Chargaff’s Second Parity rule score of total CDS sequence (ct method) | |
cds_chargaff_score_pf | Chargaff’s Second Parity rule score of total CDS sequence (pf method) | |
cds_shannon_score | Shannon Entropy value of total CDS sequence | |
tRNA | tRNA_chargaff_score_ct | Chargaff’s Second Parity rule score of total tRNA sequence (ct method) |
tRNA_chargaff_score_pf | Chargaff’s Second Parity rule score of total tRNA sequence (pf method) | |
tRNA_shannon_score | Shannon’s Entropy value of total tRNA sequence |