Skip to main content
. 2019 Oct 8;36(5):1382–1390. doi: 10.1093/bioinformatics/btz753

Table 1.

RNAIndel features

Feature identifiera Feature description Selection statusb
1 Sequence/Alignment repeat Count of repeat unit including homopolymers and STRs in indel flanking region s, m
2 lc (linguistic complexity) Diversity of k-mers in flanking 50-bp region
3 local_lc Diversity of k-mers in flanking 6-bp region s, m
4 gc GC content in flanking 50-bp region
5 local_gc GC content in flanking 6-bp region
6 strength DNA pair-bond strength of 2-mers in flanking 50-bp region m
7 local_strength DNA pair-bond strength of 2-mers in flanking 6-bp region s
8 dissimilarity** Edit distance between indel and flanking sequences m
9 indel_complexity Mismatches around the indel measured by edit distance s
10 indel_size** Length of inserted or deleted nucleotides m
11 is_ins True for insertions m
12 is_at_ins* True for ‘A’ or ‘T’ insertions s
13 is_at_del* True for ‘A’ or ‘T’ deletions
14 is_gc_ins* True for ‘G’ or ‘C’ insertions
15 is_gc_del* True for ‘G’ or ‘C’ deletions s
16 ref_count Count of RNA-Seq reads representing the reference allele s, m
17 alt_count Count of RNA-Seq reads representing the indel allele s, m
18 is_bidirectional True if an indel is supported by forward and reverse reads s
19 is_uniq_mapped True if an indel is supported by uniquely mapped reads s, m
20 is_near_exon_boundary True if an indel is within exon but on the exon boundary s, m
21 equivalence exists True if alternative indel alignments are observe s, m
22 is_multiallelic True if multiple indels are observed at the locus s, m
23 Transcript/Protein is_inframe** True if an indel is in-frame
24 is_splice True if an indel is in an intronic region within 10-bp to exon
25 is_truncating True if an indel causes frame-shift, or stop gain, or destroys splice motif
26 is_in_cdd** True if an indel is located in conserved domain
27 indel_location Relative indel location in coding region s
28 is_nmd_insensitive True if nonsense-mediate decay insensitive
29 indels_per_gene Number of indels detected in the gene in the sample
30 cds_length Length of the coding region s
31 DB is_on_db True if indel is present in the default germline database s, m

Note: A total of 31 features related to sequence/alignment, biological effect on transcription and protein coding, and match to germline variant database are examined.

a

Features marked with * were used only for training of single-nucleotide indel model while those marked with ** were used only for training of multi-nucleotide indel model.

b

Features selected by the single-nucleotide or the multi-nucleotide model are marked as s and m, respectively.