Table 2.
Category | Predictor | Studied in Human, piggyBat and SB | Studied in Bat | Measure (transformation)a | Source | Description |
---|---|---|---|---|---|---|
DNA conformation | A-phased repeat | X | X | Content | Cer et al. (2011) | Runs of four or more As or Ts without the flexible TpA step (Rohs et al. 2009) |
Direct repeat | X | X | Content | Cer et al. (2011) | Two tracts of 10–50 nt separated by 0–5 nt that have the same composition | |
G-quadruplex repeat | X | X | Count (log10) for human, content for bat | Cer et al. (2011) | Four blocks, with the same number of G bases (from 3 to 7), separated by 1–7 nt | |
Inverted repeat | X | X | Content | Cer et al. (2011) | Two consecutive DNA sequences (10–100 nt long) separated by 0–100 nt that are palindromic on the same strand. They may fold back and generate double helices | |
Mirror repeat | X | X | Count | Cer et al. (2011) | Two perfect repeats of 10–100 nt separated by 0–100 nt on the same strand | |
Triplex motif | X | Count (log10) | Cer et al. (2011) | Similar to Mirror repeat but the repeat can contain only purines or pyrimidines on the same strand, the separation is at most 8 nt. These sequences can form three-stranded isoforms | ||
Z-DNA repeat | X | X | Content | Cer et al. (2011) | Five or more tandem repeats, each with alternating pyrimidine–purine dinucleotide motif in which the pattern YG is maintained on one of the DNA strands. These motifs can take the Z-DNA conformation | |
DNA sequence | Mononucleotide STR | X | X | Content | Genome-wide screen | All repeats of one nucleotide with length >9 bp |
Dinucleotide STR | X | X | Content | Genome-wide screen | All repeats of two nucleotide motifs with length >10 bp | |
Trinucleotide STR | X | X | Content | Genome-wide screen | All repeats of three nucleotide motifs with length >12 bp | |
Tetranucleotide STR | X | X | Content | Genome-wide screen | All repeats of four nucleotide motifs with length >16 bp | |
TRF | X | Content (log10) | Repeats identified by the program Tandem Repeat Finder (Benson 1999) | |||
SINE | X | X | Count (log10) for human, content for bat | UCSC Genome Browser | Transposable element (Short INterspersed Elements) | |
LINE | X | X | Count for human, content for bat | UCSC Genome Browser | Transposable element (Long INnterspersed Elements) | |
L1 target sequence | X | X | Count | Cost et al. (2002) | Sequence associated to target primed reverse transcription which is characteristic of L1 and Alu mobilization | |
Telomere hexamer sequence | X | X | Count | Morrish et al. (2007); Nergadze et al. (2007) | Sequence associated with DSB and repair by telomerases, and L1 retrotransposition as well | |
CpG dinucleotides | X | Content | Genome-wide screen | Proportion of CpG dinucleotides in each window | ||
Gene or exon | X | X | Content | UCSC Genome Browser | All annotated genes or exons in these genomes | |
Most conserved elements | X | Count | Siepel et al. (2005) | Regions of the genome that contain functional elements identified by comparative genomics | ||
Expression and regulation | Chromatin accessibility | X | Count (sqrt) | UCSC Genome Browser | Regions of the genome digested by DNAse I reflecting active chromatin | |
RNA polymerase II occupancy | X | Content | UCSC Genome Browser | Regions of the genome transcribed by RNA polymerase II | ||
DNA methylation | X | Count (sqrt) | Down et al. (2008) | Epigenetic modification that can modify the chromatin structure and regulate gene expression | ||
CpG islands | X | X | Content (sqrt) for human, count for bat | UCSC Genome Browser | Regions of the genome rich in GC close to promoters | |
Recombination | Recombination hotspots | X | Count (log10) | Myers et al. (2005) | Predicted recombination hotspots using SNP data | |
Position on the chromosome | Distance to telomere | X | Distance in bp (sqrt) | Genome-wide screen | Distance from the tip of the chromosome to each genome window defined here | |
Distance to centromere | X | Distance in bp (sqrt) | Genome-wide screen | Distance from the centromere to each genome window defined here | ||
Replication | Replication timing | X | Weighted average (sqrt) | Ryba et al. (2010) | Genome-wide microarray data measuring time in which replication occurs in hESC (human embryonic stem cells) |
aCount is the number of each feature in a particular window. Content is the fraction of a particular window that is occupied by a feature. Weighted average is used when several data intervals overlap within a window border.