Skip to main content
. 2014 Apr 22;31(7):1816–1832. doi: 10.1093/molbev/msu138

Table 2.

Significant Predictors (and Their Sources) in the Regression Analyses of Human, Bat, and Ex Vivo Integrations of DNA Transposons.

Category Predictor Studied in Human, piggyBat and SB Studied in Bat Measure (transformation)a Source Description
DNA conformation A-phased repeat X X Content Cer et al. (2011) Runs of four or more As or Ts without the flexible TpA step (Rohs et al. 2009)
Direct repeat X X Content Cer et al. (2011) Two tracts of 10–50 nt separated by 0–5 nt that have the same composition
G-quadruplex repeat X X Count (log10) for human, content for bat Cer et al. (2011) Four blocks, with the same number of G bases (from 3 to 7), separated by 1–7 nt
Inverted repeat X X Content Cer et al. (2011) Two consecutive DNA sequences (10–100 nt long) separated by 0–100 nt that are palindromic on the same strand. They may fold back and generate double helices
Mirror repeat X X Count Cer et al. (2011) Two perfect repeats of 10–100 nt separated by 0–100 nt on the same strand
Triplex motif X Count (log10) Cer et al. (2011) Similar to Mirror repeat but the repeat can contain only purines or pyrimidines on the same strand, the separation is at most 8 nt. These sequences can form three-stranded isoforms
Z-DNA repeat X X Content Cer et al. (2011) Five or more tandem repeats, each with alternating pyrimidine–purine dinucleotide motif in which the pattern YG is maintained on one of the DNA strands. These motifs can take the Z-DNA conformation
DNA sequence Mononucleotide STR X X Content Genome-wide screen All repeats of one nucleotide with length >9 bp
Dinucleotide STR X X Content Genome-wide screen All repeats of two nucleotide motifs with length >10 bp
Trinucleotide STR X X Content Genome-wide screen All repeats of three nucleotide motifs with length >12 bp
Tetranucleotide STR X X Content Genome-wide screen All repeats of four nucleotide motifs with length >16 bp
TRF X Content (log10) Repeats identified by the program Tandem Repeat Finder (Benson 1999)
SINE X X Count (log10) for human, content for bat UCSC Genome Browser Transposable element (Short INterspersed Elements)
LINE X X Count for human, content for bat UCSC Genome Browser Transposable element (Long INnterspersed Elements)
L1 target sequence X X Count Cost et al. (2002) Sequence associated to target primed reverse transcription which is characteristic of L1 and Alu mobilization
Telomere hexamer sequence X X Count Morrish et al. (2007); Nergadze et al. (2007) Sequence associated with DSB and repair by telomerases, and L1 retrotransposition as well
CpG dinucleotides X Content Genome-wide screen Proportion of CpG dinucleotides in each window
Gene or exon X X Content UCSC Genome Browser All annotated genes or exons in these genomes
Most conserved elements X Count Siepel et al. (2005) Regions of the genome that contain functional elements identified by comparative genomics
Expression and regulation Chromatin accessibility X Count (sqrt) UCSC Genome Browser Regions of the genome digested by DNAse I reflecting active chromatin
RNA polymerase II occupancy X Content UCSC Genome Browser Regions of the genome transcribed by RNA polymerase II
DNA methylation X Count (sqrt) Down et al. (2008) Epigenetic modification that can modify the chromatin structure and regulate gene expression
CpG islands X X Content (sqrt) for human, count for bat UCSC Genome Browser Regions of the genome rich in GC close to promoters
Recombination Recombination hotspots X Count (log10) Myers et al. (2005) Predicted recombination hotspots using SNP data
Position on the chromosome Distance to telomere X Distance in bp (sqrt) Genome-wide screen Distance from the tip of the chromosome to each genome window defined here
Distance to centromere X Distance in bp (sqrt) Genome-wide screen Distance from the centromere to each genome window defined here
Replication Replication timing X Weighted average (sqrt) Ryba et al. (2010) Genome-wide microarray data measuring time in which replication occurs in hESC (human embryonic stem cells)

aCount is the number of each feature in a particular window. Content is the fraction of a particular window that is occupied by a feature. Weighted average is used when several data intervals overlap within a window border.