Skip to main content
. Author manuscript; available in PMC: 2007 Apr 2.
Published in final edited form as: Gene. 2006 Jan 10;365:11–20. doi: 10.1016/j.gene.2005.09.031

Table 2.

Motif frequency for the first nick sites for the Alu insertion polymorphisms

Count a Motif (number of sites in genome b/site usage c)
78 ttAAAA (6844770/1.14)
61 atAAAA (6115727/0.97)
37 ctAAAA (3427545/1.08)
36 atAAGA (2040242/1.76)
35 ttAAGA (2248938/1.55)
32 aaAAAA (8006008/0.40) taAAAA (7446893/0.43)
23 gtAAGA (1266992/1.82)
20 ttAAAG (2636960/0.75) caAAAA(5905656/0.34)
19 aaAAGA (5553248/0.34)
18 gtAAAA (2525766/0.58)
17 (2940505/0.58) ctAAGA (1418507/1.20)
13 caAAGA (2875842/0.45)
12 gaAAAA (5451601/0.22)
11 aaAGAA (6061956/0.18)
10 taAAAG (3079886/0.32) atAAAG (2801071/0.36) atAGAA (2476795/0.40) tgAAAA (4505577/0.22)
9 gaAAGA (3257715/0.28) ctAGAA (1925300/0.47)
agAAAA (6851406/0.13)
8 ctAAAG (1465413/0.48) aaAAAG (5450737/0.15)
7 taAGAA (2719394/0.26) gaAGAA (3200992/0.22)
atGAAA (3793799/0.18)
6 ccAAAA (2952060/0.20) tgAAGA (2585632/0.23)
ttAGAA (2603330/0.23)
5 gaAAAG gtAAAG acAAAA tcAAAA atAATA ttGAAA
gtAGAA
4 tcAGAA gcAAAG ctAAAT aaAAAT ttAAAT acAAGA
agAAGA tcAAAG tgAAAG ttTAAA caAAAG
3
ggAAAA tgAGAA ggAAAG gtAATA ccAAGA atAAAT
ggAGAA atAACA tcAAGA ctGAAA aaAATG acAAAG
ttAATA acAGAA
2 aaAAGT gcAAAA gaGAAA gtTAAA taAAAT caAATA
cgAAAA aaATAT ttAACA aaGTTA aaTAAA aaGAAA
ggGAAA atTAAA ggAAGA
1 taAAGC atACAA taAAGG aaATTG cgCTTT ttGGGA
aaAGTT agAGAA aaTGAT acCTTC aaAAAC
agGAGA gaGCCC taAAAC acTAAA gtGAAA caAGAA
atAAAC tcAATA agAATA aaATGA ttAAAC
aaGTCA agGAAA atAGGA atAGGC gcAGAA tgAGCA
tcGAAT aaCCAC caAATC gtAAGG aaAACT
acAAGC aaAGAG agCTGT agTTGT aaGCAG caGAAA
gaAAAC ccAAAT tgGGGG ctAATA aaGGTC
atTAGA ctTAAA taTTTA agATTC atAGAT gtAAAC
aaAATA ttATAA ttTAGA taAATA aaAATT
aaATCA caAAGG agAAAG ccAGAA taGAAA taAATT
agAAAT aaATCT aaTTGG gcAAGA ttCAAA
atTAAT aaGTGC aaCACA aaAAGC atGCCT ggCCTA
agATGT tgTATT aaACAT
a

Occurrence of each motif among the 800 polymorphic Alu loci.

b

The occurrence of the motif in the human genome based on UCSC hg15, with both strands considered. The second and third bases in the motif represent the first nick site by EN. For motif “aaAAAA”, the count in the genome does not include all possibility by shifting 1 bp each time in a run of “A”. Instead, in the case of “A” runs, the count refers to the number of possible shifts by 6-bp each time. The eight sites following the “NT-AARA” motif are underlined.

c

Site usage represents the ratio of observed occurrence in every 1×105 sites. Site counts and site usage are only shown for sites with more than 5 occurrences among the 800 polymorphic Alu loci.