Table 1.
Repeat sequence family | Fraction of genome (%) | Typical unit (encoded proteins) | Typical length kb | Ref. (Accession nos†) |
---|---|---|---|---|
LINEs | 21 | P□□A/T | 6–8* | [14–16] |
Long interspersed elements(e.g. LINE1) | (17·4) | (reverse transcriptase and endonuclease) | (M22333) | |
SINEs | 13·6 | PA/T | 0·1–0·3 | [17,18] |
Short interspersed elements (e.g. Alu) | (10·7) | (non-protein coding) | (L35531) | |
LTRs | 8·6 | ⇒□□□⇒ | 1·5–11 | [3,7,9,19,69] |
Long-terminal repeats(e.g. HERV class I +II + III, MaLR III) | (4·8, 3·8) | (reverse transcriptase, protease, RNAse H and integrase) | (AY208136, M14123, AF020092, U07856) | |
DNA transposons | 3·0 | ⇒□□⇐ | 0·08–3·0 | [17] |
(e.g. MER-1 Charlie) | (1·4) | (transposonase) | (L13659) | |
Unclassified | 0·15 | |||
Total | Circa 46·4% |
Further details of human repeat sequences can be found at http://www.girinst.org/Repbase_Update.html. Note HERV are contained within the repeat sequence LTRs.
Reverse transcription having primed at the 3′ end often fails to proceed to the 5′ end. so many LINEs are shorter than 1kb. SINEs share a common 3′ sequence for reverse transcription so active SINEs can exploit a LINE reverse transcriptase. SINEs and LINEs may also show short flanking repeat sequences (e.g. 5′TTAAAA/3′AATTTT) which act as signal sequences for integration [17–20]. Code; ⇒: Repeat sequence, Ρ: RNA polymerase promoter (LINE RNA pol II, SINE RNA pol III), A/T: polyA/polyT sequences, □□: open reading frame (ORF).