Repetitive sequences, especially transposon-derived interspersed repetitive elements, account for a large fraction of the genome in most eukaryotes. Despite the repetitive nature, these transposable elements display quantitative and qualitative differences even among species of the same lineage. Although transposable elements contribute greatly as a driving force to the biological diversity during evolution, they can induce embryonic lethality and genetic disorders as a result of insertional mutagenesis and genomic rearrangement. Temporary relaxation of the epigenetic control of retrotransposons during early germline development opens a risky window that can allow retrotransposons to escape from host constraints and to propagate abundantly in the host genome. Because germline mutations caused by retrotransposon activation are heritable and thus can be deleterious to the offspring, an adaptive strategy has evolved in host cells, especially in the germline. In this review, we will attempt to summarize general defense mechanisms deployed by the eukaryotic genome, with an emphasis on pathways utilized by the male germline to confer retrotransposon silencing.
Keywords: epigenetics, genetics, germline, infertility, small noncoding RNAs, spermatogenesis, transposon
A comprehensive review of the currently known and possible mechanisms underlying the control of transposable elements during germline development is presented.
Protein-coding genes make up only a small portion of the genome in most organisms, and the remainder of the genome is largely occupied by repetitive sequences that do not appear to encode mRNAs for protein production and were thus once called junk DNA [1, 2]. These noncoding DNA sequences can be generally categorized into two classes: one is represented by tandem repeats, such as the major satellite repeats, minor satellite repeats, and microsatellite repeats; the other is the interspersed repeats, most of which are derived from the transposable elements (TEs) [3, 4]. Transposable elements, also called selfish elements or parasitic elements, are mobile DNA segments present in the genomes of organisms from prokaryotes to eukaryotes [4, 5]. Transposable elements can move and replicate themselves in the host genome through a process called transposition [5]. To defend the genome integrity, host cells have developed mechanisms that can prevent TEs from propagating and transposing [6]. In general, DNA methylation and heterochromatin formation are the primary mechanisms responsible for TE control at transcriptional levels [7, 8]. Small RNA-mediated degradation is an efficient way of restraining TEs at posttranscriptional levels [9]. Recent discoveries have drawn our attention to the star molecules, PIWI-interacting RNAs (piRNAs; initially named rasiRNAs), which exclusively function in the germline to suppress TEs probably at both transcriptional and posttranscriptional levels [10–12].
Transposable Elements as Main Building Blocks of the Eukaryotic Genome
Eukaryotic genomes with rare exceptions are suffused with tandem repetitive sequences in the centromeric and telomeric regions and interspersed repetitive sequences throughout the genome [13]. Whole genome sequencing analyses reveal that most of these repetitive DNAs result from the activity of TEs [14, 15] and that the proportion of TEs in a genome correlates with the haploid genome size of the organism, for example, the proportion of TEs in the short-tailed opossum is 52% of the genome, 45%∼50% in primates, 39%∼40% in the mouse and rat, 34% in the dog, 10% in the domestic chicken, 7% in the medaka, and 3%∼4% in pufferfish species, including Takifugu rubripes and Tetraodon nigroviridis [13, 16–19], suggesting that TEs are the basic constituents of the genome.
Transposable elements can generally be categorized into two classes according to the mechanisms of transposition: retrotransposons (class I) require an RNA transcript as an intermediate, propagating in a copy-and-paste manner, and DNA transposons (class II) are excised directly, propagating in a cut-and-paste fashion (Fig. 1, A and B) [2, 6]. Most of mammalian genomes are occupied by the retrotransposons that can be further subcategorized into long terminal repeat (LTR) retrotransposons with LTRs at both ends, and non-LTR retrotransposons, including LINEs (long interspersed nucleotide elements), SINEs (short interspersed nucleotide elements) and SVAs (named after SINE, VNTR [variable number of tandem repeats], and Alu) (Fig. 1, C–E) [2, 6, 20]. LTR retrotransposons, also known as the endogenous retrovirus (ERV), can encode two proteins, POL and GAG, but lack the coding sequence of ENV protein that is critical for the full life cycle of the retrovirus [2, 6, 20]. LINE1, with the typical full-length structure being shown in Figure 1C, is capable of translating two ORFs: ORF1, encoding a RNA-binding protein, and ORF2, encoding two enzymes—reverse transcriptase and endonuclease—that are essential for autonomous movement of LINEs [21–23]. The insidious promoter residing in the 5′-untranslated region (UTR) drives the transcription of the non-LTR LINE by DNA polymerase II, whereas SINEs, of which a typical representative is Alu (Fig. 1D), comprise ∼18% of the human genome and are short DNA sequence that cannot encode any functional reverse transcriptase protein and thus rely on other mobile elements for transposition [24–26]. SVAs (Fig. 1E) contain two typical sequence motifs, namely Alu-like and SINE-R, which are separated by a VNTR sequence [27, 28]. Most TEs are kept in a quiescent state, while a small fraction (<0.05% of TEs in humans), however, still remains active in eukaryotic genomes (Table 1) [29–34]. In humans, currently known active TEs include the autonomous LINE1 (L1) element and the nonautonomous SINE and SVAs [32].
FIG. 1. .
Schematic depiction of different classes of transposable elements (TEs) in the mammalian genome. Transposable elements can be divided overall into DNA transposons (class II; A) and retrotransposons (class I; B–E), according to their sequence features and mode of transposition. Most mammalian genomes are occupied by the retrotransposons, which can be further subcategorized into long terminal repeat (LTR) retrotransposons with LTRs at both ends (B) and non-LTR retrotransposons, including LINEs (long interspersed nucleotide elements) (C), SINEs (short interspersed nucleotide element) (D), and SVAs (E). LTR retrotransposons, also known as endogenous retrovirus (ERV), can encode two proteins, POL and GAG, but lack the coding sequence of ENV protein that is critical for the full life cycle of the retrovirus. LINE1, with the typical full-length structure being shown in C, is capable of translating two ORFs: ORF1, encoding a RNA-binding protein, and ORF2, encoding two enzymes—reverse transcriptase and endonuclease—that are essential for autonomous movement of LINEs. The insidious promoter residing in the 5′-UTR drives the transcription of non-LTR LINE by DNA polymerase II, whereas SINEs, of which a typical representative is Alu (D), comprises ∼18% of the human genome and are short DNA sequence that cannot encode any functional reverse transcriptase protein and thus rely on other mobile elements for transposition. SVAs (E) contain two typical sequence motifs, namely, Alu-like and SINE-R, which are separated by a variable number of tandem repeats (VNTR) locus. To date, active TEs currently known include LINEs, SINEs and SVAs. LINEs can transpose through autonomous mobilization, whereas both SINEs and SVAs depend on LINE1 activity for nonautonomous transposition.
TABLE 1. .
Distribution of mobile DNA elements in the genomes of three organisms.
Several thousand are active.
80∼100 elements among 500 000 copies are still active.
Transposable elements have been regarded as parasites of the genome because they benefit at the expense of the host, and thus, TEs are also considered as selfish DNA [5, 35, 36]. Transposable element activity has resulted in broad variations in quantity, diversity, and chromosomal loci that are often found between or sometimes even within species [5, 35, 36]. Despite the selfish and parasitic nature, the movement and accumulation of TEs may have contributed to the creation of new genes and the formation of a complex regulatory network system in the host genome [8]. Transposable elements may also serve as building blocks for the assembly of a variety of regulatory systems to direct and coordinate eukaryotic gene expression or gene silencing under selective pressure during evolution, both at the level of single genes and across larger chromosomal regions [5, 8]. For example, >16% of eutherian-specific conserved noncoding elements are derived from various classes of TEs [37]. As a supply source, TEs can provide regulatory elements to the genome in diverse manners, such as producing a new promoter or a binding site for microRNA (miRNA) or RNA-binding proteins, introducing a new cis-regulatory element or alternative polyadenylation sites, driving antisense transcription, or disrupting the existing regulatory elements [8, 38]. A survey of mammalian genomes has revealed that 25% of human promoter regions consist of sequences that are derived originally from mobile DNA elements for the control of gene expression [36]. Alu, belonging to the SINE family, is the most abundant repetitive element group in the human genome (Table 1). Alu elements are found in >5% of human 3′-UTRs and are probable miRNA targets, suggesting that they have coevolved with their hosts [39, 40].
Although mobile elements have contributed greatly to the evolution of the host genome, they can indeed cause damage, once activated, to the host genome by generating insertional mutations or DNA rearrangement [41, 42]. Above all, if TE-induced genetic disruptions occur in the germline [43], these mutations will be passed onto the next generation and thus the impact becomes transgenerational. Therefore, the activity of mobile elements must be under tight control by the host defense system to maintain the integrity and function of the host genome. Indeed, most of the eukaryotic organisms appear to have evolved elaborate and efficient mechanisms to cope with the presence and activity of mobile DNA elements.
Developmental Reprogramming Opens a Risky Window for TE Escape
Reprogramming refers to the capability of a nucleus to modify epigenetic marks on its chromatin (e.g., DNA methylation and histone modifications) and thus control specific gene expression profiles (e.g., levels of both coding and noncoding RNAs) at a specific developmental stage [44]. There are at least two such events during mammalian development: one occurs during fetal germline development, and the other takes place during preimplantation embryonic development (Fig. 2).
FIG. 2. .
Schematic representation of two genomic reprograming events during murine development. Genomic methylation levels (vertical axis) during preimplantation and fetal germ cell development (horizontal axis) in mice are shown. Blue and red lines represent paternal and maternal genomes, respectively. Dashed lines represent the progression of demethylation. Developmental time points are indicated on the horizontal axis. A) Fertilization: demethylation of the paternal genome initiates immediately after the formation of the pronucleus, while methylation of the maternal genome is unchanged. B) First cleavage: demethylation of the maternal genome commences, whereas the paternal genome demethylation is complete prior to the first cleavage. C) Blastomere stage: there is a notable passive demethylation in the maternal genome concomitant with embryo cleavages between stages B and C. D) At 3.5 dpc: the embryo is at the blastocyst stage. Both paternal and maternal genomes start the remethylation process. E) At ∼4.5 dpc: methylation levels of both paternal and maternal genomes peak in the embryo. F) At 7.5 dpc: PGCs appear in the allantois as a population of ∼40 cells and continue to proliferate and migrate toward the gonad. G) At 10.5–11.5 dpc: PGCs reach the genital ridge, and demethylation occurs in the PGC genome. H) At 13.5 dpc: PGCs cease proliferation, and sex-specific gonad differentiation is being completed. PGCs become prospermatogonia in the testis and oogonia in the ovary. Prospermatogonia continue to proliferate mitotically while oogonia proceed to meiosis. I) At 15.5 dpc: de novo methylation continues in prospermatogonia while oogonia become oocytes that are in prophase I of meiosis. J) At 17.5 dpc: methylation levels of prospermatogonia almost peak at 17.5 dpc. K) Birth: de novo remethylation of maternal genome commences at growing oocyte close to or shortly after birth in females.
In the first case, reprogramming occurs in primordial germ cells (PGCs) during fetal germ cell development. At 7.5 dpc (days postcoitum) in the mouse, PGCs first appear in the epiblast. These cells proceed to proliferate and migrate into the genital ridge between 10.5 and 11.5 dpc, where they continue to multiply to increase the PGC population until 13.5 dpc [45]. Genomewide DNA demethylation commences as early as ∼7.5 dpc and terminates before 13.5 dpc in PGCs of both male and female mice [19, 45]. During this process, most, if not all, DNA methylation marks are removed including those of differentially methylated regions (DMRs) of imprinted genes, TEs, and repetitive satellite DNA repeats [19, 46, 47]. It is unclear to what extent the demethylation takes place in other regions across the genome [48–50]. Once the genome of male or female PGCs has been demethylated, these cells enter mitotic (male) or meiotic (female) arrest. After several days, genomewide DNA remethylation will initiate (Fig. 2). Interestingly, this de novo remethylation commences earlier (∼13.5 dpc) in the male germ cells and proceeds until the reentry of mitosis at Postnatal Day 3 [48, 49]. In the female, however, remethylation in oocytes starts close to or shortly after birth and continues during folliculogenesis after puberty [48–51] (Fig. 2). As the oocyte growth is a lengthy developmental process, remethylation occurs in different chromosomal regions at different stages during folliculogenesis [51].
The second reprogramming takes place during preimplantation embryonic development, in which demethylation commences soon after fertilization through either an active (male) or a passive (female) mechanism [19, 51]. During sperm-egg fusion, the paternal genome undergoes a rapid transformation inside the oocyte cytoplasm in which remodeling of the sperm-derived chromatin takes place via the removal of protamines and the replacement by acetylated histones [19, 51]. During this process, genomewide demethylation occurs and proceeds until DNA replication starts [52]. In contrast, genomewide demethylation of the maternal genome occurs later compared to the paternal genome through a passive mechanism, that is, the number of methylated DNA sites will be reduced with cell proliferation as a result of semiconservative DNA replication [45] (Fig. 2). This differential pattern of demethylation between paternal and maternal genomes may suggest that some paternal gene products are required for early embryonic development because the rapid reprogramming can eliminate repressive epigenetic marks (e.g., methylation) and thus induce gene expression. Intriguingly, most, if not all, of the germline imprinted genes and TEs are not subject to demethylation during this preimplantation reprograming, suggesting that a protective mechanism has evolved to maintain the silenced state of imprinted genes and TEs [45]. Despite these interesting observations, the underlying mechanism remains elusive. Taken together, the two reprograming events during mammalian development open a risky time window for TEs to escape from the control of the host genome, especially when the genome is mostly demethylated and the remethylation has not yet completed.
Male Germinal Granules Are Involved in TE Suppression
Most germline cells possess a cytoplasmic structure known as the germinal granule, or nuage, which is probably conserved in many eukaryotic organisms with different names, such as the polar granule in Drosophila [53], the P granule in Caenorhabditis elegans and Xenopus [54], the intermitochondrial cement and the chromatoid body in mice and humans [55]. Germinal granules are rife with membrane-free amorphous ribonucleoprotein (RNP) aggregates, which are localized to the cytoplasmic side of the nuclear envelope [56, 57]. In several lower eukaryotic organisms, whether embryonic cells develop into germ or soma is dependent upon maternally inherited factors during the early stages of embryogenesis [58]. In Drosophila, maternal germline determinants are concentrated in the so-called germ plasm in female oocytes, and further assemble into dense body, known as germinal granules, in the germ plasm [58, 59]. Because of the asymmetric partition of both the germinal granule and germ plasm into daughter cells during cell division, prospective cells that possess germ granules will contribute to the germline specification, indicating that germinal granules are tightly associated with the establishment and differentiation of germline cells [59]. In mammals, PGCs are derived from a population of epiblast cells induced by stimulation signals produced from the surrounding somatic cells [45]. Thus far, no evidence supports a maternal contribution to PGC determination because no accumulation of nuage-like organelles was ever observed in either mature oocytes or zygotes [45]. During male germline development in the mouse, two types of germinal granules have been identified. The first has been termed intermitochondrial cement because these granules are localized between mitochondria in the cytoplasm of prospermatogonia, type A spermatogonia, and late pachytene spermatocytes [55]. The second type of male germinal granules, termed chromatoid body, is found to localize to the perinuclear regions in late pachytene spermatocytes and round spermatids [55] (Fig. 3). Although the precise molecular composition of these two types of granules remains an enigma, the data available so far have found that they indeed share numerous common components, such as RNA helicase Vasa (MVH), RNA decapping enzymes (DCPs), Argonaute family proteins (AGO1-4), Tudor domain-containing proteins (TDRD1–9), Sm (spliceosomal complex protein), and small RNAs [60, 61] (Table 2). Interestingly, these components are all involved in RNA fate control [62, 63]. Recent studies on components of the germinal granule link its function to the repression of retrotransposons based on observations that some RNA-silencing pathway components are colocalized and concentrated in the nuage [59] and that knockout (KO) mice lacking components of the RNA interference (RNAi) pathway, for example, Mvh [64] and maelstrom (Mael) [65], display TE derepression. In addition, it has been reported that targeting ORF1p and ORF2p proteins as well as L1 RNA of LINE1 retrotransposon to the stress granule, which shares components similar to the germinal granule, is one of the proposed mechanisms of controlling retrotransposons [66–68].
FIG. 3. .
Male germinal granules, including the intermitochondrial cement and the chromatoid body, are present in the cytoplasm during male germline development in the mouse. Spermatogenic cells containing the intermitochondrial cement are framed with a blue dashed line, whereas the red dashed-line frame indicates spermatogenic cells possessing the chromatoid body. Note that both the intermitochondrial cement and the chromatoid body are present in late pachytene spermatocytes.
TABLE 2. .
Summary of murine germinal granule- and piRNA pathway-associated components that are involved in TE repression.
In male mice, the accumulating evidence points to the pivotal role of the germinal granule in the repression of transposable elements. GASZ, a germ cell protein with Ankyrin repeats, colocalizes in the intermitochondrial cement in the prospermatogonia with the Argonaute subfamily protein MILI (PIWIL2) during early germ line development [69]. Gasz-null testes show increased hypomethylation and elevated expression of retrotransposons that is similar to the MILI KO testes [70]. Spermatogenesis is blocked at the zygotene to pachytene transition, resulting in male sterility [69]. Maelstrom is a highly conserved protein found predominantly in female germline nuage of Drosophila [71]. While expressed in the male germline in the mouse, maelstrom accumulates throughout the cytoplasm and in prominent perinuclear nuage in late pachytene and diplotene spermatocytes and colocalizes with other germinal granule components: MIWI (PIWIL1), TDRD1, and MVH [65, 68]. Mael-null testes display severe LINE1 derepression, and spermatocytes exhibit massive DNA damage. L1 RNP aggregates at the onset of meiosis, indicating that MAEL is a component of the germinal granule in the male germline and is indispensable for TE silencing [65, 68] (Table 2). MILI (PIWIL2) and MIWI2 (PIWIL4) localize to two distinct cytoplasmic foci in fetal prospermatogonia in mice [72]. The first type of granule, termed pi-bodies, constitutes the MILI-TDRD1 module of the piRNA pathway and is likely analogous to intermitochondrial cement present in early fetal development. The second type, containing the MIWI2-TDRD9-MAEL complex, has been shown to correspond to a subpopulation of processing bodies (P-body) as verified by immunostaining using P-body markers DCP1A, DDX6, XRN1, and TNRC6A (GW182); this granule has subsequently been named piP-bodies [72, 73]. Because the PIWI-piRNA pathway is critical to the establishment and maintenance of transposon silencing, the colocalization of both distinct complexes to the germinal granules suggests a close relationship between the nuage and TE silencing [72–74]. Given the complex molecular composition of germinal granules during germ cell development, analyses of the transcriptome and proteome of these male germinal granules will help us gain more insights into the function of this enigmatic structure [75, 76].
Transposable elements, especially LINE1 retroelements, are interspersed throughout the host genome, and TE derepression can compromise the integrity of the host genome by insertional mutations, alternative splicing driven from different promoters, antisense transcripts that can trigger RNAi or interferon response, and dysregulation of gene expression as a result of nearby abnormal chromatin context [30]. Transposable elements have the inherent ability to replicate and propagate within the host genome so as to make their fitness better than that of their host. Given that TEs can be relieved as a result of germline reprogramming and TE-inflicted deleterious effects can be transmitted to and thus affect the next generation, germline cells are particularly under selective pressure to curb TE activity by coevolving complex strategies, which we will discuss.
DNA Methylation
To date, it has been well accepted that DNA methylation represents the primary mechanism of transposon suppression in the host genome [7, 8]. The majority of cytosine methylations reside in transposons that are in a silenced state in mice. In turn, demethylation will recover TE activity [7, 8]. Methylation of cytosine-5 at the CpG dinucleotide has been commonly found in eukaryotic organisms. DNA methylation, the most basic and frequent epigenetic modification, often leads to gene silencing, which is apparent for genes subject to X-chromosome inactivation [77], for imprinted genes [7, 8], and for a large number of TEs that belong to interspersed repetitive elements occupying up to 50% of the human genome [78]. Promoters of retrotransposons are silenced in prospermatogonia in the male and meiotic dictyotene oocytes in the female [79].
In mammals, five members of the DNA cytosine-5 methyltransferase family (DNMT1, DNMT2, DNMT3A, DNMT3B, and DNMT3L), which share similar sequence motifs, are known to be responsible for de novo methylation or the maintenance of methylation of corresponding DNA substrates [80, 81] (Table 2). However, only three members (DNMT1, DNMT3A, and DNMT3B) have been identified as active methyltransferase in vivo and in vitro through biochemical and genetic studies [78]. DNMT1 contains a carboxyl-terminal domain that is closely related to the bacterial cytosine-5 restriction methyltransferases and an amino-terminal domain that possesses multiple regulatory functions, including maintenance of methylation, import into the nuclei, inhibition of de novo methylation, and regulation of cell cycle-dependent protein degradation [78]. Dnmt1-deficient mouse embryos died at the head-fold stage, with severely demethylated genomes, reduced monoallelic expression at most imprinted loci, and elevated levels of intracisternal A particle (IAP) retrotransposon transcripts [82, 83]. DNMT1 appears to be most active against hemimethylated DNA that is produced from semiconservative DNA replication and appears to predominantly function as a maintenance methyltransferase [84]. However, there is also evidence showing overlapping functions in de novo methylation between DNMT1 and DNMT3 [85, 86]. In the male, DNMT1 is detected in spermatogonial stem cells (SSCs), spermatogonia, preleptotene, and leptotene spermatocytes, but not in pachytene spermatocytes, despite the fact that meiotic germ cells at all stages express Dnmt1 mRNA, suggesting that the translation of Dnmt1 must be subjected to posttranscriptional modification [87] (Table 2). DNMT1 can silence postmigratory germ cell-specific genes by DNA methylation in both premigratory germ cells and somatic cells [88]. Meanwhile, Dnmt1 knockdown can induce rapid apoptosis of germline stem cells [89]. All of these lines of evidence are consistent with the role of DNMT1 in establishing and maintaining the methylation pattern in the male germline cells and responsible for the silencing of repetitive elements [83].
DNMT3L, a truncated form strikingly distinct from other two DNMT3 family members, lacks motifs conserved in the DNA methyltransferase family and is thus deficient in enzymatic activity [90]. It is required for de novo methylation of imprinted genes in oocytes and for transposon repression in male germ cells [90]. In male germ cells, full-length DNMT3L protein is encoded by a transcript expressed at a stage where retrotransposons undergo de novo methylation (Fig. 4). A truncated form of DNMT3L is produced from a second promoter residing in intron 9 of Dnmt3l gene in late spermatocytes, whereas only one full-length, oocyte-specific promoter-derived DNMT3L has been found in growing oocytes [90, 91]. The peak of expression levels of DNMT3L and DNMT3A overlaps during a critical time window (E15.5–E17.5) when genomewide de novo DNA methylation commences, supporting the interplay of these two methyltransferases during prenatal male germline development [86] (Table 2 and Fig. 4).
FIG. 4. .
Schematic summary of expression profiles of the PIWI-piRNAs pathway components and currently known male germ cell-associated epigenetic modifiers during male germline development. Reprogramming occurs in fetal male germ cells (primordial germ cells and prospermatogonia) between 7.5 dpc and birth. DNMT3L is responsible for de novo methylation, while DNMT3A/B serve as the major maintenance methyltransferases. PIWI family proteins associate with respective TDRD members, such as MILI-TDRD1and MIWI2-TDRD9, and localize to different subcellular compartments. Moreover, these proteins most probably complex with other cofactors, such as MOV10L1, MAEL, GASZ, and HSPA2 (HSP70-2), to fulfill their functions in piRNA production and TE suppression. The height of the thick lines represents the relative expression levels of a protein at the stages indicated. Prospg, prospermatogonia; SSC, spermatogonial stem cell; Spg, spermatogonia; L, leptotene spermatocytes; P, pachytene spermatocytes; D, diplotene spermatocytes; Spd, spermatids.
Dnmt3l-deficient postnatal spermatogonia showed hypomethylated patterns not only in the paternal DMRs but also in the repetitive elements such as IAP, LINE1, and SINEB1 [92]. DNMT3L can interact with DNMT3A and DNMT3B in vitro and is assumed to target them to the DMRs of imprinted genes, thereby facilitating genomewide de novo methylation during germline development [85]. DNMT3A and DNMT3B are regarded as de novo methyltransferases in the establishment of the genomewide methylation patterns and both proteins are localized to germ cell nuclei at the stage when de novo methylation is ongoing [92]. Mice with germline conditional deletion of Dnmt3a show an impaired spermatogenesis and a lack of methylation, similar to the phenotype of Dnmt3l-null mice, suggesting both DNMT3A and DNMT3L are essential for the establishment of methylation patterns (Table 2). Interestingly, DNMT2 has long been regarded as a conserved member of methyltransferases family, and it appears to play a role in RNA methylation in zebrafish because Dnmt2-null zebrafish display reduced RNA methylation levels as well as developmental defects [93, 94]. However, the exact function of Dnmt2 in the mammals remains to be elucidated [95].
Heterochromatin Formation
Similar to the immune defense system, safeguarding the genome from invaders like TEs is essential for the survival of the host. One of the defense mechanisms to counteract and silence those genomic parasites is to package DNA sequence containing repetitive elements into transcriptionally inactive heterochromatin.
The genome of eukaryotes can be roughly partitioned into regions composed of euchromatin that are rich in active coding genes and regions composed of heterochromatin that remain condensed and are much less transcriptionally active [96]. Heterochromatin is associated with telomeres and centromeric regions of chromosomes in which large segments of the genome are packaged in a permanently inactive form known as the constitutive chromatin [96]. Further characterization later identified another kind of heterochromatin, which is represented by genomic regions that exhibit a heterochromatin form of packaging associated with cellular development and differentiation termed facultative heterochromatin [97, 98]. Heterochromatin regions are predominantly comprised of repetitive DNA, including highly tandem satellite DNA and interspersed repetitive sequences including TEs [7, 99].
It has long been known that heterochromatin silencing is mostly dependent on the covalent modifications present on either DNA or the conserved core histones. Various combinations of modification patterns on histones are viewed as the information code or the histone code, and still remains poorly understood. Some previous data have unveiled the relationship between histone modifications and the epigenetic state. Histone H3 methylation on lys-4, lys-36, and lys-79 has been implicated in active gene expression, whereas methylation of histone H3 on lys-9 and lys-27 and histone H4 on lys-20 are closely related to gene silencing [100]. Acetylation of histones H3/4 by histone acetyltransferase loses the chromatin structure and promotes polymerase II-mediated gene activation. In turn, deacetylation by histone deacetylase leads to gene silencing [101–104]. Histone phosphorylation takes place at the serine residues of all core histones and usually correlates with up-regulated gene expression [101–104]. The impact of histone ubiquitination varies and appears to be dependent on each core histone that is modified [100–104].
In the germline, histone modification patterns in PGCs fluctuate concomitantly with developmental reprogramming. For example, levels of H3K9me reach their highest level at E7.5 and gradually decrease thereafter. From E7.5 onward, however, both the repressive marks of histone modification (e.g., H3K27me and H3K9me) and the activating forms of modification (e.g., H3K4me and H3K9ac) start to emerge simultaneously [52, 105–107]. In addition, previous data show that histone modification can be influenced by the preexisting epigenetic marks of the histone, for example, H3K9me and H3K4me are mutually exclusively expressed. In addition, some effector complexes can recognize dual histone modification marks [108]. Given that global demethylation occurs during reprogramming when higher risks are posed by the repetitive elements, a diverse combination of histone modifications and the cross-talk among various epigenetic marks may be responsible for the repression of the target DNA segment in the genome through distinct pathways [108–111]. During mouse spermatogenesis, the dynamic distribution of distinct histone-carried covalent modifications has been well characterized. Levels of both H3K4me (H3K4 methylation at lys-4) and H3/4ac (acetylated H3/4) peak in spermatogonial stem cells with a sharp decline when meiosis initiates [112–114], suggesting that these histone modifications direct the expression of meiosis-associated genes. At the stage of spermatocytes, the increase in expression levels of H3K9me and H3K27me is reminiscent of the silencing mechanisms required for the meiotic division [112–114]. While meiosis is completed, levels of H3K4me and H3/4ac go up again, presumably ensuring the supply of proteins essential for the proper histone to protamine exchange during spermiogenesis [115].
Two interesting phenomena are worth mentioning. First, repetitiveness within TEs alone is sufficient for heterochromatin formation. In Drosophila, bulk repetitive DNA sequences that apparently share nothing in common are packaged into heterochromatin, giving rise to the notion that the features themselves residing in the repeats could initiate the formation of heterochromatin [116]. Previous studies have demonstrated that as few as three copies of P elements in the fly genome can suffice to induce position-effect variegation (PEV) silencing, a phenomenon occurring when a gene is inserted juxtaposed to regions of heterochromatin [116]. PEV-induced gene silencing can encompass genes that are hundreds of kilobases away from the heterochromatin/euchromatin boundary and can be explained as the local formation of heterochromatin around the genes through the recruitment of a large number of modifier components [117, 118]. Increases in copy number of P elements result in enhanced silencing, while reversing a transposon within a transgene array would cause the strengthening of silencing, suggesting that the highly repetitive elements and inverted repeats produced by the neighboring inverted repeat-containing transposons are more likely to trigger the assembly of the heterochromatin state [119].
Second, the feature buried within some repetitive elements can bind specific DNA-interacting proteins (trans-acting factors) to guide the heterochromatin formation. L1 elements are dispersed throughout the genome with high contents of CpG islands in the promoter region [120]. Retinoblastoma (Rb) family members can bind CpG-rich regions in the genome with high affinity and mediate gene silencing through the recruitment of corepressor proteins that modify the chromatin epigenetically [121]. Recent data also demonstrate that Rb protein can bind the L1 element promoter in vivo and in vitro, while Rb-null cells exhibit aberrant histone methylation patterns in pericentromeric heterochromatin regions [120]. Ultimately, it has been postulated that L1 elements may function as a master regulator of the chromatin structure through heterochromatin silencing of discrete chromosomal regions where structural genes that are regulated in a cell cycle-dependent fashion reside [120]. It is possible that Rb functions as a regulator of epigenetic silencer of L1 elements in wild-type cells through the recruitment of corepressors (e.g., histone deacetylase or histone methyltransferase) to perform the function of histone deacetylation/methylation and/or DNA methylation [122, 123].
The RNA Interference Pathway
RNAi is a highly conserved gene-silencing mechanism triggered by processing double-stranded RNAs (dsRNAs) into single stranded RNAs (ssRNAs) that are subsequently loaded into effector complexes to modulate gene expression in a sequence-specific fashion [124]. In mammals, three distinct classes of small RNAs, including miRNAs, endogenous small interfering RNAs (endo-siRNAs), and piRNAs, have been identified based upon differences in size, source of biogenesis, and mode of action [125, 126]. Here in this section, we mainly focus on the TE control function of siRNAs and miRNAs, both of which exploit DICER to produce mature ssRNAs, whereas piRNAs, the most intriguing class of small RNAs confined to the germline, will be discussed separately in the section to follow.
Small interfering RNAs refers to synthetic siRNAs or those derived from synthetic dsRNAs introduced into the host cells. These exogenous siRNAs bind to their complementary sequences in target mRNAs and induce degradation or repress translation. Recent reports demonstrate that many eukaryotes can generate endo-siRNAs, which are generally derived from the retrotransposons, pseudogenes, and other genomic regions that can produce transcripts capable of forming dsRNA structures [127–132]. In plants and lower animals, endo-siRNAs have been hypothesized to protect host genome from invading TEs mainly through two mechanisms in eukaryotic organisms: one is RNA-directed DNA demethylation (RdDM) [133–135], and the other is RNA-mediated heterochromatin formation [136–141]. In both situations, the siRNAs function as guide molecules to direct associated protein factors to target specific genomic regions by DNA cytosine methylation or promoting the formation of heterochromatin. In the first instance, initially discovered in viroid-infected tobacco, dsRNA-harboring sequences homologous to promoter regions can elicit promoter methylation and gene silencing by methylation of not only symmetrical CpG dinucleotides, but also non-CpG nucleotides [133]. In plants, the proposed model involving RdDM points to a positive feedback mechanism through which RNAi is amplified by the synthesis of additional dsRNAs using the target as a template and catalyzed by RNA-directed RNA polymerase (RdRP), which is conserved in fission yeast, Neurospora crassa, C. elegans, and plants, but not found in flies and mammals [134, 135]. This is a stepwise process in which site-specific DNA methyltransferase (MET1) can be recruited to the target-specific DNA regions directed by RNAi-induced transcriptional-silencing complex (RITS) for de novo cytosine methylation, and then histone modification helps maintain the silencing state, which can otherwise be lost by a passive or active demethylation process [136].
How do siRNAs recognize target-specific regions in the genome? Currently, two potential pathways have been proposed. The first model states that an siRNA incorporated into a RITS can base pair with its complementary strand in the genome, implying the necessity of unwinding DNA helix to facilitate the interaction. In the second model, the RITS complex can be recruited to specific DNA by base pairing of an siRNA with nascent transcript RNA [137, 138]. This pathway highlights the role of RdRP, but it remains unknown whether this pathway operates in mammals. Recent data show that some of the known components of RdDM are present in mammals as well [139, 140], suggesting the significance and generality of this mechanism in genome defense and transposon silencing.
The role of the RNAi pathway has been linked to heterochromatin formation with the discovery that defects in the RNAi pathway components can cause aberrant assembly of heterochromatin. First identified in fission yeast, RNAi mutants deficient in components of the RNAi pathway, such as Argonaute, Dicer, and RdRP, display the derepression of centromeric outer–transposon repeats, suggesting the role of siRNAs in targeting chromatin modification for heterochromatin formation [130, 142, 143]. Further evidence came from the observation in which a transgene inserted in the centromeric repeats, which normally exhibit a silent state, was activated in RNAi mutant Drosophila [144]. These data led to the proposed model for RNAi-mediated heterochromatin formation: the inverted repeats located in the centromeric region generate dsRNA molecules that are then processed into effector siRNAs by DICER. Such siRNAs might be amplified by synthetic activity of RdRP, and subsequently guide the histone methyltransferases (HMTs) to the respective sites of the chromosome to catalyze methylation of H3K9, a typical hallmark of heterochromatin [145]. The modified form of H3 can recruit HP1 and other associated corepressors to maintain the heterochromatin state [143]. Furthermore, apart from the centromeric heterochromatin silencing, RNAi has also been implicated in the targeting of interstitial sites of euchromatin (noncentromeric repeats) for silencing in a polycomb-dependent manner through methylation of H3K27 rather than H3K9 [142, 146]. The transcription of an introduced hairpin RNA can trigger silencing of homologous sequences throughout the host genome, and retrotransposons with LTRs have been demonstrated to participate in this pathway [147, 148]. Commonly, RNAi-mediated methylation of cytosine is confined to the region of RNA-DNA homology, with a minimum target DNA size of ∼30 bp [133], whereas in RNAi-directed heterochromatin assembly, it can spread over 7∼8 kilobases from the siRNA targeted sites [149]. It should be noted that these two pathways might interconnect to some extent based upon some common components. But the notion that the DNA methylation always precedes histone modification is inconclusive because this sequence of events is reversed in N. crassa, in which methylation of H3K9 is a prerequisite for all cytosine methylation [150].
MicroRNAs, with a size between 18 to 25 nucleotides, represent a highly conserved small regulatory RNA family present from prokaryotes to eukaryotes [151]. Biogenesis of miRNAs is a multi-step process through which the stem-loop hairpin RNA precursors encoded by the host genome are cleaved by DICER into mature miRNAs, which are subsequently incorporated into their effector complexes [152]. Expression of miRNAs is subject to tight spatiotemporal regulation and has been linked to suppressing gene expression by completely or partially complementary base pairing to the target mRNAs in most organisms at the posttranscriptional level [151]. Abnormal expression of miRNAs has been demonstrated to be a cause of developmental deformity and tumorigenesis in humans and animals [153, 154]. In mice, a set of miRNAs, such as miR-17-92 cluster (MIRC1) and miR-290-295 cluster (MIRC5), are highly expressed in primordial germ cells and subject to developmental regulation [155, 156]. Previously, it has been demonstrated that disruption of Dicer result in the up-regulation of retrotransposons in preimplantation embryos and embryonic stem (ES) cells, indicating that miRNAs play significant roles in the repression of retrotransposons [157, 158]. In the germline, however, Dicer-null PGCs unexpectedly exhibit down-regulated expression of LINE1 and IAP to levels that are comparable to those seen in Dicer-null neonatal prospermatogonia, suggesting that DICER is dispensable for the suppression of retrotransposons in male germ cells [155]. Given that many miRNAs are derived from transposons and the prevalence of retrotransposon sequence in the UTR regions of mammalian mRNAs is high [152, 159, 160], miRNAs might have a function similar to that of other small RNA species (e.g., siRNA and piRNAs) in suppressing transposable elements and regulating mRNAs at posttranscriptional levels.
Compared to the high degree of evolutionary conservation of miRNAs that are present from prokaryotes to eukaryotes, endo-siRNAs have so far only been identified in a few organisms. For example, endo-siRNAs exist not only in somatic cells but also in the germline in Drosophila [129, 132], whereas in the mouse only spermatogenic cells [132], oocytes [128, 161], and ES cells [131] have been shown to express endo-siRNAs, although no RdRP activity has been found in these cell types [128, 161]. The lack of endo-siRNA expression among diverse organisms and tissues may signify overlapping functions between endo-siRNAs and other small RNA pathways.
The PIWI-piRNA Pathway
Increasing lines of evidence point to an indispensable role of the PIWI-piRNA pathway in the regulation of germline development in most organisms, from fission yeasts to humans [10]. Argonaute proteins, also known as PAZ-PIWI domain proteins, consist of two main subfamilies, namely AGO and PIWI, and the latter is expressed exclusively in the germline and associates with a diverse class of germline-specific small RNAs, called piRNAs, to control the activity of mobile elements from invertebrates to mammals [10–12]. In Drosophila, there are three members in the PIWI family associated with piRNAs, including AGO3, which tightly binds sense strand piRNAs, AUB, and PIWI, which binds antisense strand piRNAs [74, 162, 163]. This binding preference was proposed to produce distinct piRNAs through a ping-pong amplification model based on the sequence characteristic of the piRNAs that they bind, most of which start with a 5′-end uridine and have an adenine at the 10th position [74, 162, 163]. This amplification mechanism highlights the continuing recycling of PIWI components in recognition and excision of target RNAs into mature piRNAs. However, it remains unclear how the preexisting primary piRNAs are transcribed from the host cells and initiate the amplification cycle. A recent study reports that mice MITOPLD (official symbol PLD6 and ortholog of Drosophila Zucchini), a conserved member of the phospholipase D superfamily, is important for primary piRNA biogenesis [164, 165]. Pld6 KO mice display notable demethylation and derepression of retroelements concurrent with meiotic arrest and severe defects in primary piRNA generation [164, 165]. In the mouse, there are three PIWI subfamily members: MIWI, MILI, and MIWI2, alternatively named PIWIL1, PIWIL2, and PIWIL4, respectively, in the official nomenclature. The expression of MILI and MIWI2 overlaps in prospermatogonia during the early germline development from 13.5 dpc to 3 days postpartum [166–168] (Fig. 4). The timing coincides with the temporal window of de novo genomewide remethylation in the nondividing prospermatogonia of the male germline, whereas MIWI is expressed predominantly in the meiotic spermatocytes and round spermatids [166–168] (Table 2). Both Piwil2 and Piwl4 KO mice display elevated expression of retrotransposons (IAP and LINE1) and spermatogenetic arrest at meiotic prophase similar to Dnmt3l KO mice, causing male sterility [70, 73]. MILI and MIWI2 have been, therefore, proposed to direct de novo methylation by recruiting other protein factors, including DNMT3L, via an unidentified mechanism [92, 169]. Interestingly, fetal mouse testes also exhibit high proportion of piRNAs with uridine at the first position and adenine at the 10th [166–168]. In addition, piRNAs expressed in the fetal gonad display sequence complementarity between MILI- and MIWI2-bound piRNA populations, suggesting a similar ping-pong amplification cycle may function in the male germline [166]. However, piRNAs expressed in postnatal testes/ovaries do not appear to result from the ping-pong mechanism [166–168], suggesting the existence of other mechanisms in piRNAs biogenesis.
It appears that piRNAs not only participate in the degradation of cytoplasmic repetitive elements-derived RNA at posttranscriptional levels in a manner similar to miRNAs and siRNAs, but also are involved in the recruitment of repressive machinery to guide DNA methylation and heterochromatin formation and thus regulate gene expression at transcriptional levels in the germline [10] (Fig. 5). In Drosophila, large piRNA clusters act as master regulators of transposon silencing [170], whereas in mammals, transposon copies interspersed in the genome initiate the cycling of piRNAs biogenesis. This produces primary piRNAs corresponding to antisense strand of DNA, which predominantly join the MILI complex in the cytoplasm. MIWI2-associated piRNAs have a bias for sequences matching to the sense strand of transposons; hence MIWI2 has been suggested to function in a fashion similar to AGO3 in flies. Furthermore, additional evidence suggests that there is sequence complementarity at the first 10 nucleotides between MILI- and MIWI2- associated piRNAs. Taking together, these data raise the possibility of a ping-pong mechanism being adopted in mammals [167, 168]. Once mature piRNAs are produced, different sets of piRNAs are subsequently sorted into respective effector complexes wherein PIWI proteins reside. Recent reports reveal that both MILI and MIWI2 localize to distinct germinal granules, cytoplasmic RNA, and protein aggregates lacking membrane that are present in close proximity to the nuclear envelope of germ cells [72]. Sense strand piRNAs encoded by the genomic piRNA cluster often bind to the MILI-TDRD1 complex and localize within intermitochondrial cement in prospermatogonia known as pi-bodies, whereas antisense piRNAs recognize the MIWI-TDRD9 complex and have been shown to constitute the granular structure together with other related proteins including Maelstrom in piP-bodies, which also contain signature proteins of RNA-processing bodies (or P bodies) enriched in translationally inactive mRNAs in somatic cells (Fig. 5) [72]. Furthermore, contrary to pi-bodies, which mainly reside in cytoplasm, piP-bodies pathway components can transport between the cytoplasm and nucleus, and hence may be able to target the genome DNA inside the nucleus. To date, it has been demonstrated that both granules share some common components, and different sets of piRNAs emanating from respective granules can interchange if needed at specific timing or stage [72]. The antisense piRNAs that form complexes with piP-bodies translocating from the cytoplasmic compartment to the nucleus could facilitate de novo methylation and the construction of repressive chromatin states through recruitment of other epigenetic modifiers as discussed above. The PIWI-piRNA pathway thus appears to be able to convert a posttranscriptional response to TE reanimation into a silent state of transcriptional suppression. But it remains unclear what other significant effectors or cofactors are involved in this process, how the different piRNAs pathway components are recruited to corresponding complexes and cooperate to function, and finally how the piRNAs direct the core effector complex to specific genomic regions for DNA methylation and/or histone modification.
FIG. 5. .
Subcellular localization of components of the PIWI-piRNA pathway and the proposed mechanism of TE suppression by this pathway. MILI associates with TDRD1 and other cofactors localized to pi-bodies, whereas MIWI2 binds TDRD9 and several other P body components localized to piP-bodies. Sense and antisense piRNAs produced from both granules can exchange through the presumptive ping-pong cycle to enhance piRNAs production. MIWI2 can enter the nucleus by recruiting other epigenetic modifiers such as DNMT3L, CBX5 (HP1a), and HMTs (histone methyltransferases). The resulting secondary antisense piRNAs can presumably function to suppress transposons via two mechanisms: 1) degradation of the retrotranscripts by recognizing complementary RNA in an RNA:RNA hybrid at the posttranscriptional level and 2) recruiting epigenetic repressors responsible for DNA methylation and heterochromatin formation via an unidentified mechanism.
It is noteworthy that several Tudor domain family proteins have been demonstrated to tightly interact with specific PIWI family proteins by recognizing the symmetrical dimethyl arginines and have therefore been suggested as the docking sites for PIWI-like proteins [171, 172]. Supporting this notion, KO mice deficient in these Tudor domain-containing proteins display enhanced retrotransposon activation and male sterility [73, 173, 174]. There are considerable differences among these members although all of them possess at least one Tudor domain. For example, TDRD1 contains four Tudor domains and a zinc finger motif, whereas TDRD9 contains only one Tudor domain combined with an additional ATPase/DExH-type helicase domain, indicating that TDRDs serve not only as scaffolds whereby PIWI assemble but also contain an additional function through an unidentified mechanism in combination with other protein factors [168, 175]. A good example is MAEL, a murine homolog of a Drosophila nuage protein maelstrom; MAEL has recently been shown to localize to the piP-bodies [72]. Mael KO mice exhibit LINE1 derepression and spermatogenic defects as seen in Piwil4 (Miwi2)- and Tdrd9-null mice, with the exception that DNA methylation at LINE1 loci is only moderately impaired in prospermatogonia at 16.5 dpc, suggesting that MAEL facilitates MIWI2-TDRD9-mediated piRNAs pathway [65, 176]. TDRD7, a member of the TUDOR family [62], has recently been found essential for nuage/chromatoid body formation and LINE1 suppression, and interestingly, these actions are independent of piRNA biogenesis [177]. Furthermore, in mice, another requisite component in the piRNA pathway is MOV10L1. Two recent reports demonstrate that MOV10L1 is able to substantially restrict the activity of mouse IAP and LINE1 by interacting with MILI and MIWI2 [178, 179] (Table 2).
Taken together, there are generally two views on the mechanism by which piRNAs modulate TE activities (Fig. 5). First, piRNAs could target complementary retrotranscripts encoded by the host retrotransposons in an RNA: RNA hybrid manner and thus degrade the aberrant TE transcripts. Second, the antisense piRNAs also can recognize the partner DNA strand via an RNA: DNA hybrid to guide the recruitment of repressive chromatic modifiers such as DNA methyltransferases and histone deacetylases. Some recent reports appear to support the latter view [180, 181]. Given the dynamic changes and cooperation among different germinal granules in the germline, deciphering the complete proteomic profile, including core effector proteins and possibly the small RNA composition, will help shed light on the hidden side of the PIWI-piRNA pathway and also on the nuage function in transposon suppression and germline specification.
The main theme of the male germline defense system is to suppress TEs using the mechanism reviewed above. However, two waves of TE activation during the male germline development have been observed [19, 46, 182]. The first one coincides with the reprograming event in PGCs in the fetal testes [19, 46, 182], whereas the second wave is observed in meiotic male germ cells in the adult mouse testes [19, 46, 182, 183]. The global demethylation may explain the transient increase in TE activity in PGCs. However, a functional role of this TE activation cannot be completely excluded. This transiently elevated TE activity is quickly suppressed in the subsequent de novo remethylation. Although we now know that the PIWI-piRNA pathway is required for the remethylation process [70, 73, 167], the molecular processes through which piRNAs induce targeted remethylation of TEs remain elusive. How do piRNAs attract and guide the methylation machineries to the TE loci? Do those demethylated and transcribed TEs produce any proteins products during the transient activation? What is the fate of those TE-derived piRNAs? Are those TE-derived piRNAs required for the maintenance of the hypermethylated status of TEs? Further study is needed to answer these critical questions.
Reprograming events that occur in PGCs and in preimplantation embryos have been linked to the germline fate determination and the acquisition of pluripotency, respectively [19, 45, 50]. The PGC reprograming wipes off all DNA methylation marks, including those of TEs [19, 45]. In contrast, the preimplantation embryonic reprograming erases most of the DNA methylation marks except those on imprinted genes and TEs [19]. This may explain why the PIWI-piRNA pathway does not function during preimplantation embryonic development [12, 73]. However, the molecular mechanism underlying the difference in these two global de novo demethylation and remethylation processes remains unknown. It would be interesting to investigate how TEs escape the global demethylation and maintain their hypermethylated status in preimplantation embryos.
The second wave of TE activation in the male germline has been observed in the meiotic phase of spermatogenesis in adult testes [182, 183]. The physiological significance of this phenomenon remains unknown. Theoretically, TEs should be entirely suppressed at this stage of germ cell development because disruption of the genome will jeopardize fertility and TE-inflicted genetic defects could be transmitted to the next generation, exerting deleterious effects on the offspring. A mild activation of TEs in meiotic male germ cells may imply an essential role of TEs in meiotic cellular events (e.g., formation of the synaptonemal complex or crossover/homologous recombination). However, because TEs exist in large numbers interspersed throughout the genome with variations in their sequences, it is very difficult to identify which ones are activated and function to contribute to the cellular functions. For the same reason, it is close to impossible to inactivate those activated TEs and then study the physiological effects. Given that the PIWI-piRNA pathway can produce TE-derived piRNAs that induce TE suppression, it may be feasible to introduce the PIWI-piRNA pathway components into the meiotic phase of spermatogenesis and then observe the effects on meiosis, assuming that the transient TE activation is suppressed.
In summary, the male germline suppression of TE activity is achieved through both the epigenetic mechanism that includes DNA methylation and heterochromatin formation as well as small RNAs-mediated posttranscriptional regulation. In addition, the temporal TE activation during the meiotic phase of spermatogenesis suggests a role of TEs in this specific stage of germ cell development, which may warrant further study to elucidate its physiological significance.
We would like to thank Keegan R. Idler for editing the text. We apologize to colleagues whose relevant work is not cited here because of the page limitation.
Research in the Yan Laboratory is supported by National Institutes of Health grants HD050281 and HD060858.
