Abstract
Transposable elements (TEs) can affect the structure of genomes through their acquisition and transposition of novel DNA sequences. The 134-bp repetitive elements, Lep1, are conserved non-autonomous Helitrons in lepidopteran genomes that have characteristic 5′-CT and 3′-CTAY nucleotide termini, a 3′-terminal hairpin structure, a 5′- and 3′-subterminal inverted repeat (SIR), and integrations that occur between AT or TT nucleotides. Lep1 Helitrons have acquired and propagated sequences downstream of their 3′-CTAY termini that are 57–344-bp in length and have termini composed of a 3′-CTRR preceded by a 3′-hairpin structure and a region complementary to the 5′-SIR (3′-SIRb). Features of both the Lep1 Helitron and multiple acquired sequences indicate that secondary structures at the 3′-terminus may have a role in rolling circle replication or genome integration mechanisms, and are a prerequisite for novel end creation by Helitron-like TEs. The preferential integration of Lep1 Helitrons in proximity to gene-coding regions results in the creation of genetic novelty that is shown to impact gene structure and function through the introduction of novel exon sequence (exon shuffling). These findings are important in understanding the structural requirements of genomic DNA sequences that are acquired and transposed by Helitron-like TEs.
Keywords: Helitron, sequence gain, genome rearrangement
1. Introduction
Helitrons are class II transposable elements (TEs) that are proposed to propagate by rolling circle replication (RCR) at the DNA level1 and are dependent upon the function of Replicase/Helicase (RepHel) proteins for autonomous transposition.1,2 Helitrons show a high degree of sequence plasticity, such that computational predications mainly rely upon the identification of conserved 5′-CT and 3′-CTRR termini,3 a 6–20-bp stem-loop structure near the 3′-terminus,4,5 and predicted integration between AT1 or TT nucleotides.3 Non-autonomous Helitrons are small in size due to lack of an internal protein-coding region, but often remain mobile through the retention of functional trans-acting RepHel proteins.1,2 A high copy number of non-autonomous Helitrons within genomes likely results from evasion of host repression.6 The observed paucities in sequence variation among Helitrons appears indicative of recent bursts in transposition.7,8 Certain groups of class II DNA transposons and Helitrons integrate frequently in proximity to protein-coding regions9 and can affect the structure and function of genes and gene products.7,10 These integrations can result in the modification of transcriptional efficiency,11 as well as introduce transcript splice variation and polyadenylation sites, changes in transcription start and stop sites, and incorporation of novel exon sequence.12,13
Helitrons are potent modifiers of genome structure and function due to frequent acquisition and transposition of host genomic DNA, which oftentimes results in exon shuffling or the duplication of gene sequences.4,14–18 The mechanism by which Helitrons acquire novel sequence remains largely unknown,19 but is hypothesized to occur at the DNA level due to transposition of both intron and exon sequences. Furthermore, Helitrons are often chimeric constructs that have acquired DNA from multiple independent loci,16,17 which may occur by a step-wise addition of novel 5′- and 3′-ends that are compatible with the minimal requirements for functioning during RCR.6,10,20 These instances indicate that class II TEs and Helitrons participate in the rearrangement and duplication of genome regions and contribute to the evolution of novel eukaryotic genome functions.21,22
TE integration and excision mutations cause phenotypic variation at the individual and population scale,23,24 may be contributing factors to speciation events25,26 and provide the genetic novelties for local adaptation via natural selection.27 The insect order Lepidoptera contains the second largest number of species on earth, some of which cause widespread damage to crop plants. Short repetitive sequences and TEs are important players in the generation of genetic diversity and evolution of Lepidoptera due to integrations within genes,28,29 but the genome-wide affects of TE-derived mutations upon genetic and phenotypic variation remain relatively unknown. The silkmoth, Bombyx mori, is the lepidopteran model species whose ∼420 Mb genome sequence assembly is composed of ∼43.6% repetitive DNA,30 with most being <500 bp.31,32 Nearly 13% of the B. mori genome comprise short interspersed nuclear elements (SINEs),33 which are class I TEs derived from tRNA, 5S rRNA or 7SL RNA-like sequences that propagate by retrotransposition.34 In contrast, DNA-based class II TEs occupy ∼3% of the B. mori genome (Helitrons 0.1%),33 and are mostly located within introns and non-coding DNA of Lepidoptera.35 This dearth of TE knowledge among lepidopteran species has resulted in difficulty in interpreting their role in the generation of structural and functional genome variance.
The Lep1 box is a conserved 99-bp repetitive DNA sequence originally described within intron and untranslated regions from eight lepidopteran species.36 The Lep1 box retains homology +10 to −50 of the core repeat, which was later described as the 134-bp lepidopteran-specific common sequence 3 (LSCS3).37 In the following, we annotate the Lep1 element as a Helitron-like TE and indicate that multiple novel DNA sequences have been acquired and propagated by the Lep1 Helitron. The conservation of predicted secondary structures between the ancestral Helitron and an acquired terminal region offers significant insight into the mode of Helitron propagation and features within the acquired sequence required for propagation. Lep1 Helitrons and acquired sequences co-localize with gene-coding regions in the B. mori genome, cause structural gene mutations, and are important mediators of genome-wide mutation.
2. Materials and methods
2.1. Annotation of the Lep1 Helitron
All annotations are made with respect to the reverse complement of the LSCS3 sequence37 that includes the 99-bp Lep1 box36 and from hereon is referred to as Lep1. Sequences showing homology to Lep1 were retrieved from the GenBank non-redundant (nr) nucleotide database via a BLASTn search that used Lep1 as the query (conducted 11-04-2010), with output filtered for ≥70% homology over ≥100 bp. Similarly, GenBank dbEST accessions for species of Lepidoptera were downloaded in FASTA format (10 November 2010), imported into a local databases using BioEdit,38 queried using Lep1, and results filtered and aligned as described previously. All lepidopteran DNA sequence accessions that passed filter criteria were downloaded in FASTA format, imported into the MEGA 5.0 sequence alignment application,39 and a multiple sequence alignment was made using the ClustalW algorithm (default parameters: gap opening penalty, 15; gap extension penalty, 6.66; weight matrix IUB, and transition weight, 0.5) in the MEGA 5.0 alignment module.39
Sequence homologies flanking the Lep1 sequence were identified by performing an ‘all vs. all’ search using the BLASTn algorithm using all nr and dbEST ‘hits’ to Lep1. The regions of intraspecific DNA sequence homology were extracted from accessions using a custom PERL script, and then used as input into the Mfold DNA secondary structure server40 (http://mfold.rna.albany.edu/?q=mfold/DNA-Folding-Form) with the partial function and pair probabilities = 25°C.
2.2. Estimation of Lep1 genome copy number and distribution
Scaffolds from the B. mori whole genome sequence build v. 2.3 were downloaded from Kaikobase (http://sgp.dna.affrc.go.jp/pubdata/genomicsequences.html; file assembledset.txt.gz) and imported into a local database using BioEdit.38 Build v. 2.3 was searched with Lep1 as the query using the BLASTn algorithm and results filtered for ‘hits’ showing ≥80% similarity over ≥50 bp. The putative BmLep1 integration positions were called the BmLep1 model v. 2.3, which was then merged with positions of the B. mori assembly v. 2.3 gene models (file: glean_cds_on_chr.gff at http://sgp.dna.affrc.go.jp/pubdata/genomicsequences.html). The combined features were displayed using CMap.41 Sequence intervals for BmLep1 elements were retrieved from the assembled B. mori scaffold for chromosome 1 (Z chromosome) using a custom PERL script, reverse complements generated using the Sequence Manipulation Suite (SMS) at http://www.bioinformatics.org/sms2/rev_comp.html, and FASTA formatted text imported into MEGA 5.0.39 A multiple sequence alignment was made for all B. mori Lep1 elements using parameters described earlier, gamma parameter estimated, and a maximum likelihood-based estimation of Lep1 phylogenetic relationship made using the general time reversible model of sequence evolution. Nucleotide sites were chosen using a partial deletion of missing characters (cut-off = 0.05), and all possible trees were interrogated using the Close-Neighbor-Interchange heuristic. Node support was acquired using 1000 bootstrap pseudoreplicates reported within a strict consensus tree.
The frequency of Lep1 integrations within species of Lepidoptera was investigated using full bacterial artificial chromosome (BAC) insert sequences from Bicyclus anynana, B. mori, Heliconius melpomene, H. numata, Helicoverpa armigera, Papilio dardanus, and Spodoptera frugiperda. These sequences were downloaded from NCBI, imported into BioEdit,38 and a search for Lep1 positions performed as described previously. The mean frequency of Lep1 integrations were calculated manually and significance of frequency differences among species was assessed using F-statistics (significance threshold α = 0.05).
2.3. Predictions of haplotype variation caused by Lep1 elements
Nucleotide accessions from the NCBI nr database were used to query derived protein sequences within the nr protein database using the blastx algorithm and results filtered for ‘hits’ showing ≥50% identity. Proteins derived from orthologous lepidopteran genes that were not identified within the initial Lep1 screen (see 2.1), but present within the blastx output were compared manually to predict instances of copy number variation within or between species. Lep1copy number variation at orthologous loci among species was further investigated using integration/excision variation among cadherin gene sequences from B. mori (gene model: BTGIBMGA013616), H. armigera (GenBank accession: AY714876.1), and Ostrinia nubilalis (DQ000165.1). Sequences were imported into a local database in BioEdit,38 searched using with Lep1 as the query and alignment of exon 1, intron 1, and exon 2 using the MEGA 5.0 sequence alignment application39 was created as described previously.
2.4. Predictions of Lep1 modification to gene structure De novo
Acquisition of transcribed Lep1 sequences was performed for O. nubilalis, where total RNA was isolated from whole larvae using RNAagents kit (Promega, Madison, WI) and cDNA was synthesized using the SMART cDNA Synthesis Kit (Clontech, Mountain View, CA) according to manufacturer's instructions, RACE reactions using the CDSIII primer (Clontech) with OnLep1-f2 (5′-TAC TRA TAT TAT AAA GCT GAA GAG TT-3′) and SMART V with OnLep1-r (5′-GAT AAA TGG GCT ATC TAA CAC TGA AAG-3′), and amplified according to manufacturer's instructions. PCR fragments were cloned, and sequenced using T7 and SP6 primers, and resulting sequence data were assembled into contigs using CAPs42 as described previously. Contigs were annotated using the Blast2Go suite,43,44 where the GenBank nr protein database was interrogated using the BLASTx algorithm.
3. Results and discussion
3.1. Annotation of the Lep1 Helitron
A conserved 99-bp Lep1 box was previously described in eight lepidopteran species3,6 and later described as portion of a 134-bp consensus LSCS3.3,7 Our homology-based searches of the GenBank nr/nt database resulted in the estimation of 618 regions within 210 nucleotide sequence accessions from Lepidoptera that show ≥70.0% similarity to Lep1 (32 species; mean similarity: 80.3 ± 5.8% over 120.5 ± 19.0 bp; Supplementary Table S1). Eighteen of these Lep1-containing accessions were annotated as microsatellite loci and 51 Lep1s were within introns or untranslated regions of known genes. The remaining 526 Lep1s were within un-annotated sequences from BAC full inserts of the lepidopteran species B. mori, B. anynana, S. frugiperda, H. armigera, H. melpomene, H. erato and P. dardanus. An analogous search of the GenBank dbEST database identified 443 accessions from lepidopteran species that showed ≥71.1% interspecific similarity with 84 of these dbEST hits being ≥130 bp (Supplementary Table S2). Multiple sequence alignment of GenBank nr/nt and dbEST accessions that contained full-length Lep1 elements resulted in a 197-bp consensus that shared Helitron-like 5′-CT and 3′-CTRY termini,1 and were, respectively, designated as region H1 and region H2 of Lep1 (Fig. 1A). The nucleotides directly adjacent to the 5′- and 3′-ends consisted of TA or AA in 96.2% of predicted Lep1s and showed not discernable target site duplications which are consistent with previously described Helitron genomic integration events.1,3,7
Structural homology among Helitrons is often used for prediction and characterization, and is based upon a conserved 3′-stem-loop (hairpin) formed upstream of the 3′-CTRR terminus.4,5,8 We predicted that a 3′-stem-loop would form at the 3′ Lep1 terminus in B. mori accession D86623.1 [AATCTACATCATTCGCGAGTGACTTAGGCTA] (nucleotides involved in base pairing are double underlined) with a Gibbs free-energy change (ΔG) = −2. 82 kcal mol−1 (Figs 1A and 2A), as well as at the 3′-terminus other lepidopteran Lep1s (Fig. 1A; not all data shown) including H. melpomene (Fig. 2B; ΔG = –3.67 kcal mol−1). In addition to 5′-CT and 3′-CTRY termini, the formation of a 3′-stem-loop (hairpin) near the 3′-terminus are hallmarks of Helitron TEs and suggest important roles of these structures in RCR or genome integration.6,10,20 A second conserved stem-loop (hairpin) structure was predicted from B. mori and H. melpomene Lep1s that we designated the 5′-inverted repeat (5′-IR). This 5′-IR involves base pairing between portions of the 5′-subterminal inverted repeat (SIRa) and the microsatellite loop, and are analogous to the 5′-IR formed by the Helitron-like Drosophila interspersed nuclear element (DINE-1) family of TEs7 and the lepidopteran microsatellite associated interspersed nuclear element (MINE-1).35 Specifically, the B. mori GenBank accession D86623.1 is predicted to have a 7-bp 5′-IR formed by a portion of the 5′-SIRa sequence [ACTAATATT-7nt-GGAAAGATTTGTTT] (nucleotides involved in intramolecular base pairing are underlined; ΔG = –0.83 kcal mol−1; Fig. 2A), and the 6 bp 5′-IR in H. melpomene Lep1s are formed between later six nucleotides of the 5′-SIRa and the (GTTT)n microsatellite loop (ΔG = –0.99 kcal mol−1; Fig. 2B). The conservation of hairpin structures adjacent to both the 5′- and 3′-termini among 460 of 618 (74.4%) of Lep1s predicted among nr database accessions suggests a role in the RepHel protein recognition, nascent strand cleavage, or other portion of the RCR mechanism,5,7,10,18,20 but further investigation is required to elucidate their role.
Structural predictions for Lep1 further indicated that intramolecular base pairing may occur between nucleotides immediately downstream of the 5′-CT and a region 26–37 nt upstream of the 3′-CTRY terminus and involve interaction of the SIRa. Specifically, nucleotides of the 5′-SIRa from B. mori accession D86623.1 [ACTAATATT-7nt-ATAAAGATTTGTTT] are predicted to pair with the 3′-SIRa [AAAATTCTTTTCCATTAGA] located 49 bp downstream (nucleotides involved in base pairing are underlined in; ΔG = –5.92 kcal mol−1; Fig. 1A). Analogous structures were predicted from H. melpomene (Figs 1A and 2B) as well as all other full-length Lep1s (Fig. 1A; all data not shown). These molecular interactions of Lep1 contribute to an overall secondary structure that is stable at 25°C for B. mori (ΔG = –19.74 kcal mol−1) and H. melpomene (ΔG = –11.90 kcal mol−1; Fig. 2), and is analogous to the structure formed by the MINE-1 Helitron from Lepidoptera.35
3.2. Annotation of acquired end sequences
Helitrons modify the structure of genomes through the acquisition and transposition of novel DNA sequence that occurs by a largely unresolved mechanism.19,20 We described the consensus Lep1 element sequence from 31 accession for species of Lepidoptera that shared Helitron-like termini at the border of regions H1 and H2 (Fig. 1A), but also identified 62–342-bp sequence regions downstream of the 3′-CTRY terminus at the boundary of region H2 that are shared within a species. The novel shared sequences downstream of the 3′-CTRY were referred to as region H3. Specifically, the region H3 showed ≤62.3% similarity between species and ≥95.7% similarity within a species or closely related species such as between the 55- and 59-bp region H3 from Helicoverpa species H. armigera (GenBank accession: FP340429.1) and H. zea (EU327673.1; Fig. 1B). Among region H3 sequences from the same species or closely related species, sequence similarity terminated at a ubiquitous 3′-CTAG motif that was followed by a thymidine nucleotide (T; Figs 1B and 3), and only one variant of the H3 region was described within a species. The Lep1s predicted from the B. mori genome assembly shows two unique 87- or 335-bp sequences within region H3 that were, respectively, represented by GenBank accessions: DQ242656.1 and D86623.1 (Fig. 1B). The B. mori Lep1 Helitrons with an 87- or 335-bp region H3 were subsequently called BmLep1_87 and BmLep1_335 variants, respectively, and alignments showed no discernable homology (Fig. 3). Phylogenetic reconstruction of B. mori Lep1_87 and _335-bp Helitron variants from chromosome 1 suggested that two weakly supported clades may exist, which indicated that Lep1s have evolved independently within the B. mori genome through the acquisition of two different downstream sequences (Supplementary Fig. S2; gamma parameter rate distribution γ = 3.9894; Log likelihood = D = –3729.34). Independent gain of sequence mutations has previously been identified for Helitrons in the maize genome6,10,20 and indicate that arthropod genomes are also modified by Helitron movements.
Although the sequences in region H3 share little interspecific sequence similarity, the secondary structures predicted to form appear to be identical by state with those formed in region H2. Specifically, the 7–9-bp 3′-stem-loops (hairpins), respectively, formed in region H3 of BmLep1_87 and BmLep1_335 Helitrons (ΔG ≤ –7.90 kcal mol−1) are more highly stable compared with the analogous structure in the ancestral region H2 (ΔG ≤ –2.99 kcal mol−1; Fig. 3). This evidence may suggest that Helitrons are dependent upon a 3′-stem-loop directly upstream of the 3′-CTAG terminus for RCR function,5 and that acquired sequences undergo selection for the capacity to support propagation by RCR.20 A functional switch from use of ancestral to derived Lep1 Helitron ends may have been influenced by the comparatively higher stability we predicted for 3′-stem-loops within the Lep1 acquired sequence. Thereby, functional shifts could occur when equally or more efficient terminal structure are encountered by change within flanking DNA or swapped between other Helitrons. In contrast, reduction in 3′-stem-loop stability in region H2 also could have resulted from degradation following the relaxation of selective constraints, and would mirror the degradation of a Helitron-like 3′-CTAG terminus that followed accretion by maize Helitrons.20
Additionally, intramolecular base pairings we predicted between the 5′-SIRa (region H1) and 3′-SIRa (region H2; Fig. 1A) are analogously formed through interaction of nucleotides within the 5′-SIRa of the ancestral Lep1 region H1 and a 3′-SIRb within the derived region H3 (Fig. 1B). These interactions between the 5′-SIRb and 3′-SIRb in B. mori Lep1_87 and _335-bp Helitrons (ΔG ≤ −2.56 kcal mol−1) is lower than between the 5′-SIRa and 3′-SIRa (ΔG ≤ −5.92 kcal mol−1; Fig. 3), but remain consistent with the characteristic secondary structures described previously for insect Helitrons.7,45 The formation of a hairpin between the 5′-SIR and 3′-SIRs within both the ancestral and acquired regions indicates that, in addition to the 3′-stem-loop structure, the base pairing between SIRs at proximal and distal ends of Lep1 may potentially be required from RCR. Furthermore, the requirement for a 3′-SIR within the independently acquired DNA sequences in proximity to a 3′-CTAG novel terminal motif could suggest that gain of sequence mutations by Lep1 may be rare. This hypothesis could be supported by our description of two Helitron variations in the B. mori genome, which contrasts with the high number of cryptic Helitrons from the maize genome where functional constraints appear to be relaxed.17
The described accretion by lepidopteran Helitrons involves sequences that are relatively small and added in a unidirectional fashion. DNA sequence upstream of the single 5′-CT terminal motif showed no homology among integrations in lepidopteran genomes or conserved secondary structures that would be indicative of a chimeric Helitron.46 These observations are analogous to those described by Yang and Bennetzen.8 Lep1 was not shown to capture entire genes, but may be due to inability to accurately describe haplotype variation from available data resources or the possible culling of the mutations from genomes due to negative affects on genome function.47 Smaller non-autonomous Helitrons are better able to evade host repression6 or show greater replication efficiency within the RCR mechanism.48 Lep1s also appears to gain sequence only at the 3′ end, which contrasts with bidirectional end creation by maize Helitrons.20 This directionality of Lep1s may result from the preferential capture that is known to occur in the same orientation as the RepHel-coding sequence of the autonomous Helitron, but this cannot be ascertained until further research is preformed to identify the parent TE of the non-autonomous Lep1 element.
3.3. Lep1 copy number and genome distribution
The variation in non-autonomous Helitrons copy number among species of Lepidoptera may result from differential effects of replicative repression via DNA methylation,49 mutation-selection balance within the overall genome architecture,50 or random genetic drift at the population scale. We showed that Lep1 Helitron integration densities range from 1.04 × 10−5 to 1.8 × 10−4 based upon annotation of full BAC insert sequences. These estimates indicated that Lep1s have an ∼13-fold copy number difference across species' genomes (Table 1), and is comparable to the ∼11-fold variance observed among DINE-1 insertions within Drosophila genomes.7 The highest Lep1 copy number density was estimated for the butterfly P. dardanus, wherein Lep1 abundance was significantly higher than for all other species except for B. anynana (P-values ≥0.0089; Student's-t values not shown). The mean Lep1 integration density among butterflies (6.9 × 10−5 ± 6.8 × 10−5) is not significantly different from the densities estimated among moth species (mean: 2.5 × 10−5 ± 2.0 × 10−5; F-statistic = 1.4; d.f.num = 1; d.f.den = 2; P-value = 0.3583).
Table 1.
Estimates from BAC full-insert sequences. Respective GenBank accessions and positional information is provided in Supplementary Table S1.
The only whole genome sequence available for use to directly estimate the TE frequencies in Lepidoptera is the 432 Mb B. mori assembly (build v.2.3),30 from which we estimated 5541 putative BmLep1 integrations. Each putative BmLep1 Helitron was assigned a unique identifier (BmLep1_000001 to BmLep1_005541), and further categorized as containing either the BmLep1_87 or BmLep1_335 variant downstream sequence described previously (Supplementary Table S3). The ancestral regions of the Lep1 Helitron (regions H1 thru H2;) within build v. 2.3 showed ≥94.3 ± 1.9% similarity with the B. mori Lep1 elements from accessions DQ242656.1 and D86623.1 (Fig. 1B). This paucity of Lep1 sequence evolution within the B. mori genome may indicate a recent burst in transposition,7,8 or a high degree of functional conservation. The density of B. mori Lep1s estimated from the genome sequence (1.3 × 10−5) is ∼2-fold lower than that estimated from BAC full inserts (2.5 × 10−5; Table 1), and highlights the error that may be associated with subsampling from BAC sequences. Mapping the positions of BmLep1 onto chromosome assemblies indicated that integrations are co-localized with protein-coding genes (Supplementary Fig. S1; chromosome 1 shown in Fig. 4), which agrees with seminal evidence of Lep1 being within gene intervals.36 Furthermore, Lep1 proximity to B. mori gene-coding regions suggests that Lep1s may affect gene structure and function on a genome-wide scale.9,20 The effects of Lep1 Helitron integrations upon gene structure and function are for the first time presented in Section 3.4.
3.4. Lep1 elements modify gene structure
The movement and propagation of TEs introduce haplotype variation,51 and alter gene functions when integrated within coding regions of a genome.28 TEs are present within insect ESTs52 and mature transcripts of D. melanogaster,53 and can cause alternative the modification of cis-regulatory function,54 introduction of frameshift and premature stop codon mutations,28 and be involved in exon shuffling18 or insertion of introns.55 Evidence that Lep1s co-localize with protein-coding genes in B. mori genome (section 3.3) suggests that integrations may affect the structure and function of lepidopteran genes, and that presence/absence mutation at orthologous loci may be a source of function genetic variation among species. Comparison among orthologs of the cadherin gene from B. mori, H. armigera, and O. nubilalis genomes showed that Helitron copy number variation is present. Specifically, a nucleotide sequence with 68.7% similarity to the Lep1 consensus was predicted within the H. armigera cadherin intron 1, whereas orthologous introns from B. mori or O. nubilalis show no Lep1-like sequence (Fig. 5A). Although the effects of this integration upon gene function was not investigated, TE integrations within introns are known to affect splicing efficiencies56 and indicated that Lep1s are a source of genome copy number variation between lepidopteran species.
Analogously, Lep1 Helitron integrations were described within cDNA-RACE products from 46 O. nubilalis clones (29.6 kb total; mean insert size: 616.1 ± 244.5 bp; GenBank accessions: JG732059–JG732089; JG744027–JG744041). Sequence from RACE products were assembled into 8 contigs and 14 singletons (3.56 ± 1.94 reads per contig), and 21 of these contigs were subsequently annotated as having a Lep1 integration (Supplementary Fig. S3). Functional annotation of these contigs indicated that all transcript-derived O. nubilalis Lep1 Helitrons were within intron or untranslated regions (data not shown), with the exception of contig04. Contig04 was predicted to show 85% amino acid similarity to the S. frugiperda allatotropin neuropeptide (at2a; GenBank accession CAD98809.1) and that a Lep1 Helitron integration had occurred within the protein-coding regions in the O. nubilalis ortholog (Fig. 5B). When compared to the 53 aa S. frugiperda at2a gene sequence, the C-terminal 37 aa of the 64 residue O. nubilalis ortholog was predicted to be encoded by regions H2 and H3 of an integrated Lep1 Helitron (Fig. 5B). The integration inserted a novel protein-coding sequence that contains a TAA stop codon and changed the predicted molecular weight and isoelectric point of the O. nubilalis at2 protein (pI ∼10.87; 13.6 kDa) compared with that of S. frugiperda (pI = 11.4; 6.1 kDa). The affect of these changes on protein function was not investigated further, but indicated that the Lep1 Helitron can affect the structure and function of gene coding sequences in Lepidoptera.
In conclusion, a comparative genomics approach was used to identify novel sequences acquired by the highly conserved ancestral Lep1 Helitron. Although the primary sequence among gained sequences are variable, a conservation of secondary structures showed that sequence identity by state is an important factor in determining the success of acquired genomic regions for in subsequent transposition events within the genome. Lep1 provides insight into the structural requirements for RCR in animal Helitrons. Furthermore, the prevalence and preference of Lep1 integrations in proximity to gene-coding regions shows that this class of Helitrons impacts the structure and function of genomes in which they reside.
Supplementary data
Supplementary Data are available at www.dnaresearch.oxfordjournals.org.
Funding
This work was supported by the USDA Current Research Information System (CRIS) project number 3625-22000-017-00D.
Supplementary Material
Acknowledgements
This research was a joint contribution of the United States Department of Agriculture (USDA), Agricultural Research Service and the Iowa State University outreach station, Ames, IA (Project 3543). This article reports the results of research only. Mention of a proprietary product or service does not constitute an endorsement or recommendation by USDA or Iowa State University for its use.
References
- 1.Kapitonov V.V., Jurka J. Rolling-circle transposons in eukaryotes. Proc. Natl. Acad. Sci. USA. 2001;98:8714–9. doi: 10.1073/pnas.151269298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Feschotte C., Mouches C. Recent amplification of miniature inverted-repeat transposable elements in the vector mosquito Culex pipiens: characterization of the Mimo family. Gene. 2000;250:109–16. doi: 10.1016/s0378-1119(00)00187-6. [DOI] [PubMed] [Google Scholar]
- 3.Kapitonov V.V., Jurka J. Helitrons on a roll: eukaryotic rolling-circle transposons. Trends Genet. 2007;23:521–9. doi: 10.1016/j.tig.2007.08.004. [DOI] [PubMed] [Google Scholar]
- 4.Lai J., Li Y., Messing J., Dooner H.K. Gene movement by Helitron transposons contributes to haplotype variability of maize. Proc. Natl. Acad. Sci. USA. 2005;102:9068–73. doi: 10.1073/pnas.0502923102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Galagan J.E., Calvo S.E., Cuomo C., et al. Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzæ. Nature. 2005;438:1105–15. doi: 10.1038/nature04341. [DOI] [PubMed] [Google Scholar]
- 6.Donner H.K., Weil C.F. Give-and-take: interactions between DNA transposons and their host plant genomes. Curr. Opin. Genet. Dev. 2009;17:486–92. doi: 10.1016/j.gde.2007.08.010. [DOI] [PubMed] [Google Scholar]
- 7.Yang H.P., Barbash D.A. Abundant and species specific DINE-1 transposable elements in 12 Drosophila genomes. Genome Biol. 2008;9:R39. doi: 10.1186/gb-2008-9-2-r39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yang L., Bennetzen J.L. Structure-based discovery and description of plant and animal Helitrons. Proc. Natl. Acad. Sci. USA. 2009;106:12832–7. doi: 10.1073/pnas.0905563106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bureau T.E., Wessler S.R. Tourist: a large family of small inverted repeat elements frequently associated with maize genes. Plant Cell. 1992;4:1283–94. doi: 10.1105/tpc.4.10.1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Li Y., Donner H.K. Excision of Helitron transposons in maize. Genetics. 2009;182:399–402. doi: 10.1534/genetics.109.101527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ohmori Y., Abiko M., Horibata A., Hirano H.Y. A transposon, Ping, is integrated into intron 4 of the DROOPING LEAF gene of rice, weakly reducing its expression and causing a mild drooping leaf phenotype. Plant Cell Physiol. 2008;49:1176–84. doi: 10.1093/pcp/pcn093. [DOI] [PubMed] [Google Scholar]
- 12.Varagona M.J., Purugganan M., Wessler S.R. Alternative splicing induced by insertion of retrotransposons into the maize waxy gene. Plant Cell. 1992;4:811–20. doi: 10.1105/tpc.4.7.811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Benjak A., Boue S., Forneck A., Casacuberta J.M. Recent amplification and impact of MITEs on the genome of grapevine (Vitis vinifera L.) Genome Biol. Evol. 2009;1:75–84. doi: 10.1093/gbe/evp009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Feschotte C., Wessler S.R. Treasures in the attic: rolling circle transposons discovered in eukaryotic genomes. Proc. Natl. Acad. Sci. USA. 2001;98:8923–4. doi: 10.1073/pnas.171326198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Fu H., Dooner H.K. Intraspecific violation of genetic colinearity and its implications in maize. Proc. Natl. Acad. Sci. USA. 2002;99:9573–8. doi: 10.1073/pnas.132259199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Morgante M., Brunner S., Pea G., Fengler K., et al. Gene duplication and exon shuffling by Helitron-like transposons generate intraspecies diversity in maize. Nat. Genet. 2005;37:997–1002. doi: 10.1038/ng1615. [DOI] [PubMed] [Google Scholar]
- 17.Lal S.K., Hannah L.C. Helitrons contribute to the lack of gene colinearity observed in modern maize inbreds. Proc. Natl. Acad. Sci. USA. 2005;102:9993–4. doi: 10.1073/pnas.0504713102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Britten R.J. Coding sequences of functioning human genes derived entirely from mobile element sequences. Proc. Natl. Acad. Sci. USA. 2004;101:16825–30. doi: 10.1073/pnas.0406985101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bennetzen J.L. Transposable elements, gene creation and genome rearrangement in flowering plants. Curr. Opin. Gene Devel. 2005;15:621–7. doi: 10.1016/j.gde.2005.09.010. [DOI] [PubMed] [Google Scholar]
- 20.Yang L., Bennetzen J.L. Distribution, diversity, evolution, and survival of Helitrons in the maize genome. Proc. Natl. Acad. Sci. USA. 2009;106:19922–7. doi: 10.1073/pnas.0908008106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gupta S., Gallavotti A., Stryker G.A., et al. A novel class of Helitron-related transposable elements in maize contain portions of multiple pseudogenes. Plant Mol. Biol. 2005;57:115–27. doi: 10.1007/s11103-004-6636-z. [DOI] [PubMed] [Google Scholar]
- 22.Zhang J., Yu C., Pulletikurti V., et al. Alternative Ac/Ds transposition induces major chromosomal rearrangements in maize. Genes Dev. 2009;23:755–65. doi: 10.1101/gad.1776909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.McClintock B. The origin and behavior of mutable loci in maize. Proc. Natl. Acad. Sci. USA. 1950;36:344–55. doi: 10.1073/pnas.36.6.344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wendel J.F., Wessler S.R. Retrotransposon-mediated genome evolution on a local ecological scale. Proc. Natl. Acad. Sci. USA. 2000;97:6250–2. doi: 10.1073/pnas.97.12.6250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Binghamp M., Kidwell M.G., Rubin G.M. The molecular basis of PM hybrid dysgenesis: the role of the P element, a P-strain-specific transposon family. Cell. 1982;29:995–1004. doi: 10.1016/0092-8674(82)90463-9. [DOI] [PubMed] [Google Scholar]
- 26.Noor M.A.F., Chang A.S. Evolutionary genetics: jumping into a new species. Curr. Biol. 2006;16:R890–2. doi: 10.1016/j.cub.2006.09.022. [DOI] [PubMed] [Google Scholar]
- 27.Gonzalez J., Karasov T., Messer P.W., Petrov D.A. Genome-wide patterns of adaptation to temperate environments associated with transposable elements in Drosophila. PLoS Genet. 2010;6:e1000905. doi: 10.1371/journal.pgen.1000905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gahan L.J., Gould F., Heckel D.G. Identification of a gene associated with Bt resistance in Heliothis virescens. Science. 2001;293:857–60. doi: 10.1126/science.1060949. [DOI] [PubMed] [Google Scholar]
- 29.d'Alençon E., Sezutsu H., Legeai F., et al. Extensive synteny conservation of holocentric chromosomes in Lepidoptera despite high rates of local genome rearrangements. Proc. Natl. Acad. Sci. USA. 2010;107:7600–5. doi: 10.1073/pnas.0910413107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.International Silkworm Genome Consortium. The genome of a lepidopteran model insect, the silkworm Bombyx mori. Insect Biochem. Mol. Biol. 2008;38:1036–45. doi: 10.1016/j.ibmb.2008.11.004. [DOI] [PubMed] [Google Scholar]
- 31.Mita K., Kasahara M., Sasaki S., et al. The genome sequence of silkworm, Bombyx mori. DNA Res. 2004;11:27–36. doi: 10.1093/dnares/11.1.27. [DOI] [PubMed] [Google Scholar]
- 32.Xia Q., Zhou Z., Lu C., et al. A draft sequence for the genome of the domesticated silkworm (Bombyx mori) Science. 2004;306:1937–40. doi: 10.1126/science.1102210. [DOI] [PubMed] [Google Scholar]
- 33.Osanai-Futahashi M., Suetsugu Y., Mita K., Fujiwara H. Genome-wide screening and characterization of transposable elements and their distribution in the silkworm, Bombyx mori. Insect Biochem. Mol. Biol. 2009;38:1046–57. doi: 10.1016/j.ibmb.2008.05.012. [DOI] [PubMed] [Google Scholar]
- 34.Okada N. SINEs: short interspersed repeated elements of the eukaryotic genome. Trends Ecol. Evol. 1991;6:358–61. doi: 10.1016/0169-5347(91)90226-N. [DOI] [PubMed] [Google Scholar]
- 35.Coates B.S., Sumerford D.V., Hellmich R.L., Lewis L.C. A Helitron-like transposon superfamily from Lepidoptera disrupts (GAAA)n microsatellites and is responsible for flanking sequence similarity within a microsatellite family. J. Mol. Evol. 2010;70:278–88. doi: 10.1007/s00239-010-9330-6. [DOI] [PubMed] [Google Scholar]
- 36.Yang C., Teng X., Zurovec M., et al. Characterization of the P25 silk gene and associated insertion elements in Galleria melonella. Gene. 1998;209:157–65. doi: 10.1016/s0378-1119(98)00029-8. [DOI] [PubMed] [Google Scholar]
- 37.Van't Hof A.E., Brakefield P.M., Saccheri I.J., Zwaan B.J. Evolutionary dynamics of multilocus microsatellite arrangements in the genome of the butterfly Bicyclus anynana, with implications for other Lepidoptera. Heredity. 2007;98:320–8. doi: 10.1038/sj.hdy.6800944. [DOI] [PubMed] [Google Scholar]
- 38.Hall T.A. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids Symp. Ser. 1999;41:95–8. [Google Scholar]
- 39.Tamura K., Dudley J., Nei M., Kumar S. MEGA4: molecular evolutionary genetic analysis MEGA. software version 4.0. Mol. Biol. Evol. 2007;24:1596–9. doi: 10.1093/molbev/msm092. [DOI] [PubMed] [Google Scholar]
- 40.Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucl. Acids Res. 2003;31:3406–15. doi: 10.1093/nar/gkg595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Youens-Clark K., Faga B., Yap I.L., Stein L., Ware D. CMap 1,01: a comparative mapping application for the Internet. Bioinformatics. 2009;25:3042–2. doi: 10.1093/bioinformatics/btp458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Huang X., Madan A. CAP3: a DNA sequence assembly program. Genome Res. 1999;9:868–77. doi: 10.1101/gr.9.9.868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Conesa A., Götz S., Garca-Gomez J. M., et al. Blast2go: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21:3674–6. doi: 10.1093/bioinformatics/bti610. [DOI] [PubMed] [Google Scholar]
- 44.Götz S., Garcia-Gomez J.M., Terol J., et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucl. Acids Res. 2008;36:3421–35. doi: 10.1093/nar/gkn176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Coates B.S., Kroemer J.A., Sumerford D.V., Hellmich R.L. A novel class of miniature inverted repeat transposable elements MITEs that contain hitchhiking GTCYn microsatellites. Insect Mol. Biol. 2011;20:15–27. doi: 10.1111/j.1365-2583.2010.01046.x. [DOI] [PubMed] [Google Scholar]
- 46.Tempel S., Nicolas J., Amrani A.E., Couee I. Model-based identification of Helitrons results in a new classification of their families in Arabidopsis thaliana. Gene. 2007;403:18–28. doi: 10.1016/j.gene.2007.06.030. [DOI] [PubMed] [Google Scholar]
- 47.Sweredowski M., Wilson L.D.R., Gaut B.S. A comparative computational analysis of nonautonomous Helitron elements between maize and rice. BMC Genomics. 2008;9:467. doi: 10.1186/1471-2164-9-467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Leung S.K., Wong J.T.Y. The replication of plastid microcircles involved rolling circle intermediates. Nucl. Acids Res. 2009;37:1991–2002. doi: 10.1093/nar/gkp063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Yoder J.A., Walsh C.P., Bestor T.H. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 1997;13:335–40. doi: 10.1016/s0168-9525(97)01181-5. [DOI] [PubMed] [Google Scholar]
- 50.Orgel L.E., Crick F.H.C. Selfish DNA-the ultimate parasite. Nature. 1980;284:604–7. doi: 10.1038/284604a0. [DOI] [PubMed] [Google Scholar]
- 51.Wang W., Kirkness E.F. Short interspersed elements SINEs are a major source of canine genome diversity. Genome Res. 2005;15:1798–808. doi: 10.1101/gr.3765505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Sunter J.D., Patel S.P., Skilton R.A., et al. A novel SINE family occurs frequently in both genomic DNA and transcribed sequences in ixodid ticks from the arthropod sub-phylum Chelicerata. Gene. 2008;415:13–22. doi: 10.1016/j.gene.2008.01.026. [DOI] [PubMed] [Google Scholar]
- 53.Lipatov M., Lenkov K., Petrov D.A., Bergman C.M. Paucity of chimeric gene-transposable element transcripts in the Drosophila melanogaster genome. BMC Biol. 2005;3:24. doi: 10.1186/1741-7007-3-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.van de Legemaat L.N., Landry J.R., Mager D.L., Medstrand P. Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet. 2003;19:530–6. doi: 10.1016/j.tig.2003.08.004. [DOI] [PubMed] [Google Scholar]
- 55.Giroux M.J., Clancy M., Baier J., et al. De novo synthesis of an intron by the maize transposable element dissociation. Proc. Natl. Acad. Sci. USA. 1994;91:12150–4. doi: 10.1073/pnas.91.25.12150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Davis M.B., Dietz J., Standiford D.M., Emerson C.P., Jr. Transposable element insertions respecify alternative exon splicing in three Drosophila myosin heavy chain mutants. Genetics. 1988;150:1105–14. doi: 10.1093/genetics/150.3.1105. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.