Skip to main content
eLife logoLink to eLife
. 2020 Jun 1;9:e58436. doi: 10.7554/eLife.58436

Retrocopying expands the functional repertoire of APOBEC3 antiviral proteins in primates

Lei Yang 1, Michael Emerman 2, Harmit S Malik 3,4,, Richard N McLaughlin Jnr 1,3,
Editors: Karla Kirkegaard5, Karla Kirkegaard6
PMCID: PMC7263822  PMID: 32479260

Abstract

Host-virus arms races are inherently asymmetric; viruses evolve much more rapidly than host genomes. Thus, there is high interest in discovering mechanisms by which host genomes keep pace with rapidly evolving viruses. One family of restriction factors, the APOBEC3 (A3) cytidine deaminases, has undergone positive selection and expansion via segmental gene duplication and recombination. Here, we show that new copies of A3 genes have also been created in primates by reverse transcriptase-encoding elements like LINE-1 or endogenous retroviruses via a process termed retrocopying. First, we discovered that all simian primate genomes retain the remnants of an ancient A3 retrocopy: A3I. Furthermore, we found that some New World monkeys encode up to ten additional APOBEC3G (A3G) retrocopies. Some of these A3G retrocopies are transcribed in a variety of tissues and able to restrict retroviruses. Our findings suggest that host genomes co-opt retroelement activity in the germline to create new host restriction factors as another means to keep pace with the rapid evolution of viruses. (163)

Research organism: Other

Introduction

Host genomes have an ancient history of coevolution with selfish genetic elements. One type of these selfish elements, called endogenous retroelements, created a substantial fraction of most animal genomes (Canapa et al., 2016; de Koning et al., 2011; Lander et al., 2001; Smit et al., 2015; Sotero-Caio et al., 2017). Endogenous retroelements such as endogenous retroviruses (ERVs) and Long Interspersed Element-1s (LINE-1s) reside in host genomes where they ‘copy-and-paste’ themselves via the action of their reverse transcriptase. These retroelements can negatively impact host fitness by disrupting genes or regulatory regions, and by increasing the likelihood of ectopic recombination (Boissinot et al., 2001; Hancks and Kazazian, 2012; Kaer and Speek, 2013; Petrov et al., 2003; Song and Boissinot, 2007).

In addition to acting on their own RNA to ensure duplication, the reverse transcription/integration functions encoded by LINE-1s and ERVs also occasionally act on host mRNAs. This ‘off-target’ activity, termed retrocopying, entails the duplication of a host gene via the reverse transcription and integration of an mRNA. These ‘retrocopies’ are intronless and removed from the chromosomal location of the parental intron-containing gene. Previous studies estimated that 3,700–18,000 retrocopies are present in the human genome (Casola and Betrán, 2017; Navarro and Galante, 2015; Potrzebowski et al., 2008).

Two features distinguish retrocopies from other types of gene duplications. First, ‘DNA-based’ mechanisms of duplication (e.g., segmental gene duplications) result in a new copy of the gene including its promoter and distal regulatory elements. In contrast, retrocopying typically duplicates only the exons, leading to the moniker ‘processed pseudogene’. Thus, transcription of a new retrocopy depends on the genomic neighborhood into which it integrates (Carelli et al., 2016). Second, retrocopying relies on the machinery encoded by endogenous retroelements like LINE-1, which are highly active in germline and early embryo (Friedli et al., 2014; Garcia-Perez et al., 2007; Klawitter et al., 2016; Muotri, 2016; Wissing et al., 2012). Therefore, unlike DNA-based duplications, retrocopying is almost exclusively limited to RNAs expressed in germline or early embryonic tissues. It follows that the level of germline expression of host mRNAs should be highly correlated to their probability of generating retrocopies. For example, ribosomal proteins that are highly expressed in germline tissues represent the most abundant class of processed pseudogenes in the human genome (Balasubramanian et al., 2009). While germline expression of host mRNAs predicates the generation of retrocopies, the vast majority of these retrocopies show characteristic signatures of pseudogenization (Casola and Betrán, 2017; Navarro and Galante, 2015; Potrzebowski et al., 2008).

While most retrocopies do not increase the genic capacity of the host due to inactivating mutations, a subset of retrocopies escaped mutational abrasion, presumably because they provide a selective advantage to the host. Indeed, evidence of functional retention in retrocopied sequences has been found in diverse organisms and includes functions such as novel subcellular localization of proteins (Rosso et al., 2008), neurotransmitter metabolism (Burki and Kaessmann, 2004), courtship (Wang et al., 2002), fertility (Kalamegham et al., 2007), and pathogen restriction (Malfavon-Borja et al., 2013; Sayah et al., 2004). Such functional retention may be particularly beneficial in the case of host defense genes, whose functional diversification is necessary for host genomes to keep pace with pathogens. For example, retrocopying of the CypA gene between coding exons of the TRIM5 gene has created novel TRIMCyp fusion genes that can potently restrict retroviruses including HIV-1 (Malfavon-Borja et al., 2013; Newman et al., 2008; Nisole et al., 2004; Sayah et al., 2004; Virgen et al., 2008; Wilson et al., 2008). In a remarkable case of convergent evolution, retrocopying has created TRIMCyp fusion genes multiple times during primate evolution, further expanding and diversifying the TRIM gene family for retroviral defense (Brennan et al., 2008; Virgen et al., 2008). In other examples, mobile element and viral genes themselves have been retrocopied and domesticated for various functions including antiviral defense (Best et al., 1996; Fujino et al., 2014; Malik and Henikoff, 2005; McLaughlin et al., 2014; Ito et al., 2013; Yan et al., 2009).

Here, we investigated whether retrocopying may have similarly diversified another family of host defense genes: the APOBEC3 (A3) cytidine deaminases. Although the common ancestor of placental mammals likely encoded three A3 genes (Münk et al., 2012), this locus has recurrently expanded and contracted throughout mammalian evolution (Ito et al., 2020), including a dramatic expansion to seven paralogous genes in catarrhine primates (Old World monkeys and hominoids) followed by recurrent positive selection of this expanded gene set (Bulliard et al., 2009; Compton et al., 2012; Duggal et al., 2011; Henry et al., 2012; McLaughlin et al., 2016; OhAinle et al., 2006). We found that ancient and recent retrocopying has further diversified the already expansive A3 gene repertoire in primates, adding as many as ten new A3s outside the well-studied’ A3 locus’. Our work uncovered an ancient A3 born via retrocopying in the common ancestor of simian primates and a dramatic, ongoing history of A3 retrocopying in New World monkeys (NWMs). Many of these NWM-specific A3 retrocopies are expressed and some retain the capability to restrict retroviruses. Thus, retrocopied A3s have continually expanded host defense repertoires in primate genomes.

Results

A3I: an ancient A3 retrocopy in simian primates

The A3 locus of human, other hominoids, and Old World monkeys comprises seven clustered genes, A3A-A3H (Jarmuz et al., 2002; OhAinle et al., 2006; Silvas and Schiffer, 2019Figure 1A). We undertook an analysis to search for any variation in this gene structure in other primate genomes. We used BLASTn with each of the seven human A3 nucleotide sequence to query all sequenced primate genomes on NCBI. As expected, each genome contained a series of proximal hits, presumably comprising the A3 locus. However, in every simian primate genome examined, we also found exactly one shared syntenic region, distinct from the A3 locus, with high similarity to A3G. This sequence match spanned the exonic sequences of A3G in a single contiguous region of around 1,100 bp (Figure 1B). Based on the absence of introns, we concluded that this sequence represents a retrocopy of an A3 gene. We found no evidence of the syntenic copy of this A3 retrocopy in the genomes of prosimians including the tarsier, bushbaby, and mouse lemur. We, therefore, conclude that this retrocopy was born in the common ancestor of simian primates, and hence propose the name A3I for this retrocopy, extending the nomenclature scheme for other human A3 genes.

Figure 1. Identification and phylogenetic distribution of A3I.

(A) A3I is located away from the A3 locus at a distant but highly conserved syntenic locus in all simian primates. The human genome is shown as an example. (B) ORF structure of A3I in various primate species. Purple boxes represent sequences that can be aligned to the intron-containing A3 copies, whereas yellow boxes represent the longest ORF of the A3I in corresponding species. Stars (*) indicate the position of stop codons. (C) Maximum likelihood phylogeny of A3Is and the intron-containing A3Bs and A3Gs. Clusters of A3Is, A3Gs, and A3Bs are highlighted by their respective color, and bootstrap values leading to these clusters are shown on the nodes. (D) Expansion of A3 retrocopies along the primate phylogeny. The number of retrocopies of each A3 is shown in color boxes at the inferred point of retrocopy birth in the primate phylogeny. The white ‘A3’ box represents a sequence that could not be assigned to a particular ortholog.

Figure 1.

Figure 1—figure supplement 1. A phylogeny of the domains of primate A3s and A3G retrocopies A PhyML tree of the domains of A3s and A3Is.

Figure 1—figure supplement 1.

Primate A3s with two deaminase domains were split into their constituent N-terminal and C-terminal domains and aligned together with all single deaminase domain A3s. Black triangles indicate bootstraps > 90%.
Figure 1—figure supplement 2. A phylogeny of primate A3s and A3G retrocopies A PhyML tree of nucleotide sequences of an alignable region of A3Gs and A3Bs from diverse simian primates in addition to A3G retrocopies from New World monkeys.

Figure 1—figure supplement 2.

Tree is rooted on the A3B group. New World monkey retrocopies group with New World monkey A3Gs which group in a larger group with A3Gs from all simian primates. Exceptionally, one retrocopy (A3I) is found in all simian primate genomes and forms a monophyletic group that branches before the diversification of simian primates (and their A3Gs).

To understand the relationship between A3I and other A3 genes, we created a maximum likelihood tree using the nucleotide sequences from simian primate A3B, A3G, and A3I genes. A3I could be aligned to primate A3Gs and A3Bs, since A3I shares the deaminase domain organization of these A3s (A3Z2-A3Z1; Figure 1A). With high bootstrap support, we found that all A3Is share a common ancestor to the exclusion of A3B and A3G genes. This pattern held with a tree of individual deaminase domains from all human A3s (Figure 1—figure supplement 1). Our analyses found the A3G genes to be the closest phylogenetic neighbors of the A3Is, suggesting a common ancestry of these two genes (Figure 1C). Since A3G is predicted to have been born soon after the simian-prosimian split (Münk et al., 2012), we propose that A3I arose via retrocopying of A3G in the common ancestor of simian primates, approximately 43 million years ago (MYA) (Perelman et al., 2011).

In all species analyzed, A3I has acquired potentially inactivating mutations relative to A3G. We found that all A3I retrocopies share a nonsense mutation at codon position 261 (Supplementary file 3), and thus, in most species, A3I encodes only a short putative open reading frame (ORF) of 153 codons (compared to the 384 codon ORF of A3G) which spans the N-terminal deaminase domain. Following this initial truncation, there were additional lineage-specific disruptions of the A3I ORF during the diversification of simian primates. These results suggest that A3I was born in the simian ancestor either as a truncated retrocopy or acquired a truncating mutation shortly following birth. Nonetheless, it is possible that the ancient A3Is encoded functional A3 proteins before becoming disabled by mutation.

Multiple A3G retrocopies in new world monkeys

A3I was not the only hit uncovered by our search for A3 retrocopies. Our analyses also revealed A3F retrocopies in three Old World monkeys within the Colobinae subfamily, A3H retrocopies in two New World monkeys, and a single A3 retrocopy that cannot be assigned to a specific A3 parent in the greater galago (Otolemur garnettii) (Figure 1D). However, our most striking finding was that every sequenced NWM genome contained numerous A3G retrocopies. This abundance of A3 retrocopies motivated a deeper investigation of the evolution and function of these NWM A3G retrogenes.

Initially focusing on the common marmoset (Callithrix jacchus) genome, we found three intron-containing A3 genes on chromosome 1 (likely orthologous to human A3A, A3G, and A3H). We also found nine loci outside of the A3 locus with high sequence similarity to human and marmoset A3G genes (Figure 2A). In contrast to the marmoset A3G gene, each of these additional hits lacked introns suggesting they were retrocopies. Seven A3G retrocopies spanned more than 1,100 bp and most of the coding exons of the marmoset A3G. In addition, one retrocopy spanned 811 bp, and one shorter retrocopy spanned 260 bp (Figure 2A). These shorter retrocopies showed a marked 3’ bias, consistent with the 5’-truncation-prone, target-primed reverse transcription mechanism of LINE-1 and related retrotransposons (Cost et al., 2002; Luan et al., 1993). Seven A3G retrocopies possessed premature stop codons, deletions, or early truncations (Figure 2A). Two of these retrocopies encode ORFs that span a single cytidine deaminase domain (’C1’ and ‘C2’). However, two A3G retrocopies (‘SS1’ and ‘SS2’, Figure 2A) were predicted to encode a 382 amino acid protein (comparable to the full-length 384 amino acid protein encoded by the intron-containing A3G gene). Thus, the marmoset genome contains nine A3G retrocopies, which may encode 2–4 additional A3G-like proteins.

Figure 2. Discovery and phylogenetic analysis of A3G retrocopies in New World monkeys.

(A) The common marmoset (Callithrix jacchus) genome encodes a single A3G and nine retrocopies of A3G (orange boxes). A3G resides on four coding exons at the A3 locus on chromosome 1, while the retrocopies are intronless and found throughout the genome. Some retrocopies contain putative protein-coding ORFs (yellow boxes) of varying lengths that retain alignable sequence similarity to the A3G protein (gray within yellow boxes, regions of poor alignment caused by frame-shifting mutations). (B) PhyML tree of A3G and A3G retrocopies from the four sequenced and assembled New World monkey genomes suggests that six retrocopies are orthologous and conserved in all four species (clusters C1-C5 and A3I). The genome of each species (colors correspond to species) contains an intron-containing A3G (dotted line) as well as retrocopies that are closely related to A3G. These more recent copies are found in only one genome, without identifiable orthologs in the other three species. Some retrocopies retain a putative protein coding ORF (indicated by a circle at the branch tip).

Figure 2.

Figure 2—figure supplement 1. Synteny analysis of retrocopies in marmoset and squirrel monkey for inference of orthology UCSC table browser was used to identify the genes on either side of each retrocopy in marmoset and squirrel monkey.

Figure 2—figure supplement 1.

An occurrence of the same genes on both sides of a copy in each species was used as an indicator of synteny and therefore orthology. All inferred orthologous pairs were also supported by phylogenetic groupings.
Figure 2—figure supplement 2. PCR of genomic DNA of New World monkeys to date retrocopy births.

Figure 2—figure supplement 2.

(A) PCR amplification of marmoset genomic DNA using oligos designed to designed to amplify each retrocopy locus in marmoset shows filled loci for 6/7 retrocopies and a failed PCR reaction for retrocopy SS1. Oligos designed to amplify a squirrel-monkey copy, not found in marmoset, show no band. (B) Amplification of marmoset retrocopy-containing loci in a panel of New World monkeys plus human genomic DNA show variable presence and/or retention of retrocopies across species.

Next, we expanded our search for A3G retrocopies to the other assembled NWM genomes on NCBI: the Bolivian squirrel monkey (Saimiri boliviensis), the white-faced capuchin (Cebus capucinus), and Ma’s night monkey (Aotus nancymaae). Like marmoset, we found multiple A3G retrocopies in each of these genomes – eight in capuchin, seven in night monkey, and ten in squirrel monkey (Figure 2A). All but one of the NWM A3G retrocopies aligned (without large gaps) to each other and intron-containing A3Gs, and a phylogenetic tree confirmed these retrocopies cluster with NWM A3Gs (Figure 1—figure supplement 2). The exception, one squirrel monkey A3G retrocopy (GenBank: JH378161), contains a 286 bp insertion of another shorter A3G retrocopy, most likely the result of a nested insertion of one retrocopy into another.

A maximum likelihood phylogenetic tree (Figure 2B) using the alignable NWM A3Gs and A3G retrocopies revealed six bootstrap-supported ‘clusters’ of retrocopies with representatives from all four analyzed NWM species (Figure 2B and Supplementary file 1 clusters C1-C5 and A3I). One of these clusters contains the NWM A3I sequences which date back to at least the last common ancestor of simian primates. Our findings suggest that the other five clusters represent orthologs of A3G retrocopies born in or before the last common ancestor of these four NWM species. This orthology was further supported by shared synteny in two NWM genomes (marmoset and squirrel monkey) for all clusters (Figure 2—figure supplement 1). We, therefore, conclude that five orthologous NWM A3G retrocopies were likely born via independent retrotransposition events in or prior to the most recent common ancestor of these four species analyzed.

To more precisely date the origins and species distribution of these A3G retrocopies, we investigated their presence in additional NWM species lacking publicly available genome sequences. For each retrocopy, we used UCSC MultiZ alignments (Blanchette et al., 2004) to find flanking sequence conservation in shared syntenic locations of marmoset, human, and mouse genomes to design oligos specific to a single retrogene-containing locus in marmoset. We were able to do so for seven of nine retrocopies (all but SS3 and C4, Figure 2A). Confirming the specificity of these oligos, six of these seven oligo pairs reproducibly amplified a single locus from marmoset genomic DNA with touchdown PCR whereas only A3I was amplified from human genomic DNA (Figure 2—figure supplement 2). Using these oligos and genomic DNA from other species, we observed that many retrocopies were present in the shared syntenic loci in other NWMs. Specifically, retrocopies C1, C2, C3, and C5 of marmoset were also present in titi (Plecturocebus moloch) and saki monkeys (Pithecia pithecia), two species in the basal family of the NWM phylogeny, suggesting these retrocopies were born in or prior to the most recent common ancestor of all NWMs (~25 MYA).

In addition to the six orthologous ‘clusters’ of retrocopies found in all or many NWMs, our phylogenetic analysis also reveals ‘species-specific’ retrocopies (SS) with no apparent ortholog in the other three species with genome assemblies (Figure 2B). These A3G retrocopies instead share a recent common ancestor with the intron-containing A3G gene from the same species, suggesting that they were born recently. Our PCR analysis revealed that some ‘marmoset-specific’ retrocopies were also present in the closely related tamarin (Saguinus oedipus). Thus, A3G retrocopies vary in age, from being found in only one species, in a few closely related species, in all NWM species, or in all simian primates. The different branch lengths leading to each of the NWM A3G retrocopies or retrocopy ortholog clusters also reflect their variable ages (Figure 2B). Our findings suggest that rather than a single burst, A3G retrocopies have been continually born throughout the evolutionary history of NWMs.

Retention of putatively functional NWM A3G retrogenes

Retrocopies are often assumed to be nonfunctional at birth since they usually consist of only the sequence within the mRNA of the parent gene and therefore lack promoters, and enhancers. However, there are well-documented examples of retrogenes that have been retained for their functionality (Casola and Betrán, 2017). To investigate whether any of the A3 retrocopies might be functional, we used several criteria to eliminate retrocopies likely to be non-functional. We assumed that functional retrocopies should be transcribed and should have evolved under selective constraint on an intact open reading frame with an intact cytidine deaminase motif. To be conservative, we narrowed our focus to the A3G retrocopies detectable in the four sequenced, publicly available New World monkey genomes – nine copies in common marmoset, eight copies in white-faced capuchin, ten copies in Bolivian squirrel monkey, and seven copies in Nancy Ma’s night monkey.

Many A3 genes are expressed in the germline and early development (Friedli et al., 2014; Marchetto et al., 2013; Refsland et al., 2010) where they protect against a diverse range of infectious and endogenous elements including retroviruses and LINE-1 (Arias et al., 2012; Harris and Dudley, 2015). In order to similarly assess expression of A3G retrocopies in vivo, we queried publicly available NWM RNA-seq datasets (Supplementary file 2) using all reference A3G retrocopy sequences. We organized all perfect-matching, uniquely-mapping read counts by species (Figure 3, x-axis) and specific retrocopy (Figure 3, y-axis). The intron-containing A3G gene itself showed detectable expression in most datasets in all four species (organized by tissue source of each dataset, colored bars down columns denote tissues). We also found that several A3G retrocopies showed expression in each species (dark bars within each species). For example, C2 retrocopies (Figure 3) are expressed in three of the four analyzed NWM species. In marmoset, the C2 retrocopy is expressed in stem cells and induced pluripotent cells, similar to the intron-containing A3G gene. We also found that each species expressed at least one species-specific retrocopy. In several species, these younger, species-specific A3G retrocopies were expressed in ovaries and testes just like the intron-containing A3G. Overall, our data suggest that a large subset of NWM A3G retrocopies are expressed in vivo, including in tissues relevant for defense against pathogens.

Figure 3. A3G retrocopies are transcribed in New World Monkey tissues A heat map shows the counts of RNA-seq reads (log10 of read count + 1) that map uniquely at 100% identity and coverage.

Figure 3.

Each pixel represents the average read counts of available data for the corresponding tissue type and A3G retrocopy. Tissue types are marked by the colored lines behind the pixels. Green represents germline tissues including iPSC, ESC, testis and ovary; orange represents brain tissues of various regions; red represents blood samples including whole blood and lymphocytes. Retrocopies which retain a putative protein coding ORF are labeled with ‘ORF’.

Second, we evaluated the A3G retrocopies for their predicted ORFs. Full-length NWM A3G genes encode a ~ 384 amino acid protein. In contrast, most A3G retrocopies encode only short putative ORFs, typically less than 100 codons (Figure 2A, yellow boxes). However, a subset of A3G retrocopies have retained a predicted ORF of at least 250 codons (Figure 2A, yellow boxes; Figure 2B, empty circles on branches) which encompasses one or two of the core deaminase domains. Most of these retrocopies conserve the core amino acids within these domains required for antiviral or anti-retroelement activities of A3G (Figure 4Huthoff and Malim, 2007; Navarro et al., 2005).

Figure 4. A3G retrocopies retain core deaminase motifs.

Figure 4.

An amino acid alignment of the core deaminase motifs shows that A3Gs of various primates have conserved HxE-CxxC motifs in both the N- and C-terminal domains. The putative ORF-encoding retrocopies all retain a conserved C-terminal motif, and most retain an N-terminal motif.

Third, we evaluated the A3G retrocopies for evidence of selective retention. Although A3G retrocopies are expected to lack stop codons upon their birth from an intact A3G gene, absence of stop codons in older A3G retrocopies could indicate functional retention. We adapted a previously published approach (Young et al., 2018) to simulate the rate of decay of ORFs in the absence of selection based on ORF length and conservative and liberal bounds on NWM background mutation frequency and generation time (Campbell and Eichler, 2013; Tacutu et al., 2018; Thomas et al., 2018). We found that that less than 5% of A3G ORFs were expected to remain intact after 20 million years (less than 1% after 40 million years) (Figure 5). In contrast to this expectation, we found two A3G retrocopies have remained intact despite being at least 20 million years old (Bininda-Emonds et al., 2007). These include one C1 A3G retrocopy (with a preserved ORF in capuchin monkey) and a C2 retrocopy (with a preserved ORF in marmoset and night monkey). Based on these findings, we hypothesize that some NWM A3G retrocopies have been retained for their function.

Figure 5. Simulation and evolution suggest selection to retain ORFs in A3G retrocopies.

(A) A simulation of ORF retention suggests most are lost within 10–20 million years in the absence of any selection to retain the ORF. Dots indicate the proportion of simulated ORFs (10,000 total) that were still intact after a given time. Colors represent three sets of parameters intended to match New World monkeys (green) or provide liberal (orange, mouse-like) and conservative (blue, human-like) bounds on the parameter sets of indel rate and generation time. The substitution rate of Ma’s night monkey was used for all three sets of simulations. Horizontal red lines indicate the 1 st and 5th percentile of intact ORFs. Vertical red lines mark the key time points of last common ancestors (LCA) among New World monkeys.

Figure 5.

Figure 5—figure supplement 1. Analysis of selection in the evolution of retrocopies.

Figure 5—figure supplement 1.

Top, PAML free ratio analysis of selection along branches (omega values for terminal branches shown in orange). Omega values less than one for all terminal branches leading to A3Gs suggest these genes have evolved under purifying selection. The branches leading to the two retrocopies that restrict retrovirus have elevated omega values (significantly higher than dN/dS = 1, p=0.058 for capuchin-C1, p=0.025 for marmoset-SS1) suggesting these retrocopies have evolved under positive selection. Bottom, RELAX analysis of overall selection when comparing A3Gs (purple) to A3G retrocopies (black) suggests the retrocopies have evolved under intensified selection (no detection of relaxation of selection) relative to the presumably functional A3Gs.

To further evaluate the selective constraint acting on A3G retrocopies, we used computational models to test whether their evolution more closely resembles a functional gene or a pseudogene. We first used the RELAX method (Wertheim et al., 2015) to test whether the A3G retrocopies show relaxed selection relative to intron-containing A3G genes. Significant relaxed selection was not detected in the putatively intact retrocopies relative to the intron-containing A3Gs (Figure 5—figure supplement 1). Instead, RELAX suggests that the A3G retrocopies have evolved more rapidly than the intron-containing A3G genes in the same set of species (Figure 5—figure supplement 1). Next, using a branch model of PAML (Yang, 2007), we observed that two retrocopies (capuchin-C1 and marmoset-SS1) had elevated dN/dS (2.6 and 2.9 respectively, significantly greater than the neutral expectation of 1), while the rest of the branches were suggestive of neutral evolution or purifying selection (Figure 5—figure supplement 1). These analyses suggested that, overall, the retrocopies evolved at a similar or accelerated rate compared to intron-containing A3Gs. Further, capuchin-C1 and marmoset-SS1 show evidence of accelerated evolution. Overall, our three lines of evidence suggest that at least a subset of the A3G retrocopies are likely to have been retained for their function.

Antiviral activity of NWM A3G retrocopies

We reasoned that A3G retrocopies could have a role in innate immunity/genome defense similar to intron-containing A3 genes. To test this possibility, we cloned and assayed intron-containing A3Gs and each A3G retrocopy encoding an intact near-full-length ORF for its ability to restrict the endogenous retroelement LINE-1 using established in vitro retrotransposition assays (Dewannieux et al., 2003; Moran et al., 1996). These assays require that a LINE-1 sequence be transcribed, spliced, and reverse transcribed back into the genome. As controls, we tested the anti-LINE-1 restriction of human A3A and human A3G. Consistent with previous reports (Bogerd et al., 2006; Chen et al., 2006; Muckenfuss et al., 2006; Niewiadomska et al., 2007), we observed potent restriction of LINE-1 by human A3A, and no restriction by human A3G. In contrast to human A3G, we found that the intron-containing A3Gs from marmoset and squirrel monkey restricted LINE-1 more than 10-fold, comparable to A3A. However, we observed no appreciable restriction of LINE-1 by any of the A3G retrocopies (Figure 6; Figure 6—figure supplement 1). Thus, despite potent anti-LINE-1 restriction by NWM A3Gs, it appears that this activity is not retained by any of the retrocopies tested.

Figure 6. A3G retrocopies restrict HIV-1 but not LINE-1 Bar charts of measured restriction of LINE-1 (retrotransposition assays) and HIV-1ΔVif (single cycle infectivity assays) show that NWM A3Gs and some A3G retrocopies restrict retrovirus.

Only NWM A3Gs, but not retrocopies restrict LINE-1.

Figure 6.

Figure 6—figure supplement 1. Western blot of A3s and A3G retrocopies For each construct, 50 ng plasmid was transfected into 25,000 293T cells in a single well of a 24 well plate.

Figure 6—figure supplement 1.

Forty-eight hours later, cells were harvested, lysed, and probed with Covance mouse HA.11 Clone 16B12 anti-HA monoclonal antibody.

Next, we investigated the antiviral restriction by NWM A3G genes and retrocopies. Using single-cycle infectivity assays, we measured the ability of NWM A3G genes and retrocopies to block infectivity of HIV-1ΔVif, which lacks Vif, a known antagonist of APOBEC3 proteins. Consistent with previous results, we found that human A3G potently restricts HIV-1ΔVif but human A3A is a poor restrictor; this restriction pattern is the opposite to that observed for LINE-1 restriction in here and in previous findings (Bogerd et al., 2006; Chen et al., 2006; Turelli et al., 2004Figure 6). We also observed 100-fold or greater restriction of HIV-1ΔVif infectivity by intron-containing A3G genes from marmoset and squirrel monkey (Figure 6; Figure 6—figure supplement 1), consistent with a previous report of restriction by NWM A3Gs (Wong et al., 2009). Finally, we observed that two retrocopies – marmoset-SS1 and capuchin-C1 – restrict HIV-1ΔVif at least 10-fold, suggesting that these two A3G retrocopies encode bona fide A3G-like anti-retroviral activity. Thus, retrocopying has expanded the functional repertoire of A3 antiviral genes in NWMs. At least 2 of these genes are expressed at the RNA level in at least some tissues and encode a functional protein with antiviral activity.

Discussion

Replicating retrotransposons inflict deleterious consequences on host genomes via insertional mutagenesis, ectopic recombination, and dysregulation of proximal genes (Beck et al., 2011). Despite these negative consequences, retrotransposons can bring about innovation in host genomes via the birth of new exons or genes (Mi et al., 2000; Schmitz and Brosius, 2011), or novel regulatory mechanisms and gene-regulatory networks (Chuong et al., 2016; Kunarso et al., 2010; Wang et al., 2007). In this work, we show that retrotransposon-mediated gene birth can lead to continual evolution of new innate immune genes. We show that all simian primate genomes contain the remnants of A3I, an ancient A3 retrocopy. We further find that NWM genomes have continually acquired A3G-derived retrocopies, a subset of which are transcribed, retain intact ORFs and functional motifs, and are capable of restricting retroviruses.

This history of ancient and young retrocopies provides a valuable resource in understanding how antiviral genes coevolve with pathogens, including changes in Vif-interacting residues or viral restriction profile (Krupp et al., 2013). Although numerous methods exist for reconstruction of ancestral sequences, rapidly evolving genes like the A3s violate assumptions and often limit the utility of these methods, thereby preventing reliable reconstruction of ancestral sequences. However, retrocopies are molecular fossils, an evolutionary snapshot of the ancient parental gene sequence which presumably evolved neutrally after inserting into the genome. A3I provides such a record of an A3G-like gene from 40 MYA, which was present in the common ancestor of simian primates. Given the rapid gene turnover of the A3 locus in mammals, it is possible that the parent of A3I no longer exists in modern primates. In this scenario, the A3I retrocopy may be all that remains of this ancient A3 gene which predates simian primate diversification.

Recent computational analysis corroborates the presence of A3 retrocopies in two of the genomes we analyzed (Hayward et al., 2018; Ito et al., 2020) and adds to a growing literature suggesting the A3 content of mammalian genomes may be even more variable and dynamic than previously appreciated (Hayward et al., 2018; Ito et al., 2020). Our data suggests that A3 retrocopying is more prevalent in NWM genomes compared even to other simian primates. This abundance is consistent with a previous study that reported an increased number of retrocopies of all genes in marmoset and squirrel monkey genomes, correlated with an increase in the activity of two LINE-1 subfamilies L1PA7 and L1PA3 (Navarro and Galante, 2015). It is unclear whether increased LINE-1 activity is sufficient to explain our observations since some NWMs like the Ateles lineage may have low or no retroelement activity (Boissinot et al., 2004). Even if NWM LINE-1 activity is high, it would not necessarily explain why A3G rather than the other NWM A3 genes are subject to recurrent retrocopying. Although duplication of some nuclear A3 proteins like human A3A or A3B are likely to be more toxic due to increased genomic mutation (Hultquist et al., 2011; McLaughlin et al., 2016), we favor the alternate hypothesis that A3G expression in the germline/early embryos of NWMs is unusually high, rendering it a more likely substrate for retrocopying relative to other NWM A3 genes. Following their insertion into a new genomic location, these retrocopies could be expressed by exaptation of a neighboring transposable element, promoter piggybacking, or recruitment of a novel promoter (Carelli et al., 2016). Recent work suggests that most of the mouse genome is transcribed over relatively short evolutionary timescales (Neme and Tautz, 2016). Such ‘genome-wide’ transcription could be the first step in exposing an advantageous function of a retrogene (Jaganathan et al., 2019).

We showed that intron-containing NWM A3G genes restricted both LINE-1 and HIV-1. Thus, it is likely that A3G retrocopies retained both of these functions immediately following birth. Yet, over time, all retrocopies that restrict HIV-1 have lost the ability to restrict LINE-1 (Figure 6). Although this could reflect idiosyncratic events, our finding that anti-LINE-1 activity, but not anti-retroviral activity, was repeatedly lost, suggests otherwise. It is possible that the anti-LINE-1 function is simply more sensitive to random mutation, such that mutations are more likely to result in loss of LINE-1 restriction; we also cannot rule out the possibility that certain A3G retrocopies retain the capacity to restrict NWM-specific LINE-1 lineages. Alternatively, A3G retrocopies may have been absolved of selection for LINE-1 restriction, perhaps due to sufficient silencing by A3G and other restriction factors. Nevertheless, the retrocopies present a natural ‘separation of function’ event that can delineate the requirements for A3G proteins to restrict LINE-1 versus retroviruses.

Although we used the lentivirus HIV-1ΔVif to measure the anti-retroviral activity of NWM A3G retrocopies, lentiviruses have not yet been found in NWMs. Even apart from lentiviruses, few active retroviruses in general have been found in NWMs; those that have been found likely represent the tip of an understudied aspect of monkey and virus biology (Colcher et al., 1977; Muniz et al., 2013). Thus, HIV-1ΔVif only serves as a proxy for the activity of A3G retrocopies towards some relevant viral pathogen in the natural environment of these monkeys.

While the NWM A3G retrocopies did not restrict LINE-1, such a mechanism of gene duplication could, in theory, function as a feedback mechanism on excess retroelement activity in the germline/early embryo. Retroelement restriction factors expressed in these tissues could be retrocopied and increase dosage or diversity of anti-retroelement restriction factors (Kondrashov et al., 2002). In this way, the retrocopies may represent a ‘revolving door’ of new gene substrates for neo- or sub-functionalization; the needs of the genome would dictate which functions persist.

In conclusion, our findings suggest retrocopied gene sequences represent a prevalent, recurrent, and rapid mechanism in primates and other organisms to evolve new genome defense functions including restriction of viruses. Although the presence of endogenous retroelements is probably net deleterious to the host, retrogene birth represents a mechanism whereby host genomes could nevertheless take advantage of the activities of these genomic pathogens to protect themselves against endogenous and infectious pathogens.

Materials and methods

Identification of A3 retrocopies

A3G retrocopies were identified using BLAT of UCSC genome databases for marmoset (Callithrix jacchus draft assembly, WUGSC 3.2, GCA_000004665.1) and squirrel monkey (Saimiri boliviensis, saiBol1, GCA_00023585.1) with marmoset A3G (NM_001267742) as a query sequence. Additional copies were identified using BLASTn of the NCBI genome assemblies of Ma’s night monkey (Aotus nancymaae, Anan_2.0) and capuchin (Cebus capucinus imitator, Cebus_imitator-1.0). The spider monkey retrocopy was identified using BLAST to query the NCBI HTGS database for reads from New World monkeys. See Supplementary file 1 for detailed coordinates of each sequence.

Mapping inactivating mutations in retrocopies

A3I sequences were queried using the codon-based and indel-sensitive alignment program LAST (http://last.cbrc.jp). The translated A3G sequence of Callithrix jacchus (NC_013914.1) was used as the reference sequence and indexed using the setting of ‘lastdb -p -cR01’ of the LAST aligner, and then the A3I sequences were queried using the setting of ‘lastal -F15’ to output in ‘maf’ format. The longest indel-sensitive translation of each A3I was then manually extracted from the maf output and aligned with mafft (https://mafft.cbrc.jp/alignment/software/) using the setting of ‘--anysymbol’ to allow stop codons and frame shifting changes to be shown.

Analysis of syntenic A3G retrocopies

Synteny of A3G retrocopies in marmoset and squirrel monkey was analyzed using UCSC table browser to download gene names within 1Mbp of either side of the retrocopy. Synteny was confirmed if the same gene was adjacent next to the retrocopy in both species. For five pairs of sequences that the tree suggested should be orthologous, we found shared genes on both sides of the retrocopies. For one retrocopy (C5), we found a shared gene on only one side of the retrocopies (Figure 2—figure supplement 1).

Construction of A3 phylogeny

A3G and A3G retrocopies were aligned using MAFFT (Katoh and Standley, 2013) with auto algorithm parameters within Geneious version 11.1.4 (Kearse et al., 2012). All retrocopies (both ORF-containing and retropseudogenes) were aligned using the complete alignable region defined by BLASTn. Trees were constructed using PHYML (Guindon et al., 2010) with NNIs topology search, BioNJ initial tree, HKY85 nucleotide substitution model, and 100 bootstraps.

ORF retention simulation

To simulated the decay of retro A3G ORFs, we used the 'mutator' and 'orf_scanner' scripts developed by Young et al., 2018. The ORF of Callithrix jacchus A3G (identified from GenBank accession NC_013914.1, 1,150 bp) was used as the starting ORF. Combinations of several substitution, insertion, deletion rate and sexual maturation time were used for the simulation. We used substitution, insertion and deletion rate of 1.16 × 10−8, 2 × 10−10 and 5.5 × 10−10 per site per generation for human (Campbell and Eichler, 2013), substitution, insertion and deletion rate of 5.4 × 10−9, 1.55 × 10−10 and 1.55 × 10−10 per site per generation for mouse (Uchimura et al., 2015), and substitution rate of 8.1 × 10−9 for Night monkey (Aotus nancymaae) (Thomas et al., 2018). Sexual maturation time of human, mouse and New World monkeys were estimated to be 25, 0.3 and 1–9 years (http://genomics.senescence.info; https://animaldiversity.org). Each run simulates the mutation of the starting ORF for 50 million years, and the simulations with each set of parameters were repeated 10,000 times. The number of ORFs that were still open and at the same length of the starting ORF were counted at every 50,000 years of each simulation.

Analysis of selective constraints in A3G retrocopies

RELAX (Wertheim et al., 2015) was carried out using the Datamonkey webserver (Weaver et al., 2018) and a PhyML (Guindon et al., 2010) tree of the MAFFT (Katoh and Standley, 2013) aligned nucleotide sequences of the subset of retrocopies that encode an ORF longer than 250 amino acids in addition to the New World monkey A3Gs with or without human A3G. We defined the branches leading to the A3Gs as reference branches and all of the other branches as test branches. The above nucleotide alignment and PhyML tree were input into the CODEML NSsites model of PAML (Yang, 2007). To test for selection along branches, these same input files were input into the branch model of PAML. To test for significance of branches with apparent dN/dS < 1, we fixed that branch at dN/dS = 1 and calculated the likelihood of this tree.

RNA-seq analysis for retrocopy and A3G expression

We searched the NCBI GEO and SRA databases (October 2018) with the keywords ‘Callithrix’, ‘Aotus’, ‘Saimiri’ and ‘Cebus’ to find existing RNA-seq datasets from these species. Callithrix jacchus, Aotus nancymaae, and Cebus capucinus are used, matching the available species where retrocopies of A3G were identified. For Saimiri, Saimiri sciureus RNA-seq was used, for which no genome sequence has been published, and the retrocopy analysis in the rest of the text analyzes Saimiri boliviensis. All RNA-seq datasets (Supplementary file 2) were queried using the default parameters of the ‘blastn_vdb’ tool of SRA toolkit (Leinonen et al., 2011) and the identified A3Gs and A3G retrocopies in this work as query sequences. RNA-seq reads hit by blastn_vdb were then processed with a custom perl (https://www.perl.org) script to only keep the reads that match the query sequence at 100% identity across the entire RNA-seq read and maps uniquely to only one of the queried retrocopies or A3G. Read that passed these filters were tallied and organized by species, tissue type, and the retrocopy or A3G copy they match.

LINE-1 retrotransposition assays

LINE-1 retrotransposition assays were carried out as previously described (Xie et al., 2011). For the mouse ORFeus luciferase assays 25,000 HEK293T cells (ATCC Cat# CRL-3216, RRID:CVCL_0063) were seeded into each well of a 96-well clear bottom, white-wall plate. 24 hr later, each well was transfected with 200 ng pYX016 (CAG promoter driving mouse ORFeus LINE-1 with globin intron and luciferase reporter) or pYX015 (Xie et al., 2011) (JM111 inactive human LINE-1 construct which contains loss-of-function mutations in ORF1p of LINE-1) and pCMV-HA-A3 or pCMV-HA-empty (RRID:Addgene_32530). 24 hr post-transfection, transfected cells were selected with 2.5 μg/ml puromycin for 72 hr. Cells were lysed and luciferase substrates provided using the Dual-Glo Luciferase Assay System (Promega E2920). Renilla and firefly luciferase activity were measured using the LUMIstar Omega luminometer. Retrotransposition is reported as firefly/renilla activity to control for toxicity.

Virus infectivity assays

Single-round HIV-1 infectivity assays were performed as described previously (OhAinle et al., 2006; Yamashita and Emerman, 2004). To produce VSV-G-pseudotyped HIV-1, 50,000 HEK293T cells (ATCC Cat# CRL-3216, RRID:CVCL_0063) were plated in a 24-well plate, and 24 hr later, co-transfected with 0.3 μg lentiviral vector encoding luciferase in the place of the nef gene (pLai3ΔenvLuc2 (Yamashita and Emerman, 2004), pLai3ΔenvLuc2ΔVif (OhAinle et al., 2006), 50 ng L-VSV-G, and 300 ng pCMV-HA-A3G or pCMV-HA-empty. All viruses were harvested 48 hr after transfection and filtered through a 0.2 μm filter. p24 gag in viral supernatants was quantified using an HIV-1 p24 Antigen Capture Assay (ABL Inc). Virus equivalents to two nanograms of p24 gag were used to infect 50,000 SupT1 cells (ATCC Cat# CRL-1942, RRID:CVCL_1714) per well in a 96-well plate in the presence of 20 μg/ml DEAE-dextran. Forty-eight hours after infection, cells from triplicate infections were lysed in 100 μl Bright-Glo luciferase assay reagent (Promega) and read on a LUMIstar Omega luminometer (BMG Labtech). A3A Western blots were carried out using Covance mouse HA.11 Clone 16B12 anti-HA monoclonal antibody (Covance Cat# MMS-101P-200, RRID:AB_10064068).

Acknowledgements

We thank members of the Malik, Emerman, and McLaughlin labs for valuable discussions. We especially thank Janet Young for comments and suggestions critical for the generation of this manuscript, and Lily Wu for technical training and consultation on virus restriction assays. This work was supported by a Howard Hughes Medical Institute postdoctoral fellowship of the Helen Hay Whitney Foundation, a National Institute of General Medical Sciences (NIGMS) at the National Institutes of Health (NIH) K99/R00 Pathway to Independence Award (grant number GM112941) to RNM; National Institute of Allergy and Infectious Disease at the NIH R01 (grant number AI3092) to ME; grants from the Mathers Foundation and an NIGMS at the NIH R01 (grant number GM074108) to HSM. HSM is an Investigator of the Howard Hughes Medical Institute.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Harmit S Malik, Email: hsmalik@fhcrc.org.

Richard N McLaughlin, Jnr, Email: rmclaughlin@pnri.org.

Karla Kirkegaard, Stanford University School of Medicine, United States.

Karla Kirkegaard, Stanford University School of Medicine, United States.

Funding Information

This paper was supported by the following grants:

  • National Institute of General Medical Sciences GM112941 to Richard N McLaughlin.

  • Helen Hay Whitney Foundation to Richard N McLaughlin.

  • National Institute of Allergy and Infectious Diseases AI3092 to Michael Emerman.

  • G. Harold and Leila Y. Mathers Foundation to Harmit S Malik.

  • National Institute of General Medical Sciences GM074108 to Harmit S Malik.

  • Howard Hughes Medical Institute to Harmit S Malik.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Conceptualization, Supervision, Funding acquisition, Methodology, Writing - review and editing.

Conceptualization, Supervision, Funding acquisition, Methodology, Writing - review and editing.

Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing.

Additional files

Supplementary file 1. Sequence coordinates, orthology groups, and ORF retainment for A3Gs and A3G retrocopies.
elife-58436-supp1.xlsx (15.6KB, xlsx)
Supplementary file 2. Read counts of retrocopies across 98 New World monkey RNAseq datasets.
elife-58436-supp2.xlsx (21.9KB, xlsx)
Supplementary file 3. Codon-based and indel-sensitive alignment of primate A3Is.

Stop codons and frame shifts were included in the alignment: star (*) represents a stop codon, slash (/) represents a frame shift caused by deletion, and backslash (\) represents a frame shift caused by insertion. Header of the sequences indicate the names of species and the NCBI accession numbers where the sequences are extracted from.

elife-58436-supp3.fa (8.7KB, fa)
Transparent reporting form

Data availability

All data generated or analyzed during this study are included in the manuscript, supporting files, or publicly available databases as listed in the Supplementary files 1 and 2. Raw data files have been provided for Figure 3.

References

  1. Arias JF, Koyama T, Kinomoto M, Tokunaga K. Retroelements versus APOBEC3 family members: no great escape from the magnificent seven. Frontiers in Microbiology. 2012;3:275. doi: 10.3389/fmicb.2012.00275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Balasubramanian S, Zheng D, Liu YJ, Fang G, Frankish A, Carriero N, Robilotto R, Cayting P, Gerstein M. Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes. Genome Biology. 2009;10:R2. doi: 10.1186/gb-2009-10-1-r2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Beck CR, Garcia-Perez JL, Badge RM, Moran JV. LINE-1 elements in structural variation and disease. Annual Review of Genomics and Human Genetics. 2011;12:187–215. doi: 10.1146/annurev-genom-082509-141802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Best S, Le Tissier P, Towers G, Stoye JP. Positional cloning of the mouse retrovirus restriction gene Fv1. Nature. 1996;382:826–829. doi: 10.1038/382826a0. [DOI] [PubMed] [Google Scholar]
  5. Bininda-Emonds OR, Cardillo M, Jones KE, MacPhee RD, Beck RM, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A. The delayed rise of present-day mammals. Nature. 2007;446:507–512. doi: 10.1038/nature05634. [DOI] [PubMed] [Google Scholar]
  6. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research. 2004;14:708–715. doi: 10.1101/gr.1933104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bogerd HP, Wiegand HL, Hulme AE, Garcia-Perez JL, O'Shea KS, Moran JV, Cullen BR. Cellular inhibitors of long interspersed element 1 and alu retrotransposition. PNAS. 2006;103:8780–8785. doi: 10.1073/pnas.0603313103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Boissinot S, Entezam A, Furano AV. Selection against deleterious LINE-1-Containing loci in the human lineage. Molecular Biology and Evolution. 2001;18:926–935. doi: 10.1093/oxfordjournals.molbev.a003893. [DOI] [PubMed] [Google Scholar]
  9. Boissinot S, Roos C, Furano AV. Different rates of LINE-1 (L1) Retrotransposon amplification and evolution in new world monkeys. Journal of Molecular Evolution. 2004;58:122–130. doi: 10.1007/s00239-003-2539-x. [DOI] [PubMed] [Google Scholar]
  10. Brennan G, Kozyrev Y, Hu SL. TRIMCyp expression in old world primates Macaca nemestrina and Macaca fascicularis. PNAS. 2008;105:3569–3574. doi: 10.1073/pnas.0709511105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bulliard Y, Turelli P, Röhrig UF, Zoete V, Mangeat B, Michielin O, Trono D. Functional analysis and structural modeling of human APOBEC3G reveal the role of evolutionarily conserved elements in the inhibition of human immunodeficiency virus type 1 infection and alu transposition. Journal of Virology. 2009;83:12611–12621. doi: 10.1128/JVI.01491-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Burki F, Kaessmann H. Birth and adaptive evolution of a hominoid gene that supports high neurotransmitter flux. Nature Genetics. 2004;36:1061–1063. doi: 10.1038/ng1431. [DOI] [PubMed] [Google Scholar]
  13. Campbell CD, Eichler EE. Properties and rates of germline mutations in humans. Trends in Genetics. 2013;29:575–584. doi: 10.1016/j.tig.2013.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Canapa A, Barucca M, Biscotti MA, Forconi M, Olmo E. Transposons, genome size, and evolutionary insights in animals. Cytogenetic and Genome Research. 2016;147:217–239. doi: 10.1159/000444429. [DOI] [PubMed] [Google Scholar]
  15. Carelli FN, Hayakawa T, Go Y, Imai H, Warnefors M, Kaessmann H. The life history of retrocopies illuminates the evolution of new mammalian genes. Genome Research. 2016;26:301–314. doi: 10.1101/gr.198473.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Casola C, Betrán E. The genomic impact of gene retrocopies: what have we learned from comparative genomics, population genomics, and transcriptomic analyses? Genome Biology and Evolution. 2017;9:1351–1373. doi: 10.1093/gbe/evx081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Chen H, Lilley CE, Yu Q, Lee DV, Chou J, Narvaiza I, Landau NR, Weitzman MD. APOBEC3A is a potent inhibitor of adeno-associated virus and retrotransposons. Current Biology. 2006;16:480–485. doi: 10.1016/j.cub.2006.01.031. [DOI] [PubMed] [Google Scholar]
  18. Chuong EB, Elde NC, Feschotte C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science. 2016;351:1083–1087. doi: 10.1126/science.aad5497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Colcher D, Heberling RL, Kalter SS, Schlom J. Squirrel monkey retrovirus: an endogenous virus of a new world primate. Journal of Virology. 1977;23:294–301. doi: 10.1128/JVI.23.2.294-301.1977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Compton AA, Hirsch VM, Emerman M. The host restriction factor APOBEC3G and retroviral vif protein coevolve due to ongoing genetic conflict. Cell Host & Microbe. 2012;11:91–98. doi: 10.1016/j.chom.2011.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Cost GJ, Feng Q, Jacquier A, Boeke JD. Human L1 element target-primed reverse transcription in vitro. The EMBO Journal. 2002;21:5899–5910. doi: 10.1093/emboj/cdf592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD. Repetitive elements may comprise over two-thirds of the human genome. PLOS Genetics. 2011;7:e1002384. doi: 10.1371/journal.pgen.1002384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Dewannieux M, Esnault C, Heidmann T. LINE-mediated retrotransposition of marked Alu sequences. Nature Genetics. 2003;35:41–48. doi: 10.1038/ng1223. [DOI] [PubMed] [Google Scholar]
  24. Duggal NK, Malik HS, Emerman M. The breadth of antiviral activity of Apobec3DE in chimpanzees has been driven by positive selection. Journal of Virology. 2011;85:11361–11371. doi: 10.1128/JVI.05046-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Friedli M, Turelli P, Kapopoulou A, Rauwel B, Castro-Díaz N, Rowe HM, Ecco G, Unzu C, Planet E, Lombardo A, Mangeat B, Wildhaber BE, Naldini L, Trono D. Loss of transcriptional control over endogenous retroelements during reprogramming to pluripotency. Genome Research. 2014;24:1251–1259. doi: 10.1101/gr.172809.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Fujino K, Horie M, Honda T, Merriman DK, Tomonaga K. Inhibition of borna disease virus replication by an endogenous bornavirus-like element in the ground squirrel genome. PNAS. 2014;111:13175–13180. doi: 10.1073/pnas.1407046111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Garcia-Perez JL, Marchetto MC, Muotri AR, Coufal NG, Gage FH, O'Shea KS, Moran JV. LINE-1 retrotransposition in human embryonic stem cells. Human Molecular Genetics. 2007;16:1569–1577. doi: 10.1093/hmg/ddm105. [DOI] [PubMed] [Google Scholar]
  28. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O. New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Systematic Biology. 2010;59:307–321. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
  29. Hancks DC, Kazazian HH. Active human retrotransposons: variation and disease. Current Opinion in Genetics & Development. 2012;22:191–203. doi: 10.1016/j.gde.2012.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Harris RS, Dudley JP. APOBECs and virus restriction. Virology. 2015;479-480:131–145. doi: 10.1016/j.virol.2015.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hayward JA, Tachedjian M, Cui J, Cheng AZ, Johnson A, Baker ML, Harris RS, Wang LF, Tachedjian G. Differential evolution of antiretroviral restriction factors in pteropid bats as revealed by APOBEC3 gene complexity. Molecular Biology and Evolution. 2018;35:1626–1637. doi: 10.1093/molbev/msy048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Henry M, Terzian C, Peeters M, Wain-Hobson S, Vartanian JP. Evolution of the primate APOBEC3A cytidine deaminase gene and identification of related coding regions. PLOS ONE. 2012;7:e30036. doi: 10.1371/journal.pone.0030036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Hultquist JF, Lengyel JA, Refsland EW, LaRue RS, Lackey L, Brown WL, Harris RS. Human and rhesus APOBEC3D, APOBEC3F, APOBEC3G, and APOBEC3H demonstrate a conserved capacity to restrict Vif-deficient HIV-1. Journal of Virology. 2011;85:11220–11234. doi: 10.1128/JVI.05238-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Huthoff H, Malim MH. Identification of amino acid residues in APOBEC3G required for regulation by human immunodeficiency virus type 1 vif and virion encapsidation. Journal of Virology. 2007;81:3807–3815. doi: 10.1128/JVI.02795-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Ito J, Watanabe S, Hiratsuka T, Kuse K, Odahara Y, Ochi H, Kawamura M, Nishigaki K. Refrex-1, a soluble restriction factor against feline endogenous and exogenous retroviruses. Journal of Virology. 2013;87:12029–12040. doi: 10.1128/JVI.01267-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Ito J, Gifford RJ, Sato K. Retroviruses drive the rapid evolution of mammalian APOBEC3 genes. PNAS. 2020;117:610–618. doi: 10.1073/pnas.1914183116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB, Chow ED, Kanterakis E, Gao H, Kia A, Batzoglou S, Sanders SJ, Farh KK. Predicting splicing from primary sequence with deep learning. Cell. 2019;176:535–548. doi: 10.1016/j.cell.2018.12.015. [DOI] [PubMed] [Google Scholar]
  38. Jarmuz A, Chester A, Bayliss J, Gisbourne J, Dunham I, Scott J, Navaratnam N. An anthropoid-specific locus of orphan C to U RNA-editing enzymes on chromosome 22. Genomics. 2002;79:285–296. doi: 10.1006/geno.2002.6718. [DOI] [PubMed] [Google Scholar]
  39. Kaer K, Speek M. Retroelements in human disease. Gene. 2013;518:231–241. doi: 10.1016/j.gene.2013.01.008. [DOI] [PubMed] [Google Scholar]
  40. Kalamegham R, Sturgill D, Siegfried E, Oliver B. Drosophila mojoless, a retroposed GSK-3, has functionally diverged to acquire an essential role in male fertility. Molecular Biology and Evolution. 2007;24:732–742. doi: 10.1093/molbev/msl201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, Thierer T, Ashton B, Meintjes P, Drummond A. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28:1647–1649. doi: 10.1093/bioinformatics/bts199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Klawitter S, Fuchs NV, Upton KR, Muñoz-Lopez M, Shukla R, Wang J, Garcia-Cañadas M, Lopez-Ruiz C, Gerhardt DJ, Sebe A, Grabundzija I, Merkert S, Gerdes P, Pulgarin JA, Bock A, Held U, Witthuhn A, Haase A, Sarkadi B, Löwer J, Wolvetang EJ, Martin U, Ivics Z, Izsvák Z, Garcia-Perez JL, Faulkner GJ, Schumann GG. Reprogramming triggers endogenous L1 and alu retrotransposition in human induced pluripotent stem cells. Nature Communications. 2016;7:10286. doi: 10.1038/ncomms10286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV. Selection in the evolution of gene duplications. Genome Biology. 2002;3:research0008. doi: 10.1186/gb-2002-3-2-research0008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Krupp A, McCarthy KR, Ooms M, Letko M, Morgan JS, Simon V, Johnson WE. APOBEC3G polymorphism as a selective barrier to cross-species transmission and emergence of pathogenic SIV and AIDS in a primate host. PLOS Pathogens. 2013;9:e1003641. doi: 10.1371/journal.ppat.1003641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kunarso G, Chia NY, Jeyakani J, Hwang C, Lu X, Chan YS, Ng HH, Bourque G. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nature Genetics. 2010;42:631–634. doi: 10.1038/ng.600. [DOI] [PubMed] [Google Scholar]
  47. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J, International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
  48. Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration The sequence read archive. Nucleic Acids Research. 2011;39:D19–D21. doi: 10.1093/nar/gkq1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Luan DD, Korman MH, Jakubczak JL, Eickbush TH. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell. 1993;72:595–605. doi: 10.1016/0092-8674(93)90078-5. [DOI] [PubMed] [Google Scholar]
  50. Malfavon-Borja R, Wu LI, Emerman M, Malik HS. Birth, decay, and reconstruction of an ancient TRIMCyp gene fusion in primate genomes. PNAS. 2013;110:E583–E592. doi: 10.1073/pnas.1216542110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Malik HS, Henikoff S. Positive selection of iris, a retroviral envelope-derived host gene in Drosophila melanogaster. PLOS Genetics. 2005;1:e44. doi: 10.1371/journal.pgen.0010044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Marchetto MCN, Narvaiza I, Denli AM, Benner C, Lazzarini TA, Nathanson JL, Paquola ACM, Desai KN, Herai RH, Weitzman MD, Yeo GW, Muotri AR, Gage FH. Differential L1 regulation in pluripotent stem cells of humans and apes. Nature. 2013;503:525–529. doi: 10.1038/nature12686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. McLaughlin RN, Young JM, Yang L, Neme R, Wichman HA, Malik HS. Positive selection and multiple losses of the LINE-1-derived L1TD1 gene in mammals suggest a dual role in genome defense and pluripotency. PLOS Genetics. 2014;10:e1004531. doi: 10.1371/journal.pgen.1004531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. McLaughlin RN, Gable JT, Wittkopp CJ, Emerman M, Malik HS. Conservation and innovation of APOBEC3A restriction functions during primate evolution. Molecular Biology and Evolution. 2016;33:1889–1901. doi: 10.1093/molbev/msw070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Mi S, Lee X, Li X, Veldman GM, Finnerty H, Racie L, LaVallie E, Tang XY, Edouard P, Howes S, Keith JC, McCoy JM. Syncytin is a captive retroviral envelope protein involved in human placental morphogenesis. Nature. 2000;403:785–789. doi: 10.1038/35001608. [DOI] [PubMed] [Google Scholar]
  56. Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD, Kazazian HH. High frequency retrotransposition in cultured mammalian cells. Cell. 1996;87:917–927. doi: 10.1016/S0092-8674(00)81998-4. [DOI] [PubMed] [Google Scholar]
  57. Muckenfuss H, Hamdorf M, Held U, Perkovic M, Löwer J, Cichutek K, Flory E, Schumann GG, Münk C. APOBEC3 proteins inhibit human LINE-1 retrotransposition. Journal of Biological Chemistry. 2006;281:22161–22172. doi: 10.1074/jbc.M601716200. [DOI] [PubMed] [Google Scholar]
  58. Muniz CP, Troncoso LL, Moreira MA, Soares EA, Pissinatti A, Bonvicino CR, Seuánez HN, Sharma B, Jia H, Shankar A, Switzer WM, Santos AF, Soares MA. Identification and characterization of highly divergent simian foamy viruses in a wide range of new world primates from Brazil. PLOS ONE. 2013;8:e67568. doi: 10.1371/journal.pone.0067568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Münk C, Willemsen A, Bravo IG. An ancient history of gene duplications, fusions and losses in the evolution of APOBEC3 mutators in mammals. BMC Evolutionary Biology. 2012;12:71. doi: 10.1186/1471-2148-12-71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Muotri AR. L1 retrotransposition in neural progenitor CellsMethods in molecular biology. Methods in Molecular Biology. 2016;1400:157–163. doi: 10.1007/978-1-4939-3372-3_11. [DOI] [PubMed] [Google Scholar]
  61. Navarro F, Bollman B, Chen H, König R, Yu Q, Chiles K, Landau NR. Complementary function of the two catalytic domains of APOBEC3G. Virology. 2005;333:374–386. doi: 10.1016/j.virol.2005.01.011. [DOI] [PubMed] [Google Scholar]
  62. Navarro FC, Galante PA. A Genome-Wide landscape of retrocopies in primate genomes. Genome Biology and Evolution. 2015;7:2265–2275. doi: 10.1093/gbe/evv142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Neme R, Tautz D. Fast turnover of genome transcription across evolutionary time exposes entire non-coding DNA to de novo gene emergence. eLife. 2016;5:e09977. doi: 10.7554/eLife.09977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Newman RM, Hall L, Kirmaier A, Pozzi LA, Pery E, Farzan M, O'Neil SP, Johnson W. Evolution of a TRIM5-CypA splice isoform in old world monkeys. PLOS Pathogens. 2008;4:e1000003. doi: 10.1371/journal.ppat.1000003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Niewiadomska AM, Tian C, Tan L, Wang T, Sarkis PT, Yu XF. Differential inhibition of long interspersed element 1 by APOBEC3 does not correlate with high-molecular-mass-complex formation or P-body association. Journal of Virology. 2007;81:9577–9583. doi: 10.1128/JVI.02800-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Nisole S, Lynch C, Stoye JP, Yap MW. A Trim5-cyclophilin A fusion protein found in owl monkey kidney cells can restrict HIV-1. PNAS. 2004;101:13324–13328. doi: 10.1073/pnas.0404640101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. OhAinle M, Kerns JA, Malik HS, Emerman M. Adaptive evolution and antiviral activity of the conserved mammalian cytidine deaminase APOBEC3H. Journal of Virology. 2006;80:3853–3862. doi: 10.1128/JVI.80.8.3853-3862.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Perelman P, Johnson WE, Roos C, Seuánez HN, Horvath JE, Moreira MA, Kessing B, Pontius J, Roelke M, Rumpler Y, Schneider MP, Silva A, O'Brien SJ, Pecon-Slattery J. A molecular phylogeny of living primates. PLOS Genetics. 2011;7:e1001342. doi: 10.1371/journal.pgen.1001342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Petrov DA, Aminetzach YT, Davis JC, Bensasson D, Hirsh AE. Size matters: non-ltr retrotransposable elements and ectopic recombination in Drosophila. Molecular Biology and Evolution. 2003;20:880–892. doi: 10.1093/molbev/msg102. [DOI] [PubMed] [Google Scholar]
  70. Potrzebowski L, Vinckenbosch N, Marques AC, Chalmel F, Jégou B, Kaessmann H. Chromosomal gene movements reflect the recent origin and biology of therian sex chromosomes. PLOS Biology. 2008;6:e80. doi: 10.1371/journal.pbio.0060080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Refsland EW, Stenglein MD, Shindo K, Albin JS, Brown WL, Harris RS. Quantitative profiling of the full APOBEC3 mRNA repertoire in lymphocytes and tissues: implications for HIV-1 restriction. Nucleic Acids Research. 2010;38:4274–4284. doi: 10.1093/nar/gkq174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Rosso L, Marques AC, Weier M, Lambert N, Lambot MA, Vanderhaeghen P, Kaessmann H. Birth and rapid subcellular adaptation of a hominoid-specific CDC14 protein. PLOS Biology. 2008;6:e140. doi: 10.1371/journal.pbio.0060140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Sayah DM, Sokolskaja E, Berthoux L, Luban J. Cyclophilin A retrotransposition into TRIM5 explains owl monkey resistance to HIV-1. Nature. 2004;430:569–573. doi: 10.1038/nature02777. [DOI] [PubMed] [Google Scholar]
  74. Schmitz J, Brosius J. Exonization of transposed elements: A challenge and opportunity for evolution. Biochimie. 2011;93:1928–1934. doi: 10.1016/j.biochi.2011.07.014. [DOI] [PubMed] [Google Scholar]
  75. Silvas TV, Schiffer CA. APOBEC3s: dna-editing human cytidine deaminases. Protein Science. 2019;28:1552–1566. doi: 10.1002/pro.3670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Smit A, Hubley R, Green P. RepeatMasker Open-4.0. 2015 http://www.repeatmasker.org
  77. Song M, Boissinot S. Selection against LINE-1 retrotransposons results principally from their ability to mediate ectopic recombination. Gene. 2007;390:206–213. doi: 10.1016/j.gene.2006.09.033. [DOI] [PubMed] [Google Scholar]
  78. Sotero-Caio CG, Platt RN, Suh A, Ray DA. Evolution and diversity of transposable elements in vertebrate genomes. Genome Biology and Evolution. 2017;9:161–177. doi: 10.1093/gbe/evw264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Tacutu R, Thornton D, Johnson E, Budovsky A, Barardo D, Craig T, Diana E, Lehmann G, Toren D, Wang J, Fraifeld VE, de Magalhães JP. Human ageing genomic resources: new and updated databases. Nucleic Acids Research. 2018;46:D1083–D1090. doi: 10.1093/nar/gkx1042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Thomas GWC, Wang RJ, Puri A, Harris RA, Raveendran M, Hughes DST, Murali SC, Williams LE, Doddapaneni H, Muzny DM, Gibbs RA, Abee CR, Galinski MR, Worley KC, Rogers J, Radivojac P, Hahn MW. Reproductive longevity predicts mutation rates in primates. Current Biology. 2018;28:3193–3197. doi: 10.1016/j.cub.2018.08.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Turelli P, Vianin S, Trono D. The Innate Antiretroviral Factor APOBEC3G Does Not Affect Human LINE-1 Retrotransposition in a Cell Culture Assay. Journal of Biological Chemistry. 2004;279:43371–43373. doi: 10.1074/jbc.C400334200. [DOI] [PubMed] [Google Scholar]
  82. Uchimura A, Higuchi M, Minakuchi Y, Ohno M, Toyoda A, Fujiyama A, Miura I, Wakana S, Nishino J, Yagi T. Germline mutation rates and the long-term phenotypic effects of mutation accumulation in wild-type laboratory mice and mutator mice. Genome Research. 2015;25:1125–1134. doi: 10.1101/gr.186148.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Virgen CA, Kratovac Z, Bieniasz PD, Hatziioannou T. Independent genesis of chimeric TRIM5-cyclophilin proteins in two primate species. PNAS. 2008;105:3563–3568. doi: 10.1073/pnas.0709258105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Wang W, Brunet FG, Nevo E, Long M. Origin of Sphinx, a young chimeric RNA gene in Drosophila melanogaster. PNAS. 2002;99:4448–4453. doi: 10.1073/pnas.072066399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Wang T, Zeng J, Lowe CB, Sellers RG, Salama SR, Yang M, Burgess SM, Brachmann RK, Haussler D. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. PNAS. 2007;104:18613–18618. doi: 10.1073/pnas.0703637104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Weaver S, Shank SD, Spielman SJ, Li M, Muse SV, Kosakovsky Pond SL. Datamonkey 2.0: a modern web application for characterizing selective and other evolutionary processes. Molecular Biology and Evolution. 2018;35:773–777. doi: 10.1093/molbev/msx335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Wertheim JO, Murrell B, Smith MD, Kosakovsky Pond SL, Scheffler K. RELAX: detecting relaxed selection in a phylogenetic framework. Molecular Biology and Evolution. 2015;32:820–832. doi: 10.1093/molbev/msu400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Wilson SJ, Webb BL, Ylinen LM, Verschoor E, Heeney JL, Towers GJ. Independent evolution of an antiviral TRIMCyp in rhesus macaques. PNAS. 2008;105:3557–3562. doi: 10.1073/pnas.0709003105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Wissing S, Muñoz-Lopez M, Macia A, Yang Z, Montano M, Collins W, Garcia-Perez JL, Moran JV, Greene WC. Reprogramming somatic cells into iPS cells activates LINE-1 retroelement mobility. Human Molecular Genetics. 2012;21:208–218. doi: 10.1093/hmg/ddr455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Wong SK, Connole M, Sullivan JS, Choe H, Carville A, Farzan M. A new world primate deficient in tetherin-mediated restriction of human immunodeficiency virus type 1. Journal of Virology. 2009;83:8771–8780. doi: 10.1128/JVI.00112-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Xie Y, Rosser JM, Thompson TL, Boeke JD, An W. Characterization of L1 retrotransposition with high-throughput dual-luciferase assays. Nucleic Acids Research. 2011;39:e16. doi: 10.1093/nar/gkq1076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Yamashita M, Emerman M. Capsid is a dominant determinant of retrovirus infectivity in nondividing cells. Journal of Virology. 2004;78:5670–5678. doi: 10.1128/JVI.78.11.5670-5678.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Yan Y, Buckler-White A, Wollenberg K, Kozak CA. Origin, antiviral function and evidence for positive selection of the Gammaretrovirus restriction gene Fv1 in the genus mus. PNAS. 2009;106:3259–3263. doi: 10.1073/pnas.0900181106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution. 2007;24:1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
  95. Young GR, Yap MW, Michaux JR, Steppan SJ, Stoye JP. Evolutionary journey of the retroviral restriction gene Fv1. PNAS. 2018;115:10130–10135. doi: 10.1073/pnas.1808516115. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Karla Kirkegaard1

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

[Editors' note: this paper was reviewed by Review Commons.]

Acceptance summary:

We appreciate the well-supported notion that retrocopying rather than de novo integration has contributed to the evolution of the APOBEC family, and judge that this manuscript represents a significant addition to the field.

eLife. 2020 Jun 1;9:e58436. doi: 10.7554/eLife.58436.sa2

Author response


We thank the reviewers and the editor for the insightful and thorough assessment of our manuscript. In this response to review letter, we have listed the original review and responded to each critique after it.

Reviewer #1 (Evidence, reproducibility and clarity):

Yang et al. submitted a manuscript describing the detection of pseudogenes ("retrocopies") of APOBEC3 (A3) genes in primates. The evolutionary history and relationship to specific A3s was analyzed and speculated that the maintained A3 retrocopies had a functional role at least early in the evolution and some may have still now. Functional data on some of the expressed retrocopies are presented on L1 and HIV.

The authors claim that "retrocopying expands the functional repertoire of A3 antiviral proteins in primates". While almost of the genetic findings were published recently (Ito et al., 2020), the authors should more clearly describe how their data differ or confirm the data of Ito et al.

We thank the reviewer for their helpful comments which have guided revisions to our manuscript. We have taken steps to clarify the dramatic differences between our work and the recent publication from Ito, Gifford, and Sato.

Foremost, we respectfully disagree with the reviewer that the genetic findings in our work were contained within the Ito, et al. manuscript. Using a computational screen of assembled mammalian genome, the Sato group catalogued the gain and loss of APOBEC3 genes during the evolution of mammals. They found a fascinating correlation between the dynamics of A3s and ERVs that formed the precis of the paper. From their genome-wide search for A3s, Ito et al. describe several retrocopies of A3s in two New World monkey species, one of which retains a full-length open reading frame, leading to the statement that this gene may be functional.

We note that the retrocopies found in the Ito et al. paper span only two of the more than 20 species in which we identify A3 retrocopies. Further, as a result of the breadth of our search for A3s, we find additional retrocopies in the same two New World monkey species that were examined in the Ito et al. paper. Finally, our study also examined functional capabilities of these additional A3s. These differences are highlighted by reviewer #3 who writes that relative to Ito et al., our manuscript studies the phenomenon of A3 retrocopies “more deeply both by in silico analyses and cell culture experiments.” Reviewer #3 also summarizes the most important difference in our studies – our work presents a “conceptual advance that the antiviral gene expansion has achieved not only via tandem gene duplication but also via gene retrocopying”.

Lastly, we want to point out that the findings of our manuscript and Ito et al., 2020 were made concurrently. Indeed, throughout the preparation process of this manuscript, we were both aware of each other’s findings and shared preprints with each other. Most of the participating journals in Review Commons have “scoop protection” mechanisms that typically extend 6 months after the publication of the first article (Ito et al. was published Jan 2020), and our article was first submitted to Review Commons on February 14, 2020. Therefore, we feel confident that the “no scoop” policy applies to the minimal overlap between our paper and that of Ito et al.

Nevertheless, we have modified the text to more clearly acknowledge the parallel finding of some New World monkey retrogenes in the Ito, et al. paper.

The functional data (Figure 6) are interesting, but in the current form not complete. The authors have to show protein expression in the transfected cells (A3, L1, HIV) and level of encapsidation into viral particles. In addition, please analyze if the retrocopies express cytidine deaminase active enzymes.

We thank the reviewer for this comment, and we have added a Western blot of the six long-ORF-containing retrocopies as Figure 6—figure supplement 1. In this blot (from early in the project), we detected protein production in 293T cells for 3/6 retrocopies. In later optimizations of subsets of this blot, we were able to detect expression of the marmoset A3G and the other two marmoset retrocopies (marmoset-2 and marmoset-4). Despite optimization attempts, we were unable to detect protein for one of the retrocopies that restricts HIV-1ΔVif (capuchin-C1). Unfortunately, at this time the included blot is the only one we have in which all 6 constructs are included on a single blot. Optimally, all 6 constructs would be side-by-side in a single blot with optimized conditions, and we are happy to complete this experiment as soon as we are able to return to our lab after the SARS-2 quarantine is lifted. However, we think the added blot shows that some of the retrocopies produce protein and the absence of detectable protein from capuchin-C1 could suggest that this retrocopy is especially potent in its restriction function or an idiosyncratic problem with detecting this protein using Western blot analyses.

We have not previously tested our lentiviral particles for levels of encapsidation of protein from each retrocopy. The value we see in this experiment is in explaining why some of the retrocopies that are expressed in producer cells may not restrict in target cells. While we note that precedent in the literature suggests that A3 proteins which restrict HIV-1ΔVif are invariably encapsidated, we would be happy to carry out this experiment when our lab reopens.

In response to the reviewer’s request to test deaminase activity for each retrocopy, we note that Figure 4 shows the intactness of the deaminase motif in each retrocopy. However, we feel that a description of the mechanisms of restriction of these retrocopies is not a major point of this paper and is beyond the scope of the current investigation.

Reviewer #1 (Significance):

Minor advance compared to Ito et al., 2020.

We respectfully and rigorously disagree with this assessment. Please refer back to the reviewer’s first comment. We defer, again, to reviewer 3’s assessment that our work presents a “conceptual advance that the antiviral gene expansion has achieved not only via tandem gene duplication but also via gene retrocopying”. Moreover, we must point out that the Ito et al., 2020 paper was entirely computational; indeed, several retrogenes that could computationally be predicted to be “dead” were confirmed by us as having antiviral activity.

Reviewer #2 (Evidence, reproducibility and clarity):

Summary:

Yang et al. study the expansion of APOBEC3 (A3) cytidine deaminases genes in primates. Authors find A3 retrocopies in several lineages in primates using Blast searches. Some are old and some are species specific. Some have disablements and some have intact ORFs. Authors study their mode of evolution, expression and functionality. Authors have performed detailed analyses including functional analyses. Some A3 retrocopies are broadly expressed and some have retained ability to restrain retroelements. I agree with the authors that their data supports that retrocopying has contributed the turnover in the repertoire of host retroelement restriction factors. Authors show that some retrocopies have remained active for long periods of time and they still show that they can restrict retroelements/retrovirus. This work provides an interesting example of immune system diversification. This study of the A3 family of proteins that are part of the vertebrate innate immune system and the data supporting turnover of these kind of immune system genes is strong. The work underscores that this is a way immunity genes evolve and it has parallels in the evolution of the TRIM gene family of immune genes. I just have a few comments. I think the work can gain from analyzing some aspects of the data in more detail and presenting the big picture in a summary table, even if it is just supplementary.

Major comments:

1) A3I is in many species. Does this mean it was preserved (i.e., functional for a while)? For how long have disabling mutations been accumulating? Can we get a sense of that? Even for other retrocopies, do we have a sense of how recent has the pseudogenization been? If it is very recent that means that the gene was active until not long ago.

Our analyses suggest that A3I was born in the common ancestor of simian primates and pseudogenized before the Catarrhini/Platyrrhini split. It is possible that A3I was functional within this extended period (~12-15 million years), but the presence of a shared truncating stop codon amongst all simian A3Is suggests the gene was no longer full-length at the time of diversification of the simians. Instead, the simian LCA likely encoded an A3I with a predicted ORF of 261 codons; if this truncated ORF were functional, it was then further truncated/pseudogenized with additional frame-breaking mutations which follow the phylogeny of primates.

We estimated the timeline of pseudogenization of each retrocopy using the species distribution of each syntenic retrocopy. We also note that we find full-length ORFs in three retrocopies which have been retained for a period of time at least as long as the age of the last common ancestor of the four New World monkeys. These old but intact retrocopies motivated our simulations of ORF retention rates (Figure 5).

2) In the PAML analyses test could be performed to test if the rate of evolution that are higher or lower than 1 for particular genes are actually significantly higher or lower than 1 for the particular gene comparing the likelihoods of the modes with the given rate with the one with the rate fixed to 1. Is there enough power to do this?

We thank the reviewer for pointing out this omission in our analysis. We did perform these tests and find a significant p-value for two of the nodes p=0.058 and p=0.025 respectively). We have updated the legend for Figure 5—figure supplement 1 to incorporate these p-values.

3) It seems to me that the synteny data Figure 2—figure supplement 1 reveals they are derived from independent retroposition events and not duplications of segments because those would include flanking genes. Is this correct? Authors could comment on that.

Yes, we think that each retrocopy we show in Figure 2—figure supplement 1 is likely created via an independent retrotransposition event. We have clarified in the text that Figure 2—figure supplement 1 shows the genes used to establish synteny to support orthology of the retrocopies shared amongst multiple species and that each of these ortholog groups presumably originated via distinct retrotransposition events.

4) In Figure 5—figure supplement 1, I am not sure why orthologous genes are not grouped together in the phylogeny and why p is smaller than 0.05. How should that figure, and the probability be interpreted?

We thank the reviewer for their comments on this figure. First, the reviewer identified an error in the tree in which the branch labels for “night monkey-C2” and “night monkey-SS1” were inadvertently switched. The corrected tree now follows the pattern expected by the reviewer. Second, we employed RELAX to “determine whether selective strength was relaxed or intensified in one of these subsets relative to the other” (Wertheim, et al., 2014). In this case, the p-value corresponds to the finding that the retrocopies (test branches) show intensification of selection relative to the intron-containing A3Gs (reference branches).

We have modified Figure 5—figure supplement 1 and the associated text to more clearly explain the specific hypothesis test we report.

5) It would be good to have a summary table that summarizes what genes have support for past or current functionality (preservation for long time or recent pseudogenization, expression, purifying or positive selection, ability to restrict retroelements) and in what lineages.

We agree with this reviewer suggestion. We have added the additional information including the number of frame disrupting mutations as a measure of age, intactness, and ability to restrict retroelements to Supplementary file 1. Thanks to this suggestion, Supplementary file 1 now serves as the master table to summarize the analyses of each retrocopy.

Reviewer #2 (Significance):

This work provides an interesting example of immune system diversification. Authors study the APOBEC3 family of proteins that is part of the vertebrate innate immune system and the data supporting turnover of these kind of immune system genes. The work underscores that this is a way immunity genes evolve and it has parallels in the evolution of the TRIM gene family of immune genes revealing patterns in the mode of evolution of immunity genes. The audience of this work will be people interested in evolution of immunity, arms races and gene diversification and all evolutionary biologists interested in adaptation. I work in the field of comparative genomics and molecular evolution.

Reviewer #3 (Evidence, reproducibility and clarity):

Summary:

This manuscript by Yang et al. is a well-written, intriguing paper highlighting the evolutionary significance of the gene creation via "retrocopying". The authors investigated the expansion of antiviral A3 genes via retrocopy in Primates and found that A3G-like retrocopies have been generated repeatedly during primate evolution. A part of A3 retrocopies found in New World monkeys retained full length open reading flames and anti-lentiviral capacities. Interestingly, the spectrum of anti-retroelement activity of A3 retrocopies was different from the original (i.e., intron-containing) A3G gene in these species, suggesting the occurrence of the functional differentiation followed by gene amplification. However, one of the main findings that many A3 retrocopies are present in New World monkey is in-line to a previous report (i.e., Ito et al., 2020), and the experimental validations were based on the human (not New World monkey's) retroelements. Nevertheless, this study deeply investigated the possible importance of A3 retrocopies for the host defense system evolution both by in silico analyses and cell culture experiments. This study provides the findings that can potentially expand our knowledge on the evolutionary arms races between retroelements and the hosts.

Major:

To strengthen the impact of this work, it would be better to increase the numbers of retroviruses in which the anti-retroviral capacities are investigated. I understand that it is difficult to examine retroviruses or L1s that are colonized naturally with New World monkeys, but I suppose it is not so difficult to investigate a variety of representative retroviruses such as murine leukemia virus (MLV) or the reconstructed human endogenous retrovirus K (HERV-Kcon). This additional experiment would be helpful to highlight that the spectrum of anti-retroviral activity of A3 retrocopies is divergent from the original A3G gene in these species and strengthen the concept to be proposed by this study.

The reviewer raises a fascinating question about whether retrocopies might have different restriction abilities relative to the other A3s in a given species. First, we feel that showing activity against one pathogen is sufficient for our claim that some of the A3 retrocopies have antiviral potential. Second, we discuss in the paper the idea that HIV-1 is not the actual target of these (or any) innate immune genes in New world primates. We argue that any other targets we might test would also be surrogates for the “true” target of these genes.

Specific:

1) Since the authors found the expansion of "functional" repertoire of A3 retrocopies specifically in New World monkey, it would be better to rephrase the title as

"Retrocopying expands the functional repertoire of APOBEC3 antiviral proteins in New World monkeys".

We thank the reviewer for this comment but point out that a large portion of our manuscript presents our work on primates outside the New World monkeys. The reviewer is correct to note that our finding of restriction activity is limited to New World monkey retrocopies, but we feel that the current title will attract a broader audience and reflects the broader relevance of this work.

2) It might be better to add a figure summarizing which A3 retrocopies in which species retain nearly full length ORFs. For example, how about making a figure like Figure 2A for all the four representative New World monkey species?

We agree. We have added the length of the longest ORF for each retrocopy to Supplementary file 1.

3) Figure 3. It would be helpful to clarify that which cell of the heatmap corresponds to the intact A3 retrocopies.

We have added labels to indicate the intact A3 retrocopies and adjusted the legend accordingly.

4) Introduction, paragraph four. It would be better to replace the word "protected" with "escaped" because this retrocopy subset should include the ones that are intact but not functional.

Changed as suggested.

5) Introduction, final paragraph. It would be better to rephrase "the common ancestor of mammals" as "the common ancestor of placental mammals" because A3 gene is absent in Marsupial.

Changed as suggested.

6) Introduction, final paragraph. Please rephrase "ongoing" as "recently-occurred".

Changed as suggested.

7) Results, paragraph three. I checked the multiple sequence alignment in Supplementary file 3 and suspect that the codon (alignment) position of the shared premature stop codon is 261 (not 264).

We thank the reviewer for pointing out this discrepancy. We have revised the text to reflect the correct position of the shared stop.

8) Results paragraph three. I could not understand the meaning of the sentence "Intriguingly, one lineage-specific mutation…". Please specify the position of mutation which the authors mentioned (in Supplementary file 3 or Figure 1B).

This portion of the text refers to a reversion of a stop codon in the orangutan A3I; specifically, the stop codon shared in all simians acquired a second mutation that created a longer ORF in only this species. We have removed this sentence from the text for the sake of clarity.

9) Subsection “Retention of putatively functional NWM A3G retrogenes” paragraph five

Please refer Figure 5—figure supplement 1 here.

Changed as suggested.

10) Subsection “Retention of putatively functional NWM A3G retrogenes” paragraph five

Please say "Significant relaxed selection was not detected" rather than "Our analysis detected no relaxation…".

Changed as suggested.

11) Figure 5—figure supplement 1 indicates "p=0.015", but the authors regard it as "not significant"?

We thank the reviewer for pointing out this confusing wording. We employ RELAX to “determine whether selective strength was relaxed or intensified in one of these subsets relative to the other” (Wertheim, et al., 2014). In this case, the p-value corresponds to the finding that the retrocopies show intensification of selection.

We have modified Figure 5—figure supplement 1 to more clearly explain the specific hypothesis test for this p-value. We have also modified the text to clarify this point.

12) Subsection “Retention of putatively functional NWM A3G retrogenes” paragraph five

Please here refer the data showing the claim "Instead, these A3G retrocopies have evolved more rapidly than…".

Changed as suggested; see previous point.

13) Did the authors perform the statistical test on the dN/dS ratio analysis? If so, please mention the result of the test.

Yes, we did. Please refer to reviewer #2’s major point 3.

14) It would be better to modify the phrase "show evidence of recurrent selection for functional innovation".

Changed as suggested.

Reviewer #3 (Significance):

This study provides a conceptual advance that the antiviral gene expansion has achieved not only via tandem gene duplication but also via gene retrocopying.

Compare to existing published knowledge.

Although one of the main findings that many A3 retrocopies are present in New World monkey is in-line to a previous report (i.e., Ito et al., 2020), this study investigated the above finding more deeply both by in silico analyses and cell culture experiments.

Audience.

Evolutionary biologists and researchers in the field of viruses (particularly retroviruses including HIV-1) and transposable elements would be interested in this work.

Your expertise.

Bioinformatics, genome biology, viruses, and transposable elements.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Supplementary file 1. Sequence coordinates, orthology groups, and ORF retainment for A3Gs and A3G retrocopies.
    elife-58436-supp1.xlsx (15.6KB, xlsx)
    Supplementary file 2. Read counts of retrocopies across 98 New World monkey RNAseq datasets.
    elife-58436-supp2.xlsx (21.9KB, xlsx)
    Supplementary file 3. Codon-based and indel-sensitive alignment of primate A3Is.

    Stop codons and frame shifts were included in the alignment: star (*) represents a stop codon, slash (/) represents a frame shift caused by deletion, and backslash (\) represents a frame shift caused by insertion. Header of the sequences indicate the names of species and the NCBI accession numbers where the sequences are extracted from.

    elife-58436-supp3.fa (8.7KB, fa)
    Transparent reporting form

    Data Availability Statement

    All data generated or analyzed during this study are included in the manuscript, supporting files, or publicly available databases as listed in the Supplementary files 1 and 2. Raw data files have been provided for Figure 3.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES