Skip to main content
eLife logoLink to eLife
. 2021 Jan 29;10:e65041. doi: 10.7554/eLife.65041

Pervasive duplication of tumor suppressors in Afrotherians during the evolution of large bodies and reduced cancer risk

Juan M Vazquez 1,, Vincent J Lynch 2,
Editors: Antonis Rokas3, Patricia J Wittkopp4
PMCID: PMC7952090  PMID: 33513090

Abstract

The risk of developing cancer is correlated with body size and lifespan within species. Between species, however, there is no correlation between cancer and either body size or lifespan, indicating that large, long-lived species have evolved enhanced cancer protection mechanisms. Elephants and their relatives (Proboscideans) are a particularly interesting lineage for the exploration of mechanisms underlying the evolution of augmented cancer resistance because they evolved large bodies recently within a clade of smaller-bodied species (Afrotherians). Here, we explore the contribution of gene duplication to body size and cancer risk in Afrotherians. Unexpectedly, we found that tumor suppressor duplication was pervasive in Afrotherian genomes, rather than restricted to Proboscideans. Proboscideans, however, have duplicates in unique pathways that may underlie some aspects of their remarkable anti-cancer cell biology. These data suggest that duplication of tumor suppressor genes facilitated the evolution of increased body size by compensating for decreasing intrinsic cancer risk.

Research organism: Other

eLife digest

From the gigantic blue whale to the minuscule bumblebee bat, animals come in all shapes and sizes. Any species can develop cancer, but some are more at risk than others. In theory, if every cell has the same probability of becoming cancerous, then bigger animals should get cancer more often since they have more cells than smaller ones. Amongst the same species, this relationship is true: taller people and bigger dogs have a greater cancer risk than their smaller counterparts.

Yet this correlation does not hold when comparing between species: remarkably large creatures, like elephants and whales, are not more likely to have cancer than any other animal. But how have these gigantic animals evolved to be at lower risk for the disease?

To investigate, Vazquez and Lynch compared the cancer risk and the genetic information of a diverse group of closely related animals with different body sizes. This included elephants, woolly mammoths and mastodons as well as their small relatives, the manatees, armadillos, and marmot-sized hyraxes. Examining these species’ genomes revealed that, during evolution, elephants had acquired extra copies of ‘tumour suppressor genes’ which can sense and repair the genetic and cellular damages that turn healthy cells into tumours. This allowed the species to evolve large bodies while lowering their risk of cancer.

Further studies could investigate whether other gigantic animals evolved similar ways to shield themselves from cancer; these could also examine precisely how having additional copies of cancer-protecting genes helps reduce cancer risk, potentially paving the way for new approaches to treat or prevent the disease.

Introduction

Among the constraints on the evolution of large bodies and long lifespans in animals is an increased risk of developing cancer. If all cells in all organisms have a similar risk of malignant transformation and equivalent cancer suppression mechanisms, then organisms with many cells should have a higher prevalence of cancer than organisms with fewer cells, particularly because large and small animals have similar cell sizes (Savage et al., 2007). Consistent with this expectation there is a strong positive correlation between body size and cancer incidence within species; for example, cancer incidence increases with increasing adult height in humans (Million Women Study collaborators et al., 2011; Nunney, 2018) and with increasing body size in dogs, cats, and cattle (Dobson, 2013; Dorn et al., 1968; Lucena et al., 2011). There is no correlation, however, between body size and cancer risk between species; this lack of correlation is often referred to as ‘Peto’s Paradox’ (Caulin and Maley, 2011; Leroi et al., 2003; Peto et al., 1975). Indeed, cancer prevalence is relatively stable at ~5% across species with diverse body sizes ranging from the minuscule 51 g grass mouse to the gargantuan 4800 kg African elephant (Abegglen et al., 2015; Boddy et al., 2020; Tollis et al., 2020). The ultimate resolution to Peto’s Paradox is trivial, large-bodied and long-lived species evolved enhanced cancer protection mechanisms, but identifying and characterizing the mechanisms that underlie the evolution of augmented cancer protection has proven difficult (Ashur-Fabian et al., 2004; Seluanov et al., 2008; Gorbunova et al., 2012; Tian et al., 2013; Sulak et al., 2016).

One of the challenges for discovering how animals evolved enhanced cancer protection mechanisms is identifying lineages in which large-bodied species are nested within species with small body sizes. Afrotherian mammals are generally small-bodied, but also include the largest extant land mammals. For example, maximum adult weights are ~70 g in golden moles,~120 g in tenrecs,~170 g in elephant shrews,~3 kg in hyraxes, and ~60 kg in aardvarks (Tacutu et al., 2013). In contrast, while extant hyraxes are relatively small, the extinct Titanohyrax is estimated to have weighed ~1300 kg (Schwartz et al., 1995). The largest living Afrotheria are also dwarfed by the size of their recent extinct relatives: extant sea cows such as manatees are large bodied (~322–480 kg) but are relatively small compared to the extinct Stellar’s sea cow which is estimated to have weighed ~8000–10,000 kg (Scheffer, 1972). Similarly African Savannah (4800 kg) and Asian elephants (3200 kg) are large, but are dwarfed by the truly gigantic extinct Proboscideans such as Deinotherium (~12,000 kg), Mammut borsoni (16,000 kg), and the straight-tusked elephant (~14,000 kg) (Larramendi, 2015). Remarkably, these large-bodied Afrotherian lineages are nested deeply within small-bodied species (Figure 1; O Leary et al., 2013a; Springer et al., 2013; O Leary et al., 2013b; Puttick and Thomas, 2015), indicating that gigantism independently evolved in hyraxes, sea cows, and elephants (Paenungulata). Thus, Paenungulates are an excellent model system in which to explore the mechanisms that underlie the evolution of large body sizes and augmented cancer resistance.

Figure 1. Large-bodied Afrotherians are nested within species with smaller body sizes (Tacutu et al., 2013; Puttick and Thomas, 2015).

Figure 1.

(A) Phylogenetic relationships between Eutherian orders, examples of each order are given in parenthesis. Horizontal branch lengths are proportional to time since divergence between lineages (see scale, Millions of Ago [MYA]). The clades Atlantogenata and Boreoeutheria are indicated, the order Proboscidea is colored blue, Sirenia is colored orange, and Hyracoidea is colored red. (B) Phylogenetic relationships of extant and recently extinct Atlantogenatans with available genomes are shown along with clade names and maximum body sizes. Note that horizontal branch lengths are arbitrary, species indicated with skull and crossbones are extinct, and those in parentheses do not have genomes. The order Proboscidea is colored blue, Sirenia is colored orange, and Hyracoidea is colored red.

Box 1.

Eutherian phylogenetic relationships.

Eutheria (eu- ‘good’ or ‘right’ and thēríon ‘beast’, hence ‘true beasts’) is one of three living (extant) mammalian lineages (Monotremes, Marsupials, and Eutherians) that diverged in the early–late Cretaceous. Eutheria was named in 1872 by Theodore Gill and refined by Thomas Henry Huxley in 1880. Living Eutherians are comprised of 18 orders, divided into two major clades (Figure 1A): Atlantogenata including the superorders Xenarthra (armadillos, anteaters, and sloths) and Afrotheria (Proboscidea, Sirenia, Hyracoidea, Tublidentata, Afroinsectivora, Cingulata, and Pilosa), and Boreoeutheria including the superorders Laurasiatheria (Insectivora, Artodactyla, Pholidota, and Carnovora) and Euarchontoglires (Lagomorpha, Rodentia, Scandentia, Dermoptera, and Primates). In our analyses, we have focused on identifying gene duplications in Afrotherian and Xenarthran genomes (Figure 1B), using the Xenarthrans Hoffmans two-toed sloth (Choloepus hoffmanni) and nine-banded armadillo (Dasypus novemcinctus) as out-groups to the Afrotherians. This approach allows us to use phylogenetic methods to polarize gene duplication events and identify genes that duplicated in the Afrotherian stem-lineage.

Many mechanisms have been suggested to resolve Peto’s paradox, including a decrease in the copy number of oncogenes, an increase in the copy number of tumor suppressor genes (Caulin and Maley, 2011; Leroi et al., 2003; Nunney, 1999), reduced metabolic rates, reduced retroviral activity and load (Katzourakis et al., 2014), and selection for ‘cheater’ tumors that parasitize the growth of other tumors (Nagy et al., 2007), greater sensitivity of cells to DNA damage (Abegglen et al., 2015; Sulak et al., 2016), enhanced recognition of neoantigens by T cells, among many others. Among the most parsimonious routes to enhanced cancer resistance may be through an increased copy number of tumor suppressors. For example, transgenic mice with additional copies of TP53 have reduced cancer rates and extended lifespans (García-Cao et al., 2002), suggesting that changes in the copy number of tumor suppressors can affect cancer rates. Indeed, candidate genes studies have found that elephant genomes encode duplicate tumor suppressors such as TP53 and LIF (Abegglen et al., 2015; Sulak et al., 2016; Vazquez et al., 2018) as well as other genes with putative tumor suppressive functions (Caulin et al., 2015; Doherty and de Magalhães, 2016). These studies, however, focused on a priori candidate genes; thus it is unclear whether duplication of tumor suppressor genes is a general phenomenon in the elephant lineage or reflects an ascertainment bias.

Here we trace the evolution of body mass, cancer risk, and gene copy number variation across Afrotherian genomes, including multiple living and extinct Proboscideans (Figure 1), to investigate whether duplications of tumor suppressors coincided with the evolution of large body sizes. Our estimates of the evolution of body mass across Afrotheria show that large body masses evolved in a stepwise manner, similar to previous studies (O Leary et al., 2013a; Springer et al., 2013; O Leary et al., 2013b; Puttick and Thomas, 2015) and coincident with dramatic reductions in intrinsic cancer risk. To explore whether duplication of tumor suppressors occurred coincident with the evolution of large body sizes, we used a genome-wide Reciprocal Best BLAT Hit (RBBH) strategy to identify gene duplications and used maximum likelihood to infer the lineages in which those duplications occurred. Unexpectedly, we found that duplication of tumor suppressor genes was common in Afrotherians, both large and small. Gene duplications in the Proboscidean lineage, however, were uniquely enriched in pathways that may explain some of the unique cancer protection mechanisms observed in elephant cells. These data suggest that duplication of tumor suppressor genes is pervasive in Afrotherians and preceded the evolution of species with exceptionally large body sizes.

Results

Step-wise evolution of body size in Afrotherians

Similar to previous studies of Afrotherian body size (Puttick and Thomas, 2015; Elliot and Mooers, 2014), we found that the body mass of the Afrotherian ancestor was inferred to be small (0.26 kg, 95% CI: 0.31–3.01 kg) and that substantial accelerations in the rate of body mass evolution occurred coincident with a 67.36× increase in body mass in the stem-lineage of Pseudoungulata (17.33 kg); a 1.45× increase in body mass in the stem-lineage of Paenungulata (25.08 kg); a 11.82× increase in body mass in the stem-lineage of Tehthytheria (296.56 kg); a 1.39× increase in body mass in the stem-lineage of Proboscidea (412.5 kg); and a 2.69× increase in body mass in the stem-lineage of Elephantimorpha (4114.39 kg), which is the last common ancestor of elephants and mastodons using the fossil record (Figure 2A,B). The ancestral Hyracoidea was inferred to be relatively small (2.86–118.18kg), and rate accelerations were coincident with independent body mass increases in large hyraxes such as Titanohyrax andrewsi (429.34 kg, 67.36× increase) (Figure 2A,B). While the body mass of the ancestral Sirenian was inferred to be large (61.7–955.51 kg), a rate acceleration occurred coincident with a 10.59× increase in body mass in Stellar’s sea cow (Figure 2A,B). Rate accelerations also occurred coincident with dramatic reductions in body mass (36.6× decrease) in the stem-lineage of the dwarf elephants Elephas (Palaeoloxodon) antiquus falconeri and Elephas cypriotes (Figure 2A,B). These data indicate that gigantism in Afrotherians evolved step-wise, from small to medium bodies in the Pseudoungulata stem-lineage, medium to large bodies in the Tehthytherian stem-lineage and extinct hyraxes, and from large to exceptionally large bodies independently in the Proboscidean stem-lineage and Stellar’s sea cow (Figure 2A,B).

Figure 2. Convergent evolution of large-bodied, cancer resistant Afrotherians.

Figure 2.

(A) Atlantogenatan phylogeny, with branch lengths scaled by log2 change in body size (left) or log2 change in intrinsic cancer risk (right). Branches are colored according to ancestral state reconstruction of body mass or estimated intrinsic cancer risk. Clades and lineages leading to extant Proboscideans and dwarf elephants are labeled. (B) Extant and ancestral body size (left), lifespan (middle), and estimated intrinsic cancer risk reconstructions; data are shown as mean (dot) and 95% confidence interval (CI, whiskers).

Step-wise reduction of intrinsic cancer risk in large, long-lived Afrotherians

In order to account for a relatively stable cancer rate across species (Abegglen et al., 2015; Boddy et al., 2020; Tollis et al., 2020), intrinsic cancer risk must also evolve with changes in body size and lifespan across species. We used empirical body size and lifespan data from extant species and empirical body size and estimated lifespan data from extinct species to estimate intrinsic cancer risk (K) with the simplified multistage cancer risk model KDt6, where D is the maximum body size and t is the maximum lifespan (Peto et al., 1975: Peto, 2015; Armitage, 1985; Armitage and Doll, 2004). As expected, intrinsic cancer risk in Afrotheria also varies with changes in body size and longevity (Figure 2A,B), with a 6.41-log2 decreases in the stem-lineage of Xenarthra, followed by a 13.37-log2 decrease in Pseudoungulata, and a 1.49-log2 decrease in Aardvarks (Figure 2A). In contrast to the Paenungulate stem-lineage, there is a 7.84-log2 decrease in cancer risk in Tethytheria, a 0.67-log2 decrease in Manatee, a 3.14-log2 decrease in Elephantimorpha, and a 1.05-log2 decrease in Proboscidea. Relatively minor decreases occurred within Proboscidea including a 0.83-log2 decrease in Elephantidae and a 0.57-log2 decrease in the American Mastodon. Within the Elephantidae, Elephantina and Loxodontini have a 0.06-log2 decrease in cancer susceptibility, while susceptibility is relatively stable in Mammoths. The three extant Proboscideans, Asian Elephant, African Savana Elephant, and the African Forest Elephant, meanwhile, have similar decreases in body size, with slight increases in cancer susceptibility (Figure 2A,B).

Pervasive duplication of tumor suppressor genes in Afrotheria

Our hypothesis was that genes which duplicated coincident with the evolution of increased body mass (IBM) and reduced intrinsic cancer risk (RICR) would be uniquely enriched in tumor suppressor pathways compared to genes that duplicated in other lineages. Therefore, we identified duplicated genes in each Afrotherian lineage (Table 1 and Figure 3A) and tested if they were enriched in Reactome pathways related to cancer biology (Figure 3B, Table 2). No pathways related to cancer biology were enriched in either the Pseudoungulata (67.36-fold IBM, 13.37-log2 RICR), but few genes were inferred to be duplicated in this lineage reducing power to detect enriched pathways. Consistent with our hypothesis, 18.18% of the pathways that were enriched in the Paenungulate stem-lineage (1.45-fold IBM, 1.17-log2 RICR), 63% of the pathways that were enriched in the Tethytherian stem-lineage (11.82-fold IBM, 7.84-log2 RICR), and 38.81% of the pathways that were enriched in the Proboscidean stem-lineage (1.06-fold IBM, 3.14-log2 RICR) were related to tumor suppression (Figure 3B, Table 2). Similarly, 21.28% and 38.00% of the pathways that were enriched in manatee (1.11-fold IBM, 0.89-log2 RICR) and aardvark (67.36-fold IBM, 1.49-log2 RICR), respectively, were related to tumor suppression. In contrast, only 2.86% of the pathways that were enriched in hyrax (1.6-fold IBM, 1.49-log2 RICR) were related to tumor suppression (Figure 3B, Table 2). Unexpectedly, however, lineages without major increases in body size or lifespan, or decreases in intrinsic cancer risk, were also enriched for tumor suppressor pathways. For example, 13.85%, 37.04%, and 22.00% of the pathways that were enriched in the stem-lineages of Afroinsectivoa and Afrosoricida, and in E. telfairi, respectively, were related to cancer biology (Figure 3B, Table 2).

Table 1. Genomes used in this study.

Species Common Name Genomes Highest Quality Genome Reference(s)
Choloepus hoffmanni Hoffmans two-toed sloth choHof1, choHof-C_hoffmanni-2.0.1_HiC Dudchenko et al., 2017
choHof2,
choHof-C_hoffmanni-2.0.1_HiC
Chrysochloris asiatica Cape golden mole chrAsi1m chrAsi1m GCA_000296735.1
Dasypus novemcinctus Nine-banded armadillo dasNov3 dasNov3 GCA_000208655.2
Echinops telfairi Lesser Hedgehog Tenrec echTel2 echTel2 GCA_000313985.1
Elephantulus edwardii Cape elephant shrew eleEdw1m eleEdw1m GCA_000299155.1
Elephas maximus Asian elephant eleMaxD eleMaxD Palkopoulou et al., 2018
Loxodonta africana African savanna elephant loxAfr3, loxAfr4 ftp://ftp.broadinstitute.org/pub/assemblies/mammals/elephant/loxAfr4
loxAfrC,
loxAfr4
Loxodonta cyclotis African forest elephant loxCycF loxCycF Palkopoulou et al., 2018
Mammut americanum American mastodon mamAmeI mamAmeI Palkopoulou et al., 2018
Mammuthus columbi Columbian mammoth mamColU mamColU Palkopoulou et al., 2018
Mammuthus primigenius Woolly mammoth mamPriV mamPriV Palkopoulou et al., 2015
Orycteropus afer Aardvark oryAfe1, oryAfe2 oryAfe2 Dudchenko et al., 2017
Palaeoloxodon antiquus Straight tusked elephant palAntN palAntN Palkopoulou et al., 2018
Procavia capensis Rock hyrax proCap1, proCap2, proCap-Pcap_2.0_HiC proCap-Pcap_2.0_HiC Dudchenko et al., 2017; Lindblad-Toh et al., 2011
Trichechus manatus latirostris Manatee triMan1, triManLat2 triManLat2 Dudchenko et al., 2017; Foote et al., 2015

Figure 3. Pervasive duplication of tumor suppressors in Atlantogenata.

(A) Afrotherian phylogeny indicating the number of genes duplicated in each lineage, inferred by maximum likelihood with Bayesian posterior probability (BPP) ≥0.80. Branches are colored according to log2 change in body size. Inset, phylogeny with branch lengths proportional to gene expression changes per gene. (B) Upset plot of cancer related Reactome pathways enriched in each Afrotherian lineage; lineages in which the cancer pathway enrichment percentage is less than background are shown in gray. Note that Upset plots are Euler diagrams showing intersections between sets; lines indicate intersections in pathway terms between lineages connected by that line (for example, the line connecting the points for Aardvark and Tenrec indicate pathway indications for those two lineages), and empty sets are not shown. (C) Wordcloud of pathways enriched exclusively in the Proboscidean stem-lineage (purple), shared between Proboscidea and Tethytheria (blue), or shared between Proboscidea and any other lineage (green).

Figure 3.

Figure 3—figure supplement 1. Estimated Copy Number by Coverage (ECNC) consolidates fragmented genes while accounting for missing domains in homologs.

Figure 3—figure supplement 1.

(A) A single, contiguous gene homolog in a target genome with 100% query length coverage has an ECNC of 1.0. (B) Two contiguous gene homologs each with 100% query length coverage have an ECNC of 2.0. (C) A single gene homolog, split across multiple scaffolds and contigs in a fragmented target genome; BLAT identifies each fragment as a single hit. Per nucleotide of query sequence, there is only one corresponding nucleotide over all the hits, thus the ECNC is 1.0. (D) Two gene homologs, one fragmented and one contiguous. 100% of nucleotides in the query sequence are represented between all hits; however, every nucleotide in the query has two matching nucleotides in the target genome, thus the ECNC is 2.0. (E) One true gene homolog in the target genome, plus multiple hits of a conserved domain that span 20% of the query sequence. While 100% of the query sequence is represented in total, 20% of the nucleotides have four hits. Thus, the ECNC for this gene is 1.45. (F) Two real gene homologs; one hit is contiguous, one hit is fragmented in two, and the tail end of both sequences was not identified by BLAT due to sequence divergence. Only 75% of the query sequence was covered in total between the hits, but for that 75%, each nucleotide has two hits. As such, ECNC is equal to 2.0 for this gene.
Figure 3—figure supplement 2. Correlations between genome quality metrics and ECNC metrics.

Figure 3—figure supplement 2.

Gene copy number metrics, and the genome quality metrics most strongly associated with them, are highlighted in red.

Table 2. Summary of reactome pathways in Atlantogenata.

Number of Percentage Cancer pathways greater than simulated?
Genes Pathways Cancer pathways Simulated cancer pathways
Afroinsectivora 36 65 13.85% 15.42% No
Afrosoricida 79 27 37.04% 15.42% Yes
Chrysochloris asiatica 1591 100 27.00% 15.42% Yes
Echinops telfairi 587 100 22.00% 15.42% Yes
Elephantidae 61 25 20.00% 13.03% Yes
Elephantulus edwardii 2103 100 22.00% 15.42% Yes
Elephas maximus 94 32 40.63% 17.73% Yes
Loxodona 12 60 10.00% 14.53% No
Loxodonta africana 100 47 53.19% 15.42% Yes
Loxodonta cyclotis 76 35 34.29% 16.11% Yes
Loxodontini 15 12 0.00% 13.82% No
Mammut americanum 52 16 0.00% 12.91% No
Mammuthus 5 62 4.84% 15.29% No
Mammuthus columbi 28 26 26.92% 12.88% Yes
Mammuthus primigenius 35 16 0.00% 12.28% No
Orycteropus afer 504 100 38.00% 15.42% Yes
Paenungulata 28 22 18.18% 12.88% Yes
Palaeoloxodon antiquus 35 8 0.00% 12.28% No
Proboscidea 157 67 38.81% 9.52% Yes
Procavia capensis 383 35 2.86% 15.42% No
Pseudoungulata 9 10 0.00% 14.90% No
Tethytheria 83 46 63.04% 18.52% Yes
Trichechus manatus 484 47 21.28% 15.42% Yes

Our observation that gene duplicates in most lineages are enriched in cancer pathways suggest either that duplication of genes in cancer pathways is common in Afrotherians, or that there may be a systemic bias in the pathway enrichment analyses. For example, random gene sets may be generally enriched in pathway terms related to cancer biology. To explore this latter possibility, we generated 5000 randomly sampled gene sets of between 10 and 5000 genes, and tested for enriched Reactome pathways using ORA. We found that no cancer pathways were enriched (median hypergeometric p-value ≤0.05) among gene sets tested greater than 157 genes; however, in these smaller gene sets, 12–18% of enriched pathways were classified as cancer pathways. Without considering p-value thresholds, the percentage of enriched cancer pathways approaches ~15% (213/1381) in simulated sets. Thus, for larger gene sets, we used a simulated threshold of ~15% to determine if pathways related to cancer biology were enriched more than one would expect from sampling bias (Table 2). We directly compared our simulated and observed enrichment results by lineage and gene set size, and found that Afrosoricida, Cape golden mole, tenrec, Elephantidae, elephant shrew, Asian elephant, African Savannah elephant, African Forest elephant, Columbian mammoth, aardvark, Paenungulata, Proboscidea, Tethytheria, and manatee had enriched cancer pathway percentages above background with respect to their gene set sizes, that is expected enrichments based on random sampling of small gene sets (Table 2). Thus, we conclude that duplication of genes in cancer pathways is common in many Afrotherians but that the inference of enriched cancer pathway duplication is not different from background in some lineages, particularly in ancestral nodes with a small number of estimated duplicates.

Tumor suppressor pathways enriched exclusively within Proboscideans

While duplication of cancer associated genes is common in Afrotheria, the 157 genes that duplicated in the Proboscidean stem-lineage (Figure 3A) were uniquely enriched in 12 pathways related to cancer biology (Figure 3B). Among these uniquely enriched pathways (Figure 3C) were pathways related to the cell cycle, including ‘G0 and Early G1’, ‘G2/M Checkpoints’, and ‘Phosphorylation of the APC/C’, pathways related to DNA damage repair including ‘Global Genome Nucleotide Excision Repair (GG-NER)’, ‘HDR through Single Strand Annealing (SSA)’, ‘Gap-filling DNA repair synthesis and ligation in GG-NER’, ‘Recognition of DNA damage by PCNA-containing replication complex’, and ‘DNA Damage Recognition in GG-NER’, pathways related to telomere biology including ‘Extension of Telomeres’ and ‘Telomere Maintenance’, pathways related to the apoptosome including ‘Activation of caspases through apoptosome-mediated cleavage’, and pathways related to ‘mTORC1-mediated signaling’ and ‘mTOR signaling’, which play important roles in the biology of aging. Thus, duplication of genes with tumor suppressor functions is pervasive in Afrotherians, but genes in some pathways related to cancer biology and tumor suppression are uniquely duplicated in large-bodied (long-lived) Proboscideans (Figure 4A,B).

Figure 4. Duplications in the African savannah elephant (Loxodonta africana) are enriched for TP53-related and other tumor suppressor processes.

Figure 4.

(A) Upset plot of cancer-related Reactome pathways in African savannah elephant, highlighting shared genes in each set, and the pathway class represented by the combinations (see Figure 3 for a description of Upset plots). (B) Inverted Upset plot from A showing the pathways shared by genes highlighted by WEBGESTALT in each pathway. (C) Cladogram of Afrotheria with sequenced genomes. Exemplar tumor suppressor duplicates are mapped onto lineages in which those genes are duplicated. Dots represent a duplication event of the color-coded genes. Note that we are unable to determine duplication status for some genes in Proboscideans because of assembly gaps in ancient genomes (indicated with skull and crossbones); these genes appear to be independently duplicated in extant species (African Forest, African Savanah, and Asian elephants) because they are missing from ancient genomes, biasing ancestral reconstructions of duplication status. (D) Gene expression levels of genes from panel C that have two or more expressed duplicates.

Figure 4—source data 1. Data set used for manual coding gene potential associated with Figure 4C,D.

Among the genes uniquely duplicated within Proboscideans are TP53, COX20, LAMTOR5, PRDX1, STK11, BRD7, MAD2L1, BUB3, UBE2D1, SOD1, LIF, MAPRE1, CNOT11, CASP9, CD14, and HMGB2 (Figure 4C). Two of these, TP53 and LIF, have been previously described (Abegglen et al., 2015; Sulak et al., 2016; Vazquez et al., 2018). These genes are significantly enriched in pathways involved in apoptosis, cell cycle regulation, and both upstream and downstream pathways involving TP53. The majority of these genes are expressed in African Elephant transcriptome data (Figure 4D), suggesting that they maintained functionality after duplication.

Coordinated duplication of TP53-related genes in Proboscidea

Prior studies found that the ‘master’ tumor suppressor TP53 duplicated multiple times in elephants (Abegglen et al., 2015; Sulak et al., 2016), motivating us to further study duplication of genes involved in TP53-related pathways in Proboscidea. We traced the evolution of genes in the TP53 pathway that appeared in one or more Reactome pathway enrichments for genes duplicated recently in the African Elephant, which has the most complete genome among Proboscideans and for which several RNA-Seq data sets are available. We found that the initial duplication of TP53 in Tethytheria, where body size expanded, was preceded by the duplication of GTF2F1 and STK11 in Paenungulata and was coincident with the duplication of BRD7. These three genes are involved in regulating the transcription of TP53 (Liang and Mills, 2013; Launonen, 2005; Drost et al., 2010; Burrows et al., 2010), and their duplication prior to that of TP53 may have facilitated re-functionalization of TP53 retroduplicates. Interestingly, STK11 is also tumor suppressor that mediates tumor suppression via p21-induced senescence (Launonen, 2005). The other genes that are duplicated in the pathway are downstream of TP53; these genes duplicated either coincident with TP53, as in the case of SIAH1, or subsequently in Proboscidea, Elephantidae, or extant elephants (Figure 4). These genes are expressed in RNA-Seq data (Figure 4D), suggesting that they are functional.

While transcript abundance estimates inferred from RNA-Seq data can suggest that genes are functional, recent non-functional duplicates can still be transcribed. Therefore we inferred if each duplicate shown in Figure 4C/D encoded a putatively function protein by manually curation, specifically to identify premature stop codons and overall sequence conservation. Most genes in Figure 4C/D, such as STK11, CD14, SOD1, and BRD7, were well conserved and lacked premature stop codons. We also find that the STK11, CD14, and BRD7 genes in the manatee were also well conserved, suggesting that extant manatees may also have enhanced tumor suppression and an augmented stress response. However, some of the duplicate genes in the mantatee genome have premature stop codons suggesting they are not translated into functional proteins, including the additional copies of MAPRE1, BUB3, and COX20 as well as at least one of the duplicate copies of CNOT11, HMGB2, MAD2L1, LIF, and TP53. For TP53, we have previously shown that duplicate copies of genes containing premature stop codons may still serve a functional role in regulating its progenitor's function. Thus, some of the genes with premature stop codons, such as duplicate COX20 and MAD2L1 which are expressed in RNA-Seq data, may encode functional lncRNA transcripts or truncated proteins. Some copies, including for CASP9 and PRDX1, contained partial RBBH hits with no premature stop codons; however, they also lacked the totality of the coding sequence and thus may represent cases of pseudogenization, subfunctionalization, or neofunctionalization.

Discussion

Among the evolutionary, developmental, and life history constraints on the evolution of large bodies and long lifespans is an increased risk of developing cancer. While body size and lifespan are correlated with cancer risk within species, there is no correlation between species because large and long-lived organisms have evolved enhanced cancer suppression mechanisms. While this ultimate evolutionary explanation is straightforward (Peto, 2015), determining the mechanisms that underlie the evolution of enhanced cancer protection is challenging because many mechanisms with relatively small effects likely contribute to evolution of reduced cancer risk. Previous candidate gene studies in elephants have identified duplications of tumor suppressors such as TP53 and LIF, among others, suggesting that an increased copy number of tumor suppressors may contribute to the evolution of large body sizes in the elephant lineage (Abegglen et al., 2015; Sulak et al., 2016; Vazquez et al., 2018; Caulin et al., 2015; Doherty and de Magalhães, 2016). Here we: (1) trace the evolution of body size and lifespan in Eutherian mammals, with particular reference to Afrotherians; (2) infer changes in cancer susceptibility across Afrotherian lineages; (3) use a genome-wide screen to identify gene duplications in Afrotherian genomes, including multiple living and extinct Proboscideans; and (4) show that while duplication of genes with tumor suppressor functions is pervasive in Afrotherian genomes, Proboscidean gene duplicates are enriched in unique pathways with tumor suppressor functions.

Correlated evolution of large bodies and reduced cancer risk

The hundred- to hundred-million-fold reductions in intrinsic cancer risk associated with the evolution of large body sizes in some Afrotherian lineages, in particular Elephantimorphs such as elephants and mastodons, suggests that these lineages must have also evolved remarkable mechanisms to suppress cancer. While our initial hypothesis was that large-bodied lineages would be uniquely enriched in duplicate tumor suppressor genes compared to other smaller-bodied lineages, we unexpectedly found that the duplication of genes in tumor suppressor pathways occurred at various points throughout the evolution of Afrotheria, regardless of body size. These data suggest that this abundance of tumor suppressors may have contributed to the evolution of large bodies and reduced cancer risk, but that these processes were not necessarily coincident. Interestingly, pervasive duplication of tumor suppressors may also have contributed to the repeated evolution of large bodies in hyraxes and sea cows, because at least some of the genetic changes that underlie the evolution of reduced cancer risk were common in this group. It remains to be determined whether our observation of pervasive duplication of tumor suppressors also occurs in other multicellular lineages. Using a similar reciprocal best BLAST/BLAT approach that focused on estimating copy number of known tumor suppressors in mammalian genomes, for example, Caulin et al., 2015 found no correlation between copy number or tumor suppressors with either body mass or longevity, whereas Tollis et al., 2020 found a correlation between copy number and longevity (but not body size) (Tollis et al., 2020; Caulin et al., 2015). These opposing conclusions may result from differences in the number of genes (81 vs 548) and genomes (8 vs 63) analyzed, highlighting the need for genome-wide analyses of many species that vary in body size and longevity.

There’s no such thing as a free lunch: Trade-offs and constraints on tumor suppressor copy number

While we observed that duplication of genes in cancer related pathways – including genes with known tumor suppressor functions – is pervasive in Afrotheria, the number of duplicate tumor suppressor genes was relatively small, which may reflect a trade-off between the protective effects of increased tumor suppressor number on cancer risk and potentially deleterious consequences of increased tumor suppressor copy number. Overexpression of TP53 in mice, for example, is protective against cancer but associated with progeria, premature reproductive senescence, and early death; however, transgenic mice with a duplication of the TP53 locus that includes native regulatory elements are healthy and experience normal aging, while also demonstrating an enhanced response to cellular stress and lower rates of cancer (García-Cao et al., 2002; Tyner et al., 2002). These data suggest that duplication of tumor suppressors can contribute to augmented cancer resistance, if the duplication includes sufficient regulatory architecture to direct spatially and temporally appropriate gene expression. Thus, it is interesting that duplication of genes that regulate TP53 function, such as STK11, SIAH1, and BRD7, preceded the retroduplication TP53 in the Proboscidean stem-lineage, which may have mitigated toxicity arising from dosage imbalances. Similar co-duplication events may have alleviated the negative pleiotropy of tumor suppressor gene duplications to enable their persistence and allow for subsequent co-option during the evolution of cancer resistance.

Caveats and limitations

Our genome-wide results suggest that duplication of tumor suppressors is pervasive in Afrotherians and may have enabled the evolution of larger body sizes in multiple lineages by lowering intrinsic cancer risk either prior to or coincident with increasing body size. However, our study has several inherent limitations. For example, we have shown that genome quality plays an important role in our ability to identify duplicate genes, and several species have poor quality genomes (and thus were excluded from further analyses). While several efforts have been established with the goal of generating high quality (chromosome length) reference genomes for mammals, such as DNAZoo, The Zoonomia Project, the Vertebrate Genomes Project, and Genome 10K, Atlantogenatans represent a minority of available genome projects. And while a few high quality Atlantogenatan genomes are available, they lack reference gene and transcriptome annotations, and genome browser graphical user interfaces that allow for easy access to genome data for the broader community, limiting their usefulness. Similarly, without comprehensive gene expression data we cannot be certain that duplicate genes are actually expressed, and thus functional. Our results on genome quality suggest several research priorities for these less well-studies species, including generating chromosome length reference genomes and genome annotations, and incorporating these species into existing genome browsers (such as UCSC Genome Browser).

We also assume that gene duplicates either maintain ancestral tumor suppressor functions and increase cancer resistance through dosage effects or provide redundancy to loss of function mutations thereby increasing robustness of tumor suppression. Many processes, such as developmental systems drift, neofunctionalization, and sub-functionalization, can cause divergence in gene functions and invalidate the assumption of conservation of gene function (Rastogi and Liberles, 2005; Qian and Zhang, 2014; Stoltzfus, 1999), leading to inaccurate inferences in gene and pathway functions which is a common problem in comparative genomic studies using pathway and gene ontologies to categorize gene function. In addition, we assume that most duplicate genes are functional but it is likely that some of the duplicates were identify are non-functional pseudogenes. Differentiating between functional and non-functional genes using comparative genomics can be challenging. For example, non-functional pseudogenes often accumulate non-synonymous amino acid substitutions and premature stop codons but these same changes can also occur in functional genes. For example, we have found that the elephant genome encodes TP53 retogenes (TP53RTGs) all of which encode premature stop codons suggesting they are pseudogenes, but these TP53TRGs are expressed, encode functional separation of function mutants of the ancestral TP53 gene, and contribute to enhanced DNA damage sensitivity in elephant cells. Similarly, we have characterized duplicate LIF gene in elephants (LIF6) that lacks the start codon and exon 1 of the parent LIF gene. LIF6 is expressed, encodes a functional protein with translation initiated at an alternative downstream start site, and also contributes to enhanced DNA damage sensitivity in elephant cells. In addition, duplicate genes that lack coding potential, such as PTENP1, can also be expressed and while not translated function as LINC RNAs (in this case acting as a sponge for microRNAs that target the parent PTEN transcript). In each case classifying duplicates into putatively functional and non-functional categories based on sequence characteristic would misclassify TP53RTGs, LIF6, and PTENP1. Thus, sequence features of pseudogenes may maintain function, as a consequence of not excluding putative pseudogenes some of the genes we include in downstream analyses may be non-functional. Further experimental studies are needed to determine which duplicates are expressed and functional.

The focus of this study, motivated by our previous identification of TP53 and LIF duplicates, was on the role gene duplication in general may have played in the resolution of Peto’s paradox in large-bodied Afrotherians, particularly Proboscidea. Duplication of tumor suppressor genes, however, is unlikely to be the sole mechanism responsible for the evolution of large body sizes, long lifespans, and reduced cancer risk. The evolution of regulatory elements, coding genes, genes with non-canonical tumor suppressor functions, and immune cell recognition of cancerous cells are also likely important for reducing the risk of cancer.

Conclusions: All Afrotherians are equal, but some are more equal than others

While we found that duplication of tumor suppressor genes is common in Afrotheria, genes that duplicated in the Proboscidean stem-lineage (Figure 3A,B) were uniquely enriched in functions and pathways that may be related to the evolution of unique anti-cancer cellular phenotypes in the elephant lineage (Figure 3C). Elephant cells, for example, cannot be experimentally immortalized (Fukuda et al., 2016; Gomes et al., 2011), rapidly repair DNA damage (Sulak et al., 2016; Hart and Setlow, 1974; Francis et al., 1981), are extremely resistant to oxidative stress (Gomes et al., 2011), and yet are also extremely sensitive to DNA damage (Abegglen et al., 2015; Sulak et al., 2016; Vazquez et al., 2018). Several pathways related to DNA damage repair, in particular nucleotide excision repair (NER), were uniquely enriched among genes that duplicated in the Proboscidean stem-lineage, suggesting a connection between duplication of genes involved in NER and rapid DNA damage repair (Hart and Setlow, 1974; Francis et al., 1981). Similarly, we identified a duplicate SOD1 gene in Proboscideans that may confer the resistance of elephant cells to oxidative stress (Gomes et al., 2011). Pathways related to the cell cycle were also enriched among genes that duplicated in Proboscideans, and cell cycle dynamics are different in elephants compared to other species; population doubling (PD) times for African and Asian elephant cells are 13–16 days, while PD times are 21–28 days in other Afrotherians (Gomes et al., 2011). Finally, the role of ‘mTOR signaling’ in the biology of aging is well known. Collectively these data suggest that gene duplications in Proboscideans may underlie some of their cellular phenotypes that contribute to cancer resistance.

Materials and methods

Ancestral body size reconstruction

We first assembled a time-calibrated supertree of Eutherian mammals by combining the time-calibrated molecular phylogeny of Bininda-Emonds et al., 2007; Bininda-Emonds et al., 2008 with the time-calibrated total evidence Afrotherian phylogeny from Puttick and Thomas, 2015. While the Bininda-Emonds et al., 2007; Bininda-Emonds et al., 2008 phylogeny includes 1679 species, only 34 are Afrotherian, and no fossil data are included. The inclusion of fossil data from extinct species is essential to ensure that ancestral state reconstructions of body mass are not biased by only including extant species. This can lead to inaccurate reconstructions, for example, if lineages convergently evolved large body masses from a small-bodied ancestor. In contrast, the total evidence Afrotherian phylogeny of Puttick and Thomas, 2015 includes 77 extant species and fossil data from 39 extinct species. Therefore, we replaced the Afrotherian clade in the Bininda-Emonds et al., 2008 phylogeny with the Afrotherian phylogeny of Puttick and Thomas, 2015 using Mesquite. Next, we jointly estimated rates of body mass evolution and reconstructed ancestral states using a generalization of the Brownian motion model that relaxes assumptions of neutrality and gradualism by considering increments to evolving characters to be drawn from a heavy-tailed stable distribution (the ‘Stable Model’) implemented in StableTraits (Elliot and Mooers, 2014). The stable model allows for large jumps in traits and has previously been shown to outperform other models of body mass evolution, including standard Brownian motion models, Ornstein–Uhlenbeck models, early burst maximum likelihood models, and heterogeneous multi-rate models (Elliot and Mooers, 2014).

Reciprocal Best Hit BLAT

We developed a reciprocal best hit BLAT (RBHB) pipeline to identify putative homologs and estimate gene copy number across species. The Reciprocal Best Hit (RBH) search strategy is conceptually straightforward: (1) Given a gene of interest GA in a query genome A, one searches a target genome B for all possible matches to GA; (2) For each of these hits, one then performs the reciprocal search in the original query genome to identify the highest-scoring hit; (3) A hit in genome B is defined as a homolog of gene GA if and only if the original gene GA is the top reciprocal search hit in genome A. We selected BLAT (Kent, 2002) as our algorithm of choice, as this algorithm is sensitive to highly similar (>90% identity) sequences, thus identifying the highest-confidence homologs while minimizing many-to-one mapping problems when searching for multiple genes. RBH performs similar to other more complex methods of orthology prediction and is particularly good at identifying incomplete genes that may be fragmented in low quality/poorly assembled regions of the genome (Altenhoff and Dessimoz, 2009; Salichos and Rokas, 2011).

Effective copy number by coverage

In low-quality genomes, many genes are fragmented across multiple scaffolds, which results in BLA(S)T-like methods calling multiple hits when in reality there is only one gene. To compensate for this, we developed a novel statistic, Estimated Copy Number by Coverage (ECNC), which averages the number of times we hit each nucleotide of a query sequence in a target genome over the total number of nucleotides of the query sequence found overall in each target genome (Figure 3—figure supplement 1). This allows us to correct for genes that have been fragmented across incomplete genomes, while accounting for missing sequences from the human query in the target genome. Mathematically, this can be written as:

ECNC=n=1lCnn=1lbool(Cn) (1)

where n is the given nucleotide in the query, l is the total length of the query, Cn is the number of instances that n is present within a reciprocal best hit, and bool (Cn) is 1 if Cn >1 Cn>0 or 0 if Cn =1 Cn=0.

RecSearch pipeline

We created a custom Python pipeline for automating RBHB searches between a single reference genome and multiple target genomes using a list of query sequences from the reference genome. For the query sequences in our search, we used the hg38 UniProt proteome (The UniProt Consortium, 2017), which is a comprehensive set of protein sequences curated from a combination of predicted and validated protein sequences generated by the UniProt Consortium. Next, we excluded genes from downstream analyses for which assignment of homology was uncertain, including uncharacterized ORFs (991 genes), LOC (63 genes), HLA genes (402 genes), replication dependent histones (72 genes), odorant receptors (499 genes), ribosomal proteins (410 genes), zinc finger transcription factors (1983 genes), viral and repetitive-element-associated proteins (82 genes), and ‘Uncharacterized’, ‘Putative’, or ‘Fragment’ proteins (30,724 genes), leaving a final set of 37,582 query protein isoforms, corresponding to 18,011 genes. We then searched for all copies of 18,011 query genes in publicly available Afrotherian genomes (Dobson, 2013), including African savannah elephant (Loxodonta africana: loxAfr3, loxAfr4, loxAfrC), African forest elephant (Loxodonta cyclotis: loxCycF), Asian Elephant (Elephas maximus: eleMaxD), Woolly Mammoth (Mammuthus primigenius: mamPriV), Colombian mammoth (Mammuthus columbi: mamColU), American mastodon (Mammut americanum: mamAmeI), Rock Hyrax (Procavia capensis: proCap1, proCap2, proCap2HiC), West Indian Manatee (Trichechus manatus latirostris: triManLat1, triManLat1HiC), Aardvark (Orycteropus afer: oryAfe1, oryAfe1HiC), Lesser Hedgehog Tenrec (Echinops telfairi: echTel2), Nine-banded armadillo (Dasypus novemcinctus: dasNov3), Hoffman’s two-toed sloth (Choloepus hoffmannii: choHof1, choHof2, choHof2HiC), Cape golden mole (Chrysochloris asiatica: chrAsi1), and Cape elephant shrew (Elephantulus edwardii: eleEdw1) (Dudchenko et al., 2017; Palkopoulou et al., 2015; Palkopoulou et al., 2018; Foote et al., 2015).

A summary of gene duplications in each species is available in Supplementary file 1.

Duplication gene inclusion criteria

In order to condense transcript-level hits into single gene loci, and to resolve many-to-one genome mappings, we removed exons where transcripts from different genes overlapped, and merged overlapping transcripts of the same gene into a single gene locus call. The resulting gene-level copy number table was then combined with the maximum ECNC values observed for each gene in order to call gene duplications. We called a gene duplicated if its copy number was two or more, and if the maximum ECNC value of all the gene transcripts searched was 1.5 or greater; previous studies have shown that incomplete duplications can encode functional genes (Sulak et al., 2016; Vazquez et al., 2018), therefore partial gene duplications were included provided they passed additional inclusion criteria (see below). The ECNC cut-off of 1.5 was selected empirically, as this value minimized the number of false positives seen in a test set of genes and genomes. The results of our initial search are summarized in Figure 3A. Overall, we identified 13,880 genes across all species, or 77.1% of our starting query genes.

Genome quality assessment using CEGMA

In order to determine the effect of genome quality on our results, we used the gVolante webserver and CEGMA to assess the quality and completeness of the genome (Nishimura et al., 2017; Parra et al., 2009). CEGMA was run using the default settings for mammals (‘Cut-off length for sequence statistics and composition’=1; ‘CEGMA max intron length’=100,000; ‘CEGMA gene flanks’=10,000, ‘Selected reference gene set’ = CVG). For each genome, we generated a correlation matrix using the aforementioned genome quality scores, and either the mean copy number or mean ECNC for all hits in the genome. We observed that the percentage of duplicated genes in non-Pseudoungulatan genomes was higher (12.94–23.66%) than Pseudoungulatan genomes (3.26–7.80%). Mean copy number, mean ECNC, and mean CN (the lesser of copy number and ECNC per gene) moderately or strongly correlated with genomic quality, such as LD50, the number of scaffolds, and contigs with a length above either 100K or 1M (Figure 3—figure supplement 2). The Afrosoricidians had the greatest correlation between poor genome quality and high gene duplication rates, including larger numbers of private duplications. The correlations between genome quality metric and number of gene duplications were particularly high for Cape golden mole (Chrysochloris asiatica: chrAsi1) and Cape elephant shrew (Elephantulus edwardii: eleEdw1); therefore we excluded these species from downstream pathway enrichment analyses.

Determining functionality of duplicated via gene expression

In order to ascertain the functional status of duplicated genes, we generated de novo transcriptomes using publicly available RNA-sequencing data for African savanna elephant, West Indian manatee, and nine-banded armadillo (Supplementary file 2). We mapped reads to the highest quality genome available for each species, and assembled transcripts using HISAT2 and StringTie (Kim et al., 2015; Pertea et al., 2015; Pertea et al., 2016). We found that many of our identified duplicates had transcripts mapping to them above a Transcripts Per Million (TPM) score of 2, suggesting that many of these duplications are functional. RNA-sequencing data was not available for Cape golden mole, Cape elephant shrew, rock hyrax, aardvark, or the lesser hedgehog tenrec.

Reconstruction of ancestral copy numbers

We encoded the copy number of each gene for each species as a discrete trait ranging from 0 (one gene copy) to 31 (for 32+ gene copies) and used IQ-TREE to select the best-fitting model of character evolution (Minh et al., 2020; Hoang et al., 2018; Kalyaanamoorthy et al., 2017; Wang et al., 2018; Schrempf et al., 2019), which was inferred to be a Jukes-Cantor type model for morphological data (MK) with equal character state frequencies (FQ) and rate heterogeneity across sites approximated by including a class of invariable sites (I) plus a discrete Gamma model with four rate categories (G4). Next we inferred gene duplication and loss events with the empirical Bayesian ancestral state reconstruction (ASR) method implemented in IQ-TREE (Minh et al., 2020; Hoang et al., 2018; Kalyaanamoorthy et al., 2017; Wang et al., 2018; Schrempf et al., 2019), the best fitting model of character evolution (MK+FQ+GR+I) (Soubrier et al., 2012; Yang et al., 1995), and the unrooted species tree for Atlantogenata. We considered ancestral state reconstructions to be reliable if they had Bayesian Posterior Probability (BPP) ≥ 0.80; less reliable reconstructions were excluded from pathway analyses. We note that there may be 'ghost' duplication events, that is genes that duplicated in, for example, the Tethytherian stem-lineage that are maintained in the Stellar’s sea cow genome and lost in the manatee genome. These genes will be reconstructed as a Proboscidean-specific duplication events because we cannot determine copy number in extinct species that lack genomes.

Pathway enrichment analysis

To determine if gene duplications were enriched in particular biological pathways, we used the WEB-based Gene SeT AnaLysis Toolkit (WebGestalt) (Liao et al., 2019) to perform Over-Representation Analysis (ORA) using the Reactome database (Jassal et al., 2020). Gene duplicates in each lineage were used as the foreground gene set, and the initial query set was used as the background gene set. WebGestalt uses a hypergeometric test for statistical significance of pathway over-representation, which we refined using two methods: a False Discovery Rate (FDR)-based approach and an empirical p-value approach (Chen et al., 2013). The Benjamini–Hochberg FDR multiple-testing correction was generated by WebGestalt. In order to correct p-values based on an empirical distribution, we modified the approach used by Chen et al. in Enrichr (Chen et al., 2013) to generate a 'combined score' for each pathway based on the hypergeometric p-value from WebGestalt, and a correction for expected rank for each pathway. In order to generate the table of expected ranks and variances for this approach, we randomly sampled foreground sets of 10–5000 genes from our background set 5000 times, and used WebGestalt ORA to obtain a list of enriched terms and P-values for each run; we then compiled a table of Reactome terms with their expected frequencies and standard deviation. These data were used to calculate a Z-score for terms in an ORA run, and the combined score was calculated using the formula C=logpz.

Estimating the evolution of cancer risk

The dramatic increase in body mass and lifespan in some Afrotherian lineages, and the relatively constant rate of cancer across species of diverse body sizes (Abegglen et al., 2015), indicates that those lineages must have also evolved reduced cancer risk. To infer the magnitude of these reductions we estimated differences in intrinsic cancer risk across extant and ancestral Afrotherians. Following Peto, 2015, we estimate the intrinsic cancer risk (K) as the product of risk associated with body mass and lifespan. In order to determine (K) across species and at ancestral nodes (see below), we first estimated ancestral lifespans at each node. We used Phylogenetic Generalized Least-Square Regression (PGLS) (Felsenstein, 1985; Martins and Hansen, 1997), using a Brownian covariance matrix as implemented in the R package ape (Paradis and Schliep, 2019), to calculate estimated ancestral lifespans across Atlantogenata using our estimates for body size at each node. In order to estimate the intrinsic cancer risk of a species, we first inferred lifespans at ancestral nodes using PGLS (Supplementary file 3) and the model. Next, we calculated K1 at all nodes, and then estimated the fold-change in cancer susceptibility between ancestral and descendant nodes (Figure 2). Next, in order to calculate K1 at all nodes, we used a simplified multistage cancer risk model for body size D and lifespan t: KDt6 (Peto et al., 1975: Peto, 2015; Armitage, 1985; Armitage and Doll, 2004). The fold change in cancer risk between a node and its ancestor was then defined as log2(K2K1).

Data analysis

All data analysis was performed using Python version 3.8 and R version 4.0.2 (2020-06-22), and the complete reproducible manuscript, along with code and data generation pipeline, can be found on our GitHub page at https://github.com/docmanny/atlantogenataGeneDuplication (Vazquez and Lynch, 2021; copy archived at swh:1:rev:6bc68ac31ef148131480710e50b0b75d06077db2; Paradis and Schliep, 2019; Paradis et al., 2020; R Development Core Team, 2019; Xie, 2020; Bolker and Robinson, 2020; Dowle and Srinivasan, 2019; Wickham et al., 2020a; Wickham, 2020; Harmon et al., 2020; Yutani, 2020; Yu, 2020a; Campitelli, 2020; Wickham et al., 2020b; Yu, 2020b; Kassambara, 2020; Slowikowski, 2020; Xiao, 2018; Yu and Lam, 2020c; Zhu, 2019; Ooms, 2020; Bache and Wickham, 2014; Pinheiro and Bates, 2020; Sievert et al., 2020; Henry and Wickham, 2020; Wickham et al., 2018; Hlavac, 2018; Wickham, 2019a; Müller and Wickham, 2020; Wickham and Henry, 2020; Yu, 2020d; Wickham, 2019b; Yu, 2020e; Gehlenborg, 2019; Xie, 2016; Alfaro et al., 2009; Eastman et al., 2011; Slater et al., 2012; Harmon et al., 2008; Pennell et al., 2014; Wickham, 2016; Yu, 2020f; Yu et al., 2018; Yu et al., 2017; Sievert, 2020; Wickham et al., 2019; Wang et al., 2020). All files necessary to reproduce the data in this manuscript are provided in Source data 1.

Manual verification of duplicate genes

We manually verified the coding potential of the 16 genes shown in Figure 4 by first identifying the reciprocal best (DNA sequence) BLAT hits in the elephant and manatee genomes, which allowed us to determine conservation and presence of premature stop codons in the each open reading frame (ORF). We translated the ORF for each hit into amino acid sequences and grouped up hits for each gene into one FASTA file along with the UniProt protein sequences for the human, dog, cat, and cow orthologs. Using a pipeline hosted at NGPhlyogeny.fr (Lemoine et al., 2019), the homologs were aligned using MAFFT Katoh and Standley, 2013; the aligned sequences were cleaned using BMGE (Criscuolo and Gribaldo, 2010). Finally we used FastME (Lefort et al., 2015) to infer a gene tree for each duplicate. Alignments were then visually inspected for conservation and presence of premature stop codons.

Acknowledgements

We would like to thank Dr. Olga Dudchenko and Dr. Erez Aiden at Baylor College of Medicine for the Hi-C scaffolded Procavia capensis, Trichechus manatus, Orycteropus afer, and Choloepus hoffmannii genomes. We would also like to thank D.H. Vazquez for his indispensable support.

Appendix 1

Summary of duplicate gene annotations associated with Figure 4C,D.

BRD7:

Three manatee copies, but tml_BRD7_3 has PMSs. Overall high sequence similarity.

BUB3:

#manatee=#elephant.

PMS in loxafr #2–3, PMS in triman #2–3.

High AA conservation even in pseudogenes.

Casp9:

Two extra elephant copies are identical in AA sequence but hits only encompass a 47-AA hit that is highly conserved with matching domain in CASP9.

CD14:

#Manatee = #Elephant, all four copies are highly conserved.

CNOT11:

9/11 elephant copies do contain PMS; however, many of these are still mostly full length and highly conserved. The ones with an early PMS are less well conserved.

COX20:

Second elephant copy has two PMSs, but is still well conserved.

HMGB2:

2/4 elephant copies have PMSs, all copies highly conserved. Manatee 2/3 copies have PMS, lower conservation.

LAMTOR5:

Only one elephant copy has a PMS; three very highly conserved copies, others with conserved domains.

LIF:

Manatee's have 4 of 13 copies previously reported in elephants. See Vazquez et al., 2018.

TP53:

Elephant: 9/19 with PMSs. Low N' conservation, but high conservation on C' end. See Sulak et al., 2016.

STK11:

Very highly conserved, with divergence in the second copies in the elephant and manatee.

SOD1:

Highly conserved elephant copies. Manatee copies are only partial hits, with moderate conservation.

PRDX1:

Extremely strong conservation for main copies. Second elephant copy is a partial hit and has some divergence from the main elephant copy.

MAPRE1:

2/3 elephant copies have early PMSs, but with subsequent ATGs. Very high conservation in main and 1/2 duplicate elephant copies, only partial hits on third copy. Very high conversation between manatee copies.

MAD2L1:

7/9 elephant copies have PMSs. 2/6 manatee copies have PMSs. However, all copies are very highly conserved.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Vincent J Lynch, Email: vjlynch@buffalo.edu.

Antonis Rokas, Vanderbilt University, United States.

Patricia J Wittkopp, University of Michigan, United States.

Funding Information

This paper was supported by the following grant:

  • University of Chicago to Juan M Vazquez, Vincent J Lynch.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Conceptualization, Data curation, Formal analysis, Supervision, Investigation, Visualization, Writing - original draft, Project administration, Writing - review and editing.

Additional files

Source data 1. All necessary data sets and scripts to reproduce results presented in this manuscript.
elife-65041-data1.zip (16.1MB, zip)
Supplementary file 1. Summary of duplications in Atlantogenata.
elife-65041-supp1.docx (15.5KB, docx)
Supplementary file 2. RNA-Seq data sets used in this study, along with key biological and genome information.
elife-65041-supp2.docx (13.3KB, docx)
Supplementary file 3. Summary of PGLS model used to estimate lifespan.
elife-65041-supp3.docx (13KB, docx)
Transparent reporting form

Data availability

All data generated or analysed during this study are included in the manuscript and supporting files.

The following previously published datasets were used:

Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. Chrysochloris asiatica (Cape golden mole) genome. NCBI Assembly. ChrAsi1.0

Gnerre S, Heiman D, Young S, Fulton L, Delehaunty K, Minx P, Chinwalla A, Mardis E, Wilson R, Warren W. 2018. The Genome Sequence of Choloepus hoffmanni (sloth) NCBI Assembly. C_hoffmanni-2.0.1_HiC

Lindblad-Toh K, Chang JL, Gnerre S, Clamp M, Lander ES. 2012. Dasypus novemcinctus (nine-banded armadillo) genome. NCBI Assembly. Dasnov3.0

Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. Echinops telfairi (small Madagascar hedgehog) NCBI Assembly. EchTel2.0

Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. Elephantulus edwardii (Cape elephant shrew) NCBI Assembly. EleEdw1.0

Palkopoulou E, Lipson M, Mallick S, Nielsen S, Rohland N, Baleka S, Karpinski E, Ivancevic AM, To TH, Kortschak RD, Raison JM, Qu Z, Chin TJ, Alt KW, Claesson S, Dalén L, MacPhee RDE, Meller H, Roca AL, Ryder OA, Heiman D, Young S, Breen M, Williams C, Aken BL, Ruffier M, Karlsson E, Johnson J, Di Palma F, Alfoldi J, Adelson DL, Mailund T, Munch K, Lindblad-Toh K, Hofreiter M, Poinar H, Reich D. 2018. A comprehensive genomic history of extinct and living elephants. NCBI Assembly. PRJEB24361

Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2017. Aardvark (Orycteropus afer) genome. NCBI Assembly. oryAfe2

Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. A comprehensive genomic history of extinct and living elephants. NCBI Assembly. proCap-Pcap_2.0_HiC

Andrew DF, Liu Y, Thomas GWC, Vinar T, Alfoldi J, Deng J, Dugan S. 2015. West Indian manatee (Trichechus manatus) genome. NCBI Assembly. triManLat2

References

  1. Abegglen LM, Caulin AF, Chan A, Lee K, Robinson R, Campbell MS, Kiso WK, Schmitt DL, Waddell PJ, Bhaskara S, Jensen ST, Maley CC, Schiffman JD. Potential mechanisms for Cancer resistance in elephants and comparative cellular response to DNA damage in humans. Jama. 2015;314:1850–1860. doi: 10.1001/jama.2015.13134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alfaro ME, Santini F, Brock C, Alamillo H, Dornburg A, Rabosky DL, Carnevale G, Harmon LJ. Nine exceptional radiations plus high turnover explain species diversity in jawed vertebrates. PNAS. 2009;106:13410–13414. doi: 10.1073/pnas.0811087106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Altenhoff AM, Dessimoz C. Phylogenetic and functional assessment of orthologs inference projects and methods. PLOS Computational Biology. 2009;5:e1000262. doi: 10.1371/journal.pcbi.1000262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Armitage P. Multistage models of carcinogenesis. Environmental Health Perspectives. 1985;63:195–201. doi: 10.1289/ehp.8563195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Armitage P, Doll R. The age distribution of Cancer and a multi-stage theory of carcinogenesis. British Journal of Cancer. 2004;91:1983–1989. doi: 10.1038/sj.bjc.6602297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Ashur-Fabian O, Avivi A, Trakhtenbrot L, Adamsky K, Cohen M, Kajakaro G, Joel A, Amariglio N, Nevo E, Rechavi G. Evolution of p53 in hypoxia-stressed Spalax mimics human tumor mutation. PNAS. 2004;101:12236–12241. doi: 10.1073/pnas.0404998101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bache SM, Wickham H. Magrittr: A Forward-Pipe Operator for R. 2014 https://CRAN.R-project.org/package=magrittr
  8. Bininda-Emonds OR, Cardillo M, Jones KE, MacPhee RD, Beck RM, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A. The delayed rise of present-day mammals. Nature. 2007;446:507–512. doi: 10.1038/nature05634. [DOI] [PubMed] [Google Scholar]
  9. Bininda-Emonds ORP, Cardillo M, Jones KE, MacPhee RDE, Beck RMD, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A. Erratum: the delayed rise of present-day mammals. Nature. 2008;456:274. doi: 10.1038/nature07347. [DOI] [PubMed] [Google Scholar]
  10. Boddy AM, Abegglen LM, Pessier AP, Aktipis A, Schiffman JD, Maley CC, Witte C. Lifetime Cancer prevalence and life history traits in mammals. Evolution, Medicine, and Public Health. 2020;2020:187–195. doi: 10.1093/emph/eoaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bolker B, Robinson D. Broom.mixed: Tidying Methods for Mixed Models. 2020 https://CRAN.R-project.org/package=broom.mixed
  12. Burrows AE, Smogorzewska A, Elledge SJ. Polybromo-associated BRG1-associated factor components BRD7 and BAF180 are critical regulators of p53 required for induction of replicative senescence. PNAS. 2010;107:14280–14285. doi: 10.1073/pnas.1009559107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Campitelli E. Ggnewscale: Multiple Fill and Colour Scales In’ Ggplot2’. 2020 https://CRAN.R-project.org/package=ggnewscale
  14. Caulin AF, Graham TA, Wang LS, Maley CC. Solutions to Peto's paradox revealed by mathematical modelling and cross-species cancer gene analysis. Philosophical Transactions of the Royal Society B: Biological Sciences. 2015;370:20140222. doi: 10.1098/rstb.2014.0222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Caulin AF, Maley CC. Peto's Paradox: evolution's prescription for cancer prevention. Trends in Ecology & Evolution. 2011;26:175–182. doi: 10.1016/j.tree.2011.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, Clark NR, Ma'ayan A. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 2013;14:128. doi: 10.1186/1471-2105-14-128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Criscuolo A, Gribaldo S. BMGE (Block mapping and gathering with entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evolutionary Biology. 2010;10:210. doi: 10.1186/1471-2148-10-210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Dobson JM. Breed-Predispositions to Cancer in pedigree dogs. ISRN Veterinary Science. 2013;2013:1–23. doi: 10.1155/2013/941275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Doherty A, de Magalhães JP. Has gene duplication impacted the evolution of eutherian longevity? Aging Cell. 2016;15:978–980. doi: 10.1111/acel.12503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Dorn CR, Taylor DO, Schneider R, Hibbard HH, Klauber MR. Survey of animal neoplasms in alameda and contra costa counties, California. II. Cancer morbidity in dogs and cats from alameda county. Journal of the National Cancer Institute. 1968;40:307–318. [PubMed] [Google Scholar]
  21. Dowle M, Srinivasan A. Data.table: Extension of ‘Data.frame. 2019 https://CRAN.R-project.org/package=data.table
  22. Drost J, Mantovani F, Tocco F, Elkon R, Comel A, Holstege H, Kerkhoven R, Jonkers J, Voorhoeve PM, Agami R, Del Sal G. BRD7 is a candidate tumour suppressor gene required for p53 function. Nature Cell Biology. 2010;12:380–389. doi: 10.1038/ncb2038. [DOI] [PubMed] [Google Scholar]
  23. Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP, Aiden EL. De novo assembly of the aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Eastman JM, Alfaro ME, Joyce P, Hipp AL, Harmon LJ. A novel comparative method for identifying shifts in the rate of character evolution on trees. Evolution. 2011;65:3578–3589. doi: 10.1111/j.1558-5646.2011.01401.x. [DOI] [PubMed] [Google Scholar]
  25. Elliot MG, Mooers AØ. Inferring ancestral states without assuming neutrality or gradualism using a stable model of continuous character evolution. BMC Evolutionary Biology. 2014;14:226. doi: 10.1186/s12862-014-0226-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Felsenstein J. Phylogenies and the comparative method. The American Naturalist. 1985;125:1–15. doi: 10.1086/284325. [DOI] [PubMed] [Google Scholar]
  27. Foote AD, Liu Y, Thomas GW, Vinař T, Alföldi J, Deng J, Dugan S, van Elk CE, Hunter ME, Joshi V, Khan Z, Kovar C, Lee SL, Lindblad-Toh K, Mancia A, Nielsen R, Qin X, Qu J, Raney BJ, Vijay N, Wolf JB, Hahn MW, Muzny DM, Worley KC, Gilbert MT, Gibbs RA. Convergent evolution of the genomes of marine mammals. Nature Genetics. 2015;47:272–275. doi: 10.1038/ng.3198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Francis AA, Lee WH, Regan JD. The relationship of DNA excision repair of ultraviolet-induced lesions to the maximum life span of mammals. Mechanisms of Ageing and Development. 1981;16:181–189. doi: 10.1016/0047-6374(81)90094-4. [DOI] [PubMed] [Google Scholar]
  29. Fukuda T, Iino Y, Onuma M, Gen B, Inoue-Murayama M, Kiyono T. Expression of human cell cycle regulators in the primary cell line of the african savannah elephant (loxodonta africana) increases proliferation until senescence, but does not induce immortalization. In Vitro Cellular & Developmental Biology - Animal. 2016;52:20–26. doi: 10.1007/s11626-015-9943-6. [DOI] [PubMed] [Google Scholar]
  30. García-Cao I, García-Cao M, Martín-Caballero J, Criado LM, Klatt P, Flores JM, Weill JC, Blasco MA, Serrano M. "Super p53" mice exhibit enhanced DNA damage response, are tumor resistant and age normally. The EMBO Journal. 2002;21:6225–6235. doi: 10.1093/emboj/cdf595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Gehlenborg N. UpSetR: A More Scalable Alternative to Venn and Euler Diagrams for Visualizing Intersecting Sets. 2019 https://CRAN.R-project.org/package=UpSetR
  32. Gomes NM, Ryder OA, Houck ML, Charter SJ, Walker W, Forsyth NR, Austad SN, Venditti C, Pagel M, Shay JW, Wright WE. Comparative biology of mammalian telomeres: hypotheses on ancestral states and the roles of telomeres in longevity determination. Aging Cell. 2011;10:761–768. doi: 10.1111/j.1474-9726.2011.00718.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Gorbunova V, Hine C, Tian X, Ablaeva J, Gudkov AV, Nevo E, Seluanov A. Cancer resistance in the blind mole rat is mediated by concerted necrotic cell death mechanism. PNAS. 2012;109:19392–19396. doi: 10.1073/pnas.1217211109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Harmon LJ, Weir JT, Brock CD, Glor RE, Challenger W. GEIGER: investigating evolutionary radiations. Bioinformatics. 2008;24:129–131. doi: 10.1093/bioinformatics/btm538. [DOI] [PubMed] [Google Scholar]
  35. Harmon L, Pennell M, Brock C, Brown J, Challenger W, Eastman J, FitzJohn R, Glor R, Hunt G, Revell L, Slater G, Uyeda J, Weir J. Geiger: Analysis of Evolutionary Diversification. 2020 https://CRAN.R-project.org/package=geiger
  36. Hart RW, Setlow RB. Correlation between deoxyribonucleic acid excision-repair and life-span in a number of mammalian species. PNAS. 1974;71:2169–2173. doi: 10.1073/pnas.71.6.2169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Henry L, Wickham H. Purrr: Functional Programming Tools. 2020 https://CRAN.R-project.org/package=purrr
  38. Hlavac M. Stargazer: Well-Formatted Regression and Summary Statistics Tables. 2018 https://CRAN.R-project.org/package=stargazer
  39. Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS. UFBoot2: improving the ultrafast bootstrap approximation. Molecular Biology and Evolution. 2018;35:518–522. doi: 10.1093/molbev/msx281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Jassal B, Matthews L, Viteri G, Gong C, Lorente P, Fabregat A, Sidiropoulos K, Cook J, Gillespie M, Haw R, Loney F, May B, Milacic M, Rothfels K, Sevilla C, Shamovsky V, Shorser S, Varusai T, Weiser J, Wu G, Stein L, Hermjakob H, D'Eustachio P. The reactome pathway knowledgebase. Nucleic Acids Research. 2020;48:D498–D503. doi: 10.1093/nar/gkz1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods. 2017;14:587–589. doi: 10.1038/nmeth.4285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kassambara A. Ggpubr:’ Ggplot2’ Based Publication Ready Plots. 2020 https://CRAN.R-project.org/package=ggpubr
  43. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Katzourakis A, Magiorkinis G, Lim AG, Gupta S, Belshaw R, Gifford R. Larger mammalian body size leads to lower retroviral activity. PLOS Pathogens. 2014;10:e1004214. doi: 10.1371/journal.ppat.1004214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Kent WJ. BLAT--the BLAST-like alignment tool. Genome Research. 2002;12:656–664. doi: 10.1101/gr.229202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nature Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Larramendi A. Shoulder height, body mass, and shape of proboscideans. Acta Palaeontologica Polonica. 2015;61:2014. doi: 10.4202/app.00136.2014. [DOI] [Google Scholar]
  48. Launonen V. Mutations in the human LKB1/STK11 gene. Human Mutation. 2005;26:291–297. doi: 10.1002/humu.20222. [DOI] [PubMed] [Google Scholar]
  49. Lefort V, Desper R, Gascuel O. FastME 2.0: a comprehensive, accurate, and fast Distance-Based phylogeny inference program. Molecular Biology and Evolution. 2015;32:2798–2800. doi: 10.1093/molbev/msv150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Lemoine F, Correia D, Lefort V, Doppelt-Azeroual O, Mareuil F, Cohen-Boulakia S, Gascuel O. Ngphylogeny.fr: new generation phylogenetic services for non-specialists. Nucleic Acids Research. 2019;47:W260–W265. doi: 10.1093/nar/gkz303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Leroi AM, Koufopanou V, Burt A. Cancer selection. Nature Reviews Cancer. 2003;3:226–231. doi: 10.1038/nrc1016. [DOI] [PubMed] [Google Scholar]
  52. Liang J, Mills GB. AMPK: a contextual oncogene or tumor suppressor? Cancer Research. 2013;73:2929–2935. doi: 10.1158/0008-5472.CAN-12-3876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Liao Y, Wang J, Jaehnig EJ, Shi Z, Zhang B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Research. 2019;47:W199–W205. doi: 10.1093/nar/gkz401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, Kheradpour P, Ernst J, Jordan G, Mauceli E, Ward LD, Lowe CB, Holloway AK, Clamp M, Gnerre S, Alföldi J, Beal K, Chang J, Clawson H, Cuff J, Di Palma F, Fitzgerald S, Flicek P, Guttman M, Hubisz MJ, Jaffe DB, Jungreis I, Kent WJ, Kostka D, Lara M, Martins AL, Massingham T, Moltke I, Raney BJ, Rasmussen MD, Robinson J, Stark A, Vilella AJ, Wen J, Xie X, Zody MC, Baldwin J, Bloom T, Chin CW, Heiman D, Nicol R, Nusbaum C, Young S, Wilkinson J, Worley KC, Kovar CL, Muzny DM, Gibbs RA, Cree A, Dihn HH, Fowler G, Jhangiani S, Joshi V, Lee S, Lewis LR, Nazareth LV, Okwuonu G, Santibanez J, Warren WC, Mardis ER, Weinstock GM, Wilson RK, Delehaunty K, Dooling D, Fronik C, Fulton L, Fulton B, Graves T, Minx P, Sodergren E, Birney E, Margulies EH, Herrero J, Green ED, Haussler D, Siepel A, Goldman N, Pollard KS, Pedersen JS, Lander ES, Kellis M, Broad Institute Sequencing Platform and Whole Genome Assembly Team. Baylor College of Medicine Human Genome Sequencing Center Sequencing Team. Genome Institute at Washington University A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478:476–482. doi: 10.1038/nature10530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Lucena RB, Rissi DR, Kommers GD, Pierezan F, Oliveira-Filho JC, Macêdo JT, Flores MM, Barros CS. A retrospective study of 586 tumours in brazilian cattle. Journal of Comparative Pathology. 2011;145:20–24. doi: 10.1016/j.jcpa.2010.11.002. [DOI] [PubMed] [Google Scholar]
  56. Martins EP, Hansen TF. Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. The American Naturalist. 1997;149:646–667. doi: 10.1086/286013. [DOI] [Google Scholar]
  57. Million Women Study collaborators. Green J, Cairns BJ, Casabonne D, Wright FL, Reeves G, Beral V. Height and Cancer incidence in the million women study: prospective cohort, and meta-analysis of prospective studies of height and total Cancer risk. The Lancet Oncology. 2011;12:785–794. doi: 10.1016/S1470-2045(11)70154-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution. 2020;37:1530–1534. doi: 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Müller K, Wickham H. Tibble: Simple Data Frames. 2020 https://CRAN.R-project.org/package=tibble
  60. Nagy JD, Victor EM, Cropper JH. Why don't all whales have Cancer? A novel hypothesis resolving Peto's paradox. Integrative and Comparative Biology. 2007;47:317–328. doi: 10.1093/icb/icm062. [DOI] [PubMed] [Google Scholar]
  61. Nishimura O, Hara Y, Kuraku S. gVolante for standardizing completeness assessment of genome and transcriptome assemblies. Bioinformatics. 2017;33:3635–3637. doi: 10.1093/bioinformatics/btx445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Nunney L. Lineage selection and the evolution of multistage carcinogenesis. Proceedings of the Royal Society of London. Series B: Biological Sciences. 1999;266:493–498. doi: 10.1098/rspb.1999.0664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Nunney L. Size matters: height, cell number and a person's risk of cancer. Proceedings of the Royal Society B: Biological Sciences. 2018;285:20181743. doi: 10.1098/rspb.2018.1743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. O Leary MA, Bloch JI, Flynn JJ, Gaudin TJ, Giallombardo A, Giannini NP, Goldberg SL, Kraatz BP, Luo ZX, Meng J, Ni X, Novacek MJ, Perini FA, Randall ZS, Rougier GW, Sargis EJ, Silcox MT, Simmons NB, Spaulding M, Velazco PM, Weksler M, Wible JR, Cirranello AL. The placental mammal ancestor and the post-K-Pg radiation of placentals. Science. 2013a;339:662–667. doi: 10.1126/science.1229237. [DOI] [PubMed] [Google Scholar]
  65. O Leary MA, Bloch JJ, Flynn TJ, Gaudin A, Giallombardo NP, Giannini SL, Goldberg BPK, Luo Z-X, Meng J, Ni X, Novacek MJ, Perini FA, Randall Z, Rougier GW, Sargis EJ, Silcox MT, Simmons NB, Spaulding M, Velazco PM, Weksler M, Wible JR, Cirranello AL. Response to comment on "The placental mammal ancestor and the post-K-Pg radiation of placentals". Science. 2013b;341:613. doi: 10.1126/science.1238162. [DOI] [PubMed] [Google Scholar]
  66. Ooms J. Magick: Advanced Graphics and Image-Processing in R. 2020 https://CRAN.R-project.org/package=magick
  67. Palkopoulou E, Mallick S, Skoglund P, Enk J, Rohland N, Li H, Omrak A, Vartanyan S, Poinar H, Götherström A, Reich D, Dalén L. Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Current Biology. 2015;25:1395–1400. doi: 10.1016/j.cub.2015.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Palkopoulou E, Lipson M, Mallick S, Nielsen S, Rohland N, Baleka S, Karpinski E, Ivancevic AM, To TH, Kortschak RD, Raison JM, Qu Z, Chin TJ, Alt KW, Claesson S, Dalén L, MacPhee RDE, Meller H, Roca AL, Ryder OA, Heiman D, Young S, Breen M, Williams C, Aken BL, Ruffier M, Karlsson E, Johnson J, Di Palma F, Alfoldi J, Adelson DL, Mailund T, Munch K, Lindblad-Toh K, Hofreiter M, Poinar H, Reich D. A comprehensive genomic history of extinct and living elephants. PNAS. 2018;115:E2566–E2574. doi: 10.1073/pnas.1720554115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Paradis E, Blomberg S, Bolker B, Brown J, Claramunt S, Claude J, Cuong HS, Desper R, Didier G, Durand B, Dutheil J, Ewing R, Gascuel O, Guillerme T, Heibl C, Ives A, Jones B, Krah F, Lawson D, Lefort V, Legendre P, Lemon J, Louvel G, Marcon E, McCloskey R, Nylander J, Opgen-Rhein R, Popescu A-A, Royer-Carenzi M, Schliep K, Strimmer K, de Vienne D. Ape: Analyses of Phylogenetics and Evolution. 2020 https://CRAN.R-project.org/package=ape
  70. Paradis E, Schliep K. Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35:526–528. doi: 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]
  71. Parra G, Bradnam K, Ning Z, Keane T, Korf I. Assessing the gene space in draft genomes. Nucleic Acids Research. 2009;37:289–297. doi: 10.1093/nar/gkn916. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Pennell MW, Eastman JM, Slater GJ, Brown JW, Uyeda JC, FitzJohn RG, Alfaro ME, Harmon LJ. Geiger v2.0: an expanded suite of methods for fitting macroevolutionary models to phylogenetic trees. Bioinformatics. 2014;30:2216–2218. doi: 10.1093/bioinformatics/btu181. [DOI] [PubMed] [Google Scholar]
  73. Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and ballgown. Nature Protocols. 2016;11:1650–1667. doi: 10.1038/nprot.2016.095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Peto R, Roe FJ, Lee PN, Levy L, Clack J. Cancer and ageing in mice and men. British Journal of Cancer. 1975;32:411–426. doi: 10.1038/bjc.1975.242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Peto R. Quantitative implications of the approximate irrelevance of mammalian body size and lifespan to lifelong Cancer risk. Philosophical Transactions of the Royal Society B: Biological Sciences. 2015;370:20150198. doi: 10.1098/rstb.2015.0198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Pinheiro J, Bates D. R-Core, Nlme: Linear and Nonlinear Mixed Effects Models. 2020 https://CRAN.R-project.org/package=nlme
  78. Puttick MN, Thomas GH. Fossils and living taxa agree on patterns of body mass evolution: a case study with afrotheria. Proceedings of the Royal Society B: Biological Sciences. 2015;282:20152023. doi: 10.1098/rspb.2015.2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Qian W, Zhang J. Genomic evidence for adaptation by gene duplication. Genome Research. 2014;24:1356–1362. doi: 10.1101/gr.172098.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. R Development Core Team . Vienna, Austria: R Foundation for Statistical Computing; 2019. https://www.R-project.org/ [Google Scholar]
  81. Rastogi S, Liberles DA. Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evolutionary Biology. 2005;5:28. doi: 10.1186/1471-2148-5-28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Salichos L, Rokas A. Evaluating ortholog prediction algorithms in a yeast model clade. PLOS ONE. 2011;6:e18755. doi: 10.1371/journal.pone.0018755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Savage VM, Allen AP, Brown JH, Gillooly JF, Herman AB, Woodruff WH, West GB. Scaling of number, size, and metabolic rate of cells with body size in mammals. PNAS. 2007;104:4718–4723. doi: 10.1073/pnas.0611235104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Scheffer VB. The weight of the steller sea cow. Journal of Mammalogy. 1972;53:912–914. doi: 10.2307/1379236. [DOI] [Google Scholar]
  85. Schrempf D, Minh BQ, von Haeseler A, Kosiol C. Polymorphism-Aware species trees with advanced mutation models, bootstrap, and rate heterogeneity. Molecular Biology and Evolution. 2019;36:1294–1301. doi: 10.1093/molbev/msz043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Schwartz GT, Rasmussen DT, Smith RJ. Body-Size diversity and community structure of fossil hyracoids. Journal of Mammalogy. 1995;76:1088–1099. doi: 10.2307/1382601. [DOI] [Google Scholar]
  87. Seluanov A, Hine C, Bozzella M, Hall A, Sasahara TH, Ribeiro AA, Catania KC, Presgraves DC, Gorbunova V. Distinct tumor suppressor mechanisms evolve in rodent species that differ in size and lifespan. Aging Cell. 2008;7:813–823. doi: 10.1111/j.1474-9726.2008.00431.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Sievert C, Parmer C, Hocking T, Chamberlain S, Ram K, Corvellec M, Despouy P. Plotly: Create Interactive Web Graphics Via’ plotly.js’. 2020 https://CRAN.R-project.org/package=plotly
  89. Sievert C. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. Chapman: Hall/CRC; 2020. [Google Scholar]
  90. Slater GJ, Harmon LJ, Wegmann D, Joyce P, Revell LJ, Alfaro ME. Fitting models of continuous trait evolution to incompletely sampled comparative data using approximate bayesian computation. Evolution. 2012;66:752–762. doi: 10.1111/j.1558-5646.2011.01474.x. [DOI] [PubMed] [Google Scholar]
  91. Slowikowski K. Ggrepel: Automatically Position Non-Overlapping Text Labels With’ Ggplot2’. 2020 https://CRAN.R-project.org/package=ggrepel
  92. Soubrier J, Steel M, Lee MS, Der Sarkissian C, Guindon S, Ho SY, Cooper A. The influence of rate heterogeneity among sites on the time dependence of molecular rates. Molecular Biology and Evolution. 2012;29:3345–3358. doi: 10.1093/molbev/mss140. [DOI] [PubMed] [Google Scholar]
  93. Springer MS, Meredith RW, Teeling EC, Murphy WJ. Technical comment on "The placental mammal ancestor and the post-K-Pg radiation of placentals". Science. 2013;341:613.2–61613. doi: 10.1126/science.1238025. [DOI] [PubMed] [Google Scholar]
  94. Stoltzfus A. On the possibility of constructive neutral evolution. Journal of Molecular Evolution. 1999;49:169–181. doi: 10.1007/PL00006540. [DOI] [PubMed] [Google Scholar]
  95. Sulak M, Fong L, Mika K, Chigurupati S, Yon L, Mongan NP, Emes RD, Lynch VJ. TP53 copy number expansion is associated with the evolution of increased body size and an enhanced DNA damage response in elephants. eLife. 2016;5:e11994. doi: 10.7554/eLife.11994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Tacutu R, Craig T, Budovsky A, Wuttke D, Lehmann G, Taranukha D, Costa J, Fraifeld VE, de Magalhães JP. Human ageing genomic resources: integrated databases and tools for the biology and genetics of ageing. Nucleic Acids Research. 2013;41:D1027–D1033. doi: 10.1093/nar/gks1155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. The UniProt Consortium UniProt: the universal protein knowledgebase. Nucleic Acids Research. 2017;45:D158–D169. doi: 10.1093/nar/gkw1099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Tian X, Azpurua J, Hine C, Vaidya A, Myakishev-Rempel M, Ablaeva J, Mao Z, Nevo E, Gorbunova V, Seluanov A. High-molecular-mass hyaluronan mediates the Cancer resistance of the naked mole rat. Nature. 2013;499:346–349. doi: 10.1038/nature12234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Tollis M, Ferris E, Campbell MS, Harris VK, Rupp SM, Harrison TM, Kiso WK, Schmitt DL, Garner MM, Aktipis CA, Maley CC, Boddy AM, Yandell M, Gregg C, Schiffman JD, Abegglen LM. Elephant genomes reveal insights into differences in disease defense mechanisms between species. bioRxiv. 2020 doi: 10.1101/2020.05.29.124396. [DOI] [PMC free article] [PubMed]
  100. Tyner SD, Venkatachalam S, Choi J, Jones S, Ghebranious N, Igelmann H, Lu X, Soron G, Cooper B, Brayton C, Park SH, Thompson T, Karsenty G, Bradley A, Donehower LA. p53 mutant mice that display early ageing-associated phenotypes. Nature. 2002;415:45–53. doi: 10.1038/415045a. [DOI] [PubMed] [Google Scholar]
  101. Vazquez JM, Sulak M, Chigurupati S, Lynch VJ. A zombie LIF gene in elephants is upregulated by TP53 to induce apoptosis in response to DNA damage. Cell Reports. 2018;24:1765–1776. doi: 10.1016/j.celrep.2018.07.042. [DOI] [PubMed] [Google Scholar]
  102. Vazquez JM, Lynch VJ. Pervasive duplication of tumor suppressors in Afrotherians during the evolution of large bodies and reduced cancer risk. swh:1:rev:6bc68ac31ef148131480710e50b0b75d06077db2Software Heritage. 2021 doi: 10.7554/eLife.65041. https://archive.softwareheritage.org/swh:1:dir:ae8a48052c5d5e4e3757e45dc5d632d07c443e3b;origin=https://github.com/docmanny/atlantogenataGeneDuplication;visit=swh:1:snp:d095eb620de426a079259bc6c9f6e8ebd57daf06;anchor=swh:1:rev:6bc68ac31ef148131480710e50b0b75d06077db2/ [DOI] [PMC free article] [PubMed]
  103. Wang HC, Minh BQ, Susko E, Roger AJ. Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation. Systematic Biology. 2018;67:216–235. doi: 10.1093/sysbio/syx068. [DOI] [PubMed] [Google Scholar]
  104. Wang LG, Lam TT, Xu S, Dai Z, Zhou L, Feng T, Guo P, Dunn CW, Jones BR, Bradley T, Zhu H, Guan Y, Jiang Y, Yu G. Treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution. 2020;37:599–603. doi: 10.1093/molbev/msz240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Wickham H. Ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag; 2016. [Google Scholar]
  106. Wickham H, Hester J, Francois R. Readr: Read Rectangular Text Data. 2018 https://CRAN.R-project.org/package=readr
  107. Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen T, Miller E, Bache S, Müller K, Ooms J, Robinson D, Seidel D, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. Welcome to the tidyverse. Journal of Open Source Software. 2019;4:1686. doi: 10.21105/joss.01686. [DOI] [Google Scholar]
  108. Wickham H. Stringr: Simple, Consistent Wrappers for Common String Operations. 2019a https://CRAN.R-project.org/package=stringr
  109. Wickham H. Tidyverse: Easily Install and Load The’ Tidyverse. 2019b https://CRAN.R-project.org/package=tidyverse
  110. Wickham H. Forcats: Tools for Working with Categorical Variables (Factors) 2020 https://CRAN.R-project.org/package=forcats
  111. Wickham H, François R, Henry L, Müller K. Dplyr: A Grammar of Data Manipulation. 2020a https://CRAN.R-project.org/package=dplyr
  112. Wickham H, Chang W, Henry L, Pedersen TL, Takahashi K, Wilke C, Woo K, Yutani H, Dunnington D. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. 2020b https://CRAN.R-project.org/package=ggplot2
  113. Wickham H, Henry L. Tidyr: Tidy Messy Data. 2020 https://CRAN.R-project.org/package=tidyr
  114. Xiao N. Ggsci: Scientific Journal and Sci-Fi Themed Color Palettes For’ Ggplot2’. 2018 https://CRAN.R-project.org/package=ggsci
  115. Xie Y. Bookdown: Authoring Books and Technical Documents with R Markdown. Florida: Hall/CRC; 2016. [Google Scholar]
  116. Xie Y. Bookdown: Authoring Books and Technical Documents with R Markdown. 2020 https://CRAN.R-project.org/package=bookdown
  117. Yang Z, Kumar S, Nei M. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics. 1995;141:1641–1650. doi: 10.1093/genetics/141.4.1641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Yu G, Smith DK, Zhu H, Guan Y. Ggtree : an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017;8:28–36. doi: 10.1111/2041-210X.12628. [DOI] [Google Scholar]
  119. Yu G, Lam TT, Zhu H, Guan Y. Two methods for mapping and visualizing associated data on phylogeny using Ggtree. Molecular Biology and Evolution. 2018;35:3041–3043. doi: 10.1093/molbev/msy194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Yu G. Ggimage: Use Image In’ Ggplot2’. 2020a https://CRAN.R-project.org/package=ggimage
  121. Yu G. Ggplotify: Convert Plot To’ Grob’ Or’ Ggplot’ Object. 2020b https://CRAN.R-project.org/package=ggplotify
  122. Yu G. Tidytree: A Tidy Tool for Phylogenetic Tree Data Manipulation. 2020d https://yulab-smu.github.io/treedata-book/
  123. Yu G. Treeio: Base Classes and Functions for Phylogenetic Tree Input and Output. 2020e https://rdrr.io/bioc/treeio/
  124. Yu G. Using ggtree to Visualize Data on Tree‐Like Structures. Current Protocols in Bioinformatics. 2020f;69:e96. doi: 10.1002/cpbi.96. [DOI] [PubMed] [Google Scholar]
  125. Yu G, Lam TT. Ggtree: An R Package for Visualization of Tree and Annotation Data. 2020c doi: 10.1002/imt2.56. https://yulab-smu.github.io/treedata-book/ [DOI] [PMC free article] [PubMed]
  126. Yutani H. Gghighlight: Highlight Lines and Points In’ Ggplot2’. 2020 https://CRAN.R-project.org/package=gghighlight
  127. Zhu H. KableExtra: Construct Complex Table With’ Kable’ and Pipe Syntax. 2019 https://CRAN.R-project.org/package=kableExtra

Decision letter

Editor: Antonis Rokas1
Reviewed by: Stephen C Stearns2, Vera Gorbunova3

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The study couples careful phylogenetic work with genomics to demonstrate that all Afrotherians are equal, but some (those with large body sizes) are more equal than others with respect to genetic mechanisms that reduce cancer risk. This is a significant advance in our understanding of the genetic evolution of cancer risk with body size, and in so doing it considerably lengthens the list of candidate genes that contribute to tumor suppression in Afrotherian mammals.

Decision letter after peer review:

Thank you for submitting your article "Pervasive duplication of tumor suppressors in Afrotherians during the evolution of large bodies and reduced cancer risk" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Patricia Wittkopp as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Stephen C Stearns (Reviewer #1); Vera Gorbunova (Reviewer #2). Please note the changes in our policy on revisions we have made in response to COVID-19 (https://elifesciences.org/articles/57162).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This study addresses the question of whether duplication of tumor suppressor genes occurred coincidently with the enlarged body size of Afrotherian mammals. By reconstructing ancestral body sizes, cancer risks and gene duplication events across the Afrotherian phylogeny, the study shows that both increased body sizes and reduced cancer risks evolved gradually. Reactome pathway enrichment analysis for gene duplicates showed that gene duplicates in both lineages with or without major increases in body size / lifespan / decreases in cancer risk are enriched in many cancer related pathways. However, the authors found that 157 genes duplicated in Proboscidean stem-lineage, in which extremely large species evolved, were uniquely enriched in 12 cancer pathways. These genes might facilitate the further body enlargement and cancer resistance evolution in Proboscideans. Most interestingly, the authors found that several genes both upstream and downstream of a famous tumor suppressor TP53 have also been duplicated, either before or after initial TP53 duplication. These genes are involved in transcriptional regulation of TP53 and may have facilitated re-functionalization of TP53 retroduplicates. Overall, this is an important and interesting study that can help us understand the evolution of body size, lifespan and cancer risk in mammals more deeply.

Revisions:

1) In general, the evolutionary fate of gene duplication includes: 1) Conservation of gene function; 2) Neofunctionalization; 3) Pseudogenization; 4) Subfunctionalization (doi:10.1016/S01695347(03)00033-8). To execute the function of tumor suppression, as this study focused on, gene duplicates were supposed to be functionally conserved or subfunctionalized. Gene duplicates that have been neofunctionalized or pseudogenized will not be helpful (also mentioned by authors in the Caveats section). Therefore, it might be more convincing to investigate the functional status of each gene duplicate, especially those in Figure 4C/D. In many cases, however, a related function, rather than an entirely new function, evolves by neofunctionalization after gene duplication, and also that to check new functions for a batch of genes is not realistic, the authors could simply check the coding sequences to ensure these genes duplicates are not pseudogenes and are functional. This is necessary because in Figure 4D, many genes have only 2 copies expressed. If one of them is a young pseudogene, it could be stochastically expressed and will encode a dysfunctional protein.

2) In Results section 3, the cancer pathway frequency data of many nodes seem to not be consistent with data shown in Table 2. For example, "55.8% (29/52) of the pathways that were enriched in the Tethytherian stem-lineage…, 27.8% (20/72) of the pathways that were enriched in the Proboscidean stem-lineage…were related to tumor suppression", the cancer pathway percentages shown in Table 2 for these 2 nodes are 63.4% and 38.81%, respectively. While the frequency data in Table 2 are consistent with Supplementary file 3. It is possible that the frequency data shown in the main text are specific to pathways of tumor suppression, rather than cancer related pathways. If this is the case, more detailed data should be shown somewhere else.

3) The titles of Results section 3 and section 4 are highly similar and actually the data in section 4 seems to be used to further solidify the conclusion of section 3. Therefore, is it possible to merge them into one single section?

4) A major concern with the paper is that while the methods are very well-thought out, there are some unresolvable limitations to the data available. As it stands, the manuscript does not sufficiently guide the reader through how these issues might influence the findings. One example of this problem is in the estimation of cancer risk. The risk is estimated on the basis of body size and life span. However, that lifespan is itself phylogenetically estimated from body size at least for the non-extant species. It is not clear from the manuscript whether all lifespans are so estimated, or whether observations are used for the lifespan of the extant species. If the later, caution is indicated, because lifespan data are highly uneven and often given as observed maximal lifespans, which can be misleading if taken from, for instance, zoo specimens. In either case, the manuscript needs to more clearly emphasize that these are statistically-predicted risks, not measured risks.

5) On a broader note, the authors have done their best with a dataset that suffers from a couple of problems. First, all of the extant very large-bodied animals form a single clade, with the hyrax as the sole small-bodied member of that clade. And since the titanohyrax is extinct, among the extant organisms (an available large-bodies species with genomes) there is then a true large-bodied clade of the Sirenia and elephants and relatives. We understand that other evolutionary data make it clear that these represent two (three including titanohyrax) independent transitions to large-body sizes. But with only the modern or nearly modern genomes to work with, we are not sure that the duplication inference procedures and their coupling to the body size analysis statistically represent more than a single observation (e.g., a default of a single transition to large size along the tethytheria branch).

6) Similarly, the authors observe what appears to be a number of independent duplications of tumor suppressors in African and Asian elephants: duplications that are lacking in many of the ancient genomes considered. The authors used rigorous statistical methods to correct for the fragmented nature of these ancient genomes, but it is very hard not to wonder if some of the data in Figure 4 is really not an artifact of using ancient genomes, where detecting recent gene duplications may be very difficult (several of the Asian and African elephant duplications in Figure 4 appear to be of the same genes). If these events are truly independent and not genome assembly/annotation artifacts, there is then an alternative hypothesis to propose. Thus, are the authors suggesting that there is a rapid turnover in the duplication of tumor suppressors, such that all elephants have such duplicates, but the particular duplications have short life spans and differ from species to species?

7) It would be nice to see a few more comments on the manatee genome and why it does (or doesn't) show the expected patterns for the genome evolution in the face of the evolution of larger body sizes.

8) Figures 3 and 4 would benefit from greatly expanded captions: we do not fully understand what is being illustrated in, for instance, Figure 3B-why are certain dots connected with lines? Intersections between what in the y-axis label?

eLife. 2021 Jan 29;10:e65041. doi: 10.7554/eLife.65041.sa2

Author response


Revisions:

1) In general, the evolutionary fate of gene duplication includes: 1) Conservation of gene function; 2) Neofunctionalization; 3) Pseudogenization; 4) Subfunctionalization (doi:10.1016/S01695347(03)00033-8). To execute the function of tumor suppression, as this study focused on, gene duplicates were supposed to be functionally conserved or subfunctionalized. Gene duplicates that have been neofunctionalized or pseudogenized will not be helpful (also mentioned by authors in the Caveats section). Therefore, it might be more convincing to investigate the functional status of each gene duplicate, especially those in Figure 4C/D. In many cases, however, a related function, rather than an entirely new function, evolves by neofunctionalization after gene duplication, and also that to check new functions for a batch of genes is not realistic, the authors could simply check the coding sequences to ensure these genes duplicates are not pseudogenes and are functional. This is necessary because in Figure 4D, many genes have only 2 copies expressed. If one of them is a young pseudogene, it could be stochastically expressed and will encode a dysfunctional protein.

We agree with the reviewers that this is a very important concern. We assume that gene duplicates either maintain ancestral tumor suppressor functions, and increase cancer resistance though dosage effects or by providing redundancy to loss of function mutations thereby increasing robustness of tumor suppression, or subfunctionalize expression domains.

However, we are wary of excluding genes from our analyses that have premature stop codons and other sequence features of pseudogenes because such genes may maintain tumor suppressor functions. For example, we have identified duplicate TP53 retrogenes (TP53TRGs) in elephants, all of which encode premature stop codons suggesting they are pseudogenes. However, these TP53TRGs are expressed, encode functional separation of function mutants of the ancestral TP53 gene, and contribute to enhanced DNA damage sensitivity in elephant cells.

Similarly, we have characterized duplicate LIF gene in elephants that lack the start codon of the parent LIF gene. This duplicate gene (LIF6) is also expressed, encodes a functional transcript with translation initiated at an alternative downstream start site, and contributes to enhanced DNA damage sensitivity in elephant cells.

In addition, duplicate genes that lack coding potential, such as PTENP1, can also be expressed and while not translated function as LINC RNAs (in this case acting as a sponge for microRNAs that target the parent PTEN transcript). In each case classifying duplicates into putatively functional and non-functional categories would misclassify TP53RTG, LIF6, and PTENP1.

Because we believe this reviewer concern is a serious one, we have expanded our Caveats and limitations section to include a discussion of this concern and our rationale for including all duplicate genes in our analyses. In this expanded Caveats and limitations section we include a discussion of the evolutionary fates of duplicate genes, including a discussion of genes that encode functional transcripts and proteins but appear to be non-functional based on sequence characteristics. We also discuss the limitations of our approach.

Furthermore, we manually verified the genes discussed in Figure 4C/D to identify premature stop codons and assess divergence between duplicate copies. We have added a new section in the Materials and methods titled “Manual verification of duplicate genes” describing our verification process, and an additional paragraph to Results corresponding to our findings. For the reviewers’ convenience, we include them here:

Manual verification of duplicate genes. For a subset of 16 genes (Figure 4), we pulled the DNA sequence matches from the reciprocal best BLAT hits in the elephant and manatee genomes to determine conservation and presence of premature stop codons in the open reading frame (ORF). We translated the ORF for each hit into amino acid sequences, and grouped up hits for each gene into one FASTA file along with the UniProt protein sequences for the human, dog, cat, and cow orthologs. Using a pipeline hosted at NGPhlyogeny.fr[@Lemoine_2019], the homologs were aligned using MAFFT[@Katoh_2013]; the aligned sequences were cleaned using BMGE[@Criscuolo_2010]; and then a gene tree was constructed using FastME[@Lefort_2015]. Alignments were then visually inspected for conservation and presence of premature stop codons.

Results of manual verification: For the genes shown in Figure 4C and D, we did additional manual curation to determine if any of them contained premature stop codons, as well as to inspect conservation. For most of these genes, such as STK11, CD14, SOD1, and BRD7, all copies are strongly conserved, and lack premature stop codons. We also find that for STK11, CD14, and BRD7, manatees also possess an additional copy with high conservation, suggesting that extant manatees may also have enhanced tumor suppression and stress response. Some of these hits, however, do have premature stop codons. These include the additional copies of MAPRE1, BUB3, and COX20; and some copies of CNOT11, HMGB2, MAD2L1, LIF, and TP53. For TP53, we have previously shown that duplicate copies of genes containing premature stop codons may still serve a functional role in regulating its progenitor's function. As such, and especially in the case of the copies of COX20 and MAD2L1 which are expressed, the presence of a premature stop codon relative to the progenitor gene may not indicate a true pseudogenization and loss of functionality. Some copies, including for CASP9 and PRDX1, contained partial hits with no neighboring or overlapping hits; while these contained no premature stops, they also do not likely contain the totality of the coding sequence of their hits, and thus may represent cases of sub- or neo-functionalization.

2) In Results section 3, the cancer pathway frequency data of many nodes seem to not be consistent with data shown in Table 2. For example, "55.8% (29/52) of the pathways that were enriched in the Tethytherian stem-lineage…, 27.8% (20/72) of the pathways that were enriched in the Proboscidean stem-lineage…were related to tumor suppression", the cancer pathway percentages shown in Table 2 for these 2 nodes are 63.4% and 38.81%, respectively. While the frequency data in Table 2 are consistent with Supplementary file 3. It is possible that the frequency data shown in the main text are specific to pathways of tumor suppression, rather than cancer related pathways. If this is the case, more detailed data should be shown somewhere else.

The data shown in Table 2 and Supplementary file 3 are correct, and the discrepancy was due to a typo during writing. We thank the reviewers for catching this, and we have corrected the text in the manuscript to reflect the data in Table 2.

3) The titles of Results section 3 and section 4 are highly similar and actually the data in section 4 seems to be used to further solidify the conclusion of section 3. Therefore, is it possible to merge them into one single section?

We agree these sections are very similar and have combined them, as suggested.

4) A major concern with the paper is that while the methods are very well-thought out, there are some unresolvable limitations to the data available. As it stands, the manuscript does not sufficiently guide the reader through how these issues might influence the findings. One example of this problem is in the estimation of cancer risk. The risk is estimated on the basis of body size and life span. However, that lifespan is itself phylogenetically estimated from body size at least for the non-extant species. It is not clear from the manuscript whether all lifespans are so estimated, or whether observations are used for the lifespan of the extant species. If the later, caution is indicated, because lifespan data are highly uneven and often given as observed maximal lifespans, which can be misleading if taken from, for instance, zoo specimens. In either case, the manuscript needs to more clearly emphasize that these are statistically-predicted risks, not measured risks.

We agree that this is an important concern and that estimation of lifespan and cancer risk has inherent limitations for extinct species, we apologize if we did not sufficiently guide the reader through our rationale and logic. We used empirical body size and lifespan for extant species, and estimates of lifespan based on body size for extinct species, and both of these estimates to statistically-predict cancer risks. We also note that, at least for elephants, captive individuals have shorter lifespans than wild individuals (Lahdenperä, L. et al. Differences in age-specific mortality between wild-caught and captive-born Asian elephants. Nature Communications (2018) DOI: 10.1038/s41467-018-05515-8; Clubb R. et al. Compromised survivorship in zoo elephants. Science (2008). DOI: 10.1126/science.1164298). Thus, it is not necessarily the case that data from zoo specimens represent maximum longevity.

We have updated the “Step-wise reduction of intrinsic cancer risk in large, long-lived Afrotherians” Results section to ensure that our methods and rationale are clear. Specifically, we have edited the first paragraph of this section to start with: “In order to account for a relatively stable cancer rate across species [1012], intrinsic cancer risk must also evolve with changes body size and lifespan across species. We used empirical body size and lifespan data from extant species and empirical body size and estimated lifespan data from extinct species to estimate intrinsic cancer risk (K) with the simplified multistage cancer risk model KDt6, where D is maximum body size and t is maximum lifespan [9, 54, 58, 59].”

5) On a broader note, the authors have done their best with a dataset that suffers from a couple of problems. First, all of the extant very large-bodied animals form a single clade, with the hyrax as the sole small-bodied member of that clade. And since the titanohyrax is extinct, among the extant organisms (an available large-bodies species with genomes) there is then a true large-bodied clade of the Sirenia and elephants and relatives. We understand that other evolutionary data make it clear that these represent two (three including titanohyrax) independent transitions to large-body sizes. But with only the modern or nearly modern genomes to work with, we are not sure that the duplication inference procedures and their coupling to the body size analysis statistically represent more than a single observation (e.g., a default of a single transition to large size along the tethytheria branch).

Phylogenetic analyses utilizing both molecular and morphological data indicate that large bodies independently evolved in hyraxes, sea cows, and elephants (Paenungulata). Unfortunately, without genomes from the largest hyrax and sirenian species (such as Titanohyrax and Stellar’s sea cow) we cannot know whether gene duplications in these lineages are also enriched for tumor suppressor functions. Thus, our use of available genomes and duplication inference methods cannot address the question of convergent molecular changes underlying convergence in large bodies.

However, we do have genomes from hyrax and manatee which allows us to identify gene duplicates in these lineages and determine which gene duplication events are ancestral and derived in each lineage. For example, a gene that is duplicated in all Proboscideans but not in either manatee or hyrax will be reconstructed as duplicated in the Proboscidean stem-lineage.

One caveat to our decision to implement a cut-off of 80% when assigning ancestral copy numbers is the fact that a gene may be missing from ancient genomes due to coverage and assembly errors, and be called as “independently-duplicated” in Manatees and Elephants. While these events would underestimate the number of gene duplications that occurred in the Tethytherian stem lineage, they would ultimately not affect our conclusions that these tumor suppressors duplicated during an increase in body size in these two separate lineages. We address this in more depth in Point 6.

We acknowledge that there may be “ghost” duplication events, ie., genes that duplicated in, for example, the Tethytherian stem-lineage that are maintained in the Stellar’s sea cow genome and lost in the manatee genome; This will be reconstructed as a Proboscidean-specific gene duplication. Without genomes from multiple extinct Sirenian and Hyracoidea lineages we are not able to differentiate these alternate duplication histories. We have added a note to our Materials and methods section “Reconstruction of Ancestral Copy Numbers” to ensure this limitation of the method is clear.

While we only have genomes from extant or recently extinct species, which indeed represents a single macroevolutionary observation of increased body size/lifespan and reduced cancer risk, we identify hundreds of gene duplication events, which represents hundreds of microevolutionary observations. If there are no biases in gene duplication rates between different types of genes (and there may be) or in the probability that different types of gene duplications will reach fixation under a neutral evolutionary process (and there may be), then we should not expect to observe enrichment of a priori defined gene and pathway ontology terms among the fixed duplicate genes. That we observe the “right” pathway enrichments in the “right” lineages indicate that the hundreds of microevolutionary observations we identify support our single macroevolutionary inference.

6) Similarly, the authors observe what appears to be a number of independent duplications of tumor suppressors in African and Asian elephants: duplications that are lacking in many of the ancient genomes considered. The authors used rigorous statistical methods to correct for the fragmented nature of these ancient genomes, but it is very hard not to wonder if some of the data in Figure 4 is really not an artifact of using ancient genomes, where detecting recent gene duplications may be very difficult (several of the Asian and African elephant duplications in Figure 4 appear to be of the same genes). If these events are truly independent and not genome assembly/annotation artifacts, there is then an alternative hypothesis to propose. Thus, are the authors suggesting that there is a rapid turnover in the duplication of tumor suppressors, such that all elephants have such duplicates, but the particular duplications have short life spans and differ from species to species?

We apologize if that this figure was unclear and thank the reviewers for pointing out this problem, which we did not appreciate. We note that we are unable to determine duplication status for some genes in Proboscideans because of assembly gaps in ancient genomes; these genes appear to be independently duplicated in extant species (African Forest, African Savanah, and Asian elephants) because they are missing from ancient genomes leaving their ancestral duplication status unresolved. Thus, some of these apparently independent duplication events are because those genes are missing in the ancient genomes because of assembly gaps. We did not mean for this figure to suggest that there is a rapid turnover of tumor suppressors.

As discussed in Point 5, the absence of a gene duplicate in ancient genomes would result in our maximum likelihood method assigning the gene duplication event to the daughter lineages, in this case either Elephantidae or Sirenia. While this would underrepresent the number of genes duplicated in the Tethytheria stem lineage, we expect that this artifact would affect any duplicated gene in general, and not any particular gene or pathway; as a result, we do not expect that this would bias our outcomes in a particular direction with regards to tumor suppressor pathway enrichments.

To ensure this is clear we have redesigned Figure 4C to match the lineage coloring scheme from Figure 1, and indicate with a skull and cross bones which species are extinct. We have also added a caveat to the description of Figure 4C: “Cladogram of Afrotheria with sequenced genomes. Exemplar tumor suppressor duplicates are mapped onto lineages in which those genes are duplicated. Dots represent a duplication event of the color-coded genes. Note that we are unable to determine duplication status for some genes in Proboscideans because of assembly gaps in ancient genomes (indicated with skull and crossbones); these genes appear to be independently duplicated in extant species (African Forest, African Savanah, and Asian elephants) because they are missing from ancient genomes, biasing ancestral reconstructions of duplication status.”

7) It would be nice to see a few more comments on the manatee genome and why it does (or doesn't) show the expected patterns for the genome evolution in the face of the evolution of larger body sizes.

As we note in section “Pervasive duplication of tumor suppressor genes in Afrotheria,” 17.8% (10/56) of the pathways that were enriched in manatee (1.11-fold IBM, 0.89-log2 RICR) are related to tumor suppression. Manatee also was enriched cancer pathway percentages above background with respect to its gene set sizes, i.e., expected enrichments based on random sampling of small gene sets (shown in Figure 3B). Thus, manatee shows the expected patterns of tumor suppressor gene duplication during the evolution of larger body sizes. In addition, in the recently-added text regarding manual curations in response to Point 1, we highlight how some of the genes in Figure 4C are also highly conserved in Manatee in addition to Elephant.

8) Figures 3 and 4 would benefit from greatly expanded captions: we do not fully understand what is being illustrated in, for instance, Figure 3B-why are certain dots connected with lines? Intersections between what in the y-axis label?

We apologize if these plots were unclear. An Upset plot is a Euler diagram that shows the intersection between multiple sets, and is a way to show intersection between more than 3 sets without using Venn diagram like overlapping shapes. In Upset plots lines indicate the intersection between sets, in this case pathway terms. We have introduced a description of Upset plots in Figure 3 and a “see Figure 3 for description of upset plots” note in the caption for Figure 4.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. Chrysochloris asiatica (Cape golden mole) genome. NCBI Assembly. ChrAsi1.0
    2. Gnerre S, Heiman D, Young S, Fulton L, Delehaunty K, Minx P, Chinwalla A, Mardis E, Wilson R, Warren W. 2018. The Genome Sequence of Choloepus hoffmanni (sloth) NCBI Assembly. C_hoffmanni-2.0.1_HiC
    3. Lindblad-Toh K, Chang JL, Gnerre S, Clamp M, Lander ES. 2012. Dasypus novemcinctus (nine-banded armadillo) genome. NCBI Assembly. Dasnov3.0
    4. Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. Echinops telfairi (small Madagascar hedgehog) NCBI Assembly. EchTel2.0
    5. Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. Elephantulus edwardii (Cape elephant shrew) NCBI Assembly. EleEdw1.0
    6. Palkopoulou E, Lipson M, Mallick S, Nielsen S, Rohland N, Baleka S, Karpinski E, Ivancevic AM, To TH, Kortschak RD, Raison JM, Qu Z, Chin TJ, Alt KW, Claesson S, Dalén L, MacPhee RDE, Meller H, Roca AL, Ryder OA, Heiman D, Young S, Breen M, Williams C, Aken BL, Ruffier M, Karlsson E, Johnson J, Di Palma F, Alfoldi J, Adelson DL, Mailund T, Munch K, Lindblad-Toh K, Hofreiter M, Poinar H, Reich D. 2018. A comprehensive genomic history of extinct and living elephants. NCBI Assembly. PRJEB24361 [DOI] [PMC free article] [PubMed]
    7. Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2017. Aardvark (Orycteropus afer) genome. NCBI Assembly. oryAfe2
    8. Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. A comprehensive genomic history of extinct and living elephants. NCBI Assembly. proCap-Pcap_2.0_HiC
    9. Andrew DF, Liu Y, Thomas GWC, Vinar T, Alfoldi J, Deng J, Dugan S. 2015. West Indian manatee (Trichechus manatus) genome. NCBI Assembly. triManLat2

    Supplementary Materials

    Figure 4—source data 1. Data set used for manual coding gene potential associated with Figure 4C,D.
    Source data 1. All necessary data sets and scripts to reproduce results presented in this manuscript.
    elife-65041-data1.zip (16.1MB, zip)
    Supplementary file 1. Summary of duplications in Atlantogenata.
    elife-65041-supp1.docx (15.5KB, docx)
    Supplementary file 2. RNA-Seq data sets used in this study, along with key biological and genome information.
    elife-65041-supp2.docx (13.3KB, docx)
    Supplementary file 3. Summary of PGLS model used to estimate lifespan.
    elife-65041-supp3.docx (13KB, docx)
    Transparent reporting form

    Data Availability Statement

    All data generated or analysed during this study are included in the manuscript and supporting files.

    The following previously published datasets were used:

    Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. Chrysochloris asiatica (Cape golden mole) genome. NCBI Assembly. ChrAsi1.0

    Gnerre S, Heiman D, Young S, Fulton L, Delehaunty K, Minx P, Chinwalla A, Mardis E, Wilson R, Warren W. 2018. The Genome Sequence of Choloepus hoffmanni (sloth) NCBI Assembly. C_hoffmanni-2.0.1_HiC

    Lindblad-Toh K, Chang JL, Gnerre S, Clamp M, Lander ES. 2012. Dasypus novemcinctus (nine-banded armadillo) genome. NCBI Assembly. Dasnov3.0

    Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. Echinops telfairi (small Madagascar hedgehog) NCBI Assembly. EchTel2.0

    Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. Elephantulus edwardii (Cape elephant shrew) NCBI Assembly. EleEdw1.0

    Palkopoulou E, Lipson M, Mallick S, Nielsen S, Rohland N, Baleka S, Karpinski E, Ivancevic AM, To TH, Kortschak RD, Raison JM, Qu Z, Chin TJ, Alt KW, Claesson S, Dalén L, MacPhee RDE, Meller H, Roca AL, Ryder OA, Heiman D, Young S, Breen M, Williams C, Aken BL, Ruffier M, Karlsson E, Johnson J, Di Palma F, Alfoldi J, Adelson DL, Mailund T, Munch K, Lindblad-Toh K, Hofreiter M, Poinar H, Reich D. 2018. A comprehensive genomic history of extinct and living elephants. NCBI Assembly. PRJEB24361

    Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2017. Aardvark (Orycteropus afer) genome. NCBI Assembly. oryAfe2

    Di Palma F, Alfoldi J, Johnson J, Berlin A, Gnerre S, Jaffe D, MacCallum I, Young S, Walker BJ, Lindblad-Toh K. 2013. A comprehensive genomic history of extinct and living elephants. NCBI Assembly. proCap-Pcap_2.0_HiC

    Andrew DF, Liu Y, Thomas GWC, Vinar T, Alfoldi J, Deng J, Dugan S. 2015. West Indian manatee (Trichechus manatus) genome. NCBI Assembly. triManLat2


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES