Abstract
Bats possess extraordinary adaptations, including flight, echolocation, extreme longevity and unique immunity. High-quality genomes are crucial for understanding the molecular basis and evolution of these traits. Here we incorporated long-read sequencing and state-of-the-art scaffolding protocols1 to generate, to our knowledge, the first reference-quality genomes of six bat species (Rhinolophus ferrumequinum, Rousettus aegyptiacus, Phyllostomus discolor, Myotis myotis, Pipistrellus kuhlii and Molossus molossus). We integrated gene projections from our ‘Tool to infer Orthologs from Genome Alignments’ (TOGA) software with de novo and homology gene predictions as well as short- and long-read transcriptomics to generate highly complete gene annotations. To resolve the phylogenetic position of bats within Laurasiatheria, we applied several phylogenetic methods to comprehensive sets of orthologous protein-coding and noncoding regions of the genome, and identified a basal origin for bats within Scrotifera. Our genome-wide screens revealed positive selection on hearing-related genes in the ancestral branch of bats, which is indicative of laryngeal echolocation being an ancestral trait in this clade. We found selection and loss of immunity-related genes (including pro-inflammatory NF-κB regulators) and expansions of anti-viral APOBEC3 genes, which highlights molecular mechanisms that may contribute to the exceptional immunity of bats. Genomic integrations of diverse viruses provide a genomic record of historical tolerance to viral infection in bats. Finally, we found and experimentally validated bat-specific variation in microRNAs, which may regulate bat-specific gene-expression programs. Our reference-quality bat genomes provide the resources required to uncover and validate the genomic basis of adaptations of bats, and stimulate new avenues of research that are directly relevant to human health and disease1.
Subject terms: Phylogenetics, Evolutionary biology, Genomics, Virology
Reference-quality genomes for six bat species shed light on the phylogenetic position of Chiroptera, and provide insight into the genetic underpinnings of the unique adaptations of this clade.
Main
With more than 1,400 species identified to date2, bats (Chiroptera) account for about 20% of all extant mammal species. Bats are found around the world and successfully occupy diverse ecological niches1. Their global success is attributed to an extraordinary suite of adaptations1 including powered flight, laryngeal echolocation, vocal learning, exceptional longevity and a unique immune system that probably enables bats to better tolerate viruses that are lethal to other mammals (such as severe acute respiratory syndrome-related coronavirus, Middle East respiratory syndrome-related coronavirus and Ebola virus)3. Bats therefore represent important model systems for the study of extended healthspan4, enhanced disease tolerance3, vocal communication5 and sensory perception6. To understand the evolution of bats and the molecular basis of these traits, we generated reference-quality genomes for six bat species as part of the Bat1K global genome consortium1 (http://bat1k.com) in coordination with the Vertebrate Genome Project (https://vertebrategenomesproject.org). These six bat genera span both major suborders Yinpterochiroptera (R. ferrumequinum and R. aegyptiacus) and Yangochiroptera (P. discolor, M. myotis, P. kuhlii, M. molossus)7 (Supplementary Table 1), represent extremes in bat longevity8, possess major adaptations in bat sensory perception1 and can better survive viral infections as compared with other mammals3.
Genome sequencing and assembly
To obtain genome assemblies of high contiguity and completeness, we developed pipelines that incorporate state-of-the-art sequencing technologies and assembly algorithms (Supplementary Notes 1, 2). In brief, we generated PacBio continuous long reads, 10x Genomics Illumina read clouds, Bionano optical maps and chromosome conformation capture (Hi-C) Illumina read pairs for each bat species (Fig. 1a). We assembled the PacBio reads into contigs using a customized assembler we termed DAmar, a hybrid of the earlier Marvel9, Dazzler and Daccord10,11 systems. Next, we used 10x Illumina read-cloud data to correct base errors and phase haplotypes, arbitrarily picking one haplotype in a phased block. Finally, we used Bionano optical maps and then Hi-C data to produce long-range scaffolds (Extended Data Fig. 1a, b, Supplementary Note 2). For all six bat species, this resulted in assemblies with high contiguity: 96–99% of each assembly is in chromosome-level scaffolds (N50 values of 92–171.1 Mb) (Fig. 1b, Extended Data Figs. 1c, d, 2a). When compared with previously published bat genomes12–19, our assemblies have higher contig N50 values—ranging from 10.6 to 22.2 Mb—and therefore, these are two orders of magnitude more contiguous than bat genomes assembled from short-read data alone (Fig. 1b, Extended Data Fig. 1d, Supplementary Tables 2, 3, Supplementary Note 2). Similarly, our genomes are estimated to have near-100% gene completeness (see ‘Gene annotation’) (Fig. 1c, d, Supplementary Table 4, Supplementary Note 3.1). Furthermore, analysis of 197 nonexonic ultraconserved elements20 indicates a high completeness of nonexonic genomic regions. This analysis also revealed three cases of marked sequence divergence of ultraconserved elements in vespertilionid bats—something rarely observed in these elements, which are highly constrained amongst placental mammals (Extended Data Fig. 2b–d, Supplementary Figs. 1–3, Supplementary Table 5, Supplementary Note 3.2). In summary, these genomes are comparable to the best reference-quality genomes that have so far been generated for any eukaryote with a gigabase-sized genome21.
Gene annotation
To comprehensively annotate protein-coding genes, we integrated different types of genetic evidence—including short-read (RNA sequencing (RNA-seq)) and long-read (isoform sequencing (Iso-Seq)) transcriptomic data from our bat species, gene projections by TOGA, aligned protein and cDNA sequences of related mammals, and de novo gene predictions (Fig. 1c). For the six bat species, we annotated between 19,122 and 21,303 protein-coding genes (Fig. 1e). Using the 4,104 mammalian genes in the ‘Benchmarking Universal Single-Copy Orthologs’ (BUSCO)22 set, we achieved 99.3–99.7% completeness (Fig. 1d); this shows that our assemblies and annotations are highly complete in protein-coding sequences (Extended Data Fig. 3a). Importantly, the completeness of our gene annotations is higher than available annotations of dog, cat, horse, cow and pig, and is only surpassed by those of human and mouse, which have received extensive manual curation (Fig. 1d, Supplementary Table 4). Thus, reference-quality genome assemblies combined with multiple types of gene evidence can generate high-quality and near-complete gene annotations of bats. This strategy can be extended to other species to improve genome assembly and annotation. All individual evidence and final gene sets can be visualized in the Bat1K genome browser (https://genome-public.pks.mpg.de) and downloaded from https://bds.mpi-cbg.de/hillerlab/Bat1KPilotProject/.
Genome sizes and transposable elements
At about 2 Gb in size, bat genomes are generally smaller than genomes of other placental mammals1 (which are typically 2.5–3.5 Gb). By annotating transposable elements in our genomes (Supplementary Note 3.3), we found that smaller genome size is related to lower transposable element content (Extended Data Fig. 3b). Recently inserted transposable elements in the bat genomes are extremely variable in terms of their type and number, as compared to other mammals (Extended Data Fig. 3c). In vespertilionid bats, we detected recent activity of rolling-circle and DNA transposon classes that have been largely dormant in other mammals for over 40 million years23. In summary, bats exhibit substantial diversity in transposable element content, and diverse transposable element classes show evidence of recent activity.
The phylogenetic origin of Chiroptera
Identifying the evolutionary origin of bats within the mammalian clade Laurasiatheria is a key prerequisite for any comparative analyses. However, the phylogeny of Laurasiatheria and—in particular—the origin of bats is a long-standing and unresolved phylogenetic question24, as multiple phylogenetic and systematic studies support alternative topologies25. These incongruent results have been attributed to the challenge of identifying the two (presumably short) internal branches that link the four key clades that diverged in the Late Cretaceous period26—that is, Chiroptera, Cetartiodactyla, Perissodactyla and (Carnivora + Pholidota) (Fig. 2, Supplementary Table 1).
We revisited this question, leveraging the high completeness of our gene annotations. We extracted a comprehensive dataset of 12,931 orthologous protein-coding genes using TOGA (21,468,943 aligned nucleotides in length and 7,911,881 parsimony-informative sites) and 10,857 orthologous conserved noncoding elements (5,234,049 aligned nucleotides and 1,234,026 parsimony-informative sites) from 48 mammalian genomes (Supplementary Note 4.1). We concatenated each of these datasets, identified the optimal model of sequence evolution with ModelFinder27 (Supplementary Table 6), inferred the species tree under maximum likelihood using the model-partitioned dataset with IQ-TREE28, rooted using Atlantogenata29, and obtained 1,000 bootstrap replicates to estimate branch support (Supplementary Note 4.2). For each protein-coding gene, we also compared the optimal gene tree inferred under maximum likelihood to the species tree, using the Robinson–Foulds distance to identify gene alignments with possibly incorrect homology statements30 (Supplementary Note 4.2.2). Our analysis of concatenated protein-coding genes identified the origin of bats within Laurasiatheria with 100% bootstrap support across the entire tree (Fig. 2). Omitting the top-scoring 100 and 500 genes (based on Robinson–Foulds distance) from the phylogenetic data produced the same tree topology, which suggests a small effect of homology error on the inferred phylogeny (Extended Data Fig. 4a, b). The tree inferred from the conserved noncoding element data identified the same phylogenetic position of bats, and differed from that shown in Fig. 2 only in the position of Perissodactyla (most closely related to Carnivora + Pholidota rather than to Cetartiodactyla) (Extended Data Fig. 5a). Therefore, both coding and noncoding regions of the genome support an early split between Eulipotyphla and the rest of the laurasiatherians (that is, Scrotifera); within Scrotifera, Chiroptera is the sister clade to Fereuungulata (Cetartiodactyla + Perissodactyla + Carnivora + Pholidota). This tree challenges the Pegasoferae hypothesis31, which groups bats with Perissodactyla, Carnivora and Pholidota, but agrees with a previous study of concatenated phylogenomic data32. Evolutionary studies of 102 retrotransposons, which considered incomplete lineage sorting, also supported a sister-group relationship between Chiroptera and Fereuungulata, but differ from the present study in supporting a sister-group relationship between Carnivora and Cetartiodactyla25,26.
Next, we considered potential phylogenetic problems with our data and methods. First, as the number of homologous sites increases in phylogenomic datasets, so too does bootstrap support33—sometimes even for an incorrect tree34. Therefore, we estimated the maximum likelihood support of each protein-coding gene (n = 12,931) for the 15 bifurcating trees that represent all possible topologies of the 4 key clades (Supplementary Fig. 4), with Eulipotyphla as the outgroup and the clade subtrees as in Fig. 2. We found that the best-supported tree is identical to the tree estimated from our concatenated protein-coding gene set (Fig. 2; tree 1 with 1,007/10,822 genes, described in Extended Data Fig. 5b and Supplementary Note 4.2.1) and shows the sister-group relationship between Chiroptera and Fereuungulata, which is also supported by the conserved noncoding elements (Extended Data Fig. 5a). Second, model misspecification (owing to a poor fit between phylogenetic data and the model of sequence evolution used) or loss of the historical signal35 can cause biases in phylogenetic estimates36. To assess whether these factors may have confounded our phylogenetic estimate (Fig. 2), we examined the 12,931 alignments of protein-coding genes for evidence of violating the assumption of evolution under homogeneous conditions (assumed by the phylogenetic methods used here) and for evidence that the historical signal has decayed almost completely (owing to multiple substitutions at the same sites; Supplementary Note 4.2). A total of 488 gene alignments, comprising 1st and 2nd codon sites from all 48 taxa (241,098 sites and 37,588 parsimony-informative sites), were considered optimal for phylogenetic analysis and were concatenated into a data matrix (Supplementary Table 7). Maximum likelihood trees were generated but resulted in an ambiguous phylogenetic estimate (Extended Data Fig. 5c, topology 13 in Supplementary Fig. 4, Supplementary Note 4.2). Therefore, we analysed these 488 genes individually using SVDquartets37, a single-site coalescence-based method that provides an alternative to phylogenetic analysis of a concatenation26. The inferred optimal tree again supported Chiroptera as sister group to Fereuungulata (Extended Data Fig. 5d, topology 1 in Supplementary Fig. 4), which is the most-supported position from all of our analyses and data partitions. Taken together, multiple lines of evidence from across the genome provide the highest support for Chiroptera as basal within Scrotifera (Fig. 2).
Screens for gene selection, losses and gains
Using our best-supported species phylogeny (Fig. 2), we explored the genomic basis of exceptional traits shared by bats. We performed three unbiased genome-wide screens for gene changes that occurred in the six bat species. First, we screened the 12,931 protein-coding genes classified as 1:1 orthologues for signatures of positive selection on the ancestral bat branch (stem Chiroptera), under the aBSREL38 model using HyPhy39 (false discovery rate < 0.05) (Supplementary Note 4.3). We further required that the branch-site test implemented in codeml40 (part of the PAML package) independently verified positive selection, and manually excluded alignment ambiguities. This strict screen identified nine genes with diverse functions that have robust evidence of positive selection in the bat ancestor (Supplementary Table 8). This included the genes LRP2 and SERPINB6, which—among other functions—are involved in hearing. Both genes are expressed in the cochlea and, in humans, are associated with disorders that involve deafness41,42 (Supplementary Note 4.3). LRP2 has an amino acid substitution that is specific to bats with laryngeal echolocation, as pteropodid bats—which do not have laryngeal echolocation—exhibit a different, derived amino acid (Extended Data Fig. 6a). In a third hearing-related gene TJP243, our analysis identified a putative microduplication that is also found only in echolocating bats (Extended Data Fig. 6b). These echolocator-specific mutations were further confirmed using publicly available bat genomes (n = 6) and all three genes were found not to be under positive selection in the non-bat-ancestral lineages (that is, Cetartiodactyla and Carnivora) using our strict selection protocols (Supplementary Note 4.3.3). If these mutations and the ancestral signatures of selection in these genes are indeed related to echolocation, this would provide molecular evidence that laryngeal echolocation evolved once in the bat ancestor with a subsequent loss in pteropodids rather than as multiple independent acquisitions within the echolocating bats, informing a long-standing debate in bat biology on the origin of echolocation44.
In addition to hearing-related genes, our genome-wide screen also revealed bat-specific selection on several immunity-related genes: the B-cell-specific chemokine CXCL1345, the asthma-associated NPSR146 and INAVA, a gene that is involved in intestinal barrier integrity and enhancing NF-κB signalling in macrophages47. Changes in these genes may have contributed to the unique tolerance of pathogens among bats3. By specifically testing 2,453 candidate genes with immune- and age-related Gene Ontology terms (Supplementary Note 4.3), and strictly requiring significance by both aBSREL and codeml with multiple test correction (false discovery rate < 0.05), we found 10 additional genes with robust evidence of positive selection in the ancestral bat lineage (Extended Data Fig. 6c, Supplementary Table 9, Supplementary Note 4.3.2). These additional genes include IL17D48 and IL1B49, which are involved in immune system regulation and NF-κB activation, and LCN250 and GP251, which are involved in responses to pathogens. We further used I-TASSER52 to model the three-dimensional (3D) structure of all of the proteins encoded by the genes under positive selection, and to estimate the effect of the bat-specific residues on protein structure and stability. Our results show that bat-specific substitutions with significant support for positive selection are predicted to have stabilizing or destabilizing effects (for example, AZGP1 and INAVA), which may affect protein function (Supplementary Note 4.4). Some bat-specific substitutions also occur in or near regions that may be directly involved in ligand-binding (for example, DEFB1, LCN2, SERPINB6 and KBTBD11). Overall, combining genome-wide and candidate screens revealed several candidate genes, which suggests that ancestral bats evolved immunomodulatory mechanisms that enabled a higher tolerance to pathogens than is typical amongst mammals. Consistent with this, repeating the stringent genome-wide screen to detect selection on comparable, ordinal branches leading to the ancestors of Carnivora and Cetartiodactyla revealed fewer immune-related genes (three and four genes for Carnivora and Cetartiodactyla, respectively) (Supplementary Table 10, Supplementary Note 4.3.3).
In our second genome-wide screen, we used a previously developed approach53 to systematically screen for gene losses (Supplementary Note 4.5). This revealed 10 genes that are inactivated in our 6 bat species but that are present in the majority of non-bat members of Laurasiatheria (Supplementary Table 11). Two of these lost genes have immune-stimulating functions (Fig. 3a). LRRC70 is a broadly expressed gene that potentiates cellular responses to multiple cytokines and amplifies NF-κB activation mediated by bacterial lipopolysaccharides54. IL36G is overexpressed in patients with psoriasis or inflammatory bowel disease, and encodes a pro-inflammatory interleukin that induces the canonical NF-κB pathway and other pro-inflammatory cytokines55–57. We confirmed the loss of these genes in additional, publicly available bat genomes (n = 9) (Extended Data Fig. 7). Together, genome-wide screens for gene loss and positive selection revealed several genes involved in NF-κB signalling (Fig. 3b, Supplementary Note 4.3), which suggests that altered NF-κB signalling may contribute to immune-related adaptations in bats.
Third, we investigated changes in the sizes of gene families, which revealed 35 gene families that exhibit significant expansions or contractions in the bat ancestor (Supplementary Table 12). Among these, we inferred an expansion of the APOBEC gene family caused by expansion at the APOBEC3 locus (Fig. 3c), which is known to exhibit a complex history of duplication and loss in the flying foxes (Pteropus genus)58 as well as in other mammals59. Our detailed analysis indicates a small expansion of APOBEC3 in the ancestral bat lineage, followed by multiple, lineage-specific expansions that involve up to 14 duplication events (Supplementary Fig. 5, Supplementary Note 4.6), including the generation of a second APOBEC3 locus in Myotis. APOBEC3-type genes encode DNA- and RNA-editing enzymes that can be induced by interferon signalling and are implicated in restricting viral infection and transposon activity60,61. Expansion of APOBEC3 genes in multiple bat lineages may contribute to viral tolerance in these lineages.
Integrated viruses in bat genomes
There is mounting evidence that suggests that bats can better tolerate and survive viral infections than most mammals, owing to adaptations in their immune response3. This is further supported by our findings of selection and loss of immune-related genes and expansions of the viral-restricting APOBEC3 genes. As viral infections can leave traces in host genomes in the form of endogenous viral elements (EVEs)62, we screened our bat genomes to ascertain whether they contain a higher number and diversity of EVEs compared with other mammals (Supplementary Note 3.4). First, we focused on non-retroviral EVEs that generally are less abundant in animal genomes compared to endogenous retroviruses (ERVs)62. We identified three predominant non-retroviral families of EVEs—the Parvoviridae, Adenoviridae and Bornaviridae—in individual bat species and in other mammalian outgroups (Extended Data Fig. 8a). We also detected a partial filovirus EVE in Vespertilionidae (Pipistrellus and Myotis), which is consistent with a previous report that vespertilionid bats have—in the past—been exposed to and can survive filoviral infections63.
Second, we focused on retroviral protein-coding genes from all ERV classes. Consistent with other mammals, the highest number of integrations came from beta- and gamma-like retroviruses64,65 (Extended Data Fig. 8b, Supplementary Fig. 6). Notably, in the genomes of several bat species (Phyllostomus, Rhinolophus, and Rousettus), we found DNA that encodes viral envelope (Env) proteins that are more similar to those of the alpharetroviruses than to other retroviral genera (Extended Data Fig. 8b, c). Until now, alpharetroviruses have been considered as exclusively endogenous avian viruses66; consequently, our discovery of alpharetroviral-like elements in the genomes of several bat species suggests that bats have been infected by these viruses (Extended Data Fig. 8c). Phylogenetic analysis suggests that most viral integrations are relatively recent integration events (Supplementary Fig. 7). This analysis also revealed short gag-like fragments with similarity to lentiviruses in Pipistrellus (a retrovirus genus rarely observed in endogenized form)67, although it is not clear whether these resulted from ancient lentiviral integrations; two families of foamy retroviruses belonging to the spumaretroviruses in Rhinolophus (confirming the presence of endogenous spumaretroviruses in this species); and pol-like sequences clustering with deltaretroviruses in Molossus. Overall, these results show that bat genomes contain a diversity of ERVs, which provides evidence of past viral infections. The integrated ERVs are available as an annotation track in the Bat1K genome browser (https://genome-public.pks.mpg.de) (Extended Data Fig. 8d).
Changes in noncoding RNAs
The role of noncoding RNAs in driving phenotypic adaptation has recently been established68, but little is known about their evolution in bats. We comprehensively annotated noncoding RNAs in our bat genomes, and screened for variation in noncoding RNA by comparing our 6 bat species with 42 other mammals (Fig. 4a, Supplementary Note 5.1). We found that nearly all of the annotated noncoding RNA genes are shared across all six bat genomes (Supplementary Fig. 8), and between bats and other mammals (for example, 95.8–97.4% are shared between bats and humans). Given the importance of microRNAs (miRNAs) as developmental and evolutionary drivers of change69, we specifically investigated the evolution of families of miRNA genes. We identified 286 conserved miRNA gene families across all mammals (Supplementary Table 13), 11 of which were significantly contracted (false discovery rate < 0.05) (Extended Data Fig. 9a, Supplementary Fig. 9), and 13 of which were lost, in the ancestral bat branch (Supplementary Figs. 10, 11, Supplementary Note 5.2)—a pattern comparable to that of other mammal lineages (Extended Data Fig. 9a).
Next, we investigated the evolution of single-copy miRNA genes. Alignments of 98 highly conserved, single-copy miRNAs identified across the 6 bat and 42 other mammalian genomes revealed that one miRNA (miR-337-3p) had unique variation in the seed region in bats, as compared to other mammals (Fig. 4b, Extended Data Fig. 9b). We generated libraries for small RNA-seq from the brain, liver and kidney across the six bat species and showed that miR-337-3p is pervasively expressed (Extended Data Fig. 9c). Because miRNA seed sequences are the strongest determinant of target specificity, these seed changes are expected to alter the repertoire of sequences targeted by miR-337-3p in bats. Indeed, reporter assays (Supplementary Note 5.4, Supplementary Table 14) revealed that bat miR-337-3p strongly repressed the expression of its cognate bat target sequence but had no effect on the human site (and vice versa) (Fig. 4c), which demonstrates that the bat-specific seed sequence changes alter miR-337-3p binding specificity. We further explored whether this difference in binding specificity changes the set of target genes regulated, and found that bat and human miR-337-3p are predicted to regulate a distinct spectrum of gene targets (Supplementary Tables 15, 16, Supplementary Note 5.3). Gene Ontology enrichment analysis of these target gene sets suggests a shift towards regulation of developmental, rhythmic, synaptic and behavioural gene pathways in bats (Extended Data Fig. 9d), pointing to a marked change in processes regulated by miR-337-3p in this clade.
In addition to losses and variation, continuous miRNA innovation has previously been suggested to act as a key player in the emergence of increasing organismal complexity in eukaryotes68. To identify novel miRNAs (defined as having a novel seed sequence) that evolved in bats, we screened for novel sequences in the small RNA libraries from all six bat species (Supplementary Table 17, Supplementary Note 5.3). This expression analysis revealed 122–261 novel miRNAs across the 6 bat genomes, with only a small number being shared across 2 or more bats (Supplementary Fig. 12). From these, we identified 12 novel miRNAs that are present in the genome of all 6 bat species and that are also without apparent homologues in other mammals (Supplementary Table 18). To test whether these candidates are functional miRNAs, we selected the top three candidates (Supplementary Table 18, Supplementary Note 5.3), and experimentally tested their ability to regulate an ideal target sequence in reporter assays (Supplementary Table 14). Two of the three miRNAs we tested (miR-19125 and miR-6665) were able to regulate their targets, which shows that they are actively processed by endogenous miRNA machinery, loaded onto the RNA-induced silencing complex and able to repress target mRNAs (Fig. 4d). Thus, miR-19125 and miR-6665 represent true miRNAs that are evolutionary novelties in bats. Taken together, these data demonstrate innovation in the bat lineage, both in miRNA seed sequence and novel miRNA emergence. Further detailed mechanistic studies are required to determine the role of these miRNAs in bat physiology and evolution.
All of the results described here are supported by additional material that can be found in the Supplementary Methods, Supplementary Notes 1–5, Supplementary Tables 1–46, Supplementary Figs. 1–20 and Supplementary Data 1–3.
Conclusion
We have generated chromosome-level, near-complete assemblies of six bat species that represent diverse chiropteran lineages. Using the comprehensive annotations of our bat genomes together with phylogenomic methodologies, we address the evolutionary origin of bats within Laurasiatheria and resolve bats as the sister taxa to Fereuungulata. Our conservative genome-wide screens investigating gene gain, loss and selection revealed novel candidate genes that are likely to contribute tolerance to viral infections among bats. Consistent with this finding, we also found that bat genomes contain a high diversity of endogenized viruses. We also uncovered genes involved in hearing that exhibit mutations specific to laryngeal-echolocating bats and ancestral patterns of selection. If future experiments show that these changes are indeed related to hearing, this would support a single ancestral origin of laryngeal echolocation and its subsequent loss in pteropodid bats. Finally, we identified and experimentally validated miRNAs that are evolutionary novelties or that carry bat-specific changes in their seed sequence. Changes in these important regulators of gene expression may have contributed to changes in developmental and behavioural processes in bats.
These high-quality bat genomes, together with future genomes, will provide a rich resource to address the evolutionary history and genomic basis of bat adaptations and biology, which is the ultimate goal of Bat1K1. These genomes enable a better understanding of the molecular mechanisms that underlie the exceptional immunity and longevity of bats, allowing us to identify and validate molecular targets that ultimately could be harnessed to alleviate human ageing and disease. For example, we predict that our reference-quality bat genomes will be tools that are heavily relied upon in future studies focusing on how bats tolerate coronavirus infections. This is of particular global relevance given the current pandemic of coronavirus disease 2019 (COVID-19), and ultimately may provide solutions to increase human survivability—thus providing a better outcome for this, and future, pandemics.
Methods
No statistical methods were used to predetermine sample size. The experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment.
Genome sequencing
Genome sequencing was performed following the protocols of the Bat1K genome consortium (http://bat1k.com) in coordination with the Vertebrate Genome Project (https://vertebrategenomesproject.org/)70. Ultralong and long genomic DNA from various bat tissues was isolated either (a) by phenol–chloroform based DNA clean-up and precipitation, (b) with the Qiagen MagAttract HMW DNA kit or (c) with the agarose-plug-based Bionano Prep Animal tissue kit following the manufacturer’s instructions. The fragment size of all genomic DNAs was controlled by pulse-field gel electrophoresis before library construction. Size-selected PacBio CLR libraries of at least 20 kb in size were run on the SEQUEL system with 10-h movie times. For Bionano optical mapping, genomic DNA was labelled following either the NLRS or the DLS protocol according to the manufacturer’s instructions. Labelled genomic DNAs were run on the Bionano Saphyr instrument to at least 100× genome coverage. Linked Illumina reads were generated with the 10x Genomics Chromium genome protocol according to the manufacturer’s instructions. These libraries were sequenced on short read Illumina devices with a 150-bp paired-end regime. Hi-C confirmation capture was performed by Phase Genomics, ARIMA Genomics or by applying the ARIMA Genomics Hi-C kit. High-quality RNA was extracted by using commercially available RNA isolation kits. Standard PacBio Iso-Seq SMRTbell libraries were sequenced on the SEQUEL device with 10-h or 20-h movie times. Details of DNA and RNA library preparation are described in Supplementary Note 1, and statistics of all data collected for each bat are provided in Supplementary Note 2.1.
Genome assembly
To reconstruct each genome, we first assembled the Pacbio reads ≥ 4 kb in length into contigs with our custom assembler DAmar, which outputs a set of ‘primary’ contigs that are guaranteed not to be a haplotype variant of a segment of another primary contig (called an ‘alternate’ contig). Consensus sequences of primary contigs were produced with two rounds of Arrow. The 10x data were subsequently used to both polish the consensus sequence further and to maximally phase heterozygous haplotype variation, followed by selecting one haplotype for each phased block arbitrarily. Bionano data were assembled into optical maps with Bionano Solve, which were used to scaffold the primary contigs and occasionally break a misjoined sequence contig. Finally, using Salsa2, the Hi-C data were used to scaffold the data into chromosome-spanning scaffolds. Measurements of karyotype images were used to assess whether scaffolds lengths resemble chromosome lengths.
To assess genome completeness, we used BUSCO (version 3)22 with the mammalian (odb9) protein set, applied both to our assemblies and our gene annotations. To assess completeness in noncoding regions, we used Blat (v.36x2)71 with sensitive parameters to determine how many of 197 nonexonic ultraconserved elements20 align at ≥ 85% identity.
Gene annotation
To comprehensively annotate genes, we integrated different evidence. First, we used GenomeThreader (v.1.7.0)72 to align protein and RNA transcript sequences from NCBI or Ensembl for one other closely related bat species that has annotated genes. Second, we projected genes contained in the human, mouse and Myotis lucifugus Ensembl 96 annotation73 and our M. myotis annotation to other bats. To this end, we generated whole-genome alignments as described in ref. 74 and used Tool to infer Orthologs from Genome Alignments (TOGA)—a method that identifies the co-linear alignment chain(s)75 aligning the putative orthologue using synteny and the amount of intronic/intergenic alignments—and annotated genes with CESAR 2.076 in multi-exon mode. Third, we generated de novo gene predictions by applying Augustus77 in single-genome mode with a bat-specific gene model trained by BRAKER (v.2.1)78 and extrinsic evidence provided as hints. In addition, we applied Augustus in comparative mode to a multiple genome alignment generated by MultiZ (v.11.2). Fourth, we used transcriptomic data from both publicly available data sources and our own Illumina short read RNA-seq data. Additionally, we generated PacBio long-read RNA sequences (Iso-Seq) from all six species to capture full-length isoforms and accurately annotate untranslated regions (UTRs). RNA-seq reads were stringently mapped using HISAT2 (v.2.0.0)79. Transcriptomic data were processed using TAMA80. All transcriptomic, homology-based and ab initio evidence were integrated into a consensus gene annotation using EVidenceModeller (v.1.1.1)81. High-confidence transcripts and TOGA projections were added if they provided novel splice site information.
Transposable elements
We annotated each genome for transposable elements (TEs) following previous methods82 that incorporate de novo TE discovery with RepeatModeler83 followed by manual curation of potentially novel TEs (putative elements with mean K2P divergences <6.6% from the relevant consensus). Starting consensus sequences were also filtered for size (>100 bp). To classify final consensus sequences, each TE was examined for structural hallmarks and compared to online databases: blastx to confirm the presence of known ORFs in autonomous elements, RepBase (v.20181026) to identify known elements and TEclass84 to predict TE type. Finally, duplicates were removed via the program cd-hit-est (v.4.6.6)85,86 if they did not pass the 80-80-80 rule as described in ref. 87. The final de novo curated elements were combined with a vertebrate library of known TEs in RepBase (v.20181026) (Supplementary Data 1) and RepeatMasker analysis of the bats and seven mammalian outgroups were examined. Full details of these methods are available in Supplementary Note 3.3.
Phylogenomics
Human transcripts were projected to 41 additional mammal species resulting in 12,931 genes classified as 1:1 orthologues by TOGA (Supplementary Data 2). Non-homologous segments were trimmed and CDS sequences were aligned. The best-fit model of sequence evolution for each alignment was found and used to infer a maximum likelihood (ML) gene tree using IQTREE28. Individual gene alignments were also concatenated into a partitioned supermatrix, which was used to estimate the mammalian species tree. Branch support for this tree was determined using 1,000 bootstrap replicates. This species tree was rooted on Atlantogenata and used to determine the position of Chiroptera position within Laurasiatheria. Individual gene trees were compared to the species tree using Robinson–Foulds (RF) distances30. Phylogenomic signal within our genomes was further explored by estimating the ML support of each protein-coding gene for the 15 possible bifurcating laurasiatherian topologies involving four clades, with Eulipotyphla as the outgroup. An additional supermatrix, consisting of 10,857 orthologous conserved noncoding elements (CNEs), was generated and explored using the aforementioned methods.
To assess whether model misspecification or loss of historic signal affected our data, all 12,931 alignments were examined for evidence of violating the assumptions of evolution under homogeneous conditions and a decay of signal owing to multiple substitutions. A total of 488 gene alignments, containing all 48 taxa, were considered optimal for phylogenetic analysis under these conditions. These data were explored using the methods above, and the SVDquartets single-site coalescence-based method37, as an alternative to concatenation. A full description of all phylogenetic methods is available in Supplementary Note 4.2.
Gene selection, loss and gain
We screened all 12,931 orthologous genes for signatures of positive selection on the stem Chiroptera branch using the best supported mammalian phylogeny and two state-of-the-art methods, aBSREL implemented in HyPhy39 and codeml in PAML40. We required a HyPhy false discovery rate < 0.05 (using the Benjamini–Hochberg procedure to correct for 12,931 statistical tests) and a codeml P < 0.05. To increase the sensitivity in detecting positive selection in genes relevant for prominent bat traits, we also performed a screen considering 2,453 candidate genes associated with longevity, immunity or metabolism. Genes showing evidence of positive selection were subsequently explored using protein structure prediction and modelling methods (Supplementary Data 3). To systematically screen for gene losses, we used a previously developed approach53 (Supplementary Note 4.5), and required that less than 80% of the ORF was intact in all six bats, excluding genes classified as lost in more than 20% of non-Chiroptera Laurasiatherian mammals contained in our 120-mammal multiple genome alignment88 (Supplementary Note 4.5). We confirmed the presence of inactivating mutations in independently sequenced bat species. To investigate expansions and contractions of protein families, we used CAFE89 with a false discovery rate < 0.05 cut-off. As input for CAFE, we clustered Ensembl-annotated proteins into families using POrthoMCL90 and the PANTHER Database (v.14.0)91 and our ultrametric time tree, generated using r8s.
Integrated viruses in bat genomes
The six bat genomes and seven additional mammalian genomes were inspected for the presence of EVEs and ERVs. Potential integrations were identified using local BLAST92 with 14 probes for the viral proteins Gag, Pol and Env from each genus of Retroviridae for ERVs; tblastn92 of an established comprehensive library62 of non-retroviral proteins identified integrations of other viral types. Reciprocal blast of identified regions was used to identify viral family (for EVEs) or closest retroviral genus (for ERVs). Regions for each viral protein family passing quality thresholds were aligned using MUSCLE within Aliview93. A phylogenetic tree for the identified retroviral pol-like sequences from the six bat genomes and probes was then reconstructed using RAxML with the VT + G model94.
Evolution of noncoding genomic regions
Conserved noncoding RNA genes were annotated using the Infernal pipeline95. To gain insights into the evolution of conserved miRNA families along the bat lineages, we performed two analyses that investigate (i) expansion or contraction of members with miRNA gene families, and (ii) gain or loss of miRNA gene families. To explore variation in miRNA sequence unique to bats, we aligned and investigated single-copy miRNA genes across these 48 taxa. We developed a pipeline to predict the gene targets of candidate miRNAs and the biological processes in which they are potentially engaged. To identify novel miRNAs evolved in bats, we sequenced small RNA libraries from brain, kidney and liver for all six bat species using Illumina miRNA-seq. We carried out a comprehensive pipeline to identify novel miRNA commonly shared by the ancestral bat lineage. We further used luciferase assays96,97 to test the functionality of candidate miRNAs in vitro. A full description is provided in Supplementary Note 5.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at
Supplementary information
Acknowledgements
E.W.M. and M.P. were supported by the Max Planck Society and were partially funded by the Federal Ministry of Education and Research (grant 01IS18026C). All data produced in Dresden were funded directly by the Max Planck Society. S.C.V., P.D. and K.L. were funded by a Max Planck Research Group awarded to S.C.V. from the Max Planck Society, and a Human Frontiers Science Program (HFSP) Research grant awarded to S.C.V. (RGP0058/2016). M.H. was funded by the German Research Foundation (HI 1423/3-1) and the Max Planck Society. E.C.T. was funded by a European Research Council Research Grant (ERC-2012-StG311000), UCD Wellcome Institutional Strategic Support Fund, financed jointly by University College Dublin and SFI-HRB-Wellcome Biomedical Research Partnership (ref 204844/Z/16/Z) and Irish Research Council Consolidator Laureate Award. G.M.H. was funded by a UCD Ad Astra Fellowship. G.J. and E.C.T. were funded from the Royal Society/Royal Irish Academy cost share programme. L.M.D. was supported by NSF-DEB 1442142 and 1838273, and NSF-DGE 1633299. D.A.R. was supported by NSF-DEB 1838283. E.D.J. and O.F. were funded by the Rockefeller University and the Howard Hughes Medical Institute. We thank Stony Brook Research Computing and Cyberinfrastructure, and the Institute for Advanced Computational Science at Stony Brook University for access to the high-performance SeaWulf computing system (which was made possible by a National Science Foundation grant (no. 1531492)); the Long Read Team of the DRESDEN-concept Genome Center, DFG NGS Competence Center, part of the Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden; S. Kuenzel and his team of the Max Planck Institute of Evolutionary Biology; members of the Vertebrate Genomes Laboratory at The Rockefeller University for their support; L. Wiegrebe, U. Firzlaff and M. Yartsev, who gave us access to captive colonies of Phyllostomus and Rousettus bats and aided with tissue sample collection; and M. Springer, for completing the SVDquartet analyses, and providing phylogenetic input and expertise.
Extended data figures and tables
Author contributions
M.H., S.C.V., E.W.M. and E.C.T. conceived and supervised the project. M.H., S.C.V., E.W.M. and E.C.T. provided funding. M.L.P., S.J.P., D.K.N.D., G.J., R.D.R., A.G.L., E.C.T. and S.C.V. provided tissue samples for sequencing. Z.H., J.G.R., O.F., P.D. and S.W. were responsible for nucleic acid extraction and sequencing. M.P. assembled and curated all genomes. D.J. provided coding gene annotation and was responsible for coding gene evolutionary analysis. D.J. provided multiple sequence and genome alignments. M.H. and D.J. analysed ultraconserved elements and genome completeness. D.J. and M.H. established the Bat1K genome browser. Z.H. provided non-coding gene annotation and was responsible for non-coding gene evolutionary analysis. K.L. processed Iso-Seq data and provided UTR annotation. Z.H., K.L., P.D. and S.C.V. conducted miRNA target prediction and gene ontology enrichment. P.D. conducted miRNA functional experiments. G.M.H., L.S.J. and E.C.T. provided phylogenomic analyses. G.M.H. and L.M.D. were responsible for codeml analysis. D.J., M.H., Z.H., G.M.H., E.C.T., L.M.D. and A.P.C. interpreted evolutionary analyses. B.M.K. and M.H. developed the TOGA gene projection tool and B.M.K. provided projections for non-bat mammals. E.C.S., L.B.-G. and A.K. provided EVE annotation and analysis. D.A.R. and K.A.M.S. provided transposable element annotation and analysis. E.D.J. provided support for sequencing of Phyllostomus and Rhinolophus genomes. D.J., Z.H., M.P., G.H., M.H., S.C.V., E.W.M. and E.C.T. wrote the manuscript. All authors provided edits and comments.
Data availability
All data generated or analysed during this study are included in the Article and its Supplementary Information. All genomic and transcriptomic data are publicly available for visualization via the open-access Bat1K genome browser (https://genome-public.pks.mpg.de) and for download at https://bds.mpi-cbg.de/hillerlab/Bat1KPilotProject/. In addition, the assemblies have been deposited in the NCBI database under BioProject PRJNA489245 and GenomeArk (https://vgp.github.io/genomeark/). Accession numbers for all the miRNA-seq and RNA-seq data used in this study can be found in Supplementary Tables 17 and 34, respectively.
Code availability
All custom code has been made available on GitHub at https://github.com/jebbd/Bat1K and https://github.com/MartinPippel/DAmar.
Competing interests
The authors declare no competing interests.
Footnotes
Peer review information Nature thanks Brock Fenton and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: David Jebb, Zixia Huang, Martin Pippel
These authors jointly supervised this work: Michael Hiller, Sonja C. Vernes, Eugene W. Myers, Emma C. Teeling
Contributor Information
Michael Hiller, Email: hiller@mpi-cbg.de.
Sonja C. Vernes, Email: sonja.vernes@mpi.nl
Eugene W. Myers, Email: gene@mpi-cbg.de
Emma C. Teeling, Email: emma.teeling@ucd.ie
Extended data
is available for this paper at 10.1038/s41586-020-2486-3.
Supplementary information
is available for this paper at 10.1038/s41586-020-2486-3.
References
- 1.Teeling EC, et al. Bat biology, genomes, and the Bat1K project: to generate chromosome-level genomes for all living bat species. Annu. Rev. Anim. Biosci. 2018;6:23–46. doi: 10.1146/annurev-animal-022516-022811. [DOI] [PubMed] [Google Scholar]
- 2.Simmons, N. B. & Cirranello, A. L. Bat Species of the World: A Taxonomic and Geographic Database, https://batnames.org/ (2020).
- 3.Banerjee A, et al. Novel insights into immune systems of bats. Front. Immunol. 2020;11:26. doi: 10.3389/fimmu.2020.00026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Huang Z, et al. Longitudinal comparative transcriptomics reveals unique mechanisms underlying extended healthspan in bats. Nat. Ecol. Evol. 2019;3:1110–1120. doi: 10.1038/s41559-019-0913-3. [DOI] [PubMed] [Google Scholar]
- 5.Vernes SC, Wilkinson GS. Behaviour, biology and evolution of vocal learning in bats. Phil. Trans. R. Soc. Lond. B. 2020;375:20190061. doi: 10.1098/rstb.2019.0061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Jones G, Teeling EC, Rossiter SJ. From the ultrasonic to the infrared: molecular evolution and the sensory biology of bats. Front. Physiol. 2013;4:117. doi: 10.3389/fphys.2013.00117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Teeling EC, et al. A molecular phylogeny for bats illuminates biogeography and the fossil record. Science. 2005;307:580–584. doi: 10.1126/science.1105113. [DOI] [PubMed] [Google Scholar]
- 8.Wilkinson GS, Adams DM. Recurrent evolution of extreme longevity in bats. Biol. Lett. 2019;15:20180860. doi: 10.1098/rsbl.2018.0860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Nowoshilow S, et al. The axolotl genome and the evolution of key tissue formation regulators. Nature. 2018;554:50–55. doi: 10.1038/nature25458. [DOI] [PubMed] [Google Scholar]
- 10.Tischler, G. in Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2017) (eds Bartoletti, M. et al.) 103–114 (Springer, 2019).
- 11.Tischler, G. & Myers, E. W. Non hybrid long read consensus using local de Bruijn graph assembly. Preprint at https://www.biorxiv.org/content/10.1101/106252v1 (2017).
- 12.Dong D, et al. The genomes of two bat species with long constant frequency echolocation calls. Mol. Biol. Evol. 2017;34:20–34. doi: 10.1093/molbev/msw231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Eckalbar WL, et al. Transcriptomic and epigenomic characterization of the developing bat wing. Nat. Genet. 2016;48:528–536. doi: 10.1038/ng.3537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Parker J, et al. Genome-wide signatures of convergent evolution in echolocating mammals. Nature. 2013;502:228–231. doi: 10.1038/nature12511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Pavlovich SS, et al. The Egyptian Rousette genome reveals unexpected features of bat antiviral immunity. Cell. 2018;173:1098–1110. doi: 10.1016/j.cell.2018.03.070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Seim I, et al. Genome analysis reveals insights into physiology and longevity of the Brandt’s bat Myotis brandtii. Nat. Commun. 2013;4:2212. doi: 10.1038/ncomms3212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wen M, et al. Exploring the genome and transcriptome of the cave nectar bat Eonycteris spelaea with PacBio long-read sequencing. Gigascience. 2018;7:giy116. doi: 10.1093/gigascience/giy116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zepeda Mendoza ML, et al. Hologenomic adaptations underlying the evolution of sanguivory in the common vampire bat. Nat. Ecol. Evol. 2018;2:659–668. doi: 10.1038/s41559-018-0476-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhang G, et al. Comparative analysis of bat genomes provides insight into the evolution of flight and immunity. Science. 2013;339:456–460. doi: 10.1126/science.1230835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bejerano G, et al. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. doi: 10.1126/science.1098119. [DOI] [PubMed] [Google Scholar]
- 21.Nature Biotechnology Editorial A reference standard for genome biology. Nat. Biotechnol. 2018;36:1121. doi: 10.1038/nbt.4318. [DOI] [PubMed] [Google Scholar]
- 22.Waterhouse RM, et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol. Biol. Evol. 2018;35:543–548. doi: 10.1093/molbev/msx319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Pace JK, II, Feschotte C. The evolutionary history of human DNA transposons: evidence for intense activity in the primate lineage. Genome Res. 2007;17:422–432. doi: 10.1101/gr.5826307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Foley NM, Springer MS, Teeling EC. Mammal madness: is the mammal tree of life not yet resolved? Phil. Trans. R. Soc. Lond. B. 2016;371:20150140. doi: 10.1098/rstb.2015.0140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Doronina L, et al. Speciation network in Laurasiatheria: retrophylogenomic signals. Genome Res. 2017;27:997–1003. doi: 10.1101/gr.210948.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Springer, M. S. & Gatesy, J. An ABBA-BABA test for introgression using retroposon insertion data. Preprint at https://www.biorxiv.org/content/10.1101/709477v1 (2019).
- 27.Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods. 2017;14:587–589. doi: 10.1038/nmeth.4285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2015;32:268–274. doi: 10.1093/molbev/msu300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tarver JE, et al. The interrelationships of placental mammals and the limits of phylogenetic inference. Genome Biol. Evol. 2016;8:330–344. doi: 10.1093/gbe/evv261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Springer MS, Gatesy J. On the importance of homology in the age of phylogenomics. Syst. Biodivers. 2018;16:210–228. doi: 10.1080/14772000.2017.1401016. [DOI] [Google Scholar]
- 31.Nishihara H, Hasegawa M, Okada N. Pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions. Proc. Natl Acad. Sci. USA. 2006;103:9929–9934. doi: 10.1073/pnas.0603797103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Tsagkogeorga G, Parker J, Stupka E, Cotton JA, Rossiter SJ. Phylogenomic analyses elucidate the evolutionary relationships of bats. Curr. Biol. 2013;23:2262–2267. doi: 10.1016/j.cub.2013.09.014. [DOI] [PubMed] [Google Scholar]
- 33.Jermiin LS, Poladian L, Charleston MA. Is the “Big Bang” in animal evolution real? Science. 2005;310:1910–1911. doi: 10.1126/science.1122440. [DOI] [PubMed] [Google Scholar]
- 34.Philippe H, et al. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 2011;9:e1000602. doi: 10.1371/journal.pbio.1000602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ho SY, Jermiin L. Tracing the decay of the historical signal in biological sequence data. Syst. Biol. 2004;53:623–637. doi: 10.1080/10635150490503035. [DOI] [PubMed] [Google Scholar]
- 36.Jermiin LS, Catullo RA, Holland BR. A new phylogenetic protocol: dealing with model misspecification and confirmation bias in molecular phylogenetics. NAR Genom. Bioinf. 2020;2:lqaa041. doi: 10.1093/nargab/lqaa041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chou J, et al. A comparative study of SVDquartets and other coalescent-based species tree estimation methods. BMC Genomics. 2015;16:S2. doi: 10.1186/1471-2164-16-S10-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Smith MD, et al. Less is more: an adaptive branch-site random effects model for efficient detection of episodic diversifying selection. Mol. Biol. Evol. 2015;32:1342–1353. doi: 10.1093/molbev/msv022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Pond SL, Frost SD, Muse SV. HyPhy: hypothesis testing using phylogenies. Bioinformatics. 2005;21:676–679. doi: 10.1093/bioinformatics/bti079. [DOI] [PubMed] [Google Scholar]
- 40.Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007;24:1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
- 41.Kantarci S, et al. Mutations in LRP2, which encodes the multiligand receptor megalin, cause Donnai–Barrow and facio-oculo-acoustico-renal syndromes. Nat. Genet. 2007;39:957–959. doi: 10.1038/ng2063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Tan J, Prakash MD, Kaiserman D, Bird PI. Absence of SERPINB6A causes sensorineural hearing loss with multiple histopathologies in the mouse inner ear. Am. J. Pathol. 2013;183:49–59. doi: 10.1016/j.ajpath.2013.03.009. [DOI] [PubMed] [Google Scholar]
- 43.Walsh T, et al. Genomic duplication and overexpression of TJP2/ZO-2 leads to altered expression of apoptosis genes in progressive nonsyndromic hearing loss DFNA51. Am. J. Hum. Genet. 2010;87:101–109. doi: 10.1016/j.ajhg.2010.05.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wang Z, et al. Prenatal development supports a single origin of laryngeal echolocation in bats. Nat. Ecol. Evol. 2017;1:0021. doi: 10.1038/s41559-016-0021. [DOI] [PubMed] [Google Scholar]
- 45.Gunn MD, et al. A B-cell-homing chemokine made in lymphoid follicles activates Burkitt’s lymphoma receptor-1. Nature. 1998;391:799–803. doi: 10.1038/35876. [DOI] [PubMed] [Google Scholar]
- 46.Vendelin J, et al. Downstream target genes of the neuropeptide S-NPSR1 pathway. Hum. Mol. Genet. 2006;15:2923–2935. doi: 10.1093/hmg/ddl234. [DOI] [PubMed] [Google Scholar]
- 47.Luong P, et al. INAVA–ARNO complexes bridge mucosal barrier function with inflammatory signaling. eLife. 2018;7:e38539. doi: 10.7554/eLife.38539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Saddawi-Konefka R, et al. Nrf2 induces IL-17D to mediate tumor and virus surveillance. Cell Rep. 2016;16:2348–2358. doi: 10.1016/j.celrep.2016.07.075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Barker BR, Taxman DJ, Ting JP. Cross-regulation between the IL-1β/IL-18 processing inflammasome and other inflammatory cytokines. Curr. Opin. Immunol. 2011;23:591–597. doi: 10.1016/j.coi.2011.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Flo TH, et al. Lipocalin 2 mediates an innate immune response to bacterial infection by sequestrating iron. Nature. 2004;432:917–921. doi: 10.1038/nature03104. [DOI] [PubMed] [Google Scholar]
- 51.Hase K, et al. Uptake through glycoprotein 2 of FimH+ bacteria by M cells initiates mucosal immune response. Nature. 2009;462:226–230. doi: 10.1038/nature08529. [DOI] [PubMed] [Google Scholar]
- 52.Yang J, et al. The I-TASSER suite: protein structure and function prediction. Nat. Methods. 2015;12:7–8. doi: 10.1038/nmeth.3213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Sharma V, et al. A genomics approach reveals insights into the importance of gene losses for mammalian adaptations. Nat. Commun. 2018;9:1215. doi: 10.1038/s41467-018-03667-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Wang W, Yang Y, Li L, Shi Y. Synleurin, a novel leucine-rich repeat protein that increases the intensity of pleiotropic cytokine responses. Biochem. Biophys. Res. Commun. 2003;305:981–988. doi: 10.1016/S0006-291X(03)00876-3. [DOI] [PubMed] [Google Scholar]
- 55.Bridgewood C, et al. IL-36γ has proinflammatory effects on human endothelial cells. Exp. Dermatol. 2017;26:402–408. doi: 10.1111/exd.13228. [DOI] [PubMed] [Google Scholar]
- 56.Johnston A, et al. IL-1F5, -F6, -F8, and -F9: a novel IL-1 family signaling system that is active in psoriasis and promotes keratinocyte antimicrobial peptide expression. J. Immunol. 2011;186:2613–2622. doi: 10.4049/jimmunol.1003162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Nishida A, et al. Increased expression of interleukin-36, a member of the interleukin-1 cytokine family, in inflammatory bowel disease. Inflamm. Bowel Dis. 2016;22:303–314. doi: 10.1097/MIB.0000000000000654. [DOI] [PubMed] [Google Scholar]
- 58.Hayward JA, et al. Differential evolution of antiretroviral restriction factors in pteropid bats as revealed by APOBEC3 gene complexity. Mol. Biol. Evol. 2018;35:1626–1637. doi: 10.1093/molbev/msy048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Münk C, Willemsen A, Bravo IG. An ancient history of gene duplications, fusions and losses in the evolution of APOBEC3 mutators in mammals. BMC Evol. Biol. 2012;12:71. doi: 10.1186/1471-2148-12-71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Roper N, et al. APOBEC mutagenesis and copy-number alterations are drivers of proteogenomic tumor evolution and heterogeneity in metastatic thoracic tumors. Cell Rep. 2019;26:2651–2666. doi: 10.1016/j.celrep.2019.02.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Salter JD, Bennett RP, Smith HC. The APOBEC protein family: united by structure, divergent in function. Trends Biochem. Sci. 2016;41:578–594. doi: 10.1016/j.tibs.2016.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Katzourakis A, Gifford RJ. Endogenous viral elements in animal genomes. PLoS Genet. 2010;6:e1001191. doi: 10.1371/journal.pgen.1001191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Taylor DJ, Dittmar K, Ballinger MJ, Bruenn JA. Evolutionary maintenance of filovirus-like genes in bat genomes. BMC Evol. Biol. 2011;11:336. doi: 10.1186/1471-2148-11-336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Hayward A, Grabherr M, Jern P. Broad-scale phylogenomics provides insights into retrovirus–host evolution. Proc. Natl Acad. Sci. USA. 2013;110:20146–20151. doi: 10.1073/pnas.1315419110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Skirmuntt EC, Katzourakis A. The evolution of endogenous retroviral envelope genes in bats and their potential contribution to host biology. Virus Res. 2019;270:197645. doi: 10.1016/j.virusres.2019.197645. [DOI] [PubMed] [Google Scholar]
- 66.Xu X, Zhao H, Gong Z, Han GZ. Endogenous retroviruses of non-avian/mammalian vertebrates illuminate diversity and deep history of retroviruses. PLoS Pathog. 2018;14:e1007072. doi: 10.1371/journal.ppat.1007072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Katzourakis A, Tristem M, Pybus OG, Gifford RJ. Discovery and analysis of the first endogenous lentivirus. Proc. Natl Acad. Sci. USA. 2007;104:6261–6265. doi: 10.1073/pnas.0700471104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Heimberg AM, Sempere LF, Moy VN, Donoghue PC, Peterson KJ. MicroRNAs and the advent of vertebrate morphological complexity. Proc. Natl Acad. Sci. USA. 2008;105:2946–2950. doi: 10.1073/pnas.0712259105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Moran Y, Agron M, Praher D, Technau U. The evolutionary origin of plant and animal microRNAs. Nat. Ecol. Evol. 2017;1:0027. doi: 10.1038/s41559-016-0027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Preprint at https://www.biorxiv.org/content/10.1101/2020.05.22.110833v1 (2020). [DOI] [PMC free article] [PubMed]
- 71.Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. 2002;12:656–664. doi: 10.1101/gr.229202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Gremme G, Brendel V, Sparks ME, Kurtz S. Engineering a software tool for gene structure prediction in higher organisms. Inf. Softw. Technol. 2005;47:965–978. doi: 10.1016/j.infsof.2005.09.005. [DOI] [Google Scholar]
- 73.Aken BL, et al. The Ensembl gene annotation system. Database (Oxford) 2016;2016:baw093. doi: 10.1093/database/baw093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Sharma V, Hiller M. Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res. 2017;45:8369–8377. doi: 10.1093/nar/gkx554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl Acad. Sci. USA. 2003;100:11484–11489. doi: 10.1073/pnas.1932072100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Sharma V, Schwede P, Hiller M. CESAR 2.0 substantially improves speed and accuracy of comparative gene annotation. Bioinformatics. 2017;33:3985–3987. doi: 10.1093/bioinformatics/btx527. [DOI] [PubMed] [Google Scholar]
- 77.Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008;24:637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]
- 78.Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2016;32:767–769. doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Kuo, R. I., Cheng, Y., Smith, J., Archibald, A. L. & Burt, D. W. Illuminating the dark side of the human transcriptome with TAMA Iso-Seq analysis. Preprint at https://www.biorxiv.org/content/10.1101/780015v1 (2019). [DOI] [PMC free article] [PubMed]
- 81.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008;9:R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Platt RN, II, Blanco-Berdugo L, Ray DA. Accurate transposable element annotation is vital when analyzing new genome assemblies. Genome Biol. Evol. 2016;8:403–410. doi: 10.1093/gbe/evw009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0, http://www.repeatmasker.org (2013–2015)
- 84.Abrusán G, Grundmann N, DeMester L, Makalowski W. TEclass—a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics. 2009;25:1329–1330. doi: 10.1093/bioinformatics/btp084. [DOI] [PubMed] [Google Scholar]
- 85.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
- 87.Wicker T, et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 2007;8:973–982. doi: 10.1038/nrg2165. [DOI] [PubMed] [Google Scholar]
- 88.Hecker N, Hiller M. A genome alignment of 120 mammals highlights ultraconserved element variability and placenta-associated enhancers. Gigascience. 2020;9:giz159. doi: 10.1093/gigascience/giz159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.De Bie T, Cristianini N, Demuth JP, Hahn MW. CAFE: a computational tool for the study of gene family evolution. Bioinformatics. 2006;22:1269–1271. doi: 10.1093/bioinformatics/btl097. [DOI] [PubMed] [Google Scholar]
- 90.Tabari E, Su Z. PorthoMCL: parallel orthology prediction using MCL for the realm of massive genome availability. Big Data Anal. 2017;2:4. doi: 10.1186/s41044-016-0019-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Mi H, et al. The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 2005;33:D284–D288. doi: 10.1093/nar/gki078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 93.Larsson A. AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics. 2014;30:3276–3278. doi: 10.1093/bioinformatics/btu531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25:1335–1337. doi: 10.1093/bioinformatics/btp157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Devanna P, et al. Next-gen sequencing identifies non-coding variation disrupting miRNA-binding sites in neurological disorders. Mol. Psychiatry. 2018;23:1375–1384. doi: 10.1038/mp.2017.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Devanna P, van de Vorst M, Pfundt R, Gilissen C, Vernes SC. Genome-wide investigation of an ID cohort reveals de novo 3′ UTR variants affecting gene expression. Hum. Genet. 2018;137:717–721. doi: 10.1007/s00439-018-1925-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data generated or analysed during this study are included in the Article and its Supplementary Information. All genomic and transcriptomic data are publicly available for visualization via the open-access Bat1K genome browser (https://genome-public.pks.mpg.de) and for download at https://bds.mpi-cbg.de/hillerlab/Bat1KPilotProject/. In addition, the assemblies have been deposited in the NCBI database under BioProject PRJNA489245 and GenomeArk (https://vgp.github.io/genomeark/). Accession numbers for all the miRNA-seq and RNA-seq data used in this study can be found in Supplementary Tables 17 and 34, respectively.
All custom code has been made available on GitHub at https://github.com/jebbd/Bat1K and https://github.com/MartinPippel/DAmar.