Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
. 2012 Jul 10;29(10):3169–3179. doi: 10.1093/molbev/mss133

Habitat Variability Correlates with Duplicate Content of Drosophila Genomes

Takashi Makino 1,*, Masakado Kawata 1,*
PMCID: PMC3457775  PMID: 22586328

Abstract

The factors limiting the habitat range of species are crucial in understanding their biodiversity and response to environmental change. Yet the genetic and genomic architectures that produce genetic variation to enable environmental adaptation have remained poorly understood. Here we show that the proportion of duplicated genes (PD) in the whole genomes of fully sequenced Drosophila species is significantly correlated with environmental variability within the habitats measured by the climatic envelope and habitat diversity. Furthermore, species with a low PD tend to lose the duplicated genes owing to their faster evolution. These results indicate that the rapid relaxation of functional constraints on duplicated genes resulted in a low PD for species with lower habitat diversity, and suggest that the maintenance of duplicated genes gives organisms an ecological advantage during evolution. We therefore propose that the PD in a genome is related to adaptation to environmental variation.

Keywords: habitat diversity, habitat distributions, evolvability, adaptation, duplicated genes

Introduction

The factors that constrain the evolution of habitat range is of critical importance for understanding the evolution of biodiversity and conservation because these factors are closely related to historical processes creating the current biodiversity and to adaptation to current and future global climate changes (Root et al. 2003; Bridle and Vines 2007; Roy et al. 2009). Even within closely related groups like Drosophila, some species have narrow restricted ranges and inhabit one or a few habitat types, whereas others have wider ranges and live in diverse environments (Kirkpatrick and Barton 1997). Kellermann et al. (2009) showed that low genetic variation in cold and desiccation tolerance limits the distributions of species. This indicates that a lack of genetic variation in key traits within a species is related to failure to expand in range and adapt to environmental change. However, what determines the ability of species to generate genetic variation (i.e., their evolvability) remains unknown. Restricted-range species might have genetic and genomic architectures that do not allow high variation.

We focused on gene duplication as a source of genome-wide genetic variation. One system that produces and maintains a large amount of genetic variation is buffering. By buffering the deleterious consequences of mutations, genetic variation can accumulate in a genome. An obvious mechanism for buffering genetic variation is redundancy, and one of the main factors that generates redundancy is gene duplication (Wilkins 1997; Hartman et al. 2001). This type of mutation is particularly common in eukaryotes, and in yeast, duplication rates are reportedly faster than point mutation rates (Lynch et al. 2008). After gene duplication, one of the pair is redundant, and as such, the functional constraints are relaxed, and one or both copies can differentiate as long as their original function is maintained (Ohno 1970). Therefore, under these relaxed functional constraints, mutations are likely to accumulate in duplicated genes. Furthermore, the functional redundancy of duplicated genes can be maintained for extensive periods of time (Dean et al. 2008). Genetic variations in the duplicated genes within a population are likely to be maintained by their buffering effect (Wilkins 1997; Hartman et al. 2001). Therefore, gene duplication is a major source of genetic variation. In fact, Kliebenstein showed that not only younger tandem duplicated genes but also older duplicated genes elevated intraspecific gene expression variation in a population (Kliebenstein 2008).

Previous studies reported that duplicated olfactory receptor (Or) and gustatory receptor (Gr) genes were likely to be lost in specialist Drosophila species with host specificity; however, these studies focused only on the particular gene families associated with odor response, and did not consider the relationship between the duplicated gene content and habitat (McBride 2007; McBride et al. 2007). The habitats of specialists are necessarily restricted by those of their hosts; hence, host specificity would be expected to be related to habitat (Markow and O’Grady 2007). If species with a larger proportion of duplicated genes (PD) have a greater potential to generate genetic variation for more traits, they might show increased environmental adaptability. Namely, duplicated genes would have contributed to adaptation to diverse environments within the ranges of species.

We propose that species with a higher duplicated gene content would have distribution ranges with higher environmental variability. We tested this hypothesis using Drosophila species that had been fully sequenced (Clark et al. 2007) and had documented habitat ranges (figs. 1 and S1, Supplementary Material online) (Markow and O’Grady 2006). Environmental variability within their habitat range (here referred to as habitat variability) was measured using two different indices, one relating to climatic envelope and the other to habitat diversity. We estimated their climatic envelopes using bioclimatic variables from WORLDCLIM (Hijmans et al. 2005). Variability of Köppen climate classification (Kottek et al. 2006) within their range was estimated as habitat diversity using the Brillouin’s index as a measure of species diversity (Margalef 1958; Legendre and Legendre 1998). These indices were used as measurements to indicate the adaptability of species to environmental variability within their habitats. Examining the effect of duplicated genes on the expansion, contraction, and/or conservation of the habitat ranges of these species during their evolution allowed us to explore the importance of basal genetic diversity for adaptation.

Fig. 1.

Fig. 1.

Habitat distributions of Drosophila species. Habitat distributions of D. yakuba (pink), D. erecta (red), D. ananassae (purple), D. pseudoobscura (light green), D. persimilis (green), D. willistoni (orange), D. mojavensis (yellow), and D. virilis (blue) are shown. Red arrows indicate islands inhabited by island endemic species (D. sechellia and D. grimshawi). The habitat distribution of the cosmopolitan species D. melanogaster is shown in figure S1, Supplementary Material online.

Materials and Methods

Fully Sequenced Drosophila Species

The genomes of 12 Drosophila species (Drosophila melanogaster, D. sechellia, D. simulans, D. yakuba, D. erecta, D. ananassae, D. pseudoobscura, D. persimilis, D. willistoni, D. mojavensis, D. virilis, and D. grimshawi) have been fully sequenced (Clark et al. 2007). However, the coverage of the genome assemblies for D. simulans is comparatively poor. As a result, the number of identified orthologs of D. melanogaster in the D. simulans genome is relatively low (Heger and Ponting 2007), even though these are among the closest related Drosophila species. We therefore excluded D. simulans from our analyses.

Drosophila Gene Sequences

The protein sequences corresponding to protein-encoding genes from the 11 Drosophila species were downloaded from the EnsemblMetazoa database, release 4 (http://metazoa.ensembl.org). In some cases, a non-melanogaster gene was split into two genes as a result of sequence or assembly errors (fig. S2A, Supplementary Material online). To minimize these errors, we conducted a homology search using the Basic Local Alignment Search Tool (BLAST; D. melanogaster protein sequences vs. non-melanogaster protein sequences), combined the sequences of physically neighboring genes in a non-melanogaster genome into one sequence when the neighboring genes did not show homology to each other (E value < 105), and identified the best hit for the same gene among those in D. melanogaster (table S1, Supplementary Material online). The merged genes were treated as a single gene in this study. For non-melanogaster species, we also combined the nucleotide sequences of separate genes by the same process.

Duplicated Genes

To identify duplicated genes in the Drosophila genomes, we conducted an all-to-all BLAST search for all protein sequences used in this study. Genes with a homologue (E value < 105 and query coverage >30%) in the same species were identified as candidate duplicated genes. Importantly, we found that our results were robust at different E value cut-offs (105, 1010, and 1020), and the trend in our results did not change as a result of different cut-off values.

Synonymous Substitution Rate between Duplicated Genes

To examine the distribution of gene duplication timing for the duplicated genes, we aligned the sequences of the duplicated gene pairs derived from the EnsemblMetazoa database by using the T-COFFEE multiple sequence alignment program (Notredame et al. 2000) and estimated the synonymous substitution rate (KS) between a duplicated gene and its closest paralogue by the Yang and Nielsen method (Yang and Nielsen 2000), which was implemented in the Phylogenetic Analysis by Maximum Likelihood (PAML) program package (Yang 1997). Note that the closest paralogues were determined by identifying the best BLAST hit from the duplicated gene candidates. The distributions of KS are shown in figure S3, Supplementary Material online.

Gene Collapse Based on Sequence Similarity

There are many non-divergent duplicated gene pairs created as a result of recent duplication events or assembly errors in a genome. Homologous gene pairs with KS < 0.1 were collapsed into a single gene (a2–a3–a4 and b1–b2 in fig. S2B, Supplementary Material online). The collapsed genes were classified as duplicated genes on the basis of a BLAST hit in comparison with genes not recently duplicated. If at least one gene in the homologous gene cluster (a2, a3, and a4 in fig. S2B, Supplementary Material online) had a duplicated gene partner (a1 in fig. S2B, Supplementary Material online; KS ≥ 0.1; E value < 105 and query coverage > 30%), the collapsed gene was defined as a duplicated gene. If not, the collapsed gene (b1–b2 in fig. S2B, Supplementary Material online) was defined as a singleton. The duplicated genes in Drosophila are summarized in table S2, Supplementary Material online.

Lineage-Specific Gene Losses

Orthologs were defined by reciprocal best hits between different species by using the results of the all-to-all BLAST. If the orthologous relationship was obtained by one-to-one best hits, we defined the orthologs as one-to-one. Such a relationship indicated that there has been no lineage-specific gene duplication and loss after speciation. We identified orthologous gene clusters using one-to-one orthologous relationships for closely related species and their outgroups to investigate gene-loss events during evolution (fig. S4A, Supplementary Material online). We did not use genes without orthologs in the outgroups, because it was not easy to predict their ancestral state. If there was no gene-loss event in either of the closely related species, we obtained orthologous trios where possible (e.g., species 1B–species 2B–outgroup B in fig. S4B, Supplementary Material online). When orthologous trios were not available, we inferred that gene-loss events occurred in either lineage of the closely related species (fig. S4A, Supplementary Material online). For the comparison, we typically used D. melanogaster as the outgroup. When we investigated gene-loss events for species that were in the clade including D. melanogaster, we used other closely related species as the outgroups (tables 1 and 2). Note that there were other possible outgroups for the comparisons of some species; however, even when we used other outgroups for estimating the proportion of lost duplicated genes, our result did not change. To identify a species’ lost duplicated genes generated before speciation from another species, we focused on the gene similarity between a species and the outgroup species (fig. S4B, Supplementary Material online). We defined a species (e.g., species 1 in fig. S4B, Supplementary Material online) as having a lost duplicated gene (species 1A in fig. S4B, Supplementary Material online) when the following were observed: an inferred ortholog (species 2A in fig. S4B, Supplementary Material online) in the compared species (species 2 in fig. S4B, Supplementary Material online) was a duplicated gene, and the similarity between the duplicated gene and its duplicated gene partner (similarity between species 2A and species 2B in fig. S4B, Supplementary Material online) was lower than that between either of the duplicated copies and its best hit homolog in the outgroup (similarity between species 2A and outgroup A or that between species 2B and outgroup B in fig. S4B, Supplementary Material online), as determined by a BLAST search.

Table 1.

Differences in Evolutionary Events of Genes between Closely Related Species.

Species Outgroup Habitat Variability
PD Gene Losses
Fast Evolving Genes
Duplicated Gene Pairs with Fast Evolving Genes
Climatic Tolerance Habitat Diversity Number of Lineage-Specific Gene Losses PD for Lineage-Specific Lost Genes Number of One-to-One Orthologous Trios Number of Significant Fast-Evolving Genes Number of Duplicated Gene Pairs in Orthologous Trios Number of Duplicated Gene Pairs Having Significant Fast Evolving Gene
Drosophila melanogaster D. yakuba 1,119 3.11 0.53 497 133/497 11,952 482/11,952 3,193 104/3,193
D. sechellia 1 0 0.49 430 270/430 1,441/11,952 433/3,193
P value <2.2 × 10−16 <2.2 × 10−16 <2.2 × 10−16
D. pseudoobscura D. melanogaster 599 2.71 0.53 204 98/204 10,995 203/10,995 2,599 53/2,599
D. persimilis 287 1.18 0.5 401 229/401 1,502/10,995 507/2,599
P value 0.042 <2.2 × 10−16 < 2.2 × 10−16
D. yakuba D. melanogaster 240 1.87 0.5 304 149/304 12,156 712/12,156 3,472 352/3,472
D. erecta 65 1.05 0.5 259 145/259 1,082/12,156 608/3,472
P value 0.12 <2.2 × 10−16 <2.2 × 10−16

Table 2.

Comparison of Lost Duplicated Genes.

Species 1 with Higher Climatic Tolerance
Species 2 with Lower Climatic Tolerance
Outgroup P Values (χ2 Test) of the Significant Difference in the Lost Duplicated Genes
Species Climatic Tolerance Number of Lost Duplicated Genes Total Number of Lost Genes Proportion of Lost Duplicated Genes Species Climatic Tolerance Number of Lost Duplicated Genes Total Number of Lost Genes Proportion of Lost Duplicated Genes
Drosophila ananassae 385 114 269 0.42 D. sechellia 1 281 458 0.61 D. pseudoobscura 1.1 × 10−6
D. ananassae 385 148 310 0.48 D. yakuba 240 198 390 0.51 D. pseudoobscura
D. ananassae 385 128 292 0.44 D. erecta 65 147 266 0.55 D. pseudoobscura 0.0090
D. ananassae 385 137 277 0.49 D. persimilis 287 362 635 0.57 D. willistoni 0.042
D. erecta 65 84 171 0.49 D. sechellia 1 207 355 0.58 D. ananassae
D. melanogaster 1,119 133 497 0.27 D. sechellia 1 270 430 0.63 D. yakuba <2.2 × 10−16
D. melanogaster 1,119 94 287 0.33 D. yakuba 240 142 283 0.50 D. ananassae 3.5 × 10−5
D. melanogaster 1,119 88 285 0.31 D. erecta 65 105 174 0.60 D. ananassae 1.1 × 10−9
D. melanogaster 1,119 152 333 0.46 D. ananassae 385 133 278 0.48 D. pseudoobscura
D. melanogaster 1,119 135 294 0.46 D. persimilis 287 361 599 0.60 D. willistoni 6.8 × 10−5
D. melanogaster 1,119 164 324 0.51 D. pseudoobscura 599 223 394 0.57 D. willistoni
D. melanogaster 1,119 187 366 0.51 D. willistoni 462 218 448 0.49 D. virilis
D. mojavenisis 133 109 233 0.47 D. grimshawi 28 221 410 0.54 D. melanogaster
D. persimilis 287 326 574 0.57 D. sechellia 1 254 414 0.61 D. willistoni
D. persimilis 287 349 594 0.59 D. yakuba 240 185 346 0.53 D. willistoni
D. persimilis 287 362 628 0.58 D. erecta 65 135 231 0.58 D. willistoni
D. pseudoobscura 599 200 373 0.54 D. sechellia 1 285 455 0.63 D. willistoni 0.011
D. pseudoobscura 599 207 382 0.54 D. yakuba 240 202 376 0.54 D. willistoni
D. pseudoobscura 599 222 394 0.56 D. erecta 65 148 248 0.60 D. willistoni
D. pseudoobscura 599 205 382 0.54 D. ananassae 385 156 293 0.53 D. willistoni
D. pseudoobscura 599 98 204 0.48 D. persimilis 287 229 401 0.57 D. melanogaster 0.042
D. pseudoobscura 599 223 404 0.55 D. willistoni 462 229 479 0.48 D. virilis 0.034
D. virilis 362 81 178 0.46 D. mojavenisis 133 144 262 0.55 D. melanogaster
D. virilis 362 78 167 0.47 D. grimshawi 28 246 446 0.55 D. melanogaster
D. willistoni 462 191 438 0.44 D. sechellia 1 303 492 0.62 D. virilis 6.0 × 10−8
D. willistoni 462 213 458 0.47 D. yakuba 240 228 424 0.54 D. virilis 0.037
D. willistoni 462 204 453 0.45 D. erecta 65 170 304 0.56 D. virilis 0.0042
D. willistoni 462 220 473 0.47 D. ananassae 385 180 330 0.55 D. virilis 0.030
D. willistoni 462 208 449 0.46 D. persimilis 287 379 638 0.59 D. virilis 2.7 × 10−5
D. yakuba 240 112 267 0.42 D. sechellia 1 207 354 0.58 D. ananassae 6.4 × 10−5
D. yakuba 240 149 304 0.49 D. erecta 65 145 259 0.56 D. melanogaster

Relative Rate Test for Orthologous Gene Pairs

To detect fast-evolving genes after the speciation of closely related species (fig. S4C, Supplementary Material online), we conducted a relative rate test using protein sequences aligned by T-COFFEE for the following closely related species and their outgroups: D. melanogaster, D. sechellia, D. yakuba (outgroup); D. pseudoobscura, D. persimilis, D. melanogaster (outgroup); and D. yakuba, D. erecta, D. melanogaster (outgroup) (Tajima 1993). We used the orthologous trios of closely related species and their outgroups obtained earlier, and counted the number of significant fast-evolving genes for each species (table 1).

Divergence of Duplicated Gene Pairs in Orthologous Trios

We examined whether an acceleration of the evolutionary rates of duplicated genes occurred in species with low habitat variability and low PD. We focused on all the duplicated gene pairs in the aforementioned dataset of orthologous trios, to minimize any effect of lineage-specific extra gene duplications on evolutionary rates (fig. S4D, Supplementary Material online). Note that as long as a relationship among the three species is observed, no gene loss events have occurred in any of the lineages of the orthologous trios. An extra gene copy generated by lineage-specific gene duplication might cause a relaxation of the functional constraints on the gene copies in the lineage; therefore, we used duplicated gene pairs derived from a duplication event before the speciation of the closely related species and their outgroups. Note that no recent gene duplication events occurred after speciation in the datasets from these trios. We counted the number of duplicated gene pairs in which at least one partner was a significantly fast-evolving gene for each species (table 1).

Habitat Area and Habitat Variability

The habitat areas for the Drosophila species were obtained from the literature (Ashburner et al. 1982; Piano et al. 1997; Reed and Markow 2004; Markow and O’Grady 2006) (online: http://scitechlab.wordpress.com/2008/11/02/the-humble-fruit-fly-drosophila-melanogaster). Habitat variability was estimated from climatic envelope and habitat diversity using the Köppen climate classification. Climatic envelope is the range of temperatures, rainfall, and other climate-related parameters in which a species currently exists. We estimated climatic envelope using principal component analysis (PCA) with WORLDCLIM (Hijmans et al. 2005). We obtained world spacial data and the WORLDCLIM climatic dataset (10 minutes latitude/longitude) from DIVA-GIS (http://www.diva-gis.org). The habitat area was measured as the number of grid squares on the climate map. We then extracted the climatic values from 19 bioclimatic variables used for BIOCLIM (Hijmans et al. 2005) in the habitat area of each Drosophila species. We performed PCA using the bioclimatic variables for all of the species, and found that the first 2 principal components (PCs) explained 93.4% of the total variance (table S3, Supplementary Material online). The contribution of PC1 and PC2 is 79.9% and 13.5%, respectively. PCA plots (x-axis: PC1 and y-axis: PC2) and the correlation circle are shown in figure S5 and S6, Supplementary Material online, respectively. On the basis of PCA results, we also plotted values of PC1 and PC2 for each species (fig. S5, Supplementary Material online). We used 107,865 cells (PC1: 799 × PC2: 135) by weighting the relative contribution to PC1 and PC2 for estimating climatic envelope, and defined the number of cell grids overlapping points in the 107,865 cell grids as the climatic envelope of Drosophila species.

The Köppen climate classification map was used for estimating the Drosophila species’ habitat diversity (Kottek et al. 2006). This climate map consists of a grid of squares (0.5° latitude/longitude) in which a certain climate is classified by temperature, precipitation, and vegetation (fig. S1, Supplementary Material online). The habitat area was measured as the number of grid squares on the climate map. The number of grids varied among the Drosophila species, and therefore habitat diversity was calculated using varieties (through logarithmic transformation) in the climatic environment among grid squares for each species using the Brillouin’s index, which is robust to sample size (Margalef 1958).

Model Selection

All of the following statistical analyses were executed in R (http://www.r-project.org). We applied model selection using regression to examine which genomic factors affect habitat features (bioclimatic variables and habitat area). We explored the set of predictors of the explanatory variables using the stepwise Akaike’s Information Criterion procedure, and determined the set of variables that yielded the lowest score. In addition, we conducted a multivariate analysis of variance (MANOVA) in which two variables (climatic envelope and habitat area) were used as response variables, and genome size, number of genes and PD were used as explanatory variables (tables S4 and S5, Supplementary Material online).

To remove any phylogenetic constraints on the relationship between genetic architecture and habitat, we used a robust phylogeny derived from Drosophila 12 Genomes Consortium (Clark et al. 2007). Using this phylogenetic tree, we selected a model by applying the generalized least squares model with the Brownian model, as described earlier, and measured the phylogenetically independent contrasts (PICs) (Felsenstein 1985). We performed linear regression analyses for selected explanatory variables using the estimated PICs.

Gene Ontology

To investigate whether the lineage-specific lost or fast-evolving duplicated genes of species with low PD were enriched in some particular functional categories, we examined the Gene Ontology (GO) database entries for the duplicated genes between species with different PD (D. melanogaster–D. sechellia and D. pseudoobscuraD. persimilis) The GO identifiers (ids) and GO “slim” annotations for the biological processes of D. melanogaster were downloaded from ftp://ftp.geneontology.org/pub/go/gene-associations/ and ftp://ftp.geneontology.org/pub/go/GO_slims, respectively. We excluded those classified as GO:0008150 (biological process unknown). The frequency of each GO id was counted for the D. melanogaster genes. For the other species, we used the GO ids of the most similar homolog in D. melanogaster. To analyze the GO data for genes that had been lost in the D. melanogaster lineage (mel1 in fig. S2C, Supplementary Material online), we used the GO id of the most similar homolog retained in D. melanogaster (mel2 in fig. S2C, Supplementary Material online) for the orthologs retained in the D. sechellia genome (sec1 in fig. S2C, Supplementary Material online) of the lost genes. The enrichment of GO ids for the genes in species having a low PD was compared with that in species having a high PD. We calculated the P value for each GO id by comparing two different gene sets. The estimated P values were adjusted using Bonferroni correction.

Results and Discussion

Recently Duplicated Genes

We identified duplicated genes by similarity search (blastp) for each species and estimated the synonymous substitution rate (KS) between a duplicated gene and its closest paralogue of 11 fully sequenced Drosophila species. We observed that the duplicated gene pairs tended to have KS < 0.1 (fig. S3, Supplementary Material online). This observation is consistent with high rates of gene duplications and losses (Lynch and Conery 2000). There was apparent bias in the number of recent duplication events in particular species. Although the recent burst of gene duplications observed in some particular lineages is biologically feasible, we question the reliability of the enrichment of recently duplicated gene pairs. Indeed, it is difficult to distinguish recently duplicated genes from artifacts of genome assembly. On the other hand, diverged duplicated genes maintained in a genome are obviously derived from ancient gene duplication events (not artifacts), and most of the substitutions can be attributed to diverged duplicated genes (not recently duplicated genes). Therefore, we focused on diverged duplicated genes in the following analyses, and homologous gene pairs with a KS < 0.1 were collapsed into a single gene (see Materials and Methods). We found that there was a significant positive correlation between the proportion of total duplicated genes including recently duplicated genes and climatic envelope estimated by bioclimatic variables (R2= 0.62, P = 0.0066; see next section in more detail), but there was no significant correlation between the proportion of only recently duplicated genes and climatic envelope (R2= 0.26, P = 0.13). This observation indicates that in comparison with evolutionarily maintained duplicates, recent duplicates are unlikely to contribute to the climatic envelope. This could be attributed to the lower divergence of recently duplicated genes.

PD Associated with Habitat Diversity

To investigate the relationship between genomic architecture and habitat, we employed a linear model in which genome size (Nardon et al. 2005) from (Bosco et al. 2007), the PD and number of genes were used as explanatory variables, and climatic envelope and habitat area were used as response variables, removing the phylogenetic constraints. PD was selected as the sole explanatory variable for climatic envelope (R2= 0.82, P = 0.00032; fig. 2). PD and number of genes were selected as explanatory variables for habitat area, but the regression coefficient was statistically significant only for the former (R2= 0.45, P = 0.024); these results were not changed by using a MANOVA for the two response variables (table S4, Supplementary Material online). We then examined the effects of climatic envelope and habitat area on PD, and only climatic envelope was selected as an explanatory variable. These results indicate that PD is strongly correlated with climatic envelope.

Fig. 2.

Fig. 2.

Correlation between climatic envelope and PD. (A) Relationship between climatic envelope and PD. The x-axis indicates PD for 11 Drosophila species (mel, D. melanogaster; sec, D. sechellia; yak, D. yakuba; ere, D. erecta; ana, D. ananassae; pse, D. pseudoobscura; per, D. persimilis; wil, D. willistoni; moj, D. mojavensis; vir, D. virilis; and gri, D. grimshawi). The y-axis indicates climatic envelope estimated by WORLDCLIM datasets. (B) Relationship between contrasts in climatic envelope and PD. The x-axis indicates PICs in PD, and the y-axis indicates PICs in the climatic envelope. The dashed line represents the regression line.

We were concerned that two extreme contrasts might be driving the relationship in figure 2B. The two extreme contrasts were generated by the large climatic envelope values for D. melanogaster and D. pseudoobscura, which have closely related species with opposite features, that is, D. sechellia and D. persimilis, respectively. However, even when we removed D. melanogaster and D. pseudoobscura from our analyses, we observed the same trends (R2= 0.58, P = 0.029; fig. S7A, Supplementary Material online).

Some species’ habitats are known to have been expanded by human activity, particularly in the case of D. melanogaster, which has been spread around the world. We therefore repeated the analysis without D. melanogaster, and confirmed that the results were not affected (table S4, Supplementary Material online). It has been reported that D. virilis is a holarctic species (Ashburner et al. 1982; Mirol et al. 2008). When D. virilis and/or D. melanogaster were removed from the analysis, the results did not change (data not shown).

We suspected that the differences in genome coverage among Drosophila species might correlate with PD, and therefore examined this relationship. We obtained data on the genome coverage of all Drosophila species except for D. melanogaster from EnsemblMetazoa, and found no correlation between the genome coverage and the PD (R2= 0.014, P = 0.76).

To reinforce our results, we investigated the relationship between PD and the habitat diversity of Drosophila within their range based on Köppen climate classification. The classification considers not only temperature and precipitation but also vegetation (Kottek et al. 2006). Environmental diversity within the habitat was estimated using the Brillouin’s index (Margalef 1958). Similar results were obtained even when we used a different measure of environmental variability with a different climatic dataset (R2= 0.93, P = 7.7 × 106; fig. S8 and table S5, Supplementary Material online). These results strongly support the contention that in Drosophila species, PD is correlated with habitat variability.

The Influence of Effective Population Size on Genomic Architecture

It has been proposed that there are correlations between effective population size and genomic contents (Lynch and Conery 2003). In the genus Drosophila, several studies have shown that the evolutionary rates of genes are faster in the host-specific species D. sechellia, which has a small effective population size, than in the cosmopolitan species D. simulans (Kliman et al. 2000; McBride 2007). Similarly, Singh et al. (2009) observed that the evolutionary rates of genes are likely to be accelerated in the host-specific species D. sechellia and D. erecta, which have smaller effective population sizes, compared with D. melanogaster and D. yakuba. These studies tend to suggest that the genes of species with small population sizes evolve fast, possibly due to less effective natural selection. Petit and Barbadilla (2009) examined the effective population sizes of many of the Drosophila species used in the present study and reported that selection efficiency is correlated with effective population size, which, in turn, is correlated with levels of genomic codon bias, proportion of adaptive substitutions, and repetitive sequences. We therefore examined the relationship between effective population size and climatic envelope using synonymous polymorphism in the genes of the seven Drosophila species (D. melanogaster, D. sechellia, D. yakuba, D. erecta, D. mojavenisis, and D. virilis) reported in Petit and Barbadilla (2009). Our results showed that there was no significant correlation between effective population size and climatic envelope (R2= 0.061, P = 0.63; fig. S9A, Supplementary Material online). In addition, the results of a linear model, in which effective population size and climatic envelope were used as explanatory variables, showed that PD was explained by climatic envelope (R2= 0.85, P = 0.0089) but not by the effective population size. Further, we examined the relationship between PD and climatic envelope after removing species with a small effective population size (D. sechellia and D. erecta), because the correlation between selection efficiency and population size was strong when the host-specific species were compared with generalist species (Petit and Barbadilla 2009; Singh et al. 2009). However, we still found a significant correlation between PD and climatic envelope (R2= 0.86, P = 0.00086; fig. S9B, Supplementary Material online), suggesting that differences in PD among species are not explained by the effective population sizes.

Evolutionary Processes in Divergence of Duplicate Content

We next investigated the evolutionary processes responsible for differences in the PD between closely related species with different habitat variability. First, we examined the conservation of PD by fitting it to the phylogenetic tree using a Brownian motion model and calculating Pagel’s lambda (Pagel 1999). We found that lambda was 1.2 × 107, and that the value differed significantly from 1 under Brownian motion evolution by comparison of likelihoods (P = 3.3 × 106). This indicates that the phylogeny does not explain the distribution of PD among the Drosophila species. We therefore examined whether the loss of duplicated genes occurred more frequently in species with low PD. We focused on two species pairs, D. melanogasterD. sechellia and D. pseudoobscuraD. persimilis, in which the habitat variability and PD differed even though they were closely related phylogenetically (figs. 2A and S8A, Supplementary Material online) and found that species with low habitat variability and low PD (D. sechellia and D. persimilis) tended to lose duplicated genes (figs. S4A and S4B, Supplementary Material online, and table 1). D. pseudoobscura and D. persimilis diverged more recently than did D. melanogaster and D. sechellia, and, in fact, the former species pair can easily interbreed. Even though divergence times were different between species pairs, we observed consistent results in which species with low habitat variability tended to lose duplicated genes, when we expanded the estimation for all species used in our study (table 2). In addition, we found that there was a strong negative correlation between the loss rates of duplicated genes and climactic envelope among the species, even after phylogenetic constraints were removed (R2= 0.92, P = 1.1 × 105; fig. 3). Furthermore, we found a significant negative correlation between PD and the loss rates of duplicated genes (R2= 0.73, P = 0.0017; fig. 3C and D). Note that the extreme contrasts in figure 3B and D did not drive the relationships (fig. S7B, Supplementary Material online; R2= 0.78, P = 0.0038 and Fig. S7C; R2= 0.45, P = 0.068). The negative correlation in figure S7C, Supplementary Material online, after removal of the extreme contrasts was not statistically significant, but it is possible due to the low statistical power of the small dataset. This indicates that the functional constraints on duplicated genes of species with low habitat variability are more relaxed than those of species with high habitat variability. D. sechellia is thought to have lost Or and Gr genes associated with odor response in compensation for specializing on Morinda citrifolia, which is toxic to other Drosophila (McBride 2007; McBride et al. 2007). This is likely to be a case of antagonistic pleiotropy (trade-offs) (Hoffmann 2010). McBride reported that not only Or/Gr genes but also randomly chosen genes in D. sechellia were fast-evolving compared with those in the closely related cosmopolitan species D. simulans (McBride 2007). Our findings derived from genome-wide analyses suggest that DNA decay occurred in climatic specialists rather than generalists (Hoffmann and Willi 2008; Hoffmann 2010), although it is difficult to distinguish the hypothesis from that of antagonistic peliotropy (Hoffmann 2010). However, we suggest that species with low habitat variability might have lost the functional constraints on genes in general. We conducted a relative rate test (Tajima 1993) to detect lineage-specific fast-evolving genes to understand the general trends in differences in the functional constrains on genes between closely related species with different habitat ranges. The number of fast-evolving genes in species with high habitat variability and high PD (D. melanogaster and D. pseudoobscura) was significantly smaller than that in species with low habitat variability and low PD (D. sechellia, P < 2.2 × 1016, χ2 test; and D. persimilis, P < 2.2 × 1016, χ2 test; fig. S4C, Supplementary Material online, and table 1). Furthermore, duplicated gene pairs including significantly fast-evolving genes were enriched in species with low habitat variability and low PD (D. sechellia, P < 2.2 × 1016, χ2 test; and D. persimilis: P < 2.2 × 1016, χ2 test; fig. S4D, Supplementary Material online, and table 1). These results imply that the low PD in species with low habitat variability was caused by losses both of duplicated genes and of sequence similarity between duplicated gene pairs.

Fig. 3.

Fig. 3.

Correlation between climatic envelope (or PD) and the PD of lost genes. (A) Relationship between climatic envelope and the PD of lost genes. The x-axis indicates the proportion of lost duplicated genes for 11 Drosophila species (mel, D. melanogaster; sec, D. sechellia; yak, D. yakuba; ere, D. erecta; ana, D. ananassae; pse, D. pseudoobscura; per, D. persimilis; wil, D. willistoni; moj, D. mojavensis; vir, D. virilis; and gri, D. grimshawi). The y-axis indicates climatic envelope estimated by WORLDCLIM datasets. Error bars indicate standard error for the PD of lost genes derived from table 2. (B) Relationship between contrasts in climatic envelope and the average proportion of lost duplicated genes for each species. The x-axis indicates PICs in the PD of lost genes, and the y-axis indicates PICs in the climatic envelope. The dashed line represents the regression line. (C) The PD of total genes for each species (black) and the average proportion of lost duplicated genes for each branch (gray) on phylogenetic tree. The phylogenetic tree is from Drosophila 12 Genomes Consortium (2007). (D) Relationship between contrasts in the PD of total genes and the average proportion of lost duplicated genes for each species. The x-axis indicates PICs in the PD of lost genes, and the y-axis indicates PICs in the PD of total genes. The dashed line represents the regression line.

We examined whether lineage-specific lost and fast-evolving genes in species with low PD were enriched in particular functional categories using gene ontology (http://www.geneontology.org). As a result, we detected little enrichment of functional categories for the genes in species with low PD; the lost genes in D. sechellia and D. persimilis were enriched only in metabolic process (P = 1.5 × 107) and response to stimulus (P = 9.9 × 104), respectively. Note that the enrichment of lost genes related to metabolic process in D. sechellia could be caused by a trade-off associated with specializing on the fruits of M. citrifolia, which contain substrates toxic to other Drosophila species (Markow and O’Grady 2007). This indicates that both the loss and relaxation of functional constraints are common for genes in species with low habitat variability, rather than being specific to particular genes. Species would need not only cold and desiccation tolerances but also physiological, morphological, behavioral, and certain other adaptations to live in heterogeneous environments.

We also conducted the analyses using the closely related species D. yakuba and D. erecta; although D. yakuba has a wider distribution in Africa than the specialist D. erecta (Markow and O’Grady 2006), both species inhabit tropical regions and have similar PD. We found no significant difference in the PD of lineage-specific lost genes between these species, which was consistent with our hypothesis (table 1). However, the duplicated gene pairs containing significantly fast-evolving genes and the fast-evolving genes themselves were both enriched in D. erecta (P < 2.2 × 1016, χ2 test; table 1). Notably, this difference was smaller than that for species pairs with different habitat variability and PD (D. melanogasterD. sechellia and D. pseudoobscuraD. persimilis; table 1). Although both D. yakuba and D. erecta have low habitat variability, these results might also be affected by the host specificity, narrow habitat area and/or small population size of D. erecta. Overall, our results suggest that in species with low habitat variability, duplicated genes have been lost from the genome, whereas in species with high habitat variability, high PD has been maintained in the genome.

Cause and Effect of Habitat Variability

Adaptation to homogeneous environments (e.g., host specialization) is probably the main cause of habitat range restriction, because our results show that species with low habitat variability and low PD tend to lose duplicated genes (fig. 3, tables 1 and 2). In addition, we also showed that there is no evidence to suggest that species with high habitat variability have gained a greater number of duplicated genes than those with low habitat variability. This indicates that habitat variability cannot be the cause of increasing PD. We propose that selection for retaining genetic diversity operated efficiently in species with high habitat variability. Under this selection, duplicated genes in species with high habitat variability were maintained, in contrast to those species with low habitat variability. Therefore, the loss of duplicated genes could be a reason for restricting habitat expansion to habitats with lower variability after species have adapted to homogeneous environments and lost the genes. Compared with more generalist species, host-specific species (D. sechellia, D. erecta, and D. mojavenisis) and island endemic species (D. sechellia and D. grimshawi) are unable to expand their distributions to heterogeneous environments due to a lack of genetic variation conferred by retention of duplicated genes. Therefore, PD can be both a cause and an effect of habitat variability in Drosophila species (fig. 3, tables 1 and 2).

Conclusion

Our findings show that the PD in a genome strongly correlates with the habitat variability of a species. Variable environments within a species’ range must promote the maintenance of duplicated genes. A recent study predicted that duplicated genes could be maintained in gene regulatory networks in randomly fluctuating environments (Tsuda and Kawata 2010). The expression of duplicated genes was more diverse than that of singletons (Kliebenstein 2008; Ha et al. 2009; Dong et al. 2011), and therefore, individuals with more duplicated genes have advantages in diverse environments because they produce more genetically variable offspring. Kellermann et al. (2009) showed that specialist species lacked genetic variation in key traits, thereby limiting their ability to adapt to changed conditions. Our results indicate that genetic and genomic architecture, such as the PD in a genome, are fundamental constraints on the production of genetic variation for adaptation to new and varied environments.

Many of the whole genome sequences in the database were determined from inbred individuals. Therefore, these sequences do not provide information about the genetic variation of the population. Although species have gene copy number variations in their genomes, it is highly unlikely that inbreeding or the founder effect immediately reduces their PD. Therefore, PD can be estimated even from the genomic sequences of inbred lines as a representative value of an individual of a population. We suggest that PD is an excellent genetic indicator for adaptation to habitat diversity. Whole genomes can now be sequenced comparatively easily, and techniques continue to rapidly improve (Metzker 2010). Further analyses of duplicated genes in additional species will clarify the relationship between genetic factors and habitat distributions that depend on habitat variability. If the relationship between PD and habitat variability applies to other organisms, it allows us to predict which species are unlikely to survive to environmental change, which could aid future biodiversity conservation efforts. This study shows the first evidence that genome-wide duplicated gene content determines ecological traits. Our results provide new insight into the evolution of duplicated genes, that their maintenance might confer an ecological advantage to an organism during evolution.

Supplementary Material

Supplementary figures S1–S9 and tables S1–S5 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Supplementary Data

Acknowledgments

We thank A.A. Hoffman, J. Kitano, and J. Bridle for comments on the manuscript. The study was supported by the Global Centres of Excellence Program “Centre for ecosystem management adapting to global change” (J03) of the Ministry of Education, Culture, Sports, Science, and Technology of Japan to T.M. and M.K.

References

  1. Ashburner M, Thompson J, Carson HL. The genetics and biology of Drosophila. Vol. 3b. San Diego (CA): Academic Press; 1982. [Google Scholar]
  2. Bosco G, Campbell P, Leiva-Neto JT, Markow TA. Analysis of Drosophila species genome size and satellite DNA content reveals significant differences among strains as well as between species. Genetics. 2007;177:1277–1290. doi: 10.1534/genetics.107.075069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bridle JR, Vines TH. Limits to evolution at range margins: when and why does adaptation fail? Trends Ecol Evol. 2007;22:140–147. doi: 10.1016/j.tree.2006.11.002. [DOI] [PubMed] [Google Scholar]
  4. Clark AG, Eisen MB, Smith DR, et al. (417 co-authors) Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007;450:203–218. doi: 10.1038/nature06341. [DOI] [PubMed] [Google Scholar]
  5. Dean EJ, Davis JC, Davis RW, Petrov DA. Pervasive and persistent redundancy among duplicated genes in yeast. PLoS Genet. 2008;4:e1000113. doi: 10.1371/journal.pgen.1000113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Dong D, Yuan Z, Zhang Z. Evidences for increased expression variation of duplicate genes in budding yeast: from cis- to trans-regulation effects. Nucleic Acids Res. 2011;39:837–847. doi: 10.1093/nar/gkq874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Felsenstein J. Phylogenies and the comparative method. Am Nat. 1985;125:1–15. [Google Scholar]
  8. Ha M, Kim ED, Chen ZJ. Duplicate genes increase expression diversity in closely related species and allopolyploids. Proc Natl Acad Sci U S A. 2009;106:2295–2300. doi: 10.1073/pnas.0807350106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hartman JL, 4th, Garvik B, Hartwell L. Principles for the buffering of genetic variation. Science. 2001;291:1001–1004. doi: 10.1126/science.291.5506.1001. [DOI] [PubMed] [Google Scholar]
  10. Heger A, Ponting CP. Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes. Genome Res. 2007;17:1837–1849. doi: 10.1101/gr.6249707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. Very high resolution interpolated climate surfaces for global land areas. Int J Climatol. 2005;25:1965–1978. [Google Scholar]
  12. Hoffmann AA. A genetic perspective on insect climate specialists. Austr J Entomol. 2010;49:93–103. [Google Scholar]
  13. Hoffmann AA, Willi Y. Detecting genetic responses to environmental change. Nat Rev Genet. 2008;9:421–432. doi: 10.1038/nrg2339. [DOI] [PubMed] [Google Scholar]
  14. Kellermann V, van Heerwaarden B, Sgro CM, Hoffmann AA. Fundamental evolutionary limits in ecological traits drive Drosophila species distributions. Science. 2009;325:1244–1246. doi: 10.1126/science.1175443. [DOI] [PubMed] [Google Scholar]
  15. Kirkpatrick M, Barton NH. Evolution of a species’ range. Am Nat. 1997;150:1–23. doi: 10.1086/286054. [DOI] [PubMed] [Google Scholar]
  16. Kliebenstein DJ. A role for gene duplication and natural variation of gene expression in the evolution of metabolism. PLoS One. 2008;3:e1838. doi: 10.1371/journal.pone.0001838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kliman RM, Andolfatto P, Coyne JA, Depaulis F, Kreitman M, Berry AJ, McCarter J, Wakeley J, Hey J. The population genetics of the origin and divergence of the Drosophila simulans complex species. Genetics. 2000;156:1913–1931. doi: 10.1093/genetics/156.4.1913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kottek M, Grieser J, Beck C, Rudolf B, Rubel F. World Map of the Köppen-Geiger climate classification updated. Meteorologische Zeitschrift. 2006;15:259–263. [Google Scholar]
  19. Legendre P, Legendre L. Numerical ecology. 2nd English ed. Amsterdam: Elsevier; 1998. [Google Scholar]
  20. Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. doi: 10.1126/science.290.5494.1151. [DOI] [PubMed] [Google Scholar]
  21. Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302:1401–1404. doi: 10.1126/science.1089370. [DOI] [PubMed] [Google Scholar]
  22. Lynch M, Sung W, Morris K, et al. (11 co-authors) A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc Natl Acad Sci U S A. 2008;105:9272–9277. doi: 10.1073/pnas.0803466105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Margalef DR. Information theory in ecology. Gen Syst. 1958;3:36–71. [Google Scholar]
  24. Markow T, O’Grady P. Drosophila: A guide to species identification and use. San Diego (CA): Academic Press; 2006. [Google Scholar]
  25. Markow TA, O’Grady PM. Drosophila biology in the genomic age. Genetics. 2007;177:1269–1276. doi: 10.1534/genetics.107.074112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. McBride CS. Rapid evolution of smell and taste receptor genes during host specialization in Drosophila sechellia. Proc Natl Acad Sci U S A. 2007;104:4996–5001. doi: 10.1073/pnas.0608424104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. McBride CS, Arguello JR, O’Meara BC. Five Drosophila genomes reveal nonneutral evolution and the signature of host specialization in the chemoreceptor superfamily. Genetics. 2007;177:1395–1416. doi: 10.1534/genetics.107.078683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Metzker ML. Sequencing technologies–the next generation. Nat Rev Genet. 2010;11:31–46. doi: 10.1038/nrg2626. [DOI] [PubMed] [Google Scholar]
  29. Mirol PM, Routtu J, Hoikkala A, Butlin RK. Signals of demographic expansion in Drosophila virilis. BMC Evol Biol. 2008;8:59. doi: 10.1186/1471-2148-8-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Nardon C, Deceliere G, Loevenbruck C, Weiss M, Vieira C, Biemont C. Is genome size influenced by colonization of new environments in dipteran species? Mol Ecol. 2005;14:869–878. doi: 10.1111/j.1365-294X.2005.02457.x. [DOI] [PubMed] [Google Scholar]
  31. Notredame C, Higgins DG, Heringa J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302:205–217. doi: 10.1006/jmbi.2000.4042. [DOI] [PubMed] [Google Scholar]
  32. Ohno S. Evolution by gene duplication. New York: Springer; 1970. [Google Scholar]
  33. Pagel M. Inferring the historical patterns of biological evolution. Nature. 1999;401:877–884. doi: 10.1038/44766. [DOI] [PubMed] [Google Scholar]
  34. Petit N, Barbadilla A. Selection efficiency and effective population size in Drosophila species. J Evol Biol. 2009;22:515–526. doi: 10.1111/j.1420-9101.2008.01672.x. [DOI] [PubMed] [Google Scholar]
  35. Piano F, Craddock EM, Kambysellis MP. Phylogeny of the island populations of the Hawaiian Drosophila grimshawi complex: evidence from combined data. Mol Phylogenet Evol. 1997;7:173–184. doi: 10.1006/mpev.1996.0387. [DOI] [PubMed] [Google Scholar]
  36. Reed LK, Markow TA. Early events in speciation: polymorphism for hybrid male sterility in Drosophila. Proc Natl Acad Sci U S A. 2004;101:9009–9012. doi: 10.1073/pnas.0403106101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Root TL, Price JT, Hall KR, Schneider SH, Rosenzweig C, Pounds JA. Fingerprints of global warming on wild animals and plants. Nature. 2003;421:57–60. doi: 10.1038/nature01333. [DOI] [PubMed] [Google Scholar]
  38. Roy K, Hunt G, Jablonski D, Krug AZ, Valentine JW. A macroevolutionary perspective on species range limits. Proc Biol Sci. 2009;276:1485–1493. doi: 10.1098/rspb.2008.1232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Singh ND, Larracuente AM, Sackton TB, Clark AG. Comparative genomics on the Drosophila phylogenetic tree. Annu Rev Ecol Evol Systemat. 2009;40:459–480. [Google Scholar]
  40. Tajima F. Simple methods for testing the molecular evolutionary clock hypothesis. Genetics. 1993;135:599–607. doi: 10.1093/genetics/135.2.599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Tsuda ME, Kawata M. Evolution of gene regulatory networks by fluctuating selection and intrinsic constraints. PLoS Comput Biol. 2010;6:e1000873. doi: 10.1371/journal.pcbi.1000873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Wilkins AS. Canalization: a molecular genetic perspective. Bioessays. 1997;19:257–262. doi: 10.1002/bies.950190312. [DOI] [PubMed] [Google Scholar]
  43. Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997;13:555–556. doi: 10.1093/bioinformatics/13.5.555. [DOI] [PubMed] [Google Scholar]
  44. Yang Z, Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000;17:32–43. doi: 10.1093/oxfordjournals.molbev.a026236. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES