Abstract
Diatoms are the most species-rich group of microalgae, and their contribution to marine primary production is important on a global scale. Diatoms can form dense blooms through rapid asexual reproduction; mutations acquired and propagated during blooms likely provide the genetic, and thus phenotypic, variability upon which natural selection may act. Positive selection was tested using genome and transcriptome-wide pair-wise comparisons of homologs in three genera of diatoms (Pseudo-nitzschia, Ditylum, and Thalassiosira) that represent decreasing phylogenetic distances. The signal of positive selection was greatest between two strains of Thalassiosira pseudonana. Further testing among seven strains of T. pseudonana yielded 809 candidate genes of positive selection, which are 7% of the protein-coding genes. Orphan genes and genes encoding protein-binding domains and transcriptional regulators were enriched within the set of positively selected genes relative to the genome as a whole. Positively selected genes were linked to the potential selective pressures of nutrient limitation and sea surface temperature based on analysis of gene expression profiles and identification of positively selected genes in subsets of strains from locations with similar environmental conditions. The identification of positively selected genes presents an opportunity to test new hypotheses in natural populations and the laboratory that integrate selected genotypes in T. pseudonana with their associated phenotypes and selective forces.
Keywords: positive selection, natural selection, diatom, evolution
Introduction
Diatoms are evolutionarily the youngest members of the phytoplankton, originating near the boundary of the Permian and Triassic periods ∼250 Ma, a time of mass extinction in the world ocean (Raup 1979; Sorhannus 2007). Radiation of diatoms into marine, freshwater, soil, and ice ecosystems is evidence of genetic specialization and an exemplary ability to adapt. The number of marine diatom species has increased during the last 65 My, reducing the global silicic acid concentration due to the incorporation of precipitated silica into the diatom cell walls (Siever 1991; Falkowski et al. 2004). The biogeochemical cycles of carbon and silicon are coupled by diatoms and to a lesser extent by other siliceous organisms. Diatoms control the biological portion of the silicon cycle in today’s oceans and contribute substantially to primary production and the export of carbon to the deep sea (Maliva et al. 1989; Tréguer and Pondaven 2000).
Diatoms are the most species-rich group of phytoplankton in the ocean (Kooistra et al. 2007), and they appear to adapt rapidly to local conditions. Marine species with circum-global distributions have regionally, genetically structured populations (Rynearson and Armbrust 2004; Casteleyn et al. 2010), despite wide distribution via ocean currents. Growth rates of diatoms occur on the same time scale as environmental fluctuations within different marine ecosystems. Estuarine and coastal species are adapted to daily fluctuations in photosynthetically available light that ranges from extremely low to stressfully intense levels as the turbidity of the water column and their position in it changes; in contrast, open ocean species live in consistently clear water (Lavaud et al. 2007). Over the course of a week, coastal species can experience transitions from cool, high-nutrient, upwelling conditions to warmer, low-nutrient, downwelling states (Austin and Barth 2002). In the nutrient-poor open ocean, normally low diatom abundances increase dramatically in response to ephemeral, cold-core eddies that bring nutrients to the surface, allowing diatoms to grow quickly into blooms that can persist for up to a month (Benitez-Nelson et al. 2007).
Diatoms have a high capacity to accrue mutations upon which selection may act. They are diploid and can theoretically accommodate a high proportion of recessive mutations. Their life history is dominated by asexual reproduction, during which cells divide at least once per day during bloom conditions (Furnas 1990). Sexual generation times are ∼2 years in marine diatoms (D'Alelio et al. 2010; Holtermann et al. 2010); thus, genetic variation increases and surviving mutations are passed along relatively quickly. The mutational load carried by diatoms is evident in the sequenced genomes of Thalassiosira pseudonana and Phaeodactylum tricornutum. The amount of genetic diversity accrued between T. pseudonana and Pha. tricornutum in 90 My is similar to that accrued between mammals and fish in 550 My (Bowler et al. 2008); however, when normalized to generation time, the number of neutral substitutions per generation is equivalent between the pairs. In addition, the genome of T. pseudonana has, on average, one polymorphism per 150 bases (Armbrust et al. 2004), which is an order of magnitude more than humans (Sachidanandam et al. 2001) but an order of magnitude less than the purple sea urchin (Sodergren et al. 2006).
Gene variants associated with adaptive phenotypes are positively selected and identified as the protein-coding genes in which the ratio of nonsynonymous substitutions at nonsynonymous sites to synonymous substitutions at synonymous sites (dN:dS) is greater than one among homologs of two or more organisms. Positively selected genes and their encoded proteins share traits with other divergent and diverging genes such as those under relaxed purifying selection. Rates of protein sequence divergence tend to be higher in lineage-specific, or “young,” proteins, than in proteins with deep evolutionary histories (Alba and Castresana 2005; Wolf et al. 2009). Concomitantly, mRNA expression of highly diverged and positively selected genes is frequently restricted to specific tissues (Kosiol et al. 2008; Oliver et al. 2010) and is lower than that of conserved genes responsible for basic functions such as protein production, cell maintenance, and division (Pál et al. 2001; Subramanian and Kumar 2004). Purifying selection is relaxed in proteins found at the periphery of networks and on the extracellular surface of cells interfacing with the environment (Julenius and Pedersen 2006; Kim et al. 2007). Proteins encoded by positively selected genes are frequently mediators of signal transduction, functioning in cell–cell recognition, immune response, and gamete recognition; other reproductive proteins, membrane and intracellular transporters are also selected (Castillo-Davis et al. 2004; Bustamante et al. 2005; Nielsen et al. 2005; Namroud et al. 2008; Li et al. 2009; Voolstra et al. 2011).
Studies of positive selection in single-celled eukaryotes and phytoplankton, specifically, remain limited (Li et al. 2009; Voolstra et al. 2009, 2011), primarily because the availability of sequence data is limited. Selected sites within the sexually induced gene (SIG1) appear to be under positive selection among four species of Thalassiosira (Sorhannus and Pond 2006) but not within different strains of Thalassiosira weissflogii (Grunow) Fyxell and Hasle (Sorhannus 2003; Suzuki and Nei 2004). The ecologically important silicon transporter (SIT) gene family experiences strong purifying selection among 45 marine and freshwater species within the Thalassiosirales separated by 75 My of divergence (Alverson 2007). It is possible that the detection of positive selection within the SIT gene family was hindered by saturating rates of synonymous mutation, which is evident among the three SIT genes of Pha. tricornutum (Sapriel et al. 2009).
In this study, we quantify positive selection within three genera of diatoms, including six species, using transcriptomic and genomic sequences. These diatoms represent a gradient of phylogenetic relatedness from three well-established species of Pseudo-nitzschia that diverged 5–10 Ma to strains within the cosmopolitan species T. pseudonana that diverged from Detonula confervacea ∼2 Ma (Sorhannus 2007). The two cryptic species of Ditylum brightwellii are differentiated by genome size and likely diverged recently because the larger genome-sized species is thus far found only in the northeastern Pacific Ocean (Koester et al. 2010). By rigorously testing genes within seven strains of T. pseudonana using a phylogenetic framework, we identified a large suite of positively selected genes and explored links between the genes, potential phenotypes, and environmental selective forces using profiles of gene expression and branch-site models of selection among different strains. We used the dN:dS metric instead of population-based polymorphism methods because preliminary observations of the data revealed polymorphic differences between strains, which were sampled from distant locations and across decades suggesting that, similar to other planktonic diatoms, there is population structure within T. pseudonana.
Materials and Methods
Sequence Data and Identification of Homologous Genes
We used transcriptomic and genomic data to detect positive selection in three diatom genera, Pseudo-nitzschia, Ditylum, and Thalassiosira (supplementary table S1, Supplementary Material online: database depositions). The transcriptomes of the three species of Pseudo-nitzschia were sequenced and quality curated previous to this study (supplementary table S1, Supplementary Material online). For the two species of Ditylum, we extracted total RNA from exponentially growing cultures using Plant RNA reagent (Invitrogen, Life Technologies), selected mRNA with the MicroPoly(A)Purist Kit (Ambion; Life Technologies), and reverse transcribed the mRNA using the SuperScript Double-Stranded cDNA synthesis Kit (Invitrogen, Life Technologies). Pyrosequencing was performed in the Schuster Laboratory (University Park, PA). We used an automated pipeline that integrates custom scripts with extant bioinformatic tools to remove ribosomal sequence with BLAST, and Lucy to trim poly-A tails and low-quality sequence with scores <14. Quality-curated reads were assembled with CABOG default settings (Miller et al. 2008).
We identified homologous genes among transcripts of the Pseudo-nitzschia species and between the two Ditylum species using another pipeline. In brief, the pipeline retrieved all possible open reading frames (ORFs) of at least 25 amino acids in length for each contig using getorf (EMBOSS). ORFs were compared with protein databases including those of several phytoplankton and the nonredundant (nr) protein database of NCBI using BLASTp at an e-value cutoff of 10−3 (Altschul et al. 1997). Contigs with the lowest e value were selected for further analysis. If a contig did not have a protein match, the longest ORF was chosen. Putative homologs between diatom species were paired using a best reciprocal blastp between the transcripts of each species at an e-value cutoff of 10−10. Homologs were aligned on the translated protein sequence using CLUSTALW2 (Larkin et al. 2007) and converted to the original DNA sequence with revtrans.py (Wernersson and Pedersen 2003). Aligned homologs were trimmed to blunt ends with at least three identical amino acids at each end and converted to PHYLIP format for analysis of positive selection.
Seven strains of Thalassiosira were sequenced with an ABI SOLiD and mapped with BWA 0.5.9 (parameters: bwa aln -k 2 -l 18 -n .001) to the reference strain (CCMP1335) previously sequenced using the Sanger method (Armbrust et al. 2004). SOLiD-sequenced reads were trimmed based on quality (Iverson et al. 2012); reads less than 24 bp that did not meet the quality threshold were discarded. Alignments of the homologous genes consist of the majority consensus sequence of each strain with the introns removed. The T. pseudonana v. 3.0 gene models (http://genome.jgi.doe.gov/Thaps3/Thaps3.home.html) defined the start, stop, and intron boundaries.
Pair-Wise Tests for Detecting Positive Selection at Increasing Phylogenetic Distance
We used five pair-wise tests (table 1) to determine the phylogenetic distance that best detects positive selection in diatoms. Gene sets from two strains of Thalassiosira, two species of Ditylum, and three species pairs of Pseudo-nitzschia were each tested. We calculated species divergence from 18S ribosomal DNA (rDNA) sequences aligned with CLUSTALW2 and trimmed to the length of the shortest partial sequence, which belonged to Pseudo-nitzschia multistriata. Percent divergence was calculated in BioEdit (Hall 1999).
Table 1.
Phylogenetic Divergence and Rates of Synonymous (dS) and Nonsynonymous (dN) Substitutions in Homologs of Three Diatom Genera Using Pair-Wise Tests.
| 18S div. (%) | Seqa (#) | Paired Genes (No.) | Mean Align. Length (bp) | Saturated Pairs dS ≥ 1.0 |
Unsaturated Pairs dS < 1.0 |
|||||
|---|---|---|---|---|---|---|---|---|---|---|
| No. | Percentage | No. | Percentage | Mean dN | Mean dS | |||||
| Thalassiosira pseudonana (NY) | 0 | 11,390 | 11,355 | 1,488 | 9 | 0.08 | 11,346 | 99.92 | 0.002 | 0.006 |
| T. pseudonana (Wales) | 11,390 | |||||||||
| Ditylum brightwellii 1 | 0b | 3,910 | 113 | 204 | 4 | 3 | 110 | 96 | 0.007 | 0.048 |
| D. brightwellii 2 | 477 | |||||||||
| Pseudo-nitzschia multistriata | 0.60c,d | 16,512 | 2,653 | 483 | 2,359 | 89 | 294 | 11 | 0.086 | 0.795 |
| P. multiseries | 16,535 | |||||||||
| P. australis | 1.03e | 920 | 277 | 324 | 176 | 64 | 101 | 36 | 0.090 | 0.726 |
| P. multistriata | 16,512 | |||||||||
| P. australis | 1.12f | 920 | 404 | 336 | 266 | 66 | 138 | 34 | 0.103 | 0.718 |
| P. multiseries | 16,535 | |||||||||
aNumber of starting sequences. T. pseudonana: JGI version 3.0 gene model transcripts; others: BLASTp-annotated or the longest ORF.
c1,160 bp partial alignment of 18S rDNA of the three Pseudo-nitzschia spp.
dAccession: U18241 and P. multistriata (Ruggiero personal communication).
eAccession: AM235384 and P. multistriata.
Phylogenetic analysis by maximum likelihood (PAML version 4.4 [Yang 2007]) was used to analyze each set of paired genes (table 1) with the program codeml in runmode = −2 (pairwise), model = 0, NSsites = 0 (one dN:dS value), and fix_omega = 0 (estimates omega) and cleandata = 1, such that sites with ambiguity codes were removed from analysis. We removed genes saturated for synonymous substitutions (dS > 1.0) and used the dN:dS ratio to evaluate at what phylogenetic distance positive selection could be detected (dN:dS > 1.0).
Testing for Positive Selection among Seven Strains of T. pseudonana
We further tested the genes with dN:dS ≥ 0.5 in the pair-wise analysis of T. pseudonana for positive selection among seven genetically distinct strains because approximately 80% of genes with a dN:dS ≥ 0.5 are under positive selection (Swanson et al. 2004). A coalescent tree was generated from the individual gene trees (supplementary fig. S1, Supplementary Material online). Gene trees and the coalescent tree were constructed with RaXML 7.2.5 (GTRPROTGAMMAWAG [Stamatakis 2006]) and PhyloNet (unrooted minimum coalescence [Than and Nakhleh 2009]), respectively. Individual genes were tested for positive selection by comparing how well the data fit a null model (M8a, nearly neutral) versus a selection model (M8). Both models allow omega (dN:dS for the tree of seven strains) to vary at different sites along the gene alignment. In the neutral model M8a, omega is distributed between zero and one, whereas in the selection model M8, omega may also be greater than one. We ran null and selection models three and four times, respectively, to evaluate convergence of the likelihood estimates. Any genes that did not converge within the selection model were removed from further analysis. We calculated the likelihood ratio tests (lrts) for each gene using the formula: lrt = −2.0×(ln(likelihood)NULL – ln(likelihood)SELECTION). The test statistic generally follows a χ2 distribution, therefore nominal significance (P value) was determined by integrating the right-hand side of the χ2 distribution for one degree of freedom (number of parameters: M8a = 16, M8 = 17) by the lrt statistic. The statistical significance for multiple tests was adjusted using a Bonferroni correction and a false discovery rate (FDR) of 0.01 with Q value (parameters: lamba = range 0.0 to 0.9 by 0.05; π0 method = bootstrapped; Storey and Tibshirani 2003).
Testing for Functional Enrichment and Orphan Status of Genes under Positive Selection
We submitted the T. pseudonana version 3.0 genes to InterProScan (Zdobnov and Apweiler 2001) and performed BLASTp against the nr database at NCBI to assess functional annotations for each gene. Orphans are defined as genes whose translated proteins either have no matches or matches with an e-value of 10−5 or greater from the BLAST-searched databases. Search databases included genomic and transcriptomic sequence from five additional diatoms, two oomycetes, two cryptophytes, three prasinophytes, one haptophyte, one amoeba, one ciliate, one green alga, and the nr. Gene ontology (GO) terms were extracted from the InterProScan results to test for significant associations of GO terms within the set of positively selected genes using GOSTATS (Falcon and Gentleman 2007). We performed conditional, hypergeometric tests for over-representation of GO terms separately for each of the GO ontologies, molecular function, cell component, and biological process. A Bonferroni correction and an FDR of 0.05 were used to adjust P values for multiple tests.
Expression of Positively Selected Genes in T. pseudonana (Strain CCMP1335)
We used T. pseudonana transcription data from Mock et al. (2008) to test the hypothesis that positively selected genes have lower levels of expression than neutral or purified genes and to identify growth conditions under which positively selected genes are coexpressed. The relative level of expression was determined for each gene by taking the median fluorescence of the aggregated (previously quantile normalized) probes and replicates of each experimental condition. We tested differences in the distributions of gene expression between positively selected genes and genes evolving neutrally and under purifying selection for each experimental condition using one-sided Mann–Whitney tests (wilcox.text R v2.12.1). Correlations between gene expression and the number of exons per gene, mature transcript length, and G + C content in the transcript were investigated using descriptive statistics and the linear model implemented in R. Two-way hierarchical clustering (Cluster 3.0 [Eisen et al. 1998]) with the city blocks distance algorithm was used to group both genes and experimental conditions by similarity of expression patterns to identify genes that were coexpressed under specific conditions and thus potentially coevolving.
Testing Positive Selection along Specified Lineages of T. pseudonana
Four different branch-site selection models were used to test for positively selected genes in subsets of strains grouped by environmental conditions at the site each strain was collected. Two hypotheses addressed seasonal temperature variability. We placed strains in one of two groups based on the variability in seasonal sea surface temperature (SST) where they were collected. Differences in seasonal SST were plotted and calculated using the Smith and Reynolds climatology from the NCEP NOMADS Meteorological Data Server for SST between January and July 1970–2000 at the location each strain was collected (http://www.emc.ncep.noaa.gov/research/cmb/sst_analysis/#_cch2_1007146782, last accessed November 1, 2012). Two hypotheses explored the potential for ecosystem type and geographic isolation to promote positive selection using the open ocean strain (CCMP1014) and the Adriatic strain (RcTP), respectively. The null branch-site model A1 fixes omega to 1.0 on the branch(es) of interest, whereas the selection model A estimates a distribution of omega values across the gene on the chosen branch(es).
We also tested the amount of time that strains have been in culture as a selective force using a branch model. Omega values were estimated for strains based on the decade in which they were collected. Branch models estimate one omega value per designated set of lineages for the entire gene. The null model M0 fixes omega to 1.0 and the selection model (model = 2) estimates omega for each set of lineages. Genes in which the lrt is significant and omega > 1.0 for the specified branches are positively selected.
The branch-site and branch models used the same tree topology and statistics as described earlier with the exception that individual model parameters differed. Branches of the tree were annotated to support each of the different hypotheses being tested. The background omega was applied to internal branches unless both tips originating from a node were being tested with the same omega, then that internal branch would be assigned to the alternative, estimated omega.
Results
Pair-Wise Tests for Positive Selection
We determined the best phylogenetic distance at which to detect positive selection in diatoms through pair-wise comparisons of homologs within and between species. The phylogenetic distance, or percent divergence of the 18S rDNA sequence, ranges from zero between both the Thalassiosira strains and Ditylum species to 1.12% between P. australis and P. multiseries (table 1). The proportion of gene pairs saturated for synonymous substitutions is greatest in the interspecies comparisons of Pseudo-nitzschia, accounting for 89% of those tested between P. multistriata and P. multiseries (table 1). We did not use gene pairs saturated for synonymous substitutions to detect positive selection because the synonymous substitution rate is uncertain. Rates of synonymous and nonsynonymous substitutions between homologs decrease with decreasing phylogenetic distance. The rates of synonymous substitution between homologs of the sister species of Ditylum and strains of Thalassiosira are one and two orders of magnitude less, respectively, than the interspecies comparisons of Pseudo-nitzschia (table 1). The length of gene alignments does not appear to affect the estimates of the rates of substitution.
The majority of homologous pairs that are not saturated for synonymous substitutions are under strong purifying selection with dN:dS ≤ 0.1 in all comparisons except the Thalassiosira strains (fig. 1). Most Pseudo-nitzschia homologs have a dN:dS < 0.4, consistent with purifying and relaxed purifying selection and only a few have a dN:dS > 1.0, indicating that they may be subject to positive selection. Intermediate values of dN:dS for the homologs of both Ditylum and Thalassiosira suggest that many are experiencing relaxed purifying selection. Approximately 10% and 20% of the Ditylum and Thalassiosira homologs, respectively, have dN:dS > 1.0. We did not perform lrts of statistical significance for this initial survey of the genes.
Fig. 1.

Frequency distributions of the dN:dS of homologs between diatom pairs for genes that are not saturated for silent substitutions (i.e., for genes where dS < 1.0). Number of genes analyzed is in parentheses.
Positive Selection among Seven Strains of T. pseudonana
The genome of the T. pseudonana reference strain (CCMP1335) includes 11,390 gene models of which 11,355 were tested in the pair-wise survey—35 genes had internal stop codons in one or the other strain and were not tested. In the test of the two T. pseudonana strains, 3,565 genes had a dN:dS ≥ 0.5; therefore, we tested these genes more rigorously among seven strains to determine the statistical support for positive selection. The maximum likelihood approach used here incorporates a phylogenetic model allowing omega (dN:dS for a gene tree) to vary across the alignment of each gene. Maximum likelihood analysis identified 2,035 genes with a nominal P value ≤ 0.05 (supplementary table S2, Supplementary Material online). After correcting for multiple tests, 809 (Bonferroni, P value < 1.5 × 10−5) and 1,784 (FDR = 0.01) genes emerged as strong candidates of positive selection representing 7% and 16% of the protein-coding genes, respectively (supplementary table S2, Supplementary Material online). The set of 809 genes is a subset of 1,784 genes and is referred to as the positively selected set.
Protein functions encoded by the 809 positively selected genes were assessed using the GO data structure and InterPro for greater resolution of information concerning protein domains. GO terms are currently assigned to 329 (41%) of the 809 positively selected genes and 5,875 (52%) of the 11,390 total coding genes. We performed enrichment analysis for genes with GO terms. Six GO terms, including 146 unique genes, are over-represented in the positively selected set of genes relative to the distribution of GO terms for all protein-coding genes. The majority (112) of over-represented genes encode proteins involved in protein–protein interactions (table 2). One-third (34%) of the 329 positively selected genes with GO terms are represented by GO:0005515, the term for protein binding, in contrast to 18% of all genes with GO terms. Genes encoding biosynthetic and metabolic regulatory proteins are also over-represented within the positively selected set (table 2). Transcription factors dominate the regulatory genes with the majority (25 of 28) represented by GO:0009889; 8.5% of positively selected genes versus 4.3% of all genes with GO terms are represented by this term (supplementary table S3, Supplementary Material online). Protein domains with the greatest representation are WD40, zinc fingers, tetratricopeptides, PDZ, and domains associated with heat shock functions (supplementary table S3, Supplementary Material online). Orphan genes, for which no homologs were found in other organisms, represent 24% (191 genes) of the positively selected genes and 15% (1,718 genes) of all coding genes.
Table 2.
Functional enrichment in 809a positively selected genes of Thalassiosira pseudonana.
| Ontology | P-cutb | GO ID | P | Odds Ratio | Expc | Obsd | Totale | Term |
|---|---|---|---|---|---|---|---|---|
| MF | 18 | GO:0005515 | 6.05E − 12 | 2.43 | 62 | 112 | 1,098 | Protein binding*,** |
| GO:0009982 | 0.001353 | 4.34 | 2.23 | 8 | 39 | Pseudouridine synthase activity | ||
| CC | 6 | GO:0005634 | 0.007638 | 1.89 | 16 | 26 | 313 | Nucleus |
| BP | 23 | GO:0009889 | 2.69E − 05 | 2.63 | 13 | 29 | 282 | Regulation of biosynthetic process*,** |
| GO:0060255 | 3.53E − 05 | 2.54 | 14 | 30 | 301 | Regulation of macromolecule metabolic process*,** | ||
| GO:0090304 | 4.20E − 05 | 2.19 | 24 | 43 | 530 | Nucleic acid metabolic process*,** | ||
| GO:0051171 | 7.57E − 05 | 2.46 | 14 | 29 | 298 | Regulation of nitrogen compound metabolic process*,** | ||
| GO:0031323 | 0.000126 | 2.34 | 15 | 30 | 322 | Regulation of cellular metabolic process*,** | ||
| GO:0080090 | 0.000194 | 2.31 | 15 | 29 | 314 | Regulation of primary metabolic process** | ||
| GO:0001522 | 0.00022 | 5.90 | 2 | 8 | 37 | Pseudouridine synthesis** | ||
| GO:0006355 | 0.00039 | 2.48 | 10 | 22 | 219 | Regulation of transcription, DNA dependent** | ||
| GO:0032774 | 0.000819 | 2.33 | 11 | 22 | 231 | RNA biosynthetic process** | ||
| GO:0050789 | 0.000878 | 1.89 | 24 | 39 | 513 | Regulation of biological process** | ||
| GO:0034641 | 0.001622 | 1.64 | 47 | 65 | 1,017 | Cellular nitrogen compound metabolic process** | ||
| GO:0016567 | 0.001646 | 10.51 | 1 | 4 | 12 | Protein ubiquitination** |
Note.—Ontologies: MF, molecular function; BP, biological process; CC, cellular component.
aThree hundred twenty nine of 809 genes were annotated by GO terms.
bP-cut: the number of GO IDs with P values < 0.05 in the hypergeometric tests for each GO ID.
cExp: expected number of genes for the GO term in positively selected set if the function is not enriched.
dObs: observed number of genes in the positively selected set of genes.
eTotal: the total number of genes in the T. pseudonana genome annotated for a specific GO term.
*Significant with Bonferroni corrections per Ontology (MF = 1.4E − 4; CC = 3.7E − 4; BP = 9.6E − 5).
**Significant with FDR of 0.05 using q-value correction.
Expression of Positively Selected Genes in T. pseudonana (Strain CCMP1335)
We analyzed relative mRNA expression of T. pseudonana CCMP1335 from a publicly available data set with respect to the positively selected genes. Thalassiosira pseudonana had been limited for growth by silicic acid, iron, nitrate, carbon dioxide, or low temperature (4°C) or maintained under nutrient replete conditions (Mock et al. 2008). The data set of expressed genes comprised 8,996 genes, of which 636 are in the positively selected set of 809. The positively selected genes are expressed at lower levels than those under neutral and purifying selection (fig. 2). In each experimental treatment, the highest value of expression is an order of magnitude less for positively selected genes than genes evolving neutrally or under purifying selection (fig. 2). In addition, the frequency distributions of gene expression for positively selected and neutral and purified genes are significantly different (Mann–Whitney; fig. 2). These results are not affected by the number of exons per gene, mature transcript length, or G + C content within the mature transcript (data not shown).
Fig. 2.
Boxplots and statistics of the expression of positively selected (gray) and neutral and purified (white) genes of Thalassiosira pseudonana (CCMP1335) grown in a nutrient replete control (ctrl), nutrient limitation (Si, Fe, NO3, and CO2), and a 4°C cold treatment. Boxplots: box = 1st and 3rd quartiles, whiskers = 1.5 × interquartile range, notch = median of expression as log10(median aggregated probe intensity per gene). Mann–Whitney tests (N = 636 selected genes, N = 8,357 neutral and purified genes; *significance at P ≪ 0.01).
Two-way hierarchical clustering of the expression of the 636 positively selected genes highlights similarities and differences among groups of genes with respect to the experimental treatments (fig. 3). Two groups of genes are of special interest because they have very low and specific patterns of expression, respectively. The first is a group of 69 genes with consistently low expression across all treatments; some of these genes encode proteins putatively associated with sexual reproduction (fig. 3 and supplementary table S4, Supplementary Material online). The second group contains five genes, four of which are differentially upregulated in the silicic acid and iron-limited conditions relative to the control. One gene encodes a transcription factor, three appear to encode extracellular proteins including one with a chitin-binding domain, and the fifth has a transmembrane helix and possibly binds lectins (fig. 4).
Fig. 3.

Relative expression for 636 of 809 positively selected genes in Thalassiosira pseudonana (CCMP1335) grown under nutrient limitation (Si, Fe, NO3, and CO2), a nutrient replete control, and a 4°C cold treatment. Each gene is represented by one row in the heatmap. The similarity relationship of gene expression is represented by the dendrogram to the left. The upper dendrogram represents the similarity of responses among treatments. (A) Genes associated with sexual reproduction. (B) Putative cell wall-associated genes detailed in figure 4. Data are from Mock et al. 2008; log10(median aggregated probe intensity per gene) is plotted.
Fig. 4.
Relative expression and annotations of five coexpressed putative cell wall-associated genes that are positively selected in Thalassiosira pseudonana. Table headers: SP, signal peptide; T, target of peptide; DE, differentially expressed relative to control in Mock et al. (2008). Within table: Y, yes; S, secretory pathway; TF, transcription factor; PP, protein–protein interacting; TM, transmembrane region. The scale is log10(median aggregated probe intensity per gene).
Positive Selection along Specified Lineages of T. Pseudonana
The seven different T. pseudonana strains were grouped according to similarities in environmental conditions at the locations where they were collected to test whether those conditions promoted positive selection. Four strains were collected from locations where the SST is relatively stable year round, with a summer–winter difference of no more than 6°C (table 3). These strains have 5-fold more genes under positive selection than strains isolated from regions where the SST is seasonally more variable and the fluctuations are up to 13–15°C (table 3). Forty-one of the 121 positively selected genes associated with stable SSTs have GO terms. Eighteen (44%) of the 41 genes interact with other proteins, GO:0005515, and are statistically enriched after Bonferroni correction (GOSTATS: P value: 0.00012). Nine of the 22 positively selected genes associated with the more variable SSTs have GO terms; four of the encoded proteins have protein-binding activity, but none are related to heat stress.
Table 3.
Number of Genes under Positive Selection among Specific Lineages of Thalassiosira pseudonana.
| Strain | Isolation Location | Latitude | Longitude | ΔSST °C | Branch-Site Models |
Branch Model |
|||
|---|---|---|---|---|---|---|---|---|---|
| Low ΔSST | High ΔSST | Ocean Environment | Adriatic Isolation | Date of Isolation | |||||
| CCMP1014 | North Pacific Gyre | 28 N | 155 W | 0 | Low | Low | Open ocean | Connected | 1971a |
| CCMP1015 | San Juan Island, WA | 48.54 N | 123.01 W | 6 | Low | Low | Coastal/Estuarine | Connected | 1985a |
| CCMP1007 | VA | 37.95 N | 79.94 W | 14 | High | High | Coastal/Estuarine | Connected | 1964 |
| CCMP1335 | NY | 40.76 N | 72.82 W | 13 | High | High | Coastal/Estuarine | Connected | 1958 |
| CCMP1013 | Wales, UK | 53.28 N | 3.83 W | 6 | Low | Low | Coastal/Estuarine | Connected | 1973a |
| IT (RcTP) | Adriatic Sea, Italy | 44.9 N | 12.42 E | 15 | High | High | Coastal/Estuarine | Isolated | 2006a |
| CCMP1012 | Perth, Western Australia | 31.99 S | 115.83 E | 2 | Low | Low | Coastal/Estuarine | Connected | 1965 |
| Resultsb | |||||||||
| Genes with nominal P ≤ 0.05 | 744 | 353 | 87 | 70 | 468 | ||||
| Significant genes (Bonferroni) | 121 | 22 | 2 | 0 | 0 | ||||
| Genes in positively selected set of 809 | 111 | 20 | 2 | 0 | 0 | ||||
Note.—Underline, strains tested for positive selection per experiment. ΔSST, difference in averaged SST between January and July 1979–2000.
aBranches of strains tested for positive selection based on decade of isolation.
bNumber of parameters for branch-site models: selection model = 16, null model = 17, df = 1. Branch model (date): selection model = 14, null model = 17, df = 3.
The strain from the Pacific Gyre is the only representative from the open ocean, which is characterized by lower nutrient concentrations than coastal and estuarine areas. Two genes are positively selected in the Pacific Gyre strain, and neither has a GO term. The population of T. pseudonana from which the northern Adriatic Sea strain was collected is potentially geographically isolated from other populations because there are three hydrogeographic and genetic barriers for other species between it and the Atlantic Ocean. There are no positively selected genes within the Adriatic Sea strain (table 3).
Six of the seven T. pseudonana strains have been in culture for more than 25 years; therefore, we investigated the influences of culturing using a branch model that tested for positive selection upon strains based on the decade in which they were isolated (table 3). Time in culture does not appear to promote positive selection. There are 468 genes with a nominal P value ≤ 0.05, meaning that estimated omegas of the specified branches were significantly different from one, but they could be significantly greater or less than one. Twenty-five of the 468 genes were significant at a Bonferroni corrected value (P < 1.5 × 10−5), but none had an omega ≥ 1.0.
Discussion
Phylogenetic Distance at Which Positive Selection Is Best Detected in Diatoms
The greatest number of genes with a signal of positive selection occurs within a species, between two strains of T. pseudonana. Thalassiosira pseudonana diverged from D. confervacea, its sister species, ∼2 Ma, based on a molecular clock applied to the divergence of 18S rDNA (Sorhannus 2007). The 18S rDNA sequences of T. pseudonana and D. confervacea differ by 0.4% over 1,758 base pairs (from accession numbers used in Sorhannus [2007]). Comparisons of diatoms with similar or equal or sequence divergence should be tractable for identifying positively selected genes. Positive selection may also be detected at greater phylogenetic distances for particular genes. Many sexual reproduction genes evolve rapidly, and those involved in gamete recognition are implicated in maintaining reproductive isolation (Lyon and Vacquier 1999). The putative diatom gamete recognition gene, SIG1, is positively selected among four species of Thalassiosira (Sorhannus and Pond 2006). Selecting the appropriate phylogenetic distance to detect positively selected genes is, therefore, also contingent upon the functions of the proteins they encode.
The homologs detected between Pseudo-nitzschia species are currently subject to strong purifying selection, suggesting that their functions are conserved in other groups of organisms as well. Interestingly, the distribution of dN:dS in Pseudo-nitzschia homologs is of the same shape, and the rates of nonsynonymous mutations per codon are in the same range as genes shared by mice and humans that also have homologs in plants and yeast and have deep evolutionary histories of hundreds of millions of years (Alba and Castresana 2005). The apparent difference in the time required to accrue similar amounts of genetic variation, 10 My for Pseudo-nitzschia species versus 100s of million years for metazoans and plants, is likely due to the high frequency of asexual reproduction in diatoms that provides the opportunity for mutation on a daily basis.
The cryptic sister species of Ditylum have a 2-fold difference in genome size hypothesized to result from whole-genome duplication (Koester et al. 2010). Positively selected genes may be associated with speciation and niche differentiation reflecting the two species different adaptive strategies. The proportion of genes indicated to be under positive selection in Ditylum was intermediate to the Pseudo-nitzschia species and Thalassiosira strain pairs. This may be because the Ditylum species diverged only recently, and too little time has passed to identify selective substitutions.
Candidate Genes under Positive Selection among Seven Strains of T. pseudonana
Seven percent of the 11,390 known genes in T. pseudonana are strong candidates for positive selection. This estimate is statistically conservative because it relies upon a Bonferroni correction rather than an FDR. One potential caveat is that this estimate may include both intra- and interstrain divergence if unidentified gene duplications were collapsed onto a single locus during genome mapping and assembly. In these instances, the signal of positive selection is accurate, but instead of applying to a single-locus alignment including sequence from seven strains, the alignment contains the consensus sequence of multiple loci for one or more strains.
Different organisms accrue positively selected genes at different rates, highlighting differences in selective pressures and mechanisms retaining beneficial mutations within populations. Both Thalassiosira pseudonana and humans have ∼7% of their loci under positive selection (Biswas and Akey 2006), but the length of time that humans have been diverging from their last common ancestor is two to three times longer than that between T. pseudonana and D. confervacea. Two coral species Acropora millepora and A. palmata also have ∼7% of their genomes under positive selection (Voolstra et al. 2011), but they diverged from one another 10–12 Ma (van Oppen et al. 2001). Approximating sexual generation times for diatoms, acroporid corals, and humans as 2, 4, and 20 years, respectively, these groups have accrued the same relative number of positively selected genes in 1.0 × 106, 2.8 × 106, and 3.0 × 105 generations. The intermediate proportion of positively selected genes within T. pseudonana might be due to large population sizes of diatoms, within population allelic variability, gene flow between populations, and the strength and periodicity of selective forces (Lynch et al. 1991).
Versatility within the regulatory networks of gene expression provides a mechanism for cells to react quickly to environmental fluctuations (Li and Chen 2010). Positive selection in a single transcription factor potentially affects the expression of many genes at once. Transcription factors comprise ∼2% of the protein-coding genes in T. pseudonana (Rayko et al. 2010) and are the most specific group of over-represented proteins encoded by the positively selected genes (table 2 and supplementary table S3, Supplementary Material online). Ten percent of the 258 transcription factors are positively selected in T. pseudonana suggesting that differential gene regulation is an important component of adaptation for these strains. Positively selected genes were identified within the families of basic leucine zipper (bZip), heat shock, Myb, and zinc finger transcription factors. Transcription factors are also over-represented within the positively selected genes among human populations (Bustamante et al. 2005) and between two species of the nematode Caenorhabditis (Castillo-Davis et al. 2004) but not between two closely related corals (Voolstra et al. 2011).
Responding to the molecular signals of biotic and abiotic stressors is within the purview of both bZip and heat shock transcription factors (HSFs); genes from both families also function under normal homeostatic physiologies. These two families are of special interest in diatoms because the size of membership differs from other organisms. There are 10-fold fewer bZip transcription factors in diatoms and their phylogenetic group, the stramenopiles, than in plants and animals (Montsant et al. 2007; Rayko et al. 2010). Yet, 20% of the genes in the bZip transcription family are under positive selection in T. pseudonana. Diatoms have an expanded family of HSFs, currently estimated to contain 94 members in contrast to the one to four HSFs found in yeast and metazoans (Montsant et al. 2007; Fujimoto and Nakai 2010; Rayko et al. 2010). Six of the 39 (∼15%) heat shock factors in a diatom-only clade, Group 2, are positively selected (fig. 1 of Rayko et al. 2010). The Group 2 clade of HSFs highlights the importance of gene duplication in evolution and the potential for mutations in one of the duplicates to eventually provide a selective advantage (Zhang et al. 1998; Briscoe et al. 2010).
In T. pseudonana, 77% (112) of the proteins over-represented within any functional category are engaged in protein–protein interactions. The specific GO term over-represented in T. pseudonana is not among those over-represented among positively selected genes in other organisms, but genes encoding proteins with related functions including signal transducers, and receptors for various proteins are over-represented in animals (Bustamante et al. 2005; Voolstra et al. 2011).
Lineage-specific, or orphan, genes are found throughout the evolutionary tree of life (Tautz and Domazet-Lošo 2011). These rapidly evolving genes experience relaxed levels of purifying selection; the increased numbers of nonsynonymous mutations may lead to novel protein functions and adaptive advantages (Domazet-Lošo and Tautz 2003; Voolstra et al. 2011). In addition, the creation of orphan genes provides a long-term source of genetic variability upon which selection may act. We define orphan genes of T. pseudonana as those genes lacking any sequence homology to the genes of other organisms at an e-value cutoff of 10−5. There are proportionally more orphan genes in the positively selected set (24%) than in the genome as a whole (15%). These genes, frequently encoding proteins of unknown function, are important to conferring an adaptive advantage to individuals of T. pseudonana. In diatoms, orphan genes of unknown function may be produced through de novo mechanisms (e.g., Cai et al. 2009), but there is also evidence of lineage specific genes arising though gene duplication (e.g., Domazet-Lošo and Tautz 2003). For example, the gene families of cyclins and heat shock factors are greatly expanded in diatoms, and both families have genes found only in diatoms (Huysman et al. 2010; Rayko et al. 2010). A small number of genes are positively selected within the diatom-only cyclins and heat-shock factors.
Expression of Positively Selected Genes
Levels of gene expression in yeast and vertebrates are inversely correlated to protein divergence and are hypothesized to exercise indirect control over mutation rates, such that highly expressed genes are the most conserved (Pál et al. 2001; Subramanian and Kumar 2004). Positively selected genes are expressed at lower levels than genes subject to neutral or purifying selection and tend to be expressed in restricted conditions or specific tissues in multicellular organisms (Kosiol et al. 2008). Positively selected genes in T. pseudonana CCMP1335 had significantly lower expression than more conserved genes across six different experimental treatments (fig. 2). Similar to other organisms, the adaptive genes in T. pseudonana are likely functioning in specialized capacities and enhance survival during suboptimal growth conditions.
Sexual reproduction genes are a classic example of genes with restricted expression and rapid rates of evolution in diverse organisms (Clark et al. 2006; Oliver et al. 2010). Thalassiosira pseudonana belongs to the multipolar diatoms that produce flagellated sperm, as do the centric diatoms. Genes encoding flagella-associated and putative sperm-activating proteins are under positive selection in T. pseudonana and are among the genes with the lowest levels of expression among six experimental growth conditions (fig. 3 and Mock et al. 2008). This pattern of expression for genes involved in sexual reproduction is consistent with observations that there were no cells differentiating into gametangia in these experiments (Mock et al. 2008). Sexual reproduction is not yet documented in T. pseudonana, but it is an obligate and episodic phase of the life history of many diatoms (Chepurnov et al. 2004). The signature of positive selection suggests that these genes are functional; otherwise, they would be expected to degrade into nonfunctional pseudogenes.
Coexpression of multiple genes suggests that their protein products are functioning in the same metabolic pathway or that they are affecting the same phenotype. The environmental conditions in which the genes are expressed provide additional information when the function of their encoded proteins is unknown. One of the most obvious patterns of coexpression in T. pseudonana CCMP1335 was associated with five positively selected genes that were up-regulated when the cells were limited for silicon and iron (fig. 4, Mock et al. 2008). The first gene in this group is a MYB transcription factor that also is highly expressed under other conditions, similar to expression patterns of MYB genes in plants where they respond to environmental stress and regulate growth and development (Yanhui et al. 2006). Two of the five genes encode proteins that appear to interact with chitin: protein 12594 possesses a chitin-binding domain and a signal peptide suggesting that it is secreted and protein 21085 does not have a signal peptide, but it does have a transmembrane domain and it is a member of a superfamily of proteins that bind lectins, suggesting that it may also interact with chitin. The fourth member of this cluster encodes a protein (21587) with a transmembrane domain but possesses no other functional annotation. The fifth gene contains a V5/Tpx-1 domain, commonly found in extracellular sensory proteins that recognize signals and proteins from other organisms that are frequently pathogenic (Cantacessi et al. 2009). Gene expression of this cluster is associated with a distinctive morphology induced by both silicon and iron limitation—an elongated, bent-cell phenotype in which chitin is deposited in the girdle region of the wall (Durkin et al. 2009). Healthy cells have tight connections between silica cell wall components; therefore, secretion of chitin-binding proteins is hypothesized to shore up the silica cell wall when those connections are weakened and vulnerable to pathogenic attack (Davis et al. 2005; Durkin et al. 2009). The pattern of expression and putative localization of these five proteins provides a possible link between selective agents of nutrient limitation, functional genes, and a cell wall phenotype.
Selection of Specified Strains with Respect to Environment
We used branch-site models of positive selection to explore the hypothesized selective forces of geographic isolation, ocean environment, and temperature on subsets of the strains collected from sites associated with these environmental variables (table 3). Three hydrogeographic barriers separate the Adriatic Sea from the Atlantic Ocean, and these boundaries differentiate population structure in planktonic organisms living in the Mediterranean Sea (Patarnello et al. 2007; Yebra et al. 2011). Although the T. pseudonana Adriatic strain is potentially geographically isolated, there do not appear to be any differentiating environmental pressures. Instead, the genes of the Adriatic strain are evolving under purifying and neutral selection.
The open ocean strain from the Pacific Gyre was collected from an environment very different than the other six estuarine strains, including the one from the Adriatic. The gyre has stable light, salinity, and low nutrients compared with coastal regions and estuaries, which are characterized by dynamic fluctuations of those same parameters (Karl 1999). Therefore, we hypothesized that the open ocean strain would have a strong genetic signal of differential adaptation to its unique environment. Surprisingly, only two genes were identified as positively selected in the open ocean strain. Both encode predicted proteins with unknown functions.
Temperature can be a strong selective agent (Husby et al. 2011), and we tested two hypotheses associated with seasonal variation in SSTs. The four strains collected from locations where summer–winter SSTs varied by 6°C, or less, shared 121 positively selected genes. The presence of three heat shock factors, one heat shock protein, and over-represented protein-binding proteins within this set of positively selected genes suggests adaptation to stress conditions. Fewer positively selected genes are shared by the three strains collected from regions where there is a strong seasonal fluctuation in temperature, but one gene is notable because it encodes a UV radiation resistance protein. This protein may not be directly affected by temperature, but it would be affected by available sunlight altering the temperature of water. These data provide circumstantial evidence that seasonal temperature variation may act as a selective agent; however, it is more likely that absolute temperature, the timing of seasonal changes, and interactions between available light and temperature are more effective in selecting fit phenotypes (e.g., Namroud et al. 2008).
In the two previous sections, we explore associations between genotype and selective force. It is important to note that these strains have been in culture for decades (table 3). Although there is no evidence that culturing promotes positive selection, it is unlikely that any strain retains its original genotype; therefore, care must be taken in interpreting the results. Our results are robust in two ways: first the most tractable genes for further study tend to be grouped in the expression profile or are shared by a subset of strains from the branch-site models, and second, the majority of positively selected genes detected by the branch-site models are in the original set of 809 genes.
Together, our results present an opportunity to test hypotheses that integrate positively selected genes in T. pseudonana with their associated phenotypes and selective forces through manipulative laboratory experiments and in the field by taking a population genetics approach to determine allele distributions for populations living in different regions.
Supplementary Material
Supplementary figure S1 and tables S1–S4 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments
The authors thank Renee George, Megan Kogut, Bob Morris, and Larry Ruzzo for insightful discussion and Robin Kodner for assistance with the T. pseudonana consensus tree. Special thanks are given to those who shared data sets. The T. pseudonana data set was constructed and curated by Rhonda Morales, Ellen Lin, Vaughn Iverson, Dave Schruth, and Chris Berthiaume. The Pseudo-nitzschia multiseries and P. australis data sets are courtesy of Micaela S. Parker. Uwe John and Wiebe Kooistra provided the Pseudo-nitzschia multistriata data set, and M. Valeria Ruggiero provided the P. multistriata 18S sequence. This work was supported by the Gordon and Betty Moore Foundation Marine Microbiology Investigator Award to E.V.A., and by the National Institutes of Health HD057974 and the National Science Foundation's Division of Environmental Biology 0918106 to W.J.S.
References
- Alba MM, Castresana J. Inverse relationship between evolutionary rate and age of mammalian genes. Mol Biol Evol. 2005;22:598–606. doi: 10.1093/molbev/msi045. [DOI] [PubMed] [Google Scholar]
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alverson AJ. Strong purifying selection in the silicon transporters of marine and freshwater diatoms. Limnol Oceanogr. 2007;52:1420–1429. [Google Scholar]
- Armbrust EV, Berges JA, Bowler C, et al. (45 co-authors) The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science. 2004;306:79–86. doi: 10.1126/science.1101156. [DOI] [PubMed] [Google Scholar]
- Austin JA, Barth JA. Variation in the position of the upwelling front on the Oregon shelf. J Geophys Res. 2002;107(C11):3180. [Google Scholar]
- Benitez-Nelson CR, Bidigare RR, Dickey TD, et al. (23 co-authors) Mesoscale eddies drive increased silica export in the subtropical Pacific Ocean. Science. 2007;316:1017–1021. doi: 10.1126/science.1136221. [DOI] [PubMed] [Google Scholar]
- Biswas S, Akey JM. Genomic insights into positive selection. Trends Genet. 2006;22:437–446. doi: 10.1016/j.tig.2006.06.005. [DOI] [PubMed] [Google Scholar]
- Bowler C, Allen AE, Badger JH, et al. (77 co-authors) The Phaeodactylum genome reveals the evolutionary history of diatom genomes. Nature. 2008;456:239–244. doi: 10.1038/nature07410. [DOI] [PubMed] [Google Scholar]
- Briscoe AD, Bybee SM, Bernard GD, Yuan F, Sison-Mangus MP, Reed RD, Warren AD, Llorente-Bousquets J, Chiao C-C. Positive selection of a duplicated UV-sensitive visual pigment coincides with wing pigment evolution in Heliconius butterflies. Proc Natl Acad Sci U S A. 2010;107:3628–3633. doi: 10.1073/pnas.0910085107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bustamante CD, Fledel-Alon A, Williamson S, et al. (14 co-authors) Natural selection on protein-coding genes in the human genome. Nature. 2005;437:1153–1157. doi: 10.1038/nature04240. [DOI] [PubMed] [Google Scholar]
- Cai JJ, Macpherson JM, Sella G, Petrov DA. Pervasive hitchhiking at coding and regulatory sites in humans. PLoS Genet. 2009;5(1):e1000336. doi: 10.1371/journal.pgen.1000336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cantacessi C, Campbell BE, Visser A, et al. (12 co-authors) A portrait of the “SCP/TAPS” proteins of eukaryotes—developing a framework for fundamental research and biotechnological outcomes. Biotechnol Adv. 2009;27:376–388. doi: 10.1016/j.biotechadv.2009.02.005. [DOI] [PubMed] [Google Scholar]
- Casteleyn G, Leliaert F, Backeljau T, Debeer A-E, Kotaki Y, Rhodes L, Lundholm N, Sabbe K, Vyverman W. Limits to gene flow in a cosmopolitan marine planktonic diatom. Proc Natl Acad Sci U S A. 2010;107:12952–12957. doi: 10.1073/pnas.1001380107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Castillo-Davis CI, Kondrashov FA, Hartl DL, Kulathinal RJ. The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res. 2004;14:802–811. doi: 10.1101/gr.2195604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chepurnov VA, Mann DG, Sabbe K, Vyverman W. Experimental studies on sexual reproduction in diatoms. Int Rev Cytol. 2004;237:91–154. doi: 10.1016/S0074-7696(04)37003-8. [DOI] [PubMed] [Google Scholar]
- Clark NL, Aagaard JE, Swanson WJ. Evolution of reproductive proteins from animals and plants. Reproduction. 2006;131:11–22. doi: 10.1530/rep.1.00357. [DOI] [PubMed] [Google Scholar]
- D'Alelio D, d'Alcala MR, Dubroca L, Sarno D, Zingone A, Montresor M. The time for sex: a biennial life cycle in a marine planktonic diatom. Limnol Oceanogr. 2010;55:106–114. [Google Scholar]
- Davis AK, Hildebrand M, Palenik B. A stress-induced protein associated with the girdle band region of the diatom Thalassiosira pseudonana (Bacillariophyta) J Phycol. 2005;41:577–589. [Google Scholar]
- Domazet-Lošo T, Tautz D. An evolutionary analysis of orphan genes in Drosophila. Genome Res. 2003;13:2213–2219. doi: 10.1101/gr.1311003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Durkin CA, Mock T, Armbrust EV. Chitin in diatoms and its association with the cell wall. Eukaryot Cell. 2009;8:1038–1050. doi: 10.1128/EC.00079-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Falcon S, Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics. 2007;23:257–258. doi: 10.1093/bioinformatics/btl567. [DOI] [PubMed] [Google Scholar]
- Falkowski PG, Katz ME, Knoll AH, Quigg A, Raven JA, Schofield O, Taylor FJR. The evolution of modern eukaryotic phytoplankton. Science. 2004;305:354–360. doi: 10.1126/science.1095964. [DOI] [PubMed] [Google Scholar]
- Fujimoto M, Nakai A. The heat shock factor family and adaptation to proteotoxic stress. FEBS J. 2010;277:4112–4125. doi: 10.1111/j.1742-4658.2010.07827.x. [DOI] [PubMed] [Google Scholar]
- Furnas MJ. In situ growth rates of marine phytoplankton: approaches to measurement, community and species growth rates. J Plankton Res. 1990;12:1117–1151. [Google Scholar]
- Hall TA. BioEdit: a user friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999;41:95–98. [Google Scholar]
- Holtermann KE, Bates SS, Trainer VL, Odell A, Virginia Armbrust E. Mass sexual reproduction in the toxigenic diatoms Pseudo-nitzschia australis and P. pungens (Bacillariophyceae) on the Washington Coast, USA. J Phycol. 2010;46:41–52. [Google Scholar]
- Husby A, Visser ME, Kruuk LE. Speeding up microevolution: the effects of increasing temperature on selection and genetic variance in a wild bird population. PLoS Biol. 2011;9(2):e1000585. doi: 10.1371/journal.pbio.1000585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huysman MJJ, Martens C, Vandepoele K, et al. (11 co-authors) Genome-wide analysis of the diatom cell cycle unveils a novel type of cyclins involved in environmental signaling. Genome Biol. 2010;11:R17. doi: 10.1186/gb-2010-11-2-r17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iverson V, Morris RM, Frazar CD, Berthiaume CT, Morales RL, Armbrust EV. Untangling genomes from metagenomes: revealing an uncultured class of marine euryarchaeota. Science. 2012;335:587–590. doi: 10.1126/science.1212665. [DOI] [PubMed] [Google Scholar]
- Julenius K, Pedersen AG. Protein evolution is faster outside the cell. Mol Biol Evol. 2006;23:2039–2048. doi: 10.1093/molbev/msl081. [DOI] [PubMed] [Google Scholar]
- Karl DM. A sea of change: biogeochemical variability in the North Pacific Subtropical Gyre. Ecosystems. 1999;2:181–214. [Google Scholar]
- Kim PM, Korbel JO, Gerstein MB. Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci U S A. 2007;104:20274–20279. doi: 10.1073/pnas.0710183104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koester JA, Swalwell JE, von Dassow P, Armbrust EV. Genome size differentiates co-occurring populations of the planktonic diatom Ditylum brightwellii (Bacillariophyta) BMC Evol Biol. 2010;10:1. doi: 10.1186/1471-2148-10-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kooistra WHCF, Gersonde R, Medlin LK, Mann DG. The origin and evolution of the diatoms: their adaptation to a planktonic existence. In: Falkowski PG, Knoll AH, editors. Evolution of primary producers in the sea. Amsterdam: Elsevier Academic Press; 2007. pp. 210–250. [Google Scholar]
- Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, Siepel A. Patterns of positive selection in six mammalian genomes. PLoS Genet. 2008;4:e1000144. doi: 10.1371/journal.pgen.1000144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Larkin MA, Blackshields G, Brown NP, et al. (13 co-authors) Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23:2947–2948. doi: 10.1093/bioinformatics/btm404. [DOI] [PubMed] [Google Scholar]
- Lavaud J, Strzepek RF, Kroth PG. Photoprotection capacity differs among diatoms: possible consequences on the spatial distribution of diatoms related to fluctuations in the underwater light climate. Limnol Oceanogr. 2007;52:1188–1194. [Google Scholar]
- Li CW, Chen BS. Identifying functional mechanisms of gene and protein regulatory networks in response to a broader range of environmental stresses. Comp Funct Genomics. 2010 doi: 10.1155/2010/408705. Article ID 408705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li YD, Liang H, Gu Z, Lin Z, Guan W, Zhou L, Li YQ, Li WH. Detecting positive selection in the budding yeast genome. J Evol Biol. 2009;22:2430–2437. doi: 10.1111/j.1420-9101.2009.01851.x. [DOI] [PubMed] [Google Scholar]
- Lynch M, Gabriel W, Wood AM. Adaptive and demographic responses of phytoplankton populations to environmental change. Limnol Oceanogr. 1991;36:1301–1312. [Google Scholar]
- Lyon JD, Vacquier VD. Interspecies chimeric sperm lysins identify regions mediating species-specific recognition of the abalone egg vitelline envelope. Dev Biol. 1999;214:151–159. doi: 10.1006/dbio.1999.9411. [DOI] [PubMed] [Google Scholar]
- Maliva RG, Knoll AH, Siever R. Secular change in chert distribution: a reflection of evolving biological participation in the silica cycle. Palaios. 1989;4:519–532. [PubMed] [Google Scholar]
- Miller JR, Delcher AL, Koren S, Venter E, Walenz BP, Brownley A, Johnson J, Li K, Mobarry C, Sutton G. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics. 2008;24:2818–2824. doi: 10.1093/bioinformatics/btn548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mock T, Samanta MP, Iverson V, et al. (15 co-authors) Whole-genome expression profiling of the marine diatom Thalassiosira pseudonana identifies genes involved in silicon bioprocesses. Proc Natl Acad Sci U S A. 2008;105:1579–1584. doi: 10.1073/pnas.0707946105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Montsant A, Allen AE, Coesel S, et al. (27 co-authors) Identification and comparative genomic analysis of signaling and regulatory components in the diatom Thalassiosira pseudonana. J Phycol. 2007;43:585–604. [Google Scholar]
- Namroud M-C, Beaulieu J, Juge N, Laroche J, Bousquet J. Scanning the genome for gene single nucleotide polymorphisms involved in adaptive population differentiation in white spruce. Mol Ecol. 2008;17:3599–3613. doi: 10.1111/j.1365-294X.2008.03840.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nielsen R, Bustamante C, Clark AG, et al. (13 co-authors) A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 2005;3(6):e170. doi: 10.1371/journal.pbio.0030170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oliver TA, Garfield DA, Manier MK, Haygood R, Wray GA, Palumbi SR. Whole-genome positive selection and habitat-driven evolution in a shallow and a deep-sea urchin. Genome Biol Evol. 2010;2:800–814. doi: 10.1093/gbe/evq063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pál C, Papp B, Hurst LD. Highly expressed genes in yeast evolve slowly. Genetics. 2001;158:927–931. doi: 10.1093/genetics/158.2.927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patarnello T, Volckaert FAMJ, Castilho R. Pillars of Hercules: is the Atlantic–Mediterranean transition a phylogeographical break? Mol Ecol. 2007;16:4426–4444. doi: 10.1111/j.1365-294X.2007.03477.x. [DOI] [PubMed] [Google Scholar]
- Raup DM. Size of the Permo-Triassic bottleneck and its evolutionary implications. Science. 1979;206:217–218. doi: 10.1126/science.206.4415.217. [DOI] [PubMed] [Google Scholar]
- Rayko E, Maumus F, Maheswari U, Jabbari K, Bowler C. Transcription factor families inferred from genome sequences of photosynthetic stramenopiles. New Phytol. 2010;188:52–66. doi: 10.1111/j.1469-8137.2010.03371.x. [DOI] [PubMed] [Google Scholar]
- Rynearson TA, Armbrust EV. Genetic differentiation among populations of the planktonic marine diatom Ditylum brightwellii (Bacillariophyceae) J Phycol. 2004;40:34–43. [Google Scholar]
- Rynearson TA, Newton JA, Armbrust EV. Spring bloom development, genetic variation, and population succession in the planktonic diatom Ditylum brightwellii. Limnol Oceanogr. 2006;51:1249–1261. [Google Scholar]
- Sachidanandam R, Weissman D, Schmidt SC, et al. (41 co-authors) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001;409:928–933. doi: 10.1038/35057149. [DOI] [PubMed] [Google Scholar]
- Sapriel G, Quinet M, Heijde M, Jourdren L, Tanty V, Luo G, Le Crom S, Lopez PJ. Genome-wide transcriptome analyses of silicon metabolism in Phaeodactylum tricornutum reveal the multilevel regulation of silicic acid transporters. PLoS One. 2009;4(10):e7458. doi: 10.1371/journal.pone.0007458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siever R. Silica in the oceans: biological-geochemical interplay. In: Shieder SH, Boston PJ, editors. Scientists on Gaia. Boston: MIT Press; 1991. pp. 287–295. [Google Scholar]
- Sodergren E, Weinstock GM, Davidson EH, et al. (228 co-authors) The genome of the sea urchin Strongylocentrotus purpuratus. Science. 2006;314:941–952. doi: 10.1126/science.1133609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sorhannus U. The effect of positive selection on a sexual reproduction gene in Thalassiosira weissfloggii (Bacillariophyta): results obtained from maximum-likelihood and parsimony-based methods. Mol Biol Evol. 2003;20:1326–1328. doi: 10.1093/molbev/msg145. [DOI] [PubMed] [Google Scholar]
- Sorhannus U. A nuclear-encoded small-subunit ribosomal RNA timescale for diatom evolution. Mar Micropaleontol. 2007;65:1–12. [Google Scholar]
- Sorhannus U, Pond SLK. Evidence for positive selection on a sexual reproduction gene in the diatom genus Thalassiosira (Bacillariophyta) J Mol Evol. 2006;63:231–239. doi: 10.1007/s00239-006-0016-z. [DOI] [PubMed] [Google Scholar]
- Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–2690. doi: 10.1093/bioinformatics/btl446. [DOI] [PubMed] [Google Scholar]
- Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Subramanian S, Kumar S. Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics. 2004;168:373–381. doi: 10.1534/genetics.104.028944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suzuki Y, Nei M. False-positive selection identified by ML-based methods: examples from the Sig1 gene of the diatom Thalassiosira weissflogii and the tax gene of a guman T-cell lymphotropic virus. Mol Biol Evol. 2004;21:914–921. doi: 10.1093/molbev/msh098. [DOI] [PubMed] [Google Scholar]
- Swanson WJ, Wong A, Wolfner MF, Aquadro CF. Evolutionary expressed sequence tag analysis of Drosophila female reproductive tracts identifies genes subjected to positive selection. Genetics. 2004;168:1457–1465. doi: 10.1534/genetics.104.030478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tautz D, Domazet-Lošo T. The evolutionary origin of orphan genes. Nat Rev Genet. 2011;12:692–702. doi: 10.1038/nrg3053. [DOI] [PubMed] [Google Scholar]
- Than C, Nakhleh L. Species tree inference by minimizing deep coalescences. PLoS Comput Biol. 2009;5(9):e1000501. doi: 10.1371/journal.pcbi.1000501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tréguer P, Pondaven P. Silica control of carbon dioxide. Nature. 2000;406:358–359. doi: 10.1038/35019236. [DOI] [PubMed] [Google Scholar]
- van Oppen MJH, McDonald BJ, Willis B, Miller DJ. The evolutionary history of the coral genus Acropora (Scleractinia, Cnidaria) based on a mitochondrial and a nuclear marker: reticulation, incomplete lineage sorting, or morphological convergence? Mol Biol Evol. 2001;18:1315–1329. doi: 10.1093/oxfordjournals.molbev.a003916. [DOI] [PubMed] [Google Scholar]
- Voolstra CR, Sunagawa S, Matz MV, et al. (11 co-authors) Rapid evolution of coral proteins responsible for interaction with the environment. PLoS One. 2011;6(5):e20392. doi: 10.1371/journal.pone.0020392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Voolstra CR, Sunagawa S, Schwarz JA, Coffroth MA, Yellowlees D, Leggat W, Medina M. Evolutionary analysis of orthologous cDNA sequences from cultured and symbiotic dinoflagellate symbionts of reef-building corals (Dinophyceae: Symbiodinium) Comp Biochem Physiol Part D Genomics Proteomics. 2009;4:67–74. doi: 10.1016/j.cbd.2008.11.001. [DOI] [PubMed] [Google Scholar]
- Wernersson R, Pedersen AG. RevTrans: multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res. 2003;31:3537–3539. doi: 10.1093/nar/gkg609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wolf YI, Novichkov PS, Karev GP, Koonin EV, Lipman DJ. The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci U S A. 2009;106:7273–7280. doi: 10.1073/pnas.0901808106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
- Yanhui C, Xiaoyuan Y, Kun H, et al. (18 co-authors) The MYB transcription factor superfamily of Arabidopsis: expression analysis and phylogenetic comparison with the rice MYB family. Plant Mol Biol. 2006;60:107–124. doi: 10.1007/s11103-005-2910-y. [DOI] [PubMed] [Google Scholar]
- Yebra L, Bonnet D, Harris RP, Lindeque PK, Peijnenburg KTCA. Barriers in the pelagic: population structuring of Calanus helgolandicus and C. euxinus in European waters. Marine Ecol Progress Ser. 2011;428:135–149. [Google Scholar]
- Zdobnov EM, Apweiler R. InterProScan: an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001;17:847–848. doi: 10.1093/bioinformatics/17.9.847. [DOI] [PubMed] [Google Scholar]
- Zhang JZ, Rosenberg HF, Nei M. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci U S A. 1998;95:3708–3713. doi: 10.1073/pnas.95.7.3708. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


