The fungus Candida albicans exists as a prevalent commensal and an important opportunistic pathogen that can infect multiple niches of its human host. Recent studies have examined the diploid genome of C. albicans by performing both short-term microevolution studies and comparative genomics on collections of clinical isolates. Common mechanisms driving genome dynamics include accumulation of point mutations, loss of heterozygosity (LOH) events, large-scale chromosomal rearrangements, and even ploidy change, with important consequences for both drug resistance and host adaptation. Evidence for recombination between C. albicans lineages also highlights a role for (para)sex in shaping the species population structure. Ongoing work will continue to define the contributions of genome evolution to phenotypic variation and the role of host pressures in driving adaptive processes.
Keywords: Candida albicans, genome evolution, parasex, LOH, ploidy, mutation
The Ascomycete lineage of fungi emerged 400–500 million years ago and now includes approximately 46,000 species with significant environmental and clinical importance [1–3]. Most species occupy environmental niches yet a handful are capable of infecting humans and producing disease, especially in immunocompromised hosts. Among these, Candida albicans stands out as a widespread human commensal of the skin, oral cavity, gastrointestinal and genitourinary tracts [4–6]. Overgrowth of these host niches can result in debilitating mucosal infections and life-threatening systemic disease, making C. albicans an opportunistic pathogen of significant clinical importance [7–9]. Although primarily described as an obligate mammalian commensal, recent evidence suggests it may also be able to survive and propagate in the environment [10,11].
C. albicans is naturally diploid with a 14.3 megabase (Mb) genome consisting of eight chromosomes and a GC content of 33.5% [12]. The genome of the standard ‘laboratory’ isolate of C. albicans, SC5314, contains 43,665 heterozygous positions, corresponding to an average of 1 heterozygous single nucleotide polymorphism (SNP) every 330 base pairs (bp) [13]. However, as further discussed below, heterozygous positions are not evenly distributed across the genome and their frequency varies considerably between isolates. C. albicans was long thought to be an obligate asexual species, yet an unusual parasexual cycle has been defined in the laboratory that lacks a conventional meiosis but can still generate highly recombinant forms.
Population structure of C.albicans isolates
C. albicans shows a predominantly clonal population structure consisting of 17 clades as established by multilocus sequence typing (MLST) [14–16]. The population structure is largely independent of geography although clades SA and E are enriched in South Africa and Europe, respectively [17–20]. Recent whole genome sequencing efforts of multiple clinical isolates (including those from commensal and pathogenic sources) generally support the previously constructed clade architecture of C. albicans [20–22]. These studies also highlighted the extensive sequence divergence present within the species, in some cases exceeding 1% nucleotide divergence between isolates [22]. Genome sequencing also revealed strong support for one clade, clade 13, to represent a new sub-species named Candida africana. Isolates from this clade were first identified in Africa, have unique morphological and physiological properties, and are associated with genital tract infections [20,23–25]. Close association with the genital tract may be the consequence of C. africana isolates having lost the ability to colonize other host niches and thus represent a striking example of niche specialization [20]. C. africana genomes have undergone massive loss of heterozygosity together with the accumulation of a large number of nonfunctional genes, reflecting a reductive mode of evolutionary diversification common among yeasts [26].
Mutational processes in the C. albicans genome
A number of studies have indicated that the C. albicans genome displays high levels of plasticity, most readily evidenced by changes in karyotypes [27–32]. Genomic variation contributes to its ability to colonize a variety of host niches, adapt to diverse selection pressures, as well as escape antifungal drugs [6,21,29,33–39]. Recent analyses have provided a nuanced picture of genome dynamics and revealed the plethora of mechanisms driving genome change from the nucleotide level to the whole chromosome level.
Point mutations
Point mutations such as base substitutions are an important driver of genomic change due to their high frequency of production and likelihood of being tolerated [13,21,40,41]. C. albicans base substitution rates are estimated to be 1.2×10−10 per base pair per generation during in vitro growth in standard laboratory medium [40], similar to rates observed for other model yeasts [42,43]. Calculation of C. albicans mutation rates in the mammalian host is complicated by poorly-defined generation times in vivo. However, mutational frequencies over defined time periods were greater in vivo than in vitro, consistent with the idea that host pressures may enhance mutational events and/or provide stronger selective pressures for retention of mutations [38,40].
The frequency of insertion/deletions (indels) is higher in C. albicans [22,40] than in S. cerevisiae [42,44], yet these frequencies are still lower than those seen in other eukaryotic species [45], which is likely the result of compact, gene-rich yeast genomes. Whereas coding sequences comprise 1–2% of the genomes of higher eukaryotes [46], ~36% of the C. albicans genome contains protein-encoding open reading frames (ORFs) whose function could be disrupted by indel formation [22]. Indeed, strong purifying selection is evidenced by an overrepresentation of indels within intergenic sequences and indels being found in multiples of three nucleotides within coding sequences, indicating that frameshift-inducing indels are selected against [22,40].
The accrual of point mutations across the C. albicans genome is also non-random. Overall, base substitutions preferentially accumulate in intergenic and repetitive regions of the genome (Figure 1a) [40]. In the case of intergenic regions and gene-poor repetitive regions, mutations are less likely to have deleterious consequences. Similarly, elevated mutation rates are observed in cell surface proteins harboring repetitive regions (such as the ALS adhesins) and these likely have functional consequences for interactions with the host [47,48]. Analysis of mutational frequencies also reveal that synonymous mutations are tolerated better than non-synonymous mutations during short-term evolution of C. albicans strains, which reflects an important role for purifying selection in limiting the accumulation of non-synonymous mutations in the genome [40]. Over longer time frames, non-synonymous mutations were uniformly distributed across the genome; whereas the number of synonymous mutations fluctuated extensively depending on chromosomal position [21].
Figure 1. Mutational patterns in the C. albicans genome.
(a) Mutation accumulation varies across C. albicans chromosomes, with telomeres, repetitive regions, and genes containing repeats enriched for mutations. (b) Loss of heterozygosity (LOH) and de novo SNPs are major drivers of genetic variation. Large LOH tracts are rare and arise at the ends of chromosomes, whereas short-tract LOH events are frequent and widespread across the genome. New heterozygous SNPs arise both within and adjacent to LOH tracts.CEN, centromere; HET, heterozygous; HOM, homozygous; MRS, Major Repeat Sequence.
Recent passaging of C. albicans strains in the murine gastrointestinal (GI) tract revealed selection for mutations of certain genes that can dictate the balance between commensal and pathogen [49]. In particular, homozygous mutations that disrupt the function of FLO8, a transcriptional regulator that promotes filamentation, were frequently selected for in this niche [50]. Constructed FLO8 deletion mutants outcompeted wildtype strains when directly evaluated in the GI, yet showed reduced virulence in the systemic model of infection [49]. Several C. albicans strains isolated from vulvovaginal candidiasis patients contained FLO8 mutations that were associated with either increased or decreased Flo8 activity [51], suggesting that modulation of FLO8 function is a common mechanism of niche adaptation. Loss-of-function mutations in EFG1, another key regulator of filamentation, have also been identified in clinical isolates that enhance fitness in the GI niche [21,52] but decrease systemic virulence [21,52], similar to FLO8 deletion mutants. It therefore appears that regulators of filamentation are commonly disrupted in clinical C. albicans strains with functional consequences for their fitness in the host.
Loss of heterozygosity
The diploid genome of C. albicans contains a relatively high density of heterozygous positions even when compared to other Candida clade species [13]. However, the extent of heterozygosity varies greatly between different clinical isolates [20–22]. Of note, C. albicans isolates collected from oak trees in the UK showed increased levels of heterozygosity compared to clinical isolates, potentially indicating exposure to distinct selective pressures [11].
LOH occurs when heterozygous positions between the two chromosome homologs are lost, rendering these regions of the genome homozygous. Thus, an allelic configuration of AB can undergo LOH to become either AA or BB. This process can increase genetic diversity in the population and can uncover recessive alleles with potentially deleterious outcomes [53]. Indeed, haploid forms of C. albicans have poor overall fitness, which is presumably the consequence of unmasking recessive alleles with reduced function since cells still showed low fitness even after auto-diploidization [28].
In C. albicans, several studies have examined LOH events that cover large chromosomal segments or even entire chromosomes [37,40,54,55]. Large segmental LOH tracts have been attributed to mitotic crossovers or break-induced replication (BIR), and typically extend from the site of the DNA break to the end of the respective chromosome arm [21,40,53]. In line with this, LOH frequency consistently increased from the centromere towards to the telomere among sequenced C. albicans genomes [22]. The frequency of large LOH events also increases in response to environmental stressors (oxidative stress, high temperature, antifungal drugs), as well as during infection of the mammalian host [37,40,56–58]. Whole chromosome LOH can arise through chromosome loss and reduplication although these events are relatively rare compared to segmental LOH events, at least in clinical isolates [20,21]. These LOH events can dramatically impact the genome as they can cause homozygosis of hundreds or even thousands of heterozygous SNPs.
In contrast, homozygosis of short regions (<5 kb) is understudied at the whole genome level in fungal cells, in part due to the difficulty in accurately resolving heterozygous polymorphisms using short-read sequencing. Short-tract LOH can arise via gene conversion or double crossovers, and recent microevolution studies of C. albicans indicate that they occur at similar frequencies to de novo base substitutions [40]. Consequently, genome heterozygosity levels are maintained at stable levels, at least in the absence of large-tract LOH events. Furthermore, the overwhelming majority of LOH tracts spanned small regions of the chromosome (average LOH tract was only 368 bp) and affected only 1–2 heterozygous positions. Short-tract LOH events were observed across the genomes of clinical isolates, although their frequency increased from centromeres to telomeres [22]. In contrast, large-scale LOH events were comparatively rare during microevolution [40,53]. Short-tract LOH were enriched at telomeres, repetitive regions (e.g., the Major Repeat Sequence and long terminal repeats) and in genes harboring repetitive elements (e.g. ALS genes) [40] (Figure 1b). Interestingly, LOH events were also associated with de novo mutations, as regions immediately flanking LOH tracts showed increased accumulation of both SNPs and indels [40], suggestive of mutagenic DNA repair mechanisms [59–64]. These trends were not apparent in natural isolates that have diverged over longer evolutionary periods; in this case, heterozygous regions accumulated more mutations than homozygous regions [22], similar to other eukaryotes [40].
LOH events can drive important phenotypic changes, including those relevant to clinical traits. For example, LOH of mutated ERG11, TAC1 or MRR1 alleles can enhance antifungal drug resistance by altering azole drug targets or increasing drug efflux [35,65,66]. Homozygosis of a region on the right arm of chromosome (Chr) 3 altered sensitivity to the DNA-damaging agent MMS [67] and large LOH events were associated with increased resistance to fluconazole during antifungal treatment [36]. More recently, it was shown that a number of clinical isolates are natural EFG1 heterozygotes that can readily undergo LOH to generate efg1 null derivatives that exhibit a fitness advantage within the GI tract [68]. LOH therefore represents a key mechanism by which isolates can evolve and adapt to their environment.
Mating, Parasex and Ploidy shifts
Although most natural isolates are diploid or near diploid [20,21,36], C. albicans is capable of adopting a range of stable ploidy states from haploid to tetraploid [28,69]. Ploidy doubling can occur via conventional, opposite-sex mating, as C. albicans a and α cells undergo efficient conjugation following LOH of the heterozygous a/ α mating type-like (MTL) locus and switching from the sterile ‘white’ phenotypic state to the mating-competent ‘opaque’ state [70]. Despite efficient mechanisms for cell-cell conjugation, a meiotic program of ploidy reduction has not been identified. Instead, cells can be induced to undergo ploidy reduction to a diploid or near-diploid state via an uncoordinated process of concerted chromosome loss (CCL) [71,72]. CCL produces genetically diverse parasexual products, in part due to the formation of multiple aneuploid forms, and this in turn results in substantial phenotypic heterogeneity in parasexual progeny [72,73].
Treatment of C. albicans cells with antifungal drugs can also drive a transient increase in ploidy, as exposure of diploid cells to fluconazole resulted in cellular ploidies reaching up to 16N [74]. A loss of coordinated cell division events following DNA replication was responsible for these ploidy increases and often produced multinucleate ‘trimeras’ that consisted of contiguous mother, daughter and granddaughter cells [74]. Subsequent mitotic collapse resulted in tetraploid cells with extra spindles that then generated aneuploid forms due to chromosome mis-segregation [74].
A notable characteristic of C. albicans and other fungal species is that they are highly tolerant of different aneuploid forms, which can have important phenotypic consequences. Aneuploid formation has been observed during laboratory passaging of C. albicans and is promoted by the presence of azoles, certain sugars (e.g., L-sorbose) or low nitrogen [27,30,31,69,75]. While aneuploidy in eukaryotes is commonly associated with fitness defects, aneuploid forms of C. albicans can be advantageous under specific conditions [21,76]. In particular, C. albicans cells harboring certain supernumerary chromosomes, such as an isochromosome of Chr5, can provide resistance against azole drugs [29].
In one sequencing analysis, ~38% (8/21) of clinical isolates originating from a range of host niches contained supernumerary chromosomes [21]. Chr4 and Chr6 were commonly trisomic in these isolates, suggesting that smaller chromosomes may be retained more frequently due to reduced fitness costs compared to larger chromosomes. In contrast, recent sequencing of 182 clinical isolates identified only 18 aneuploid strains (~10%) [20]. These discrepancies may result from differences in antifungal drug exposure between isolates in the two studies, which was unknown in the first study but unlikely for many of the 182 isolates that included numerous commensal isolates. Indeed, aneuploid frequencies increased over time among C. albicans isolates recovered from patients during azole treatment, further supporting a link between drug exposure and aneuploidy [36]. Certain aneuploid forms can increase expression of the azole drug target ERG11 [29,77], genes in the ergosterol biosynthetic pathway [36], or drug efflux pumps [29,77]. Supernumerary chromosomes can also emerge following passage in the mammalian host [38,39]. Chr6 trisomies were often observed in isolates recovered from the oral cavity [38], and Chr7 trisomies emerged during colonization of the murine GI tract where they provided increased fitness in this niche [40]. Thus, aneuploid forms can increase genetic variation and fuel phenotypic change, both of which are reversible via endoreplication or loss of the supernumerary chromosome.
Recombination in clinical isolates
While efficient mating and the parasexual cycle are observed under laboratory conditions, the extent of genetic exchange between C. albicans strains in nature has long been debated [14,15,78,79]. Genome sequencing of clinical C. albicans isolates is consistent with a largely clonal population structure yet, critically, also reveals compelling evidence for recombination between strains. Early evidence for mating in nature relied on MLST analysis of haploid mitochondrial genomes, some of which encoded allelic variants found in separate C. albicans lineages [78,79]. Recent genome analysis of 182 isolates clearly identified two clades that harbor genetic signatures from two or more other clades, indicative of recombination among distinct C. albicans lineages [20]. Investigation of a second strain set also revealed examples of admixed nuclear and mitochondrial genomes composed of several distinct clades [22]. The number of associated C. albicans clades and degree of introgression tract length in a recombinant genome serve as indicators of the evolutionary age of the mating event. Recently mated isolates will display large contiguous sequence blocks that originate from genotypes evident in other clades (Figure 2). Over time, these strains can accumulate new mutations and diverge from their parental genotypes, establishing distinct clades but obfuscating their evolutionary history. Current genomic studies suggest that both more recent and relatively ancestral recombination events have given rise to the current C. albicans population structure.
Figure 2. Parasex contributes to the C. albicans population structure.
Mating between strains from genetically distinct C. albicans clades leads to the production of highly recombinant progeny containing alleles from both parental strains. Over time, the recombinant progeny may cluster closer to one of the parental clades due to maintenance of inherited beneficial mutations that provide a fitness advantage. Alternatively, progeny can follow an independent evolutionary trajectory that includes the acquisition of novel polymorphisms. This strain will appear as a new clade or a distantly-related member of an existing clade. Continued mitotic evolution of this strain will lead to a cluster of genetically-related but distinct strains within the new clade.
The heterozygous diploid genome of C. albicans supports a diverse array of mutational processes that contribute to genome plasticity. Recent studies shed light on relatively underappreciated aspects of C. albicans genome evolution, including short-tract LOH events that accumulate in the genome at frequencies similar to those of point mutations. In addition, mating and parasex contribute to C. albicans evolution through the generation of novel genome configurations and lineages. Future studies are needed to address how different selective pressures in the host (or possibly the broader environment) impact genome evolution. In addition, efforts will increasingly examine how changes in the genome impact key phenotypic properties and enable C. albicans to infect and adapt to diverse niches in the mammalian host.
C. albicans isolates contain heterozygous diploid genomes that display extensive plasticity.
Genome evolution occurs via a variety of mechanisms including de novo mutation and short-tract gene conversion.
Population structure is primarily shaped by clonal evolution although there is now compelling evidence for (para)sexual recombination in nature.
Aneuploid cells frequently arise in vivo with complex and poorly understood contributions to organismal success.
